Best fit page breaking
The total file line and pagebreaking algorithm by Donald Knuth is easily perceived as a monolith. Take it or leave it. Vincent Hennebert suggested a way to open the algorithm up for best fit page layout. One only collects feasible page breakpoints for the next page. When all are collected (the content between the only active node and the current breakpoint is too large to squeeze on the page, and the active node is deactivated), one determines the best pagebreak. Then one resets the algorithm to the chosen page breakpoint, which is the only active node for the layout calculation of the next page.
Why is a best fit page breaking algorithm required? The total fit algorithm does not easily allow one to use different inline progression dimensions for different pages. Recently some ideas to do this in a total fit approach have been discussed, see the page PageAndLineBreaking. But these ideas are rather complicated and far from practical implementation.
Why do we not write a simple best fit page breaking algorithm? Because we do not want to discard the total fit algorithm. Ideally the best fit and total fit algorithms coexist in FOP's code base. And they have a common design so that they share much code.
In FOP page and line breaking is done as follows. The PageSequenceLM starts the collection of page breakpoints by invoking its method GetNextKnuthElements. This is repeated recursively for all descendant LMs. Each LM returns to its parent LM the elements it and its subtree have collected. The BlockLevelLMs return block elements. The LineLMs collect the inline and block elements from their descendant InlineLevelLMs and BlockLevelLMs. For sequences of inline elements, i.e. paragraphs, the line breaker caculates the best line breaks in Knuth's total fit algorithm. Lines and block elements are then returned to the parent.
The collection process is finished when this method has returned to the PageSequenceLM once or multiple times, and the PSLM is finished. Then the page breaker kicks in and calculates the best page breaks in Knuth's total fit algorithm.
To realize the idea of best fit page breaking the page breaker must be able to interrupt the element collection process more often. It can then evaluate the collected block elements and see if the best fit for the next page break can already be determined. This requires a communication channel between the element collection process, which may have deeply descended the tree of LMs, and the page breaker.
This communication process can be achieved as follows. The method GetNextKnuthElements should not return the collected elements to the parent LM, but hand them over to the page breaker. In the current set up the page breaker performs a loop over the legal breakpoints. Actually this is a loop over all elements, and the iteration ends quickly if the element is not a legal page break. In the new situation this loop should be interruptible and resumable. When the page breaker receives a sequence of one or more elements, it performs the iterations over these elements. If it does not have enough elements to determine the best pagebreak, the collection process of the LMs continues. If it does find the best pagebreak, and if the ipd of the next page is different, the collection process is reset to the chose pagebreak. If the ipd of the next page is the same, the deselected feasible page breaks are removed, but the already collected elements can be retained. The selected page breakpoint becomes the only active node, the existing element list can be evaluated by the page breaker, and then the collection process can be resumed at the page break to which it was reset or at the point where it left off.
In this process paragraphs are always evaluated until the paragraph end, and the results contributed to the page breaker. When the page break is in the middle of a paragraph and the ipd of the next page is different, its linebreaks need to be recalculated from the page break. When the ipd of the next page is different, this procedure does not guarantee that it finds the best layout for the paragraph. For that purpose it should find the best linebreaks up to the last line on the page, not the best linebreaks for the whole paragraph. But this procedure is rather simple, and good enough.
In principle the linebreaking process need not be changed from the current situation. But the BlockLevelLMs in inline content expect to be able to return their elements to a breaker, which for inline content is probably the LineLM. The order of inline and block elements needs to be retained. Therefore, it may be required to redesign the inline element collection process along the same lines as the top-level block element collection process.
Best or total fit? This can be a parameter of the page breaker. In total fit mode the pagebreaker is the recipient of the elements as well. But it lets the collection process continue until it has the complete page sequence. Only then does it start the page breaking calculations.