This page contains information about the implementation of the "auto table layout" in Apache FOP.
Handling of Tables in FOP-2469
As soon as a table with table-layout="auto" is encountered, FOP has to determine the width of each of its columns. Patch 2469 introduces this capability using a preprocessing run (AutoLayoutDeterminationMode) over each such table and its content during FOP's rendering process.
Each column has a minimal width and a maximal width. It is up to FOP to determine these widths and (if the maximal width cannot be used) to find the optimal width (minimal<=optimal<=maximal).
- minimal width: if this width cannot be granted, FOP will most likely encounter an overflow (maximal miminmal width of all contained table cells)
- maximal width: this is the total amount of width the content of any individual table cell requires if it would be rendered without any breaks (maximal width of all contained table cells)
- optimal width: this width valueis used during the redistribution of space and, eventually, it is the width which FOP will use to render the column
This process starts in TableLayoutManager's getNextKnuthElements(), which calls determineAutoLayoutWidths() on its TableContentLayoutManager if its table uses auto layout (or is nested in another auto layout element). In this method, FOP determines MinOptMax values for each column by obtaining the corresponding widths of the table cells which this column contains. Thus, each TableCellLayoutManager has to recursively ask all of its contained LayoutManagers (LMs) for these values.
Currently, most LMs simply return the corresponding values of their children. Notable exceptions are...
- TableLM (determination depends on whether the table uses a fixed or an automatic layout)
- TableCellLM (adds indentations/paddings to the children's values)
- TextLM (trivial solution: only determines the length of the longest KnuthBox - currently, break possibilities such as hyphenations are not considered)
- ExternalGraphicLM (returns the width of the graphic - currently only for InlineViewports, the only type of figure I encountered - someone should look into the other graphics as well)
During this determination, each contained LayoutManager has to create a list of its knuth elements which, as indicated by Jeremias, can be quite expensive - especially since they are explicitly thrown away afterwards/recreated during the actual rendering of the table to avoid any sideeffects from the auto-layout preprocessing. The only information left behind are MinOptMax objects for encountered table columns (saved in the table's TableContentLM which remains intact and is reused) and refIPD values explicitly set for LayoutContexts.
In the following, simple tables are used as an example to explain the basic algorithm of determining the optimal widths of columns in an auto table. Afterwards, more specific issues such as spanned columns and nested tables are discussed.
Reading convention:
- w("abc") = width of the string "abc"
- LM as abbreviation for LayoutManager
- PGU as abbreviation for PrimaryGridUnit
- auto table as abbreviation a table with table-layout='auto' and a width based on the surrounding container's IPD (e.g., width='50%' instead of width='5cm')
- fixed table as abbreviation a table with table-layout='fixed' and a width based on the surrounding container's IPD
Simple Tables
Without Breaks
a |
b |
c |
a |
ab |
a b c |
The first row is checked and the width of each column is set to the width of its first table cell, thus: min/opt/max = w(character). During the determination of a cell's width, its PGU is inspected in the TableContentLM's setLength(PGU) to obtain the minimal (minIPD) and maximal (ipd) width. Afterwards, the second row is inspected.
- column 1: min/opt/max are identical for the second table-cell and, therefore, the column width remains the same
- column 2: since this table cell contains two consecutive characters, its minimum increases to their width. Again, the maximum width is identical and this column is set to min/opt/max = w(two characters)
- column 3: for this table cell, the minimum value is the width of the widest letter since FOP can insert a linebreak after each individual letter. The maximum and optimal value, however, are w("a b c")
After the table was inspected, each column has a MinOptMax instance representing its viable widths. Since the sum of all max values is less than the available space, the opt value (initialized as max) does not need to be changed. All columns are rendered using their respective opt value (in this case, opt==max).
With Breaks
a |
b |
c |
a |
abcdefghijklmn |
a b c d |
The first row is checked and the width of each column is set to the width of its first table cell, thus: min/opt/max = w(character). Afterwards, the second row is inspected.
- column 1: column width remains the same
- column 2: since this table cell contains consecutive characters, its minimum increases to their width. Again, the maximum width is identical and this column is set to min/opt/max = w('abcdefghijklmn')
- column 3: for this table cell, the
- min = w(widest character)
- max = w("a b c d")
- opt (for now) = w("a b c d")
Again, each column has a MinOptMax instance representing its viable widths. In this case, however, the sum of max values is greater than the available space (checked in computeOptimalColumnWidthsForAutoLayout()). The available space has to redistributed among the columns (cf. ColumnSetup's redistribute()).
Out of a list of all columns, FOP removes those which cannot be shrinked. At first, FOP excludes any columns containing only consecutive letters (i.e. min == max, in the example the first two columns) and columns which have a static width. Then, the remaining columns are set to math.max(column's min value, column's max value * shrink factor) - thus, they are either shrinked as far as possible or set to their min value. Opt values are assigned to these columns (iteratively) until the sum of their opt values is <= the available space.
In the example, columns a and b use their max(=min=opt) value, while column c uses its opt value which was shrinked (min = w(character), opt = w(2 characters and a space), max = ("a b c d")). As a result, column c is rendered with a linebreak.
Note: Columns with a static width (e.g. "2cm") are never resized.
Not Enough Space
In case the sum of all min values is greater than the available space, a warning is displayed ('columnsInAutoTableTooWide', produced in computeOptimalColumnWidthsForAutoLayout()). FOP sets all columns to their respective min value and will most probably produce an overflow.
Cells Spanning Rows and Columns
While tables containing table cells employing row-span values are not a problem (please have a look at FOP-2451 for a related patch), col-span table cells have to be taken into account.
FOP produces a PrimaryGridUnit (PGU) for each table cell. If this table cell happens to span y rows and x columns, the corresponding x*y-1 GridUnits are created to fill the model of the table with placeholders. Since FOP inspects a table from left to right (using a TableRowIterator) and from top to bottom (using a lot of them), the PGU is always encountered first during the FOP's column width determination. Spanned rows do not have an impact on the column's width since (after the content of the PGU was already processed) they do not carry additional content. For spanned columns, on the other hand, the following three cases exist.
Balanced Expansion
a |
b |
c |
abc abc abc |
|
a b c |
In case a table cell spans several columns, its MinOptMax determination is different. In the above example, the first row leads to columns with MinOptMax values of 15 points (static column), w(b), and w(c), respectively.
In the second row, FOP encounters a cell which has a min of w("abc") and a max of w("abc abc abc"), all of which are greater than the max widths of the spanned columns (=w(a) + w(b) + indentations between these spanned columns). Since the spanned columns are supposed to display the content of the spanning table cell as well, their potential widths need to be adapted. This is done when the corresponding PGU is processed in TableContentLM's setBaseLength(PGU).
First of all, the minimal spanned width (minSpanWidth := sum of min widths of all spanned columns) and the optimal spanned width (availableSpanWidth := sum of opt widths of all spanned columns) are calculated. Based on the dimension of the PGU's content, FOP already knows the spanning table cell's minimal width (minIPD := w("abc")) and its optimal width (ipd := w("abc abc abc")).
FOP determines that the spanned columns need to be widened since their minimal width (isNotSufficientForMinIPD := minSpanWidth < minIPD) and optimal width (isNotSufficientForIPD := availableSpanWidth < ipd) are insufficient. From the spanned columns (a and b in the example), FOP selects those which are not static (only b) and, thus, can be widened. If no column can be found, a warning is displayed.
If the minimum width is unavailable, all selected table columns' minimal width values (and their opt/max values) are increased (each gets a proportional cut of the missing space, cf. increaseMinimumWidthOfSpannedColumns()) to avoid an overflow.
The space gained through this increase is counted towards the sum of opt widths of all spanned columns. If_ _this optimal width is still insufficient, the opt/max widths of the selected columns is increased even more (increaseOptimalWidthOfSpannedColumns()).
In the example, column b would be the only column to widen - it would end up with an optimal width of w("abc abc abc") - (w("a") + Indentations). Of course, these are only the optimal values. If the table's optimal width would be too big, the columns would be reduced as far as possible as described before. In this case, the spanned columns (only column b) would provide their new minimal width (w("abc") - (w("a") + Indentations)). Thus, the described injection of new min, opt and max widths ensures that the width of table cells spanning multiple columns is always distributed as equally as possible among the columns which are spanned. Also, this distribution approach works quite robust - a column may be spanned by an arbitrary number of cells in almost any configuration imaginable. One exception is described in the sections "Worst Case".
An example illustrating a complex case can be found in issue FOP-2450 (resize-all-but-static-spanned-columns.xml/pdf).
Special Case: requiresSecondRun
There is a special case for the distribution approach described above (at least in the current implementation of the patch - it may and should be fixed for increased performance). The approach described above extends columns which already have a MinOptMax instance assigned to them. This is not possible if the very first row of a table contains a table cell spanning multiple rows (see table in this section) - at this point, the necessary data structures are not initialized. Therefore, the current implementation explicitly checks for this case (TableContentLM's iterateOverTableRows(): is there PGU which spans multiple columns in the first row?), skips this PGU, and sets a flag (requiresSecondDeterminationRun = true;) informing the table's TableLM to start the determination again (twice the effort for this case!).
Potential improvement: instead of revisiting the whole table, simply go back to the first row (if it fulfills the special case) and recalculate this row only before finishing the determination.
abc abc abc |
|
a b c |
a |
b |
c |
Worst Case
In a complex table, table cells spanning over multiple columns may overlap not only one, but constantly over one or more columns - HTML tables ignore this case, so here is an ASCII example:
============= | abcde | c | ============= | a | abcde | ============= ^a ^b ^c
In this case, FOP has to cope with a column (b) for which no PrimaryGridUnit can be found. Since the width determination of table cells is based on PGUs, no exact determination is possible. In its current implementation, the patch seems to favor such columns over any other column if spanned columns have to be widened, which can lead to imbalanced expansions.
Nested Tables
So far, all scenarios only described the handling of isolated tables. However, since tables may be nested in other tables, FOP also has to cope with fixed tables in auto tables, auto tables in fixed tables and auto tables in auto tables (the fourth combination is already handled correctly).
Auto Table in Fixed Table
This case is quite trivial: FOP knows the exact dimensions of the surrounding table cell (which either has a static width or requires one or more table-column units). Thus, the LayoutContext used to determine the dimensions of the nested table specifies the maximal width (its refIPD) which the contained table is allowed to use. This value is taken for granted, as long as the surrounding fixed table is not contained in another element which itself requires a width determination of its content (i.e., if the fixed table is contained in another auto table in which case the LM’s hasAutoLayoutParent() would return true).
Fixed Table in Auto Table
If an auto table's table cell contains a fixed table, things get a little bit more complicated.
The surrounding auto table needs to know the min and max widths of its columns, i.e. of each individual table cell. Only then it will be able to determine its optimal column width which, in turn, is required to inform the nested table how much space is has available. Luckily, the determination of the minimal the nested table's TableLM can determine the minimal width independently. So, when the nested TableLM's getMinimumIPD() is invoked, it returns the following result as its minimal required width for a complete table (cf. getMinimumIPDforFixedLayout()):
- sum of all static column widths + number of dynamic columns * (highest minimal value of all dynamic columns)
As max value, ColumnSetup's computeOptimalColumnWidthsForAutoLayout() returns the maximal width value of the table to inform the surrounding table of the possible dimensions of the auto table. Thus, FOP knows both possible width values to find the optimal column widths.
Auto Table in Auto Table
Again, the surrounding table needs to know the required space for the contained one while the contained table needs to know how much space will be available. In this case, the minimal required width for the contained auto table is the following (cf. getMinimumIPDforAutoLayout()):
- sum of each column's minimal width value
During the inspection of the contained table, its LM’s hasAutoLayoutParent() would return true (since the surrounding table uses table-layout=auto). In this special case, the contained table's ColumnSetup, which is responsible of computing the optimal column width (cf. computeOptimalColumnWidthsForAutoLayout()), returns the maximal width value of the table to inform the surrounding table of the possible dimensions of the table. Thus, FOP knows both possible width values to find the optimal column widths.
To inform the contained table about the computed optimal column width, the contained table will later on, during FOP's actual rendering run, be called with a LayoutContext which knows the final refIPD of the column.
Some Random Ideas
Today, FOP works with the Knuth element model in both inline- and block-progression-directions. For the auto table layout, we can make use of the inline Knuth element lists generated for content in table-cells. Before the line-breaking is done all the elements are gathered. By finding the largest/widest non-breakable sequence in an element list we can determine the minimal width of a column. By determining the full (unbroken) length of each element list involved we know how wide a column would have to be to accomodate the content without breaking it. Ideally, that latter value is used for the column width but if all the columns together would make the table wider than it can be, the column widths have to be reduced, in an extreme situation, down to the minimum established by the former value. I don't want to suggest any details on how to determine what effective width a column should get. The only purpose of the above is to give an idea for an approach to implement auto table layout.
So, in order to determine the column widths, the element lists have to be available early. The TableRowIterator already supports preflying a full table. The problem starts with actually getting at the element lists. Currently, LineLayoutManager directly starts line-breaking in getNextKnuthElements() when the element lists are available. This would have to be split, so table layout code could access the element lists separately. ATM I'm unsure if block-level content in inlines pose a problem to this. That will have to be looked at closely. Anyway, it should be taken care that an element list doesn't have to be recreated later to do the line-breaking. The number of objects created for an element list is already high enough. Creating objects is expensive.
Another topic might be whether to fly over the whole table or only the first n rows to determine the column layout. To make it even more complicated, the table layout could be recalculated for each page which makes the restart mechanism mandatory that we will need when we want to implement support for differring available IPD among different pages.