This page describes a proposal for an intermediate XML format in which FOP's area tree can be serialized and reread for delayed rendering. I've been approached by two parties who would like to have such a feature in FOP.

As is indicated below, the currently implemented intermediate format needs a different approach. The design ideas and discussion are found on the AreaTreeIntermediateXml NewDesign page.

Requirements

Benefits

The intermediate format allows various manipulations of the layouted document. People can do stuff that they can't do during the XSL-FO stage and that they'd have to do by post-processing FOP's output, but after they already lost some information on the way (like the simple-page-master used by a page).

Possible Problems

Implementation notes

Additional ideas

Coding log

Feedback from users

2007-02-28: The intermediate format has been available for quite some time now. Users seem generally quite happy with the functionality. Two problems have been identified, however:

The original approach was tempting because it promised relatively quick results by reusing as much as possible. The main intention of the intermediate format is to have FOP process a document as far as possible, so that later in the final step the ultimate output format can be generated as quickly as possible. The use case behind this: Imagine a high-volume document production system where a lot of documents come in during the day. They are formatted as they come in (so they CPU consumption can be distributed during the day). At some point the printing department decides to print the queued documents. An operator issues a command to generate a big print job containing all queued documents. The documents (saved in intermediate format) are concatenated and enriched using OMR marks or barcodes for automated packaging and other things like job info pages and finally, they are run through a renderer to generate the desired output format (often PostScript, AFP or PCL). This final task has to be very fast so the operator doesn't have to wait too long for the print job to be available for printing. Of course, the same could be done working in the actual output format (i.e. directly in PostScript or AFP), but that means that this functionality would have to be implemented for each output format which with the intermediate format you'd have to do it once.

So, the idea now is to specify a new intermediate format which will cover the original requirements (as seen above) and fix the above two problems. The rough idea is to simplify the intermediate format. That will mean that we will need a different approach for the Renderers. In the short term the original renderers will remain untouched. Maybe at some point they can be replaced by the new counterparts which are expected to be smaller and easier to develop since they don't have to cover the same amount of functionality as the current ones. The current XMLRenderer will likely remain in place for the time being if only to help with unit testing the layout engine.