This page documents aspects around images in Apache FOP. Older design documentation can be found here: http://xmlgraphics.apache.org/fop/design/images.html
Current status
Some of the content below is slightly out-dated. The "current problems" are mostly "past problems" now, after the image loader framework in XML Graphics Commons has been introduced. Performance and memory consumption has been improved as expected. Still, the image handling in the various renderers is still done in different ways. As an example: Barcode4J currently makes calls against the Graphics2DAdapter interface, the ImageAdapter interface, the PSRenderer class and can still use the fallback via SVG. The coupling is too high. The PDFRenderer also still has a slightly different approach at image handling than, say, the PSRenderer. Now, with the new intermediate format, all the code that is directly dependent on the Renderer interface becomes a problem for code reuse.
Unification of image handling is being worked on as part of the implementation of the new intermediate format (AreaTreeIntermediateXml/NewDesign). The proposal is described on the ImageSupport ImageHandler page.
Current problems
Certain renderers can embed images directly (JPEG, EPS and certain TIFF subformats, for example) but other renderers still require a decoded bitmap image. Currently, the cache only provides the first requested variant of an image. If the PDF renderer rendered an FO file with a JPEG image and then the same document is rendered with the Java2DRenderer, there will be a problem because JPEGImage loaded the original data and did not decode the JPEG image.
Another problem is with color spaces. Details here.
Format support matrix
The following matrix tries to show all the possible combinations. A more graphical view of the whole thing can be seen here (SVG, 45KB).
JPEG
fo:external-graphic only
Renderer |
required/preferred variant |
Comments |
1:1 embedding |
||
PostScript |
1:1 embedding |
Requires PostScript Level 2 |
Java2D |
decoded bitmap |
|
PCL |
decoded bitmap |
|
AFP |
decoded bitmap |
|
SVG |
referenced or RFC2396 data URL |
|
RTF |
1:1 embedding |
1:1 embedding through FOP's own code. No support for decoding JPEG images through an image library, yet.
PNG
fo:external-graphic only
Renderer |
required/preferred variant |
Comments |
decoded bitmap, possibly 1:1 embedding |
||
PostScript |
decoded bitmap, possibly 1:1 embedding for PS Level 3 |
|
Java2D |
decoded bitmap |
|
PCL |
decoded bitmap |
|
AFP |
decoded bitmap |
|
SVG |
referenced or RFC2396 data URL |
|
RTF |
1:1 embedding |
BMP
fo:external-graphic only
Renderer |
required/preferred variant |
Comments |
decoded bitmap |
||
PostScript |
decoded bitmap |
|
Java2D |
decoded bitmap |
|
PCL |
decoded bitmap |
|
AFP |
decoded bitmap |
|
SVG |
referenced or RFC2396 data URL |
|
RTF |
1:1 embedding |
GIF
fo:external-graphic only
Renderer |
required/preferred variant |
Comments |
decoded bitmap |
||
PostScript |
decoded bitmap |
|
Java2D |
decoded bitmap |
|
PCL |
decoded bitmap |
|
AFP |
decoded bitmap |
|
SVG |
referenced or RFC2396 data URL |
|
RTF |
decoded bitmap |
TIFF
fo:external-graphic only
Renderer |
required/preferred variant |
Comments |
decoded bitmap |
1:1 embedding for CCITT encoded images |
|
PostScript |
decoded bitmap |
1:1 embedding for CCITT encoded images (NYI) |
Java2D |
decoded bitmap |
|
PCL |
decoded bitmap |
|
AFP |
1:1 embedding for CCITT encoded images |
|
SVG |
referenced or RFC2396 data URL |
|
RTF |
decoded bitmap |
1:1 embedding through the help of Batik's TIFF codec plus FOP's own code.
SVG
fo:instream-foreign-object and fo:external-graphic
Renderer |
required/preferred variant |
Comments |
native conversion with Batik |
||
PostScript |
native conversion with Batik |
|
Java2D |
native conversion with Batik |
|
PCL |
conversion to bitmap with Batik |
HP/GL Graphics2D implementation only for the simplest of SVGs available |
AFP |
conversion to bitmap with Batik |
GOCA implementation in the works |
SVG |
referenced or embedded |
|
RTF |
conversion to bitmap with Batik |
Output formats (like PCL and RTF) for which no native conversion is available we need an alternative to provide the SVG as a bitmap image. This is currently implemented in AbstractGenericSVGHandler and, for RTF, in SVGConverter.
For PDF, it would be interesting to have a native picture painted into a Form XObject so such an image can be preprocessed and more easily reused. The difficulty there are features like links which would need to be handled separately since they are not part of a Form object.
Similary for PostScript, the SVG could be rendered as an EPS file which could be reused within the document.
EPS
fo:external-graphic only
Renderer |
required/preferred variant |
Comments |
embedded |
PDF support is deprecated and not supported by newer Acrobat Readers |
|
PostScript |
embedded |
|
Java2D |
not supported |
|
PCL |
not supported |
|
SVG |
not supported |
|
RTF |
not supported |
If we ever have a PostScript interpreter available to FOP we can support EPS images for other output formats. An alternative could be to extract the TIFF previews provided by certain EPS images but this is better solved through a better suited image format.
The FOray project has the beginnings of a PostScript interpreter with a proof-of-concept implementation for rendering graphics. But making it usable would take a lot of work.
MathML
fo:instream-foreign-object and fo:external-graphic
MathML is internally converted to SVG in the MathML extension and subsequently handled as such. So see the SVG section for details. Same problems, too. The alternative is to render MathML directly using Java2D.
One small issue here: a math expression usually has a baseline. This baseline should be aligned with the FO baseline.
Barcode4J
fo:instream-foreign-object only
Renderer |
required/preferred variant |
Comments |
painted using Java2D or internally converted to SVG |
||
PostScript |
internally converted to EPS |
|
Java2D |
direct painting to Graphics2D |
|
PCL |
direct painting to Graphics2D |
|
AFP |
direct painting to Graphics2D |
See Bugzilla #41995 for an alternative using BCOCA |
SVG |
internally converted to SVG (NYI) |
|
RTF |
internally converted to bitmap |
The new FOP extension is available since Barcode4J 2.0alpha1.
Other foreign XML formats
The easiest way is to convert to SVG internally and let the renderers handle that format. Examples for this section: Example plan extension, JCharts support etc.
Better is to have those extension work directly on Java2D which enables to bypass Batik and speeds things up.
Requirements for the whole solution
- Extensions which support foreign XML formats should be able to convert their content at least to SVG. Generating bitmaps is also desirable so output formats like RTF can also be supported. Rendering to Java2D may be preferred to SVG as it can reduce some overhead.
- Renderers need to expose APIs to output directly supported formats other than bitmap formats. EPS for PostScript, Graphics2D for at least Java2DRenderer but possibly also for PDF and PS. See Graphics2DImagePainter/Graphics2DAdapter as an example for a solution which is already implemented for some renderers.
- Different renderers support different source formats/flavors for the images to be embedded. The current cache only supports exactly one flavor. If the same image is rendered with another renderer this might result in problems.
- The image cache should store one entry per URI and flavor.
- Examples of possible flavors are: raw/undecoded, RenderedImage/BufferedImage, Graphics2DImagePainter, EPS, XML (SVG, MathML...), RFC2396 URL, etc.
- Renderers would provide a prioritized list of supported/preferred flavors. The image package would then do necessary decoding/conversions and deliver the best flavor it can deliver in a particular case.
- During layout only the image dimensions need to be determined so the image can be properly placed. The actual image data only needs to be available during rendering. Ideally, the InputStream to load the dimensions from should be available to the component fully loading the image later on to avoid additional round-trips to fetch the image. Should additional flavors be needed, the InputStream can probably be reopened.
- To handle all kinds of formats, we may need a special PushBackInputStream which supports arbitrarily sized buffers to reset a stream to position 0. PNG is a case where the resolution information of an image is not guaranteed to be within the first 4KB of the file. See also here. The alternative (now chosen) is to use ImageIO's ImageInputStream which allows caching of content already read either in memory or in a temp file. Specialized implementations (like a ThresholdImageInputStream, switching from memory to file when a certain limit is reached) could be implemented if optimizations are needed.
- When someone works with the XML-based intermediate format to represent the area tree, the layout and the rendering might happen in different VMs, so the actual image data might actually never be loaded at all, so the InputStream still needs to be closed properly!
- We may have to make some distinction between fast and slow connections. It could be faster simply reopening the stream if a resource is loaded from a local file in which case we can rely on the operating system's cache. For resources loaded over the (Inter)net, local buffering makes sense be that in memory or even in a temporary file (like ImageIO likes to do).
- All the little modules should be dynamically registerable. The current hard-coding in ImageFactory is bad.
- We need transparent support for GZIPped content (like SVGZ).
- Support baseline adjustments (for MathML).
- Optional: Some people told is in the past that they do dynamic image generation (like for charts and such). We told them to implement this in a servlet but it could be worthwhile (if easily implemented), that a plug-in could generate a dynamic image based on a given URI in some flavor (SVG, bitmap...).
- Optional: If possible, the package should be able to change between multiple implementations of the same document format while parsing. We've had cases where ImageIO gave better results than our internal codecs and vice versa.
Random thoughts
It might be good to separate the image dimension object for the layout process from the actual decoded image, thus providing separate caches for both. [DONE]
For high-volume PostScript environments (or PPML) it might be worthwhile not to fully load images at all but to simply insert resource placeholders (DSC comment %%IncludeResource) into the stream. This would speed up the rendering process considerably for environments where such an approach is possible. [DONE]
If it is known which renderer the document will be rendered to during the layout stage, the images could be loaded in a separate thread after the dimensions have been determined while the actual layout continues. [ignored for now]
We need to get rid of our byte array approach for storing decoded images. This should be done entirely using Java2D/AWT means, i.e. [RenderedImage BufferedImage]. [DONE]