Unifying URI Resolution

FOP currently has a multitude of pathways for resolving a URI, using java.net.URI, java.net.URL, java.lang. String, javax.xml.transform.Source and others. There also isn't much clarity as to where URIs are resolved against, in terms of a base directory. org.apache.fop.apps.FOURIResolver does a lot of the resolution, and it does a lot more than it should (I believe).

Why do we care?

If we want FOP to operate in a Cloud platform, where there are multi-tenant requirements, file-access is strictly controlled and there are much more restrictions in terms of file access and URI resolution, these issues need to be addressed. It's not just file access that is an issue, a client shouldn't have to pass a java.io.File as a resource. Whether or not a File is being read is irrelevant, an InputStream is the only requirement to load resources to memory. The URI resolver has to convert a java.net.URI to an java.io.InputStream, how it does so, isn't the concern of FOP. The converse is true of data being written, an java.io.OutputStream is the only requirement. This would give the client much more control to control FOP and would be a big step towards being eligible for integration in the cloud. URIResolution/IORestrictedEnvironment

Major Issues

The biggest issue here is trying to maintain backward compatibility. I'll post more information about this as I go along, but the biggest issue is that FOP will be much less resilient to mistakes by users. A URI will be resolved against a single base, which the client will define. I should note, that the backward compatibility issues will only be for those using relative URIs, since their resolution is ambiguous as will be discussed in the design.

Design

The plan, as of yet, is for the user to implement the following interface:

    Resource getResource(URI uri) throws IOException;

    OutputStream getOutputStream(URI uri) throws IOException;

NB:// The Resource class extends a FilterInputStream and wraps both the InputStream and its "type" which refers to the MIME type of the resource being retrieved. We will discuss the necessity of a MIME type being bound to the resource later.

By doing so they allow for a mechanism to resolve URIs, then comes the issue of where to take a base-URI from. Intuitively you'd expect any relative URI within the FO to be resolved against the URI of the FO itself, and likewise for the fop.xconf. However we don't have access to the URI of the FO because FOP implements org.xml.sax.ContentHandler, so that's a no-go. So, using the URI within the fop.xconf seems obvious, but what about when no fop.xconf is given?

Internally a class will wrap both the base-URI and the implementation of the interface above and be given in the constructor of the FOUserAgent. This maintains immutability for a FOP run (since the user agent is only valid for a single run), and keeps the API fairly simple.

MIME type resolution

Currently FOP performs file-type resolution using the file-type suffix of the file-name. This is fine on the command-line, however, it can get more involved when thinking about using virtual file systems, databases or some of the more abstract resource storage/acquisition mechanisms. As such we believe the MIME type should be resolved by the ResourceResolver; the user should be able to control the MIME type of a resource if they so wish. Obviously, the default (especially for CLI) resolver will allow for file-type suffix MIME type resolution, but the idea is to give clients more control about resource loading mechanism.

Please feel free to discuss any of the changes I'm suggesting above, there may be use-cases I'm not considering. This stuff is very tricky and since accessing resources is fundamental to FOP rendering documents, I have no doubt there are going to be objections.

  • No labels