PropertyCache
Introduction
As FOP processes the input it receives from the javax.xml.transform.Transformer
, each startElement()
event it receives comes with a parameter attribute-list. Those XML attributes are converted into FO properties, passing through a PropertyList
, and subsequently bound to the generated FObj
as member variables (see also: PropertyHandling).
Even though the StaticPropertyList
, a Java object having two backing arrays whose length is equal to the total number of properties (= roughly two times Property[250]
), only has a life-span between startElement()
and endElement()
, the total number of FObj
instances generated by a document with significantly large fo:table}}s will be very high. Those objects and their members will basically all remain referenced until FOP's {{AreaTreeHandler
receives an endPageSequence()
event.
As a very rough example, for layout of an fo:table
with 15 explicit {{fo:table-column}}s, and 4000 explicit {{fo:table-row}}s FOP instantiates:
- 1
Table
- 15 {{TableColumn}}s
- 1
TableBody
- 4.000 {{TableRow}}s
- 60.000 {{TableCell}}s
- 60.000 {{Block}}s (a table-cell cannot be empty, when specified)
So, 124.017 objects will live in the Java heap, until the table is fully rendered, and this still excludes any further content or any layout-related classes.
Before the introduction of the PropertyCache
, if the author were to specify an explicit font-size of "10pt" on all the fo:table-cell}}s, this would lead to an equal amount of distinct {{FixedLength
instances. This could be avoided by making use of property inheritance: specify the property on the ancestor fo:table
, and through inheritance, only one FixedLength
instance was needed for the entire table, as the same instance would be picked up by the PropertyList}}s responsible for resolving the cells' properties. Ultimately, this could be brought down to one instance for the whole document, by specifying the global font-size on the {{fo:root
. After the introduction of the PropertyCache
, using inheritance still has its benefits. Even if the memory consumption would be roughly the same, inheritance eliminates the need to parse the properties at all for the descendants.
Note: a fact overlooked by many is that, according to the XSL-FO Recommendation, any property may be specified on any formatting object. The fact that the font-size property does not apply to fo:root
still does not mean that the property will be entirely lost. Due to regular inheritance, all PropertyList}}s will ultimately use the value of the root {{PropertyList
unless an override is provided on a deeper level.
Still, this type of streamlining of the stylesheet is not always possible. In contexts where the stylesheet is generated by a third-party tool, there are likely to be many XML attribute specifications that, from the point of view of the formatter would be redundant. On another note, there are also many <enum>
properties that share the same possible values, but are not inherited.
The initial idea for further optimization in that area was given to FOP by Richard Wheeldon in a Bugzilla patch (see: https://issues.apache.org/bugzilla/show_bug.cgi?id=41044).
In its most basic form, some Property
subclasses that did not require a layout context for resolution were equipped with a private static WeakHashMap
that retained all created instances to which other objects held hard references, and accompanying implementations for equals()
and hashCode()
. Their constructors were made inaccessible from outside the package, and instead new instances were created via a public static getInstance()
method. That method creates a new instance, performs a get()
on the HashMap
, adds it if it does not exist yet, and returns either the result of the get()
or the newly created instance.
All the related constructors are simple parameter assignment constructors, like:
public MyProperty(int someInt, Object someObj) { this.primitiveValue = someInt; this.referenceValue = someObj; }
and the generated instances are short-lived. so they only have to be moved in the heap (for Sun JVMs, 'out of eden': see http://java.sun.com/docs/hotspot/gc1.4.2/#2.%20Generations|outline) if they are really new instances to be added to the map. As a consequence, the involved overhead should be minimal in a language like Java, especially with modern VMs. The well-known warning about instantiation rates in the early days now more apply to circumstances where one would trigger a chain of constructors with every instantion (e.g. if the second assignment would be this.referenceValue = new OtherClass();
), and the OtherClass constructor might again trigger construction of a new object (which would increase the size of the young generation, and possibly cause relocations of instances in the heap).
As the number of objects to which this approach was applied, began to grow, the decision was made to implement a dedicated class that could be used by any of the Property
classes. The concern for multi-session environments was first taken into account by wrapping the WeakHashMap
in a standard synchronized map. The PropertyCache
itself would only have one or more fetch(Object obj)
methods as public access points, that took care of the operations that were initially performed within the Property
subclasses (see above), so the idiom had become simply:
public static Property getInstance(int someInt, Object someObj) { return myPropertyCache.fetch(new MyProperty(someInt, someObj)); }
The newly created instance would either be returned from this method, or disappear into oblivion and be replaced by the corresponding entry in the map.
After fixing some errors in the added hashCode()
and equals()
implementations, the first version of the PropertyCache
was silently released to the public with FOP 0.94. No big commotion was made back then. As this was still restricted to only a handful of properties, the assumed effect would be only minimal.
In the meantime development went further on the trunk. After reading some more documentation on multi-threading in Java and studying the concepts of concurrency and contention, it became obvious that there was a possible performance bottleneck involved in using Collections.synchronizedMap();
. Java 1.5 has its own java.util.concurrent
package, but FOP still targets 1.4 environments, so those could not be used. Even then, the util.concurrent
package only offers a ConcurrentHashMap
, no variant using WeakReference}}s, which would be needed to implement a cache that is shared by many threads. Besides that, a {{Map
always contains mappings. That is: every entry has a key and a value. But FOP needed only a weakly referenced key...
Structure
The following article at IBM gave a starting point for building our very own cache: http://www.ibm.com/developerworks/java/library/j-jtp08223/index.html
As pointed out at the start of the article, this is not for the faint of heart. Maybe we shouldn't have tried it at home, but we did anyway.
Thinking it over a bit more, it became apparent that the cache did not need to expose a public remove()
operation at all, which already made it a bit simpler. It didn't need to expose anything but the mentioned fetch()
. Since the approach was based on WeakReference
, all we needed to do was make sure the map never grew unnecessarily large. The weak references would be cleared automatically. The reference objects themselves were removed from the cache upon each put()
. Initially, rather audaciously, this was done in separate threads (as if implementing our own ConcurrentWeakHashSet
was not yet enough... ) Pretty soon, it became obvious that also using threading internally in the map was taking it one step too far. The cleanup cycle was moved, the threading abandoned, and a ReferenceQueue
introduced.
The private cleanSegment()
method is triggered upon each put()
, but only if the map segment in question grows beyond twice the number of buckets in the map. Starting with a map of 8 buckets, the cleanup will be attempted every time a map segment grows beyond 16 elements.
A segment is simply an object on which can be synchronized to lock a portion of the buckets (see also the IBM article), and whose element-count corresponds to the total for all those buckets. That element-count is updated with each put()
, and during the cleanup. At first, each segment corresponds to 1 bucket, since the map only contains 8, but once the number of buckets grows beyond 32, there will be multiple buckets per segment.
The cleanup does no more than check the reference queue of the segment in question for stale entries, and if there are any, removes the corresponding references from the cache. Upon return of the cleanup, there is a check on whether it had any effect, and the number of elements for the segment has decreased. If not, then a boolean switch is set, indicating that the map segment in question votes for a redistribution of the elements. If more than 8 of the map segments agree, a rehash()
is performed, which doubles the amount of buckets to 16. Since each bucket is its own segment, the first rehash will only be triggered if all of the buckets grow beyond size 16. So in that case, the cache would contain more than 128 elements. The second rehash then, will be triggered once 8 of the 16 buckets grows beyond size 32, or a total size of over 256 elements. The third when 8 of 32 grow over 64, so 512 elements. The fourth when 8 of 32 grow over 128, or 1024.
Note that 1024 property instances corresponding to distinct values is in most general use-cases already more than sufficient to cover the distinct values for all those properties together. The caches are divided over the applicable Property
subclasses, so it will likely never get to a fifth, which would mean 2K distinct values for one property type.
Since a rehash needs exclusive access to the map, this is done by recursively obtaining locks on all of the segments. The initial entry of the rehash()
method obtains the first lock, then calls itself, and does this until all locks are acquired. From that point on, no other thread can get access to any of the segments, and will have to wait until the rehash is completed. That is: each thread that would need to put a new instance in the cache. Calls to get()
can still be served, if a corresponding instance already exists, since it first tries the usual way, without synchronization.
The fetch()
method has been exposed in such a way that it only has public entry points for those types useful in this context, mostly for convenience. Calls to those are routed to the private generic method by casting the parameter instance to Object
, which triggers a get()
on the internal map. If there is already a weak reference to an equal instance, we get a hard reference to that one.
The approach has been applied to a number of Property
subtypes, roughly corresponding to those types of property values that can be resolved at parse-time (absolute lengths, strings, numbers, enums...), and to property-bundles consisting solely of such properties (CommonFont
and CommonHyphenation
).
(to be continued)