Page tree
Skip to end of metadata
Go to start of metadata


Here is a regular Cocoon Generator for SXW OpenOffice Writer documents. OO becomes more and more a real world production text tool. The original format is quite xml, except it is zipped, and could embed other resources (images and other media). This generator provides text informations, metadatas, and enough styles to resolve some tagging from XSL.


Tested on thousands of OO docs in production context, proposed as a Cocoon commit if someone wants it. All licence mentions could be let to Apache, except author name and company.


[FG] FredericGlorieux


  • 2004-10-17:[FG] Implemented as generator

  • 2004-06-30:[FG] Creation as a accheable XSP


Verify your XML catalog for resolution of entities. Default Cocoon Catalog here
have the declaration
PUBLIC "-// OfficeDocument 1.0//EN" "open-office/dummy.dtd"
where dummy.dtd is an empty file. This is enough to parse XML, but the file is not validate.

<map:sitemap xmlns:map="">
      <map:generator name="sxw" label="content" logger="sitemap.generator.sxw" src="org.apache.cocoon.generation.SXWGenerator"/>
      <map:match pattern="**.html">
        <map:select type="exists">
          <!-- OpenOfficeWriter -->
          <map:when test="{context-attr:xfolio.efolder}{1}.sxw">
            <map:generate type="sxw" src="{context-attr:xfolio.efolder}{1}.sxw"/>
            <map:transform src="transform/oo/oo2html.xsl"/>


OpenOffice Writer documents could be used like other XML documents as sources. To provide an HTML view or some other transformation, but also to extract metadata on vast amount of documents. So, performances is an issue, a generator should be cacheable.

This was first implemented as an XSP, easy for testing, and for people to integrate in their cocoon apps, it's now a Cocoon generator, ready to be commited in cocoon.

Fast performances test (thanks to profiler)

  • direct pipe cost ~10 ms
  • xinclude cost ~160ms
  • cinclude ~320ms
  • SXWGenerator cost ~200ms on first call, but is cached (~40 ms on second call)


  • default src="jar:myzip!myentry.xml" seems to load the file entry in memory, but never remember the changes (isnt'it the goal to load a class from a jar one type ?)


For now, the best solution is in Forrest, using the new zip protocol from cocoon, able to resolve things like src="zip://content.xml@{folder}/test.sxw". This is a part of the trick. The problem of their solution is to use a cinclude transformer, which is not cacheable. You need to regenerate your aggregation each time it is requested. This is not a big problem in Forrest context (essentially generate a site), but on a real life server...

Their solution looks like that

<map:generate src="transform/cocoon/dummy.xml"/>
<map:transform src="transform/cocoon/sxw2oo.xsl">
  <map:parameter name="src" value="{context-attr:xfolio.efolder}{1}.sxw"/>
<map:transform type="cinclude"/>
<map:serialize type="xml"/>

The dummy.xml is only there because you need something to begin with pure cocoon. The job is done by the XSL, to write something like that.

<office:document xmlns:**>
  <c:include select="/*/*">
    <xsl:attribute name="src">zip://meta.xml@<xsl:value-of select="$src"/></xsl:attribute>
  <c:include select="/*/*">
    <xsl:attribute name="src">zip://content.xml@<xsl:value-of select="$src"/></xsl:attribute>

Problems are

  • an Xpath include is expensive (all document need to be loaded as DOM)
  • Cinclude (like Xinclude) haven't seem to be cacheable for me

You may say, try a <map:agreggate/>... I tried, and produce so much problems of validation that I stopped.



cited by

about xfolio

about Open Office




  • No labels