Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note: the MetadataFilters only work with the /rmeta  endpoint.  Further, they do not shortcut metadata extraction within Parsers.  They only delete the unwanted fields after the parse.  This still can save resources in storage and network bandwidth.

A user can map Tika field names to names they prefer. If excludeUnmapped is set to true, only those fields that are included in the mapping are passed back to the client.

Code Block
languagexml
titleFieldNameMappingFilter
<properties>
  <metadataFilters>
    <metadataFilter class="org.apache.tika.metadata.filter.FieldNameMappingFilter">
      <params>
        <excludeUnmapped>true</excludeUnmapped>
        <mappings>
          <mapping from="X-TIKA:content" to="content"/>
          <mapping from="a" to="b"/>
        </mappings>
      </params>
    </metadataFilter>
  </metadataFilters>
</properties>



A user can set the following in a tika-config.xml file to have the /rmeta  end point only return three fields:

noformat
Code Block
languagexml
titleFieldNameMappingFilter
<properties>
  <metadataFilters>
    <metadataFilter class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
      <params>
        <param name="include" type="list"><include>
          <string>X<field>X-TIKA:content</string>include>
          <string>extended<field>extended-properties:Application</string>include>
          <string>Content<field>Content-Type</string>include>
        </param>
      </params>
    </metadataFilter>
  </metadataFilters>
</properties>

...

No Format
<properties>
  <metadataFilters>
    <metadataFilter class="org.apache.tika.metadata.filter.ExcludeFieldMetadataFilter">
      <params>
        <param name="exclude" type="list"><exclude>
          <string>X<field>X-TIKA:content</string>field>
          <string>extended<field>extended-properties:Application</string>field>
          <string>Content<field>Content-Type</string>field>
        </param>
      </params>
    </metadataFilter>
  </metadataFilters>
</properties>

...

No Format
<properties>
  <metadataFilters>
    <metadataFilter class="org.apache.tika.metadata.filter.ClearByMimeMetadataFilter">
      <params>
        <param name="mimes" type="list"><mimes>
          <string>image<mime>image/emf</string>mime>
        </param>mimes>
      </params>
    </metadataFilter>
  </metadataFilters>
</properties>

...