This table highlights some differences between some of the handlers. I've temporarily left in question marks for items we need to confirm.
|Feature||/tika (text|body)||/tika (html)||/tika (json)||/rmeta||/meta||/unpack|
|Text (including text of embedded documents)||Y||Y||Y||Y||N||Y ( with /unpack/all)|
|Metadata of main document||N||Y||Y||Y||Y||Y ( with /unpack/all)|
|Metadata of embedded documents/attachments||N||N||N||Y||N||N|
|Notification of parse exception||Y/N||Y/N||Y||Y||Y||Y?|
|Specific stacktrace if server is started with the -s (stacktrace) option||N||N||Y||Y||N||N|
|MetadataFilters are applied (see ModifyingContentWithHandlersAndMetadataFilters)||N||N||Y||Y||N||N|
|Notification of parse exception in embedded document||N||N||Y as of 2.4.1||Y||N||N?|
|Specific stacktrace for parse exception in embedded document||N||N||Y as of 2.4.1||Y||N||N|
|WriteLimit with the ||N||N||Y||Y||N/A||N|
|Actual attachments (raw bytes)||N||N||N||N||N||Y|
1 If the parse exception comes early in the parse before the streaming starts (as with an EncryptedDocumentException), you'll get an http status 422 in /tika (text) and /tika (html). With the
/tika (text) option, if the parse exception happens after content has started streaming, the stream will simply stop and you'll have no idea that there was a parse exception. With the
/tika (html) option, you'll see truncated html in /tika (html) if this happens.
2 Tika tries to stream while parsing and while writing the output. For some file formats, the parsers currently load the full document into memory and then write the content. So, this row focuses on whether Tika streams the writing of the content (and not the streaming read/parse of the file).