...
- The ForkParser – this forks a child process and will protect against OOM and infinite loops.
- tika-batch – if you are processing files at desktop/vm scale (not cloud scale), you can run tika-batch via tika-app:
java -jar tika-app.jar -i <input_dir> -o <output_dir>
tika-server – if you are using tika-server 1In Tika >= 2.x, start the server with `–spawnChild` mode, and it will fork a child process to do the actual parsing. In Tika 2.x, that mode is defaultthe parsing is done in a forked process by default. Clients need to be able to handle tika-server going offline when the forked parsing process has to restart.
Use tika-pipes in Tika 2.x, programmatically, in tika-app with the -a option or in tika-server with the /async or /pipes endpoints.
The Tika project has taken the following steps to identify and fix catastrophic problems:
...