Tika supports EXIFTool now through the External parser. Read on to find out how to use it.
Download and install EXIFTool
EXIFTool is a wonderful tool that reads videos, images, audio and other media files and that extracts EXIF metadata from them. If you're lucky, you can install EXIFTool with the following commands.
brew install exiftool
On Linux (CentOS)
sudo yum install perl-Image-ExifTool
To verify that EXIFTool works correctly, run:
which should output something like:
Using EXIFTool with Tika
To use EXIFTool you'll need a custom Tika config that will override Tika's default MP4 parser (if you are dealing with MP4 files). You can do so by creating a file such as the one below:
Note that this config file initializes the DefaultParser a CompositeParser, and the CompositeExternalParser, and the MP4Parser. For the MP4Parser, it uses a new directive, mime-exclude, to exclude that parser from the
video/mp4 type, and then to declare that CompositeExternalParser will support
video/mp4. Since EXIFTool is an ExternalParser this configuration will make sure it gets called.
Once you have the config file made above, save it as a file, e.g.,
exif-tika-config.xml in the current directory. Then to call Tika, you can use Tika-App and/or Tika Server.
Use the following command on a file, e.g.,
This should output:
Using Tika Server
You can also use Tika-Server. First, start it up:
Now, PUT a file to it, e.g.,
Which should return: