Tika supports EXIFTool now through the External parser. Read on to find out how to use it.
Download and install EXIFTool
EXIFTool is a wonderful tool that reads videos, images, audio and other media files and that extracts EXIF metadata from them. If you're lucky, you can install EXIFTool with the following commands.
On Mac
brew install exiftool
On Linux (CentOS)
sudo yum install perl-Image-ExifTool
To verify that EXIFTool works correctly, run:
exiftool -ver
which should output something like: 9.72
Using EXIFTool with Tika
To use EXIFTool you'll need a custom Tika config that will override Tika's default MP4 parser (if you are dealing with MP4 files). You can do so by creating a file such as the one below:
<properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> </parser> <parser class="org.apache.tika.parser.mp4.MP4Parser"> <mime-exclude>video/mp4</mime-exclude> </parser> <parser class="org.apache.tika.parser.external.CompositeExternalParser"> <mime>video/mp4</mime> </parser> </parsers> </properties>
Note that this config file initializes the DefaultParser a CompositeParser, and the CompositeExternalParser, and the MP4Parser. For the MP4Parser, it uses a new directive, mime-exclude, to exclude that parser from the video/mp4
type, and then to declare that CompositeExternalParser will support video/mp4
. Since EXIFTool is an ExternalParser this configuration will make sure it gets called.
Once you have the config file made above, save it as a file, e.g., exif-tika-config.xml
in the current directory. Then to call Tika, you can use Tika-App and/or Tika Server.
Using Tika-App
Use the following command on a file, e.g., spaghetti-to-sushi.mp4
:
java -Dtika.config=exif-tika-config.xml -classpath tika-app/target/tika-app-1.9-SNAPSHOT.jar org.apache.tika.cli.TikaCLI -m spaghetti-to-sushi.mp4
This should output:
Audio Bits Per Sample: 16 Audio Channels: 2 Audio Format: mp4a Audio Sample Rate: 22050 Average Bitrate: 0 Avg Bitrate: 1.26 Mbps Balance: 0 Bit Depth: 24 Buffer Size: 0 Compatible Brands: mp41 Compressor ID: avc1 Compressor Name: h264 Content Create Date: created.with.SUPER(C).v2006.19 Content Create Date (ja): created.with.SUPER(C).v2006.19 Content-Length: 353985630 Content-Type: video/mp4 Create Date: 2006:12:17 18:50:47 Current Time: 0 s Duration: 0:37:19 Elementary Stream Track: 201 101 ExifTool Version Number: 9.72 File Access Date/Time: 2015:05:25 21:18:08-07:00 File Inode Change Date/Time: 2014:09:26 20:32:27-07:00 File Modification Date/Time: 2011:07:28 13:01:54-07:00 File Name: spaghetti-to-sushi.mp4 File Permissions: rwxr-xr-x File Size: 338 MB File Type: MP4 Graphics Mode: srcCopy Handler Description: GPAC MPEG-4 BIFS Handler Handler Type: Metadata Handler Vendor ID: Apple Image Height: 480 Image Size: 640x480 Image Width: 640 MIME Type: video Major Brand: MP4 v2 [ISO 14496-14] Matrix Structure: 1 0 0 0 1 0 0 0 1 Max Bitrate: 0 Media Create Date: 2006:12:16 20:07:48 Media Duration: 1.00 s Media Header Version: 0 Media Language Code: und Media Modify Date: 2006:12:16 20:07:48 Media Time Scale: 90000 Minor Version: 0.0.1 Modify Date: 2006:12:17 18:50:47 Movie Data Offset: 473003 Movie Data Size: 353512586 Movie Header Version: 0 Next Track ID: 201 Op Color: 0 0 0 Other Format: mp4s Poster Time: 0 s Preferred Rate: 1 Preferred Volume: 100.00 Preview Duration: 0 s Preview Time: 0 s Rotation: 0 Selection Duration: 0 s Selection Time: 0 s Source Image Height: 480 Source Image Width: 720 Time Scale: 90000 Title: From Spaghetti to Sushi.mpeg Title (ja): From Spaghetti to Sushi.mpeg Track Create Date: 2006:12:17 18:50:47 Track Duration: 0:37:19 Track Header Version: 0 Track ID: 201 Track Layer: 0 Track Modify Date: 2006:12:16 20:07:48 Track Volume: 0.00 Vendor ID: FFmpeg Video Frame Rate: 25 X Resolution: 72 X-Parsed-By: org.apache.tika.parser.CompositeParser X-Parsed-By: org.apache.tika.parser.external.CompositeExternalParser X-Parsed-By: org.apache.tika.parser.external.ExternalParser Y Resolution: 72 resourceName: spaghetti-to-sushi.mp4
Using Tika Server
You can also use Tika-Server. First, start it up:
java -Dtika.config=exif-tika-config.xml -classpath tika-server/target/tika-server-1.9-SNAPSHOT.jar org.apache.tika.server.TikaServerCli
Now, PUT a file to it, e.g., spaghetti-to-sushi.mp4
:
curl -T $HOME/Movies/spaghetti-to-sushi.mp4 -H "Content-Disposition: attachment;filename=spaghetti-to-sushi.mp4" http://localhost:9998/rmeta
Which should return:
[ { "Audio Bits Per Sample":"16", "Audio Channels":"2", "Audio Format":"mp4a", "Audio Sample Rate":"22050", "Average Bitrate":"0", "Avg Bitrate":"1.26 Mbps", "Balance":"0", "Bit Depth":"24", "Buffer Size":"0", "Compatible Brands":"mp41", "Compressor ID":"avc1", "Compressor Name":"h264", "Content Create Date":"created.with.SUPER(C).v2006.19", "Content Create Date (ja)":"created.with.SUPER(C).v2006.19", "Content-Type":"video/mp4", "Create Date":"2006:12:17 18:50:47", "Current Time":"0 s", "Duration":"0:37:19", "Elementary Stream Track":"201 101", "ExifTool Version Number":"9.72", "File Access Date/Time":"2015:05:25 21:20:47-07:00", "File Inode Change Date/Time":"2015:05:25 21:20:46-07:00", "File Modification Date/Time":"2015:05:25 21:20:46-07:00", "File Name":"apache-tika-3052147227532168299.tmp", "File Permissions":"rw-r--r--", "File Size":"338 MB", "File Type":"MP4", "Graphics Mode":"srcCopy", "Handler Description":"GPAC MPEG-4 BIFS Handler", "Handler Type":"Metadata", "Handler Vendor ID":"Apple", "Image Height":"480", "Image Size":"640x480", "Image Width":"640", "MIME Type":"video", "Major Brand":"MP4 v2 [ISO 14496-14]", "Matrix Structure":"1 0 0 0 1 0 0 0 1", "Max Bitrate":"0", "Media Create Date":"2006:12:16 20:07:48", "Media Duration":"1.00 s", "Media Header Version":"0", "Media Language Code":"und", "Media Modify Date":"2006:12:16 20:07:48", "Media Time Scale":"90000", "Minor Version":"0.0.1", "Modify Date":"2006:12:17 18:50:47", "Movie Data Offset":"473003", "Movie Data Size":"353512586", "Movie Header Version":"0", "Next Track ID":"201", "Op Color":"0 0 0", "Other Format":"mp4s", "Poster Time":"0 s", "Preferred Rate":"1", "Preferred Volume":"100.00", "Preview Duration":"0 s", "Preview Time":"0 s", "Rotation":"0", "Selection Duration":"0 s", "Selection Time":"0 s", "Source Image Height":"480", "Source Image Width":"720", "Time Scale":"90000", "Title":"From Spaghetti to Sushi.mpeg", "Title (ja)":"From Spaghetti to Sushi.mpeg", "Track Create Date":"2006:12:17 18:50:47", "Track Duration":"0:37:19", "Track Header Version":"0", "Track ID":"201", "Track Layer":"0", "Track Modify Date":"2006:12:16 20:07:48", "Track Volume":"0.00", "Vendor ID":"FFmpeg", "Video Frame Rate":"25", "X Resolution":"72", "X-Parsed-By":[ "org.apache.tika.parser.CompositeParser", "org.apache.tika.parser.external.CompositeExternalParser", "org.apache.tika.parser.external.ExternalParser" ], "X-TIKA:parse_time_millis":"3638", "Y Resolution":"72", "resourceName":"spaghetti-to-sushi.mp4" } ]