With TIKA-605, you can now use Tika to parse geospatial file formats! To figure out how, read on.

Install GDAL

If you're lucky this will work:

$ brew install gdal --complete

Errors encountered with brew and Mavericks

Note if you encounter errors while upgrading to Mavericks here, the answer is to first:

$ brew rm $(join <(brew leaves) <(brew deps gdal --complete ))

Note the above instructions are definitely Mac centric. We recommend checking out GDAL's Website for specific instructions on installing GDAL on your operating system.

Once GDAL is installed, the following command should be available on your path.


Running gdalinfo should produce something like:

Usage: gdalinfo [--help-general] [-mm] [-stats] [-hist] [-nogcp] [-nomd]
                [-norat] [-noct] [-nofl] [-checksum] [-proj4]
                [-listmdd] [-mdd domain|`all`]*
                [-sd subdataset] datasetname

FAILURE: No datasource specified.

If that works you are in business!

Using Tika and GDAL

To use Tika and GDAL grab the 1.7-SNAPSHOT latest of Tika and then grab a geospatial file, e.g., this example will use a Flexible Image Transport System (FITS) file as an example. Then run:

java -jar tika-app-1.7-SNAPSHOT.jar -m WFPC2u5780205r_c0fx.fits

This should produce, e.g.,

ALLG-MAX: 3.777701E3
ALLG-MIN: -7.319537E1
X-Parsed-By: org.apache.tika.parser.DefaultParser
X-Parsed-By: org.apache.tika.parser.gdal.GDALParser

If you see X-Parsed-By: ..GDALParser and a bunch of geospatial metadata, you are in business!

Using Tika Server and GDAL

Once you have GDAL and a fresh build of Tika 1.7-SNAPSHOT (including Tika server), you can easily use Tika-Server with GDAL. For example, to post a FITS file to the server and get back its metadata, run the following commands:

in another window, start Tika server

java -jar /path/to/tika-server-1.7-SNAPSHOT.jar

in another window, issue a cURL request

curl -T /path/to/fits/image.fits http://localhost:9998/tika --header "Content-type: application/fits"

Note on FITS dependencies

On TIKA-2684, Susan Borda, reports on some important steps to get a full FITS parse with GDAL. See Susan's comment, and her pointer to properly loading fitsio.

  • No labels