Prerequisites
DockerHub Repository and Access
To be able to release the Apache Tika Docker image on DockerHub you will need to have access to the apache/tika repository. This is controlled by the ASF Infra team and can be requested through a INFRA JIRA ticket. Make sure to tag the ticket with the Docker label.
tika-docker repo
This repository contains the Dockerfiles used to create the minimal and full images for Apache Tika. Its also containers helper examples and configurations.
General Information
Image Types
There are two image types:
- Minimal - containing just Apache Tika and it's base dependencies (i.e. Java)
- Full - containing Apache Tika, it's dependencies, as well as Tesseract and GDAL.
The Dockerfile for each image is in the correspondingly named directory, and are the only assets used to public the images.
Docker Compose Files
There are a number of Docker Compose files to allow users to quickly test certain scenarios:
- Recognising and Captioning Video and Images with TensorFlow REST (see here)
- Enriching Academic PDF Parsing with Grobid REST (see here)
- OCR of PDF or Images with Tesseract including a Custom Configuration (see here)
- Named Entity Recognition (see here)
These different scenarios use the corresponding configuration in the sample-configs directory.
Neither these Docker Compose YML files or the Sample Configurations are used for publishing Apache Tika's Docker image. They are only used to provide examples for complex configurations.
An example of using these is provided here.
docker-tool.sh
This shell file is a helper script used to simplify the building, testing and publication of the images.
It provides the following options:
- build - to build a minimal and full image of the passed in version
- test - to verify the built image can start and the version number be received back
- publish - to build the multi-arch images and publish the images on DockerHub (only for those who have access to the DockerHub repo)
republish-images.sh
This shell file was used to republish the older images when the Dockerfile was updated. It is redundant now but kept in the repo incase something similar needs done in the future.
Release Process
Update the README.md's
Available Tags
sectionUpdate the TAG version in
.env to be X.Y.Z.Q+1
Update the version in.travis.yml to be X.Y.Z.Q+1 X.Y.Z
- Update CHANGES.md to include this release, changes and release date
- Test the release as in the example below
- Commit the changes
- To release a new version of Apache Tika on DockerHub, you can follow the below steps (replacing 2.9.2 with the version number you wish to publish). As of 2.5.0, we started having to version our docker images even when based on the same Tika version. So, Docker tags might be 2.5.0.1 for Tika version 2.5.0. The first version in the commandlines is the Docker version, and the second version in the build command is the Tika version.
$ git clone https://github.com/apache/tika-docker && cd tika-docker $ ./docker-tool.sh build 2.9.2.1 2.9.2 $ ./docker-tool.sh test 2.9.2.1 # If you see the test passed, you can then build the multi-arch images and publish them: # NOTE THAT THIS STEP ALSO PUSHES THE *-latest tag. You may have to adjust the build script if you're pushing a BETA release! $ ./docker-tool.sh publish 2.9.2.1 2.9.2
6. If everything worked, tag the last commit
git tag -a 2.9.2.1 -m "New release for 2.9.2.1"
- git push --tags