This page documents how to build Docker containers for the various Impala daemon processes, suitable for deployment in Docker or another system like Kubernetes that can deploy Docker containers. If you want to do Impala development inside a Docker container, see Impala Development Environment inside Docker instead.
These instructions assume that you have an Impala development environment set up and that you can build Impala and run Impala tests. If not, see "Getting Started" on Impala Home.
Building containers and publishing to your local Docker repo
Building and publishing containers is integrated into the Impala CMake files, with dependencies on all the build artifacts that go into the containers. Building is as simple as:
You may want to set USE_CDP_HIVE in bin/impala-config-local.sh to build containers with Hive 3 support
Note - in future it would be nice if we could have a flag to buildall.sh to also build this target.
You can check if the images are in your local repository with the below command:
Pushing Images to a Repository
./docker/push-images.sh is a script that can push the built images to a docker repository. See that script's help for more information.
Running Dockerized Minicluster
As an initial step, you will need to set up a docker bridge network for the dockerized daemons to communicate over. We have a script ./docker/configure_test_network.sh to automate the setup. See that script for more details. You need to run it with the desired network name as the first argument, as follows:
The network setup changed some settings in impala-config-local.sh. You need to regenerate cluster configs and restart your minicluster services before starting Impala so that services like HDFS, HMS, etc will listen for connections on your new docker bridge network.
Once those services are up, you should be able to run dockerised Impala. If you previously had an Impala minicluster running, you must kill any non-dockerised Impala processes so they are not listening on the same ports used by the dockerized daemons. If the cluster starts up successfully, you should be able to run some queries via impala-shell.
Note that querying any existing tables is likely to fail because "localhost" is baked into a lot of metadata. You will need to load, or re-load data before running end-to-end tests.
You can see the running docker containers with "docker ps"
Automated end-to-end tests are run with this job: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/. https://github.com/apache/impala/blob/master/bin/jenkins/dockerized-impala-bootstrap-and-test.sh bootstraps ubuntu16.04 from scratch and runs the tests.
Tips for Working with Dockerized Minicluster
- The Impala debug pages are mostly exposed on the same ports as the regular minicluster, i.e. localhost:25000, localhost:25001, localhost:25002, localhost:25010, localhost:25020. This is achieved by mapping from the default webserver ports inside the container to the desired ports outside the container. I.e. all of the Impala daemons are exposing their webserver on port 25000 inside the container.
- If you want to look at logs or other state in a running container, you can use "docker exec" to run a bash process inside a container (using the name or ID from "docker ps"). Inside the container is a very stripped down Ubuntu environment, so it may be missing many commands you're accustomed to!
Switching between Dockerized and Non-Dockerized Minicluster
This is not totally streamlined. Here are some rough notes about the issues you may run into:
- HMS and Kudu metadata has hostnames embedded in various places, so if you did data load with the non-dockerized cluster, you will likely not be able to access any tables with a dockerized cluster.
- If you load data with a dockerized cluster, you can generally access tables with the non-dockerized cluster so long as you keep your bridge network around and all of the processes are listening on the bridge network's gateway IP.
- You will likely need to explicitly kill running impala processes with start-impala-cluster.py --kill to avoid port conflicts.