This page documents how to do Impala development inside a Docker container. This allows you to isolate your development environment from the rest of your system. If you want to build a containerized version of Impala suitable for production deployment with one daemon process per container, see Build and Test for Daemon Docker Containers.
If you don't have an Ubuntu 14.04 or 16.04 environment available, you can use Docker to develop. First, install Docker as you normally would. Make sure the resource limit of your Docker Engine is at least 4 CPU cores and 8GB RAM (the more the better). For example, for docker on Mac. Go to Preferences → Advanced:
Then back to your terminal,
docker pull ubuntu:16.04 # SYS_TIME is required for kudu to work. The container will be able to change the time of the host. # -p options expose the container's ports to the host. You can add more in need. # If you need to share files between the container and the host, add another -v option, e.g. "-v ~/Downloads/:/HostShared" docker run --cap-add SYS_TIME --interactive --tty --name impala-dev -p 25000:25000 -p 25010:25010 -p 25020:25020 ubuntu:16.04 bash
Now, within the container:
apt-get update apt-get install sudo adduser --disabled-password --gecos '' impdev echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers su - impdev
Then, as impdev in the container:
sudo apt-get --yes install git git clone https://git-wip-us.apache.org/repos/asf/impala.git ~/Impala cd ~/Impala export IMPALA_HOME=`pwd`
# See https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala for developing Impala. $IMPALA_HOME/bin/bootstrap_development.sh
or
# See https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala for testing Impala. $IMPALA_HOME/bin/bootstrap_system.sh source $IMPALA_HOME/bin/impala-config.sh $IMPALA_HOME/buildall.sh -noclean -notests $IMPALA_HOME/bin/create-test-configuration.sh -create_metastore -create_sentry_policy_db $IMPALA_HOME/testdata/bin/run-all.sh $IMPALA_HOME/bin/start-impala-cluster.py
When that's done, start developing! When you're ready to pause, in a new terminal in the host:
docker commit impala-dev && docker stop impala-dev
When you're ready to get back to work:
docker start --interactive impala-dev
If instead of committing your work and stopping the container, you just want to detach from it, use ctrl-p ctrl-q. You can re-attach using the start command.
Each time you restart the container, remember to run $IMPALA_HOME/bin/bootstrap_system.sh to launch all the depended services.
Troubleshooting
1. MAKE processes are killed by errors like "collect2: error: ld terminated with signal 9 [Killed]" or failed by "No space left on device" error.
You need to allocate more disk/RAM space to your docker container. If you don’t have so large RAM in your host machine, try lower the concurrency for make by giving IMPALA_BUILD_THREADS a smaller number (defaults to #CPUs).
2. Build fails in the following error:
Creating postgresql database for Hive metastore dropdb: could not connect to database template1: could not connect to server: Connection refused Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? createdb: could not connect to database template1: could not connect to server: Connection refused Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? ERROR in /home/impdev/Impala/bin/create-test-configuration.sh at line 149: createdb -U hiveuser ${METASTORE_DB}
This usually happens when you restart you docker container. You just need to start postgresql manually. Find how to start postgresql in $IMPALA_HOME/bin/bootstrap_system.sh. For example, in Ubuntu:
sudo service postgresql start
3. Cannot start HBase in the minicluster:
localhost: ssh: connect to host localhost port 22: Cannot assign requested address running master, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-master-4846cae1a5dd.out : running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-regionserver-4846cae1a5dd.out running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-2-regionserver-4846cae1a5dd.out running regionserver, logging to /home/impdev/Impala/logs/cluster/hbase/hbase-impdev-3-regionserver-4846cae1a5dd.out Contents of HDFS root: [] Connecting to Zookeeper host(s). No handlers could be found for logger "kazoo.client" Could not connect to Zookeeper: Connection time-out ERROR in /home/impdev/Impala/testdata/bin/run-hbase.sh at line 136: ${CLUSTER_BIN}/check-hbase-nodes.py Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-hbase.20190705_00_25_47.xml ERROR in testdata/bin/run-all.sh at line 64: tee ${IMPALA_CLUSTER_LOGS_DIR}/run-hbase.log Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-all.20190705_00_25_47.xml
The first line of the errors shows the root cause. Check whether you can ssh to localhost by "ssh localhost whoami". Make sure your sshd service is started, and the setting of non password login is correct. Check them out in $IMPALA_HOME/bin/bootstrap_system.sh. For example, in Ubuntu:
sudo service ssh start
Verify these things in $IMPALA_HOME/bin/bootstrap_system.sh take effect:
mkdir -p ~/.ssh chmod go-rwx ~/.ssh if ! [[ -f ~/.ssh/id_rsa ]] then ssh-keygen -t rsa -N '' -q -f ~/.ssh/id_rsa fi { echo "" | cat - ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys; } && chmod 0600 ~/.ssh/authorized_keys echo -e "\nNoHostAuthenticationForLocalhost yes" >> ~/.ssh/config && chmod 0600 ~/.ssh/config
Usually, these errors can be avoided if you run $IMPALA_HOME/bin/bootstrap_system.sh after restarting the container.
Developing Impala with Dev Container
Refer to this doc.