The main development environment for Impala uses Ubuntu 16.04 with the bash shell. The bin/bootstrap_development.sh script initializes a fresh Impala development environment for Ubuntu 16.04. It will alter your environment, including ~/.ssh/config and /etc/hosts, so consider running it in a VM or container. For instructions specific to Docker, see Impala Development Environment inside Docker.

Machine recommendations:

  • Impala requires 120GB of available disk space for a fully functional environment. An SSD is strongly recommended.
  • Impala compilation is CPU intensive. At least 4 CPUs are recommended. More CPUs will speed compilation.
  • Some Impala tests are memory intensive. 32GB of memory is recommended to be able to run all tests locally.

Quick start commands:

# Pick a location to use for your Impala environment
export IMPALA_HOME=your/desired/directory

# Get the source
git clone https://gitbox.apache.org/repos/asf/impala.git ${IMPALA_HOME}

# Run the bootstrap script
${IMPALA_HOME}/bin/bootstrap_development.sh

Note:

The source, a full build, and creating and importing all the test data requires approximately 120G of available space.  If you see an error in the console while running bootstrap_development.sh similar to "FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)", and if you see an warning in hdfs-namenode.log similar to "org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3", this may be an indication that you don't have enough available space. This can happen even if system utilities show free disk space.

If you encounter this error when rebuilding an existing cluster, clean up accumulated files in: ${IMPALA_HOME}/logs, ${IMPALA_HOME}/be/build. Remove older versions of cdh_components-xxxxxx in ${IMPALA_HOME}/toolchain. Use df to check disk space.

  • No labels