Please see the docs for latest release in 1.99.* http://sqoop.apache.org/docs/ . Some of the information below might be outdated
Building from sources
Checkout sources and switch to sqoop2 branch:
Setting up a build environment with Eclipse
- Install Eclipse,
- Install maven if not already on your machine
- Install Oracle's JDK
- Run the following commands
- Import the project into eclipse by going to File > Import... > General > Existing Projects into Workspace > Next.
- In the next wizard window, click the browse button next to "Select root directory" and browse to the root of the workspace from where you have checked out sqoop2. This will populate about 10 projects into your workspace - all of which are different modules within Sqoop2. Click Finish button to get these projects into the workspace and start working.
Note - if this is the first time you are setting up Eclipse for a maven project, the import will show class path problems due to missing variable M2_REPO (Unbound classpath variable: 'M2_REPO/...). To fix this error, go to Preferences > Java > Build Path > Classpath Variables. Click on New..., enter name M2_REPO, click on Folder and browse upto the directory ~/.m2/repository. Click OK and close the preferences dialog. This will force the rebuild of the workspace and all projects should turn green.
Similar steps need to be followed with IDEA Intellij as well.
Setting up the Code Formatter
Quick commands to compile and run tests
Run all unit tests:
Run all integration tests :
Running integration tests does take up a lot of CPU, since these tests run on the actual execution engine ( such as Hadoop MR ) esp.
Run one integration test:
If you want to run tests against the postgres repository, have a working installation of postgres and then point to it when running tests. In the following case we have a working postgres installation as
Sadly, as of this writing it does not really run the integration tests, it runs only the unit tests.
Build sqoop :
Optionally you can build Sqoop with skipping tests ( both unit tests and integration tests )
Other handy commands that does build and run all tests from scratch
Creating Sqoop binaries
Now build and package Sqoop2 binary distribution:
This process will create a directory and a tarball under
dist/target directory. The directory (named
sqoop-2.0.0-SNAPSHOT-bin-hadoop200, depending on the hadoop profile used ) contains necessary binaries to run Sqoop2, and its structure looks something like below.
VB: There is NO lib folder under the client in the latest code as of this writing
As part of this process, a copy of the Tomcat server is also downloaded and put under the
server directory in the above structure.
If you are on particular release branch such as 1.99.4, all the artifacts in it will be created with the 1.99.4 build version. for instance sqoop-1.99.4-bin-hadoop200.tar.gz
Installing Sqoop2 on remote server
To install generated binaries on remote server simply copy directory
sqoop-2.0.0-SNAPSHOT to your remote server:
Sqoop server is depending on hadoop binaries, but they are not part of the distribution and thus you need to install them into Sqoop server manually. The latest hadoop version we support is 2.5.2 .
VB: There is no addtowar.sh in the in the latest code under sqoop-2.0.0-SNAPSHOT/bin as of this writing
To install hadoop libraries execute command
addtowar.sh with argument
-hadoop $version $location. Following example is for Cloudera distribution version 4(CDH4):
If you're running CDH4 MR1:
In case that you're running original Mapreduce implementation (MR1), you will also need to install it's jar:
You can install any arbitrary jars (connectors, JDBC drivers) using
-jars argument that takes list of jars separated by ":". Here is example for installing MySQL jdbc driver into Sqoop server:
Installing a new connector to Sqoop2
If you are contributing or adding a new connector say
sqoop-foo-connector to the sqoop2, here are steps to follow.
Step 1: Create a
sqoop-foo-connector.jar. Make sure the jar contains the
sqoopconnector.properties for it to be picked up by sqoop
A typical sqoopconnector.properties for a sqoop2 connector looks like below
Step 2: Add this jar to the a folder on your installation machine and update the path to this folder in the
sqoop.properties located under the
server/conf directory under the Sqoop2 for the key
Step 3: Start the server and while initalizing the server this jar should be loaded into the sqoop's class path and registered into the sqoop repository/
Starting/Stopping Sqoop2 server
To start Sqoop2 server invoke the
sqoop shell script:
The Sqoop2 server is then running as a web application within the Tomcat server.
Similarly, to stop Sqoop2 server, do the following:
Starting/Running Sqoop2 client
To start an interactive shell,
This will bring up an interactive client ready for input commands:
Please see the 5 min Demo Guide or the Command Line Shell Guide for the latest release 1.99.* http://sqoop.apache.org/docs/
Sqoop configuration files
Both the default bootstrap configuration
sqoop_bootstrap.properties and the main configuration
sqoop.properties are located under the
server/conf directory in the Sqoop2 distribution directory.
The bootstrap configuration
sqoop_bootstrap.properties controls what the mechanism is to provide configuration for different managers in the Sqoop.
The main configuration
sqoop.properties controls what the mechanism is for where the
- Where are the log files are, what the logging levels are?
- What is the repository used?
- What is the submission/ execution engine used?
- What is the Authentication mechanism used?
Debug Logs information
- The logs of the Tomcat server is located under the
server/logsdirectory in the Sqoop2 distribution directory, most relevant would be
- The logs of the Sqoop2 server as
sqoop.log(by default unless changed by the above sqoop.properties configuration file ) under the
(@LOGDIR)directory in the Sqoop2 distribution directory.
- The logs for the Derby repository is
derbyrepo.log(by default unless changed by the above sqoop.properties configuration file ) under the
(@LOGDIR)directory in the Sqoop2 distribution directory.