Apache KNOX provides a single gateway to many services in your Hadoop cluster. You can leverage the KNOX shell DSL interface to interact with services such as WebHdfs, WebHCat (Templeton), Oozie, HBase, etc. For example, using groovy and DSL you can submit Hive queries via WebHCat (Templeton) as simple as:
submitSqoop Job API
With version of Apache KNOX 0.10.0, you can now write application using KNOX DSL for Apache SQOOP and easily submit SQOOP jobs. The WebHCAT Job class in DSL language now supports submitSqoop() as follow:
submitSqoop Request takes the following arguments:
- command (String) - The sqoop command string to execute.
- files (String) - Comma separated files to be copied to the templeton controller job.
- optionsfile (String) - The remote file which contain Sqoop command need to run.
- libdir (String) - The remote directory containing jdbc jar to include with sqoop lib
- statusDir (String) - The remote directory to store status output.
which will return jobId as Response.
In this example we will run a simple sqoop job to extract scBlastTab table to HFDS from the public genome database (mySQL) at UCSC.
First, import the following packages:
Next, establish connection to KNOX gateway with Hadoop.login:
Define your SQOOP job (assuming SQOOP is already configured with mySql driver already):
You can now submit the sqoop_command to the cluster with submitSqoop:
You can then check job status and output as usual:
Here is sample output of the above example against Hadoop cluster. You need to have properly configured Hadoop cluster with Apache KNOX gateway, Apache Sqoop and WebHcat (Templeton). Test was ran against BigInsights Hadoop cluster.
From output above you can see the job output as well as the content of the table directory on HDFS which contains 5 parts (used 5 map tasks). WebHcat (Templeton) job console output will go to stderr in this case.
As part of compiling/running your code ensure you have the following dependency: org.apache.knox:gateway-shell:0.10.0.