The 0.12.0 release of Apache Knox had a focus on the KnoxShell module of the product. This module as been getting some uptake recently and a number of improvements were made in its security, API classes, credential collectors and even structure and packaging.

The KnoxShell release artifact provides a small footprint client environment that removes all unnecessary server dependencies, configuration, binary scripts, etc. It is comprised a couple different things that empower different sorts of users.

  • A set of SDK type classes for providing access to Hadoop resources over HTTP
  • A Groovy based DSL for scripting access to Hadoop resources based on the underlying SDK classes
  • A KnoxShell Token based Sessions to provide a CLI SSO session for executing multiple script

This article will go over the KnoxShell QuickStart from download to actively scripting in a few minutes.

This particular article should work using the 0.12.0 knoxshell download with previous gateway server releases as well. Subsequent articles may focus on new feature additions from 0.12.0.

Download

In the 0.12.0 release, you may get to the knoxshell download through the Apache Knox site.

From this above page click the Gateway client binary archive link or just use the one here.

Unzip this file into your preferred location which will result in a knoxshell-0.12.0 directory and we will refer to that location as the {GATEWAY_HOME}.

CD {GATEWAY_HOME}

You should see something similar to the following:

home:knoxshell-0.12.0 larry$ ls -l
total 296
-rw-r--r--@  1 larry  staff  71714 Mar 14 14:06 LICENSE
-rw-r--r--@  1 larry  staff    164 Mar 14 14:06 NOTICE
-rw-r--r--@  1 larry  staff  71714 Mar 15 20:04 README
drwxr-xr-x@ 12 larry  staff    408 Mar 15 21:24 bin
drwxr--r--@  3 larry  staff    102 Mar 14 14:06 conf
drwxr-xr-x+  3 larry  staff    102 Mar 15 12:41 logs
drwxr-xr-x@ 18 larry  staff    612 Mar 14 14:18 samples
Directory
Description
bin
contains the main knoxshell jar and related shell scripts
conf
only contains log4j config
logs
contains the knoxshell.log file
samples
has numerous examples to help you get started

Setup Truststore for Client

Get/setup truststore for the target Knox instance or fronting load balancer

  • if you have access to the server you may use the command 
    • knoxcli.sh export-cert –type JKS
  • copy the resulting gateway-client-identity.jks to your user home directory
  • you may also ask your Knox administrator to provide you with the public cert for the gateway and create your own truststore within your user home directory

NOTE: if you see errors related to SSL and PKIX your truststore is not properly setup

Execute a Sample Script

 

Execute the an example script from the {GATEWAY_CLIENT_HOME}/samples directory - for instance:

 

bin/knoxshell.sh samples/ExampleWebHdfsLs.groovy

home:knoxshell-0.12.0 larry$ bin/knoxshell.sh samples/ExampleWebHdfsLs.groovy

Enter username: guest
Enter password:
[app-logs, apps, mapred, mr-history, tmp, user]

 

At this point, you should have seen something similar to the above output - probably with different directories listed. You should get the idea from the above. Take a look at the sample that we ran above:

import groovy.json.JsonSlurper
import org.apache.hadoop.gateway.shell.Hadoop
import org.apache.hadoop.gateway.shell.hdfs.Hdfs

import org.apache.hadoop.gateway.shell.Credentials

gateway = "https://localhost:8443/gateway/sandbox"

credentials = new Credentials()
credentials.add("ClearInput", "Enter username: ", "user")
                .add("HiddenInput", "Enter pas" + "sword: ", "pass")
credentials.collect()

username = credentials.get("user").string()
pass = credentials.get("pass").string()

session = Hadoop.login( gateway, username, pass )

text = Hdfs.ls( session ).dir( "/" ).now().string
json = (new JsonSlurper()).parseText( text )
println json.FileStatuses.FileStatus.pathSuffix
session.shutdown()

Some things to note about this sample:

  1. the gateway URL is hardcoded
    • alternatives would be passing it as an argument to the script, using an environment variable or prompting for it with a ClearInput credential collector
  2. credential collectors are used to gather credentials or other input from various sources. In this sample the HiddenInput and ClearInput collectors prompt the user for the input with the provided prompt text and the values are acquired by a subsequent get call with the provided name value.
  3. The Hadoop.login method establishes a login session of sorts which will need to be provided to the various API classes as an argument.
  4. the response text is easily retrieved as a string and can be parsed by the JsonSlurper or whatever you like

A follow up article will cover the use of the Knox Token service and related KnoxShell commands and credential collectors.

 

  • No labels