MapReduce Streaming Job — POST mapreduce/streaming
Create and queue a Hadoop streaming MapReduce job.
Version: Hive 0.13.0 and later
As of Hive 0.13.0, GET version/hadoop displays the Hadoop version used for the MapReduce job.
Location of the input data in Hadoop.
Location in which to store the output data. If not specified, WebHCat will store the output in a location that can be discovered using the queue resource.
Location of the mapper program in Hadoop.
Location of the reducer program in Hadoop.
Add an HDFS file to the distributed cache.
Set a Hadoop configuration variable using the syntax
Set an environment variable using the syntax
Set a program argument.
A directory where WebHCat will write the status of the Map Reduce job. If provided, it is the caller's responsibility to remove this directory when done.
If statusdir is set and enablelog is "true", collect Hadoop job configuration and logs into a directory named
This parameter was introduced in Hive 0.12.0. (See HIVE-4531.)
|Optional in Hive 0.12.0+||None|
Define a URL to be called upon job completion. You may embed a specific job ID into this URL using
The standard parameters are also supported.
A string containing the job ID similar to "job_201110132141_0001".
A JSON object containing the information returned when the job was queued. See the Hadoop documentation (
Code and Data Setup
Prior to Hive 0.13.0, user.name was specified in POST requests as a form parameter:
curl -d user.name=<user>.
In Hive 0.13.0 onward, user.name should be specified in the query string (as shown above):
'. Specifying user.name as a form parameter is deprecated. <name>'