...
This tutorial will appeal to Nutch administrators looking to improve runtime speed whilst maintaining MapReduce’s ability to scale to petabytes of data. Readers are encouraged to share their experienced using Nutch on Tez.
Related JIRA Tickets
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
What is Apache Tez?
Apache Tez is described as an application framework which allows for a complex directed-acyclic-graph (DAG) of tasks for processing data. It is currently built atop Apache Hadoop YARN.
...
Running the Generator job on Tez
Run # | YARN Engine | # of URLS | Elapsed Time |
---|---|---|---|
1 | MapReduce | 11322 | 00:01:19 |
2 | MapReduce | 11322 | 00:01:18 |
3 | MapReduce | 11322 | 00:01:22 |
4 | MapReduce | 11322 | 00:01:23 |
5 | Tez | N/A | N/A |
6 | Tez | N/A | N/A |
7 | Tez | N/A | N/A |
8 | Tez | N/A | N/A |
As of it was discovered that the Generator job is incompatible with Tez. The job execution log below details the outcome.
...