Miscellaneous list of information.
ClassNotFound Exceptions when running Tez DAGs
- ClassNotFound exceptions can happen in 3 places:
- On the client-side where the Tez DAG is being submitted from
- The Tez Application Master
- The task runtime
- For the client-side issue, the simplest fix is to add the Tez jars and the config directory containing tez-site.xml to the classpath. i.e. "${TEZ_CONF_DIR}:${TEZ_HOME}/*:${TEZ_HOME}/lib/*"
- If you are using the hadoop jar command, the classpath for it can be augmented by setting HADOOP_CLASSPATH to the above value.
- For the Tez runtime ( i.e. both the AM and tasks ):
- One common approach to add jars/files to the classpath everywhere is to use the "tez.aux.uris" config parameter. This can be done by either setting this param directly in tez-site.xml or setting it via -Dtez.aux.uris= on the command-line for hadoop jar commands or for applications that implement the Tool interface.
- The values specified here will be made available in the environments of both the AM and all the tasks.
- Another global approach is to configure "tez.cluster.additional.classpath.prefix". This is a config driven approach and can be used when all the required resources are available on all nodes in the cluster.
- To have more control on the resources i.e. separate set of jars for the AM or for tasks of a particular DAG, this can be done programmatically by invoking the following APIs:
- For the AM: TezClient::addAppMasterLocalFiles
- For the Tasks: Vertex::addTaskLocalFiles
- One common approach to add jars/files to the classpath everywhere is to use the "tez.aux.uris" config parameter. This can be done by either setting this param directly in tez-site.xml or setting it via -Dtez.aux.uris= on the command-line for hadoop jar commands or for applications that implement the Tool interface.