Miscellaneous list of information.

ClassNotFound Exceptions when running Tez DAGs

  • ClassNotFound exceptions can happen in 3 places:
    • On the client-side where the Tez DAG is being submitted from
    • The Tez Application Master
    • The task runtime
  • For the client-side issue, the simplest fix is to add the Tez jars and the config directory containing tez-site.xml to the classpath. i.e. "${TEZ_CONF_DIR}:${TEZ_HOME}/*:${TEZ_HOME}/lib/*"
    • If you are using the hadoop jar command, the classpath for it can be augmented by setting HADOOP_CLASSPATH to the above value.
  • For the Tez runtime ( i.e. both the AM and tasks ):
    • One common approach to add jars/files to the classpath everywhere is to use the "tez.aux.uris" config parameter. This can be done by either setting this param directly in tez-site.xml or setting it via -Dtez.aux.uris= on the command-line for hadoop jar commands or for applications that implement the Tool interface.
      • The values specified here will be made available in the environments of both the AM and all the tasks.
    • Another global approach is to configure "tez.cluster.additional.classpath.prefix". This is a config driven approach and can be used when all the required resources are available on all nodes in the cluster. 
    • To have more control on the resources i.e. separate set of jars for the AM or for tasks of a particular DAG, this can be done programmatically by invoking the following APIs:
      • For the AM: TezClient::addAppMasterLocalFiles
      • For the Tasks: Vertex::addTaskLocalFiles





 

 

 

  • No labels