You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Overview

Tez stores its History via a configurable History Logging Service. There are multiple implementations of this available as part of the Tez Framework:

  • SimpleHistoryLoggingService 
    • This writes the history into a log-file. The location is configurable. It can be logged as a file which will become part of the YARN Application's aggregated logs ( i.e. within the Application Master's container logs ) or to a defined location on a Distributed FileSystem such as HDFS.
    • This is more of a prototype ( useful for quick testing/analysis ) and not fully supported.
  • ATSHistoryLoggingService and ATSV15HistoryLoggingService
    • These impls make use of YARN Timeline to store the history data
    • The ATSV15 impl ( or to be more clear v1.5 ) makes use of enhancements done as part of YARN-4233. The enhancements are mainly around the use of a DistributedFileSystem for most of the data being written and stored instead of using LevelDB based storage.

The Tez UI currently only works with the YARN Timeline based stores. It will not be able to display any data for Tez DAGs that were configured to use the SimpleHistoryLoggingService. There is no current support for YARN Timeline v2 (being developed currently and potentially available in a future version of hadoop-3.x)

Tez History Data in YARN Timeline

YARN Timeline supports a notion of Entities. Entities are uniquely identified by a "Type" and an "Id". Entities can be related to other entities. Tez uses a set of Entity types:

  • TEZ_APPLICATION
    • Application-level data
    • Contains configuration used to initialize the Tez session/AM.
  • TEZ_APPLICATION_ATTEMPT
    • Application attempt specific data
  • TEZ_DAG_ID
    • DAG-specific data
    • Contains the dagPlan info as well as additional configuration info
    • Also contains the final status of the DAG, counters, diagnostics, etc.
    • Has event information on when it started, finished, etc.
  • TEZ_VERTEX_ID
    • Vertex-specific data
    • Also contains the final status of the Vertex, counters, diagnostics, etc.
    • Has event information on when it finished initializing, started, finished, etc.
  • TEZ_TASK_ID
    • Task-specific data
    • Also contains the final status of the Task, counters, diagnostics, etc.
    • Has event information on when it started, finished, etc.
  • TEZ_TASK_ATTEMPT_ID
    • TaskAttempt-specific data such as which container/node the attempt ran on.
    • Also contains the final status of the Attempt, counters, diagnostics, etc.
    • Has event information on when it started, finished, etc.

From a relationship perspective, you can think of TaskAttempt being a child of Task, Task a child of Vertex and Vertex a child of DAG. This relationship allows for pulling down all the necessary info for a given DAG.

Tez History Data in YARN Timeline v1.5 

 

Securing Access to Tez History Data

Securing access to data stored in Timeline is done via Timeline ACLs. For more details, please refer to the Access Controls Guide. The SimpleHistoryLoggingService provides no such protection apart from enforcing the necessary access controls on the configured DFS location where the logs are being written to.

Tez UI

 

 

Hosting the Tez UI

Tez UI as an Ambari View

 

 

 

 

 

  • No labels