Work in Progress
Overview
The sequence diagrams below (after the very long Legend) are intended to be a fairly detailed description of the interactions that occur during the process of defining, submitting and executing a map reduce job on a secure Hadoop 2.x cluster. Different phases of the overall process are covered in each diagram. The are intended to be taken as one continuous flow with the exception of the last diagram which illustrates parallel steps that would occur during the flow.
- Bootstrap
- Job Definition
- Job Submission
- Job Initiation
- Map Task Execution
- Reduce Task Execution
- Job Completion
- Client Monitoring
Legend
The descriptions of the interactions in the sequence diagrams below take this form.
Code Block |
---|
message [Protocol] ( input ) : output |
The [Protocol]
portion describes the protocol, authentication mechanism and identities exchanged.
Abbreviation | Description |
---|---|
| Kerberos Protocol |
| RPC protocol with SASL mutual authentication using Kerberos tickets. |
| RPC protocol with SASL client authentication using access tokens (e.g. YARN Node Manager Token). |
| RPC protocol with SASL client authentication using delegation tokens (e.g. HDFS Name Node Delegation Token). |
| Shuffle data transfer protocol between ShuffleService and ReduceTask. HTTP protocol with TODO. |
| Block data transfer protocol between the DataNode and a client. HTTP protocol with block tokens plus SHA1 hash exchange. |
Suffixes are used in many cases to denote type.
Abbreviation | Description |
---|---|
tgt | Kerberos Ticket Granting Ticket |
kst | Kerberos Service Ticket: u-jt-kt = A Kerberos Ticket for User u to access the JobTracker jt |
kp | Kerberos Principal: nn-kp = The Kerberos principal for the NameNode nn |
dt | Delegation Token: c-nn-dt = A delegation token for identity of the Client that can be presented to the NameNode. |
tkn | Access Token: am-tkn = An access token that can be presented to the ApplicationMaster for access. |
tkn-sk | Access Token Secret Key |
id | Identifier: job-id = Job Identifier |
Kerberos principals use the principal abbreviation and the kp suffix.
Abbreviation | Description |
---|---|
| NameNode's Kerberos Principal |
| DataNode's Kerberos Principal (Unique principal for each DataNode on every node) |
| JobTracker's Kerberos Principal |
| TaskTracker's Kerberos Principal (Unique principal for each TaskTracker on every node) |
Kerberos tickets use the consumer principal abbreviation, provider principal abbreviation and kt suffix.
Abbreviation | Description |
---|---|
| Kerberos service ticket for User u to access NameNode nn |
| Kerberos service ticket for User u to access JobTracker jt |
| Kerberos service ticket for DataNode dn to access NameNode nn |
| Kerberos service ticket for JobTracker dn to access NameNode nn |
| Kerberos service ticket for TaskTracker tt to access JobTracker jt |
Bootstrap
This diagram illustrates the interactions that occur when a Hadoop system is starting up and stabilizing. It involves various master components generating secret keys and slave components registering with the masters to receive these secret keys.
- createBlockAccessTokenSecretKey -
- kinit/AS_REQ -
- TGS_REQ -
- register/heartbeat -
- createNodeManagerTokenSecretKey -
- createAppContainerTokenSecretKey -
- kinit/AS_REQ -
- TGS_REQ -
- register/heartbeat -
Job Definition
This diagram illustrates the steps taken by a client to define a MapReduce job that will later be submitted.
- TODO
- TODO
- TODO
Job Submission
This diagram illustrates the steps taken during the submission of a MapReduce job.
- TODO
- TODO
- TODO
Job Initiation
This diagram illustrates the steps taken when a MapReduce job is scheduled for execution.
- TODO
- TODO
- TODO
Map Task Execution
This diagram illustrates the steps taken when the Map portion of a MapReduce job is executed.
- TODO
- TODO
- TODO
Reduce Task Execution
This diagram illustrates the steps taken when the Reduce portion of a MapReduce job is executed.
- TODO
- TODO
- TODO
Job Completion
This diagram illustrates the steps taken a MapReduce job has completed.
- TODO
- TODO
- TODO
Client Monitoring
This diagram illustrates the steps taken by a Client to monitor the status of a Job throughout the Job's life-cycle. The timeframe for this diagram span several of the diagrams above starting from Job Submission all the way through Job Completion.
- TODO
- TODO
- TODO
NodeManager Token Flow
This diagram illustrates the flow of NodeManager Tokens throughout a MapReduce Job's life-cycle.
- TODO
- TODO
- TODO