You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

(warning) WARNING: Work In Progress (warning)

The descriptions of the interactions below take this form.

[Protocol] message( input ) : output

The [Protocol] portion describes the protocol, authentication mechanism and identities exchanged.

Abbreviation

Description

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="700f24c7-d911-477c-82dd-655bf4bd1f4a"><ac:plain-text-body><![CDATA[

[KRB]

Kerberos Protocol

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a0097419-5fef-43c9-96b7-39dfbdf2a2a6"><ac:plain-text-body><![CDATA[

[RSK:{ticket}]

RPC protocol with SASL mutual authentication using Kerberos tickets.

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="aad40373-d7fc-4b47-9f66-db88b209cb29"><ac:plain-text-body><![CDATA[

[RSD:{delegation-token}]

RPC protocol with SASL mutual authentication using delegation tokens/

]]></ac:plain-text-body></ac:structured-macro>

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="5e8a2c0f-445a-4610-9d10-7e61e25b7720"><ac:plain-text-body><![CDATA[

[DTP]

Data transfer protocol between the DataNode and a client. HTTP protocol with block tokens plus SHA1 hash exchange.

]]></ac:plain-text-body></ac:structured-macro>

Suffixes are used in many cases to denote type.

Abbreviation

Description

tgt

Kerberos Ticket Granting Ticket

kp

Kerberos Principal: nn-kp = The Kerberos principal for the NameNode nn

kt

Kerberos Ticket: u-jt-kt = A Kerberos Ticket for User u to access the JobTracker jt

Kerberos principals use the principal abbreviation and the kp suffix.

Abbreviation

Description

nn-kp

NameNode's Kerberos Principal

dn-kp

DataNode's Kerberos Principal (Unique principal for each DataNode on every node)

jt-kp

JobTracker's Kerberos Principal

tt-kp

TaskTracker's Kerberos Principal (Unique principal for each TaskTracker on every node)

Kerberos tickets use the consumer principal abbreviation, provider principal abbreviation and kt suffix.

Abbreviation

Description

u-nn-kt

Kerberos service ticket for User u to access NameNode nn }

u-jt-kt

Kerberos service ticket for User u to access JobTracker jt }

dn-nn-kt

Kerberos service ticket for DataNode dn to access NameNode nn }

jt-nn-kt

Kerberos service ticket for JobTracker dn to access NameNode nn }

tt-jt-kt

Kerberos service ticket for TaskTracker tt to access JobTracker jt }

Secure MapReduce Client(u/u-kp) Client(u/u-kp) KerberosKDC KerberosKDC NameNode(hdfs/nn-kp) NameNode(hdfs/nn-kp) DataNode(hdfs/dn-kp) DataNode(hdfs/dn-kp) JobTracker(mapred/jt-kp) JobTracker(mapred/jt-kp) TaskTracker(mapred/tt-kp) TaskTracker(mapred/tt-kp) TaskLauncher(root/jt-kp?) Task(u/u-*-dt) Install 1. NameNode runs as hdfs using nn Kerberos principal (nn-kp)? Startup 1[KRB]ticketRequest( tgt?, nn-kp ): dn-nn-kt DN acquires ticket to access NN using credentials in keytab. 2[RSK:dn-nn-kt]heartbeat(): void DN tells NN it is alive. 3[KRB]ticketRequest( tgt?, nn-kp ): jt-nn-kt JT acquires ticket to access NN using credentials in keytab. 4[KRB]ticketRequest( tgt?, jt-kp ): tt-jt-kt TT acquires ticket to access JT using credentials in keytab. 5[RSK:tt-jt-kt]heartbeat(): no-work-yet TT tells JT it is alive and find there are no queued jobs yet. 2. What else needs to be shown here relative to NN, DN, JT, TT, etc getting tickets? Authentication 6[KRB]kinit(): tgt Acquire Kerberos Ticket Granting Ticket for user. Stored in user's ticket cache. Job Definition 7[KRB]ticketRequest( tgt, jt-kp ): u-jt-kt Acquire Kerberos Ticket for user to access JobTracker. Stored in user's ticket cache. 8[RSK:u-jt-kt]getNewJobId(): job-id Create new Job ID. 9[KRB]ticketRequest( tgt, nn-kp ): u-nn-kt Acquire Kerberos Ticket for User to access NameNode. Stored in user's ticket cache. 10[RSK:u-nn-kt]getDelegationToken(): u-nn-dt Acquire delegation token to allow Tasks to access HDFS files on behalf of the user. 11[RSK:u-jt-kt]getDelegationToken(): u-jt-dt Acquire delegation token to allow Tasks to submit additional jobs on behalf of the user. Calculate splits. Store job files. loop[Store: job-cfg-file, job-jar-file, splits, credentials(u-nn-dt, u-jt-dt)] 12[RSK:u-nn-kt]createFile( file-loc ): block-id, block-loc, block-token Create eachfile in HDFS. loop[blocks] 13[DTP]writeBlock( block-id, block-token, block-data ): void Store filedata blocks. Job Submission 14[RSK:u-jt-kt]submitJob( job-id, job-cfg-dir, job-cfg-props ): status Submit the job by providing staging dir location and configuration overrides. 3. Does JT copy the input job-cfg-dir anywhere? 4. If so how does it guarantee read access to the user's job-dir? 5. What is the JobTracker's system directory? 15createJobToken(): job-token 6. Where is the job token stored? 7. Where is the job queue, HDFS? 16enqueueJob() Job Execution 8. Where did tt-jt-kt come from? 17[RSK:tt-jt-kt]heartbeat(): work Tell JT that TT is alive and check for new tasks. 9. Where did jt-nn-kt come from? periodically 18[RSK:jt-nn-kt]renewDelegToken( u-nn-dt ): void 19renewDelegToken( u-nn-dt ): void JT periodicallyrenews all activedelegation tokens. Map Task 10. What is passed to TL on command line, env-var? 11. How is it told which blocks to map? 12. How does TL impersonate JT (ie jt-kp, jt-nn-kt)? 20[as root]exec() TaskLauncher(root/jt-kp?) Running as rootwith jr-kp forHDFS access loop[Extract each: job-cgf-file, job-jar-file, splits, credentials] 21[RSK:jt-nn-kt?]readFile( file-loc ): block-id, block-loc, block-token Extract job filesto local job-dir. loop[blocks] 22[DTP]readBlock( block-id, block-token ): block-data 23[as user]exec( job-jar, job-dir ) Task(u/u-*-dt) Untrustedcustomermap code 13. Does each T only work with a single block? 14. How is the T told which block to use and which file it is in? 15. How is the NN asked for just a BT for a specific block? 24[RSD:u-nn-dt]readFile( file-loc ): block-id, block-loc, block-token 25[DTP]readBlock( block-id, block-token ): block-data 26map( block-data ): shuffle-data Map result(i.e. shuffle-data)written to local disk. Might be nice to show submission of anothertask that will require the use of the u-jt-dt. 16. What does TT do when TL/T exits? Reduce Task 17. What is passed to TL on command line, env-var? 18. How is it told which shuffles to reduce? 27[as root]exec() TaskLauncher(root/jt-kp?) 19. Is the fetchShuffle done by TT or T? loop[each MapTask's TaskTracker] 20. What are the inputs and outputs of fetchShuffle?Where is the fetched shuffle data stored, local disk?How does the MD5 work? 28[HTTP]fetchShuffle( shuffle-url, sha1{shuffle-url/job-token} ):shuffle-data, sha1{sha1/job-token} 29[as user]exec( job-jar, job-dir ) Task(u/u-*-dt) Untrustedcustomerreduce code loop[result-files] 30[RSD:u-nn-dt]writeFile( file-loc, u-nn-dt ): block-id, block-loc, block-token Store job resultsinto HDFS loop[blocks] 31[DTP]writeBlock( block-id, block-token, block-data ): void Status 21. What else needs to happen? 22. Are the delegation tokens invalidated? 32[RSK:tt-jt-kt]status( job-token, status ): void 33invalidateJobToken( job-token ) 34invalidateDelegationToken( u-jt-dt ) 35[RSK:jt-nn-kt]invalidateDelegationToken( u-nn-dt )
  • No labels