The current delegation token framework supports mainly Kerberos authentication and Hadoop based tokens. This satisfies the use cases described in FLIP-211, however there are many non-hadoop compliant frameworks, where the authentication protocol is not Kerberos. The main motivation is to generalize the actual delegation token framework to make it authentication protocol agnostic. This change would open doors to implement providers for example for S3 (amongst many others).
The following design goals needs to be achieved:
- Currently supported Hadoop token providers must work without migration after the change
- A new delegation token provider must be able to be added/used runtime without using any Hadoop libraries (this practically means not using UserGroupInformation [UGI from now on] credentials class as serialization format when sending tokens to task managers)
- New delegation token provider must be able to be added/used runtime without using Kerberos authentication
- All authentication protocols must be able to be used independently from each other
- Authentication credentials still needs to be provided per cluster (this is not changing)
The following changes are planned:
- Add a generic DelegationTokenManager which wraps all Hadoop and non-hadoop token provider handling. This must be initialized for all Flink applications, because without loading all token providers it’s impossible to know whether credentials for various token providers are configured. Instead specific token providers are loaded when credentials are configured.
- Split the API to:
- HadoopDelegationTokenProvider: Hadoop based providers need Kerberos authentication and must obtain tokens inside an UGI.doAs context. The new manager would start the Hadoop based part only when the Hadoop common library is on classpath and valid credentials are provided.
- DelegationTokenProvider: Non-hadoop based providers can use any custom authentication logic inside their implementations. Config parameters for such authentications can be provided through Flink config or environment variables. The key change is that all authentication and token obtain logic must be included in the provider which makes the framework fully pluggable and authentication type agnostic.
- Store obtained delegation tokens in custom TokenContainer class when serializing and sending from job manager to task managers
- New singleton container needs to be introduced similar to UGI in Hadoop to store tokens
- Task managers need to deserialize UGI credentials only if hadoop common library is on classpath and set it in the UGI instance
- External connectors need to look into the TokenContainer and when token is found with a specific key then it must be used
New public API:
- HadoopDelegationTokenProvider: this is going to be the existing DelegationTokenProvider API. Authentication is happening inside the manager.
- DelegationTokenProvider: this is going to be the new authentication protocol agnostic API: Authentication is happening inside the provider.
New config parameters:
- security.delegation.tokens.renewal.retry.backoff needs to be added which has the same default value like security.kerberos.tokens.renewal.retry.backoff (1 hour).
- security.delegation.tokens.renewal.time-ratio needs to be added which has the same default value like security.kerberos.tokens.renewal.time-ratio (0.75).
Compatibility, Deprecation, and Migration Plan
No migration is needed. When the feature is implemented all the existing delegation tokens should work as before.
It’s planned to implement automated integration tests and end to end tests with dockerized service containers.
- Have a single DelegationTokenProvider API: here the implication would be that all existing token providers must include authentication logic which is the same for all Hadoop based providers. Reasons for rejection:
- It must be marked somewhere that a provider is Hadoop based or not (practically it goes into UGI or not). Storing this info runtime is going to make the framework horror to debug and would add extra code complexity (practically UGI must be rebuilt from scratch on each and every TM).
- On the task manager side each and every Hadoop connector must check whether the singleton TokenContainer has a token which can be used and must copy the content to the UGI. Since connectors are used from multiple threads this would indicate that UGI writes must be synchronized. This would add quite some code complexity and would slow down data processing.
- Having a common API would make it harder to drop the Hadoop part when it reaches end of life.
- The same authentication logic would need to be used in all Hadoop providers (practically all Hadoop providers would contain the same authentication code in some way). This change would be error prone and wouldn’t give any value but just additional complexity.