This page is intended to explain aspects of how Kerberos works with Impala. It is targeted at developers. It is not intended to provide guidance on how to run Impala in production. The information is accurate to the best of my knowledge as of the Impala 3.3.0 release.
Uses of Kerberos in Impala
Impala uses Kerberos for two categories of authentication:
- Integration with secured metadata and storage systems like HDFS, HMS, Sentry, Ranger, Kudu, HBase. Kerberos provides authentication so that Impala can interact with those services as a privileged user.
- Securing communication between Impala services - to ensure that when one daemon connects to another, they both can confirm the identity of the daemon on the other end of the TCP connection.
Configuration
Universal Kerberos
The usual and document usage of Kerberos in Impala has kerberos enabled for both #1 and #2 above. In this case the --principal and ‑‑keytab_file startup flags must be set https://impala.apache.org/docs/build/html/topics/impala_kerberos.html. On Impala startup, it will initialize kerberos under that principal, using the credentials in the provided keytab. Optionally, --be_principal can be used to specify a different principal for internal communication. Optionally --krb5_conf can be set to point to a kerberos config file in a custom location.
Kerberos can be configured for the Java clients via the matching config files (generally XML) .
Kerberos for External Services Only (Undocumented)
In some deployments, securing internal Impala communication is not necessary. For example, it may be secured using a different mechanism. Impala does not (as of the 3.4.0 release) document a way to enable #1 but not #2.
Implementation Details
One basic fact about the kerberos support in Impala is that both Java and C++ code use kerberized communication. Specifically, #1 is all done via Java clients, which use the JDK kerberos implementation (via abstractions provided by Hadoop core). #2 is done via C++ clients, which uses Kudu’s kerberos support for initializing and refreshing credentials. Both thrift and KRPC C++ implementations support kerberos. Various native security libraries - libkrb5, libsasl, etc are the underlying implementation.
Universal Kerberos
The credential cache is shared by the two implementations. In the universal kerberos mode, we use a filesystem-based credential cache in the location specified by --krb5_ccname. This is supported by both the Java and C++ kerberos implementations, allowing sharing of credentials. Alternative credential cache implementations like MEMORY are not, as of the time of writing, supported by the Java implementation.
1
The Java kerberos implementation is pointed at the same credential cache with the KRB5CCNAME environment variable 2 . The Java clients use a principal determined by configuration of external systems (the various clients, kerberos implementation, default principal in the credential cache, etc. Kerberos configs are picked up from the path specified by the “java.security.krb5.conf” JVM option (default /etc/krb5.conf). If --krb5_conf was set, both KRB5_CONFIG and the JVM option “java.security.krb5.conf” are overridden.
We expect the C++ thread to handle renewing the credential cache, using Kudu’s Kinit implementation.
Kerberos for External Services Only (Undocumented)
In this mode, Impala does not directly kinit or renew credentials as mode #1. But... Hadoop core can start a thread to refresh kerberos tickets behind the scenes 3 . This depends on a kinit being done *before* the process starts up, since no kinit is performed by Impala in this mode.
Note that the above-mentioned Impala flags *do not* result in configuring JVM flags or environment variables in this mode. This is undesirable.
Footnotes
- https://web.mit.edu/kerberos/krb5-1.12/doc/basic/ccache_def.html ↩
- This is understood by Java - https://bugs.openjdk.java.net/browse/JDK-6832353 ↩
- https://github.com/c9n/hadoop/blob/a315107/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L793 ↩