Note: Please check the security documentation for the features that Kafka supports today. This page is no longer maintained, but it is kept for historical reasons.
The following is a proposal for securing Apache Kafka. It is based on the following goals:
- support authentication of client (i.e. consumer & producer) connections to brokers
- support authorization of the assorted operations that can take place over those connections
- support encrypting those connections
- support security principals representing interactive users, user groups, and long-running services
- security should be optional; installations that don't want the above features shouldn't have to pay for them
- preserve backward compatibility; in particular, extant third-party clients should still work
Current implementation efforts are tracked in KAFKA-1682.
Features In Scope
- Authentication via SSL & Kerberos through SASL
- Authorization through Unix-like users, permissions and ACLs
- Encryption over the wire (optional)
- It should be easy to enforce the use of security at a given site
We think all this can probably be done in a backwards compatible manner and without significant performance degradation for non-secure users.
We plan to only augment the new producer & consumer implementations, and not the 0.8 implementations. This will hopefully drive adoption of the new implementations, as well as mitigate risk.
Details on these items are given below.
- Encryption/security of data at rest (can be addressed for now by encrypting individual fields in the message & filesystem security features)
- Encryption/security of configuration files (can be addressed by filesystem security featuers)
- Per-column encryption/security
- Zookeeper operations & any add-on metrics
- Provisioning of security credentials
We need to support several methods of authentication:
- SSL for access from applications (must)
- Kerberos for access on behalf of people (must)
- Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks running in the Hadoop environment to access Kafka (nice-to-have)
- LDAP username/password (nice-to-have)
- All connections that have not yet been authenticated will be assigned a fake user ("nobody" or "josephk" or something). <- Note, admins should be able to disable fake users - auditors hate those.
A port dedicated to SSL connections obviates the need for any Kafka-specific protocol signalling that authentication is beginning or negotiating an authentication mechanism (since this is all implicit in the fact that the client is connecting on that port). Clients simply begin the session by sending the standard SSL CLIENT-HELLO message. This has two advantages:
- the SSL handshake provides message integrity
- clients can use standard SSL libraries to establish the connection
A dedicated SASL port will, however, require a new Kafka request/response pair, as the mechanism for negotiating the particular mechanism is application-specific. This opens up the possibility of downgrade attacks (wherein an attacker could intercept the first message to the server requesting one authentication mechanism, and modify the message to request another, weaker mechanism). We can protect against that by designing the protocol to request a single authentication mechanism on the part of the client (that way, an attempted downgrade attack will result in handshake failure downstream).
Through this protocol, we could even support unauthenticated connections on the SASL port, as well.
A quick sketch:
- Client connects on the SASL port
- Server accepts, registers for reads on the new connection
- Client sends a (newly-defined) Authentication Request message containing an int indicating the desired mechanism, along with an optional initial SASL response packet
- Server can reject the request if it's not configured to use the requested mechanism, but if it does, it responds with with the SASL challenge data
- Client replies with SASL response data
N.B. This document originally stated "We will use SASL for Kerberos and LDAP.", but AFAICT there is no SASL mechanism covering LDAP (and the Java SASL library doesn't support it, at any rate).
Administrators should be able to disable any authentication protocol in configuration. Presumably this would need to be maintained in the cluster metadata so clients can choose to connect to the appropriate port.
This feature requires some co-operation between the socket server and the API layer. The API layer will handle the authenticate request, but the username will be associated with the connection. One approach to implementing this would be to add the concept of a Session object that is maintained with the connection and contains the username. The session would be stored in the context for the socket in socket server and destroyed as part of socket close. The session would be passed down to the API layer with each request and we would have something like session.authenticatedAs() to get the username to use for authorization purposes. We will also record in the session information about the security level of the connection (does it use encryption? integrity checks?) for use in authorization.
All future checks for authorization will just check this session information.
N.B. This is still under discussion; I've tried to pull together the current consensus here.
Regardless of the mechanism by which you connect and authenticate, the mechanism by which we check your permissions should be the same. The side effect of a successful connection via SSL with a client certificate or a successful authentication request by some other means will be that we store the user information along with that connection. The user will be based along with the request object to KafkaApis on each subsequent request.
The plan will be to support unix-like permissions on a per-topic level.
Authorization will be done in the "business logic" layer in Kafka (aka KafkaApis). The API can be something like
PermissionManager.isPermitted(Subject subject, InetAddress ip, Permissions permission, String resource)
For example doing a produce request you would likely check something like the following:
PermissionManager.isPermitted(session.subject(), session.peerIpAddress(), Permissions.WRITE, topicName)
This check will obviously have to be quite quick as it will be done on every request so the necessary metadata will need to be cached.
The subject is basically the "user name" or identify of the person trying to take some action. This will be established via whatever authentication mechanism. The action is basically a list of things you may be permitted to do (e.g. read, write, etc).
The IP address of the originating connection is is passed as it may be useful in certain authorization situations (whitelisting, or being more generous when the request originates on the loopback address, e.g.)
The PermissionManager will both check whether access was permitted and also log the attempt for audit purposes.
The resource will generally be based on the topic name but there could be other resources we want to secure so we can just treat it as an arbitrary string.
I could imagine the following permissions:
READ - Permission to fetch data from the topic
WRITE - Permission to publish data to the topic
DELETE - Permission to delete the topic
CREATE - Permission to create the topic
CONFIGURE - Permission to change the configuration for the topic
DESCRIBE - Permission to fetch metadata on the topic
REPLICATE - Permission to participate as a replica (i.e. issue a fetch request with a non-negative node id). This is different from READ in that it has implications for when a write request is committed.
Permission are not hierarchical since topics are not hierarchical. So a user will have a default value for these (a kind of umask) as well as a potential override on a per-topic basis. Note that CREATE and DESCRIBE permission primarily makes sense at the default level.
Implementing the PermissionManager
The above just gives a high-level API for checking if a particular user is allowed to do a particular thing. How permissions are stored, and how users are grouped together is going to need to be pluggable.
There are several scenarios we have considered:
- Some users may want to pick up and run Kafka without much in the way of external dependencies. These users will want a simple way to maintain permissions that works well out of the box.
- Hortonworks and Cloudera each have separate nascent attempts at securing the larger Hadoop ecosystem across multiple services. As these mature the best way to integrate into the larger ecosystem for their users will be to use either Sentry (Cloudera) or Argus (Hortonworks) depending on the Hadoop distribution the particular organization has.
- Large organizations often have very particular ways of managing security, auditing access, or implementing groups. There are various theories on the best way to manage the assignment of permissions to users (i.e. via roles, groups, acls, etc.).
Unfortunately, there is no single implementation that can satisfy all these cases. Instead we can make the PermissionsManager interface pluggable at run time so that users can specify their implementation in config.
We will try to provide something simple out of the box.
Administrators may disable authentication in configuration (giving an "audit-only" mode).
Deriving a Principal Name from Authentication Credentials
If we are to make the authorization library independent of the authentication mechanism, then we need to map each mechanism's credentials to the principal abstraction to be used in the authorization API. LinkedIn security proposes the following:
The principal is just a user name (i.e. a
When the client authenticates using SSL, the user name will be the first element in the Subject Alternate Name field of the client certificate.
When the client authenticates using Kerberos, the user name will be the fully-qualified Kerberos principal name. Admins can modify this through configuration using the standard Kerberos auth_to_local mechanism (cf. here).
When the client does not authenticate, the user name will be "nobody".
All authentication operations will be logged to file by the Kafka code (i.e. this will not be pluggable). The implementation should use a dedicated logger so as to 1) segregate security logging & 2) support keeping the audit log in a separate (presumably secured) location.
For performance reasons, we propose making encryption optional. When using Kerberos (via SASL & GSS-API), there are explicit parameters through which clients can signal their interest in encryption (similarly for SSL).
Here is a proposed sequence of work
Phase 1: Prep
- Add session as communication mechanism between socket server and kafka api layer.
- Add SSL port to metadata request
Phase 2: Authentication
- Allow disabling sendfile for reads that need encryption or other integrity checks added
- Implement SSL
- Implement SASL
Phase 3: Authorization
- Implement PermissionManager interface and implement the "out of the box" implementation.
Do we need to separately model hosts? i.e. in addition to user do we need to pass into the authorization layer information about what host the access is coming from.
Likely we need a way to specify the minimum encryption/integrity level of a client that is allowed to read data. Likely we should define something generic like NONE < INTEGRITY < ENCRYPTED and allow the user to set a minimum level for each topic so you can guarantee a particular data stream never goes in the clear.
On disk and per-field encryption
This is very important and something that can be facilitated within the wire protocol. It requires an additional map data structure for the "encrypted [data encryption key]". With this map (either in your object or in the wire protocol) you can store the dynamically generated symmetric key (for each message) and then encrypt the data using that dynamically generated key. You then encrypt the encryption key using each public key for whom is expected to be able to decrypt the encryption key to then decrypt the message. For each public key encrypted symmetric key (which is now the "encrypted [data encryption key]" along with which public key it was encrypted with for (so a map of [publicKey] = encryptedDataEncryptionKey) as a chain. Other patterns can be implemented but this is a pretty standard digital enveloping  pattern with only 1 field added. Other patterns should be able to use that field to-do their implementation too.
Non-repudiation and long term non-repudiation
By "security per topic" my understanding is that using ACLs, we can allow un-authenticated users to read/write certain topics.
I think an important requirement will be to have a "certified entirely secure cluster" - a cluster where the auditors can be 100% certain that only authenticated users can read/write.
To make it easy, separate ports for authenticated / unauthenticated access make sense (both kerberos and SSL, separate port for each), so by closing the un-authenticated port, admins can enforce the "secure only" rule.
Another option is just a "secure only" flag at the global level. But I think the separate port design will make sense anyway.
We could treat topics like file/directory permission like structures. The only difference I would expect is that if you don't have access then you don't even see the topic.
Agree. Very similar to how DB privileges affect access to metadata too.
A single "authenticated access only" flag or configuration is important for organizations that want to mark entire clusters as "out of scope" for an audit - this typically reduces the audit costs, so there's a lot of interest in doing that. Setting this on basically means that specific topics can't allow access to unauthenticated users.
Hadoop did a lot of work around authentication, and the Hadoop authentication libraries make a nice usable layer on top of Kerberos. Especially for cases where "delegation" is used to forward privileges between services (Kafka to ZK for example). Do you think it makes sense to add these libraries as dependency? I know that dependency on Hadoop-common is not a small thing, but I suspect it will pay off.
I think if we implement SASL properly then you should be able to plugin Hadoop Kerberos assuming Hadoop Kerberos is GSSAPI supported.
If you'll check out https://github.com/apache/hadoop-common/tree/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security
You'll see how they implement some fairly generic SASL input/outputs.
I'm wondering if it makes sense for us to reuse (or even read for inspiration...), unless you already have a design in mind?
I have read Cassandra and Zookeeper implementation. I think we need to get to some MVP regardless of the design I have in mind. I will checkout the the Hadoop code link you posted too, thnx.
I think a face-to-face, or a phone/webex conf between the interested parties to hash out the details and come up with a plan will be a good idea.
There are a lot of details involved and we can benefit from high-bandwidth, low latency communication channel.
How about next Tuesday 09/09/2014 @ 2pm Eastern? If not I have flexibility all next week.
I can send out a google hangout for everyone or something (join folks in on skype doesn't matter to me).
I'm wondering if Kerberos is really a must.
For example, ActiveMQ supports user/password authentication (rather than ticket based) using JAAS, and allow integration with LDAP for centralized pricipal management.
This will probably be easier to implement and support. Also, since other pub/sub systems use this, it may make life easier on our users too.
Any reason we decided not to go in that direction?
Thanks for putting this document together. I am from the Apache Argus team and this design addresses our core requirement.
I had a few queries:
1) I think we should be passing in a byte called "identity" as part of the interface so whatever is in there is what would be received. If group isn't in there you should be able to look that information up based on the identity that was passed in. Is there a reason you explicitly need group as a field?
2) I am guessing you mean the client connected IP? I think that is a good idea but we should define the use cases for doing that... folks shouldn't get the ability to block IP and then think they can start to put their Kafka brokers out on the edge of their network accessible over the public internet... that is not what the security features are geared towards.
+1 for getting the IP.
Logging IPs as part of the audit mechanism is a good idea.
White/black listing IPs within an organization is also important.
We also have to decide whether we want to implement any IP checks at connection time - kerberos supports addressable tickets and x509v3 has IP address extension which tie them to a specific host.
Arvind Mani, you are correct. With kerberos, there are implicit ways to enforce it. But sometimes, Kerberos/x509 might not be needed within a data center.
From a Kafka perspective I think we don't have to decide and the plugins can do what they need to as long as they have the information they need.
So, it sounds like we should add client connect IP to the identity byte anything else? should it be a a meta object with fields clientIp being one of them?
One thing that just came to mind that hasn't come up yet ( I don't think ) is that we may want to consider adding an explicit audit log. This comes up A LOT in PCI DSS.
Regardless of our implementation we should have a "stream of data" that is distinguishable for audits... and a way in our code to utilize it easily passing appropriate information for the session context without the ability to mess it up... even in another file like we have controller and state change.
On the Argus side, the authorization plugin does the Audit also. But having another API for Audit is also a good idea. It will make all three As (Authentication, Authorization and Auditing) as pluggable and address most security use cases.
I wouldn't want to leave this up to the plugin to implement on its own without some oversight/monitoring ... I would even go as so far that if the plugin doesn't call the audit log function that an exception is thrown SecurityPluginFailure and everything is denied... or something.
I also think we should support a stacked set of plug-ins (kind of following the Accumulo iterators model) but that maybe a phase 2 though it would be nice and not so hard to implement. It also provides for attribute-based access controls and fine-grained security which NIST has weighed HEAVILY on with big data and we should/could satisfy (plug-able of course, if stacked).
I believe http://nvlpubs.nist.gov/nistpubs/specialpublications/NIST.sp.800-162.pdf is a good working model from a design perspective we can accommodate without a lot of headache.... it is also why I think we should get authentication out the door in 0.9 and leave authorization for 1.0 because we want to get things right. MVP I still think could be just SOX/PCI DSS (etc). Just putting my 0.0243252 cents on the vision of what we could/should be doing without a lot of angst to accomplish it.
Agreed. We should be careful about what the plugin can do.
I feel, we should go by the original design and keep the implementation simple. This way we can get the authentication and authorization out at the same time. This will satisfy most of the current requirements.
And we can learn from the current implementation and alway implement a more extensive authorization in the future releases.
Would that work?
Strong +1 for auditing stream.
We need to specify which actions will be audited, and perhaps allow admins to configure granularity.