DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Status
Current state: Under discussion
Discussion thread: here
JIRA: here [Change the link from KAFKA-1 to your own ticket]
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Apache Kafka has many kinds of resources with names or identifiers, such as topics and consumer groups. However, unlike many systems, it is quite lax about specifying or enforcing maximum lengths for these identifiers. There are many ways in which accidentally or maliciously huge identifiers could cause problems, such as inflating the size of the data structures used by the group coordinator for managing group membership, or making command output unreadable. This KIP proposes setting maximum lengths for all resource names and identifiers present in the requests of the Kafka protocol.
The practical limit today for most identifiers is the serialization limit of a string in the Kafka protocol. For example, if you try to create a topic with a massively long name, a runtime exception is thrown in the admin client like this:
java.lang.RuntimeException: 'name' field is too long to be serialized at org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopic.addSize(CreateTopicsRequestData.java:548) at org.apache.kafka.common.message.CreateTopicsRequestData.addSize(CreateTopicsRequestData.java:207) at org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218) at org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187) at org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:110) at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:608) at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:582) at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:542) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:1302) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1516) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1459) at java.base/java.lang.Thread.run(Thread.java:840)
Although a topic name limit is applied by the broker, it is still possible to get an ugly client-side exception even in the case of topic names. For other identifiers such as group IDs, it is only the serialization limit which prevents excessively long identifiers because there is no broker-side limit defined. This KIP introduces broker-side limits for all identifiers.
Proposed Changes
This KIP proposes enforcing maximum lengths for all resource names and identifiers, starting in Apache Kafka 5.0. The suggested maximum values are chosen to be sufficiently high that they are very unlikely to impact any existing users, while providing stronger enforcement of limits in the Kafka protocol.
It is proposed that all resource names for which configuration properties and ACLs can be defined take the same maximum length as currently enforced for topic names, which is 249 characters. This is still a generous limit for other identifiers such as client IDs, so the KIP proposes the same limit for almost all resource names and identifiers.
The following table summarises the proposal:
Resource name or identifier | Current limit | Proposed broker limit | Proposed error code when broker limit breached | Notes |
|---|---|---|---|---|
Topic name | 249 characters | 249 characters -no change |
| |
Group ID | Serialization limit of a string | 249 characters |
| |
Group member ID | Serialization limit of a string | 36 characters |
| The group member ID is not specified by the user. In KIPs 848, 932 and 1071, the client library is intended to generate a UUID to be used as the member ID. As a result, the maximum should be the length of a serialized UUID, such as |
Group instance ID | Serialization limit of a string | 249 characters |
| |
Client ID | Serialization limit of a string | 249 characters |
| |
Transactional ID | Serialization limit of a string | 249 characters |
| |
Offset commit metadata | 4096 bytes | 4096 bytes - no change |
| Controlled by the |
Rack ID | Serialization limit of a string | 249 characters |
| |
Resource name for configs and ACLs | Serialization limit of a string | 249 characters |
| |
ACL principal | Serialization limit of a string | 249 characters |
| |
ACL host | Serialization limit of a string | 249 characters |
|
A single new error code RESOURCE_IDENTIFIER_TOO_LARGE is defined to use whenever any of the limits is breached, as opposed to a separate error code for each case. The exception message will provide more detail about which identifier was too long.
A broker configuration resource.identifier.limit.enable is added which allows administrators of Kafka 4.x clusters to apply the limit in advance of Kafka 5.0, either to enforce the limit earlier or to evaluate whether they encounter problems with their existing workloads when the limits are applied. By setting the limits generously, it is anticipated that there will be no such problems.
Public Interfaces
Configuration
Broker configuration
| Configuration | Description | Values |
|---|---|---|
resource.identifier.limit.enable | Whether the cluster enforces maximum sizes for resource names and identifiers. | In Kafka 4.x: default In Kafka 5.0 and later: default |
Kafka protocol changes
Error codes
The following new error code is defined:
RESOURCE_IDENTIFIER_TOO_LARGE(to be assigned) - The resource name or identifier is too large.
This new error code can be returned in the response from any Kafka protocol request which breaches the limit for a resource name or identifier. As a result, the RPC request and response versions for all affected RPCs must be bumped since this ensures that the client receiving the new error code will know how to interpret it. The RPCs affected will be any whose request schema includes a string-based identifier from the table above.
The associated exception ResourceIdentifierTooLargeException is a subclass of ApiException . It is not a retriable exception.
For clients which do not support RPC versions which can return the new RESOURCE_IDENTIFIER_TOO_LARGE error code, the error code will be INVALID_REQUEST . The idea is that the client-side changes for this KIP are made long in advance of Apache Kafka 5.0, with the result that most clients will support the new error code by the time production clusters are enforcing the limit.
Compatibility, Deprecation, and Migration Plan
The idea is that this KIP does not have any user impact at all, in the belief that users are already using identifiers within the new limits.
In Apache Kafka 5.0, the new limits will be applied, unless the resource.identifier.limit.enable config is set to "false" .
Test Plan
The code will be tested using a combination of unit tests, integration tests and system tests.
Rejected Alternatives
None considered.