Status

Current state: Under discussion

Discussion thread: here

JIRA: here [Change the link from KAFKA-1 to your own ticket]

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Apache Kafka has many kinds of resources with names or identifiers, such as topics and consumer groups. However, unlike many systems, it is quite lax about specifying or enforcing maximum lengths for these identifiers. There are many ways in which accidentally or maliciously huge identifiers could cause problems, such as inflating the size of the data structures used by the group coordinator for managing group membership, or making command output unreadable. This KIP proposes setting maximum lengths for all resource names and identifiers present in the requests of the Kafka protocol.

The practical limit today for most identifiers is the serialization limit of a string in the Kafka protocol. For example, if you try to create a topic with a massively long name, a runtime exception is thrown in the admin client like this:

java.lang.RuntimeException: 'name' field is too long to be serialized
        at org.apache.kafka.common.message.CreateTopicsRequestData$CreatableTopic.addSize(CreateTopicsRequestData.java:548)
        at org.apache.kafka.common.message.CreateTopicsRequestData.addSize(CreateTopicsRequestData.java:207)
        at org.apache.kafka.common.protocol.SendBuilder.buildSend(SendBuilder.java:218)
        at org.apache.kafka.common.protocol.SendBuilder.buildRequestSend(SendBuilder.java:187)
        at org.apache.kafka.common.requests.AbstractRequest.toSend(AbstractRequest.java:110)
        at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:608)
        at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:582)
        at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:542)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:1302)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1516)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1459)
        at java.base/java.lang.Thread.run(Thread.java:840)

Although a topic name limit is applied by the broker, it is still possible to get an ugly client-side exception even in the case of topic names. For other identifiers such as group IDs, it is only the serialization limit which prevents excessively long identifiers because there is no broker-side limit defined. This KIP introduces broker-side limits for all identifiers.

Proposed Changes

This KIP proposes enforcing maximum lengths for all resource names and identifiers, starting in Apache Kafka 5.0. The suggested maximum values are chosen to be sufficiently high that they are very unlikely to impact any existing users, while providing stronger enforcement of limits in the Kafka protocol.

It is proposed that all resource names for which configuration properties and ACLs can be defined take the same maximum length as currently enforced for topic names, which is 249 characters. This is still a generous limit for other identifiers such as client IDs, so the KIP proposes the same limit for almost all resource names and identifiers.

The following table summarises the proposal:

Resource name or identifier

Current limit

Proposed broker limit

Proposed error code when broker limit breached

Notes

Topic name

249 characters

249 characters  -no change

INVALID_TOPIC_EXCEPTION  (17) - no change


Group ID

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


Group member ID

Serialization limit of a string

36 characters

INVALID_REQUEST  (42)

The group member ID is not specified by the user. In KIPs 848, 932 and 1071, the client library is intended to generate a UUID to be used as the member ID. As a result, the maximum should be the length of a serialized UUID, such as 8D8AC610-566D-4EF0-9C22-186B2A5ED792 .

Group instance ID

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


Client ID

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


Transactional ID

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


Offset commit metadata

4096 bytes

4096 bytes - no change

OFFSET_METADATA_TOO_LARGE  (12) - no change

Controlled by the offset.metadata.max.bytes  configuration property.

Rack ID

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


Resource name for configs and ACLs

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


ACL principal

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


ACL host

Serialization limit of a string

249 characters

RESOURCE_IDENTIFIER_TOO_LARGE  (new)


A single new error code RESOURCE_IDENTIFIER_TOO_LARGE  is defined to use whenever any of the limits is breached, as opposed to a separate error code for each case. The exception message will provide more detail about which identifier was too long.

A broker configuration resource.identifier.limit.enable  is added which allows administrators of Kafka 4.x clusters to apply the limit in advance of Kafka 5.0, either to enforce the limit earlier or to evaluate whether they encounter problems with their existing workloads when the limits are applied. By setting the limits generously, it is anticipated that there will be no such problems.

Public Interfaces

Configuration

Broker configuration

ConfigurationDescriptionValues
resource.identifier.limit.enable Whether the cluster enforces maximum sizes for resource names and identifiers.

In Kafka 4.x: default false 

In Kafka 5.0 and later: default true 

Kafka protocol changes

Error codes

The following new error code is defined:

  • RESOURCE_IDENTIFIER_TOO_LARGE  (to be assigned) - The resource name or identifier is too large.

This new error code can be returned in the response from any Kafka protocol request which breaches the limit for a resource name or identifier. As a result, the RPC request and response versions for all affected RPCs must be bumped since this ensures that the client receiving the new error code will know how to interpret it. The RPCs affected will be any whose request schema includes a string-based identifier from the table above.

The associated exception ResourceIdentifierTooLargeException  is a subclass of ApiException . It is not a retriable exception.

For clients which do not support RPC versions which can return the new RESOURCE_IDENTIFIER_TOO_LARGE  error code, the error code will be INVALID_REQUEST . The idea is that the client-side changes for this KIP are made long in advance of Apache Kafka 5.0, with the result that most clients will support the new error code by the time production clusters are enforcing the limit.

Compatibility, Deprecation, and Migration Plan

The idea is that this KIP does not have any user impact at all, in the belief that users are already using identifiers within the new limits.

In Apache Kafka 5.0, the new limits will be applied, unless the resource.identifier.limit.enable  config is set to "false" .

Test Plan

The code will be tested using a combination of unit tests, integration tests and system tests.

Rejected Alternatives

None considered.

  • No labels