DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Status
Current state: Under Discussion
Discussion thread: https://lists.apache.org/thread/vdp8scrrzdq7ofvl0mm84dhphq8kmzgc
JIRA:
KAFKA-967
-
Getting issue details...
STATUS
Motivation
In production, producers commonly send keyed messages to leverage semantic partitioning for ordering guarantees, stream joins, and cross-cluster replication. However, kafka-producer-perf-test always produces records with null keys, making benchmark results systematically optimistic and unable to reflect the performance characteristics of real keyed workloads.
This proposal adds key distribution support to kafka-producer-perf-test, allowing engineers to benchmark keyed workloads with configurable key ranges and distribution strategies.
Public Interfaces
This proposal adds two new command-line arguments to kafka-producer-perf-test:
--key-distribution <none|range|random> (optional, default: none)
Controls how message keys are assigned:
- none — null key (current behavior, default)
- range — keys cycle through integers 0, 1, ..., KEY-RANGE-1 in round-robin order
- random — each record gets a randomly selected integer from [0, KEY-RANGE)
--message-key-range <KEY-RANGE> (optional, required when --key-distribution is range or random)
Defines the size of the key space. Must be a positive integer.
Proposed Changes
New Enum: KeyDistribution
public enum KeyDistribution {
NONE, RANGE, RANDOM
}
Generate key
Keys are serialized as their decimal string representation encoded in UTF-8, consistent with the ByteArraySerializer already configured for the producer. This keeps keys human-readable in tools like kafka-console-consumer.
| Distribution | Key value |
|---|---|
| NONE | null |
| RANGE | Integer.toString(recordIndex % keyRange) |
RANDOM | Integer.toString(random.nextInt(keyRange)) |
Performance note: The random distribution reuses a single SplittableRandom instance that is already constructed for payload generation. SplittableRandom.nextInt() is a lightweight, non-thread-safe PRNG with no allocation overhead, so key generation adds negligible latency to the hot path.
Validation
ConfigPostProcessor enforces mutual consistency between the two new arguments:
| Condition | Error |
|---|---|
| --key-distribution range or random without --message-key-range | --message-key-range is required when --key-distribution is 'range' or 'random'. |
| --message-key-range specified with --key-distribution none | --key-distribution must be 'range' or 'random' when --message-key-range is specified. |
| --message-key-range ≤ 0 | --message-key-range should be greater than zero. |
Example Usage
- Null keys — existing behavior (default)
bin/kafka-producer-perf-test.sh \ --topic my-topic --num-records 1000000 --record-size 1024 \ --throughput -1 --bootstrap-server localhost:9092
- Round-robin across 100 distinct keys
bin/kafka-producer-perf-test.sh \ --topic my-topic --num-records 1000000 --record-size 1024 \ --throughput -1 --bootstrap-server localhost:9092 \ --key-distribution range --message-key-range 100
- Random keys from a space of 10,000
bin/kafka-producer-perf-test.sh \ --topic my-topic --num-records 1000000 --record-size 1024 \ --throughput -1 --bootstrap-server localhost:9092 \ --key-distribution random --message-key-range 10000
Compatibility, Deprecation, and Migration Plan
The default value of --key-distribution is none, which preserves the current behavior of sending null-key records. Existing scripts and benchmarks continue to work without modification.
Test Plan
All remaining tests should pass, and new unit test.
Rejected Alternatives
UUID keys for random distribution
An alternative design would use UUID.randomUUID().toString() as the key for the random distribution, providing globally unique keys with no repeated values across the entire benchmark run.
This was rejected for two reasons:
- Unbounded key space defeats the purpose. The primary use case for random keys is to benchmark workloads with a known, bounded key space (e.g., 10,000 customer IDs). UUID keys give every record a unique key, making partition distribution identical to round-robin and eliminating the ability to model hot-key or skewed-partition scenarios.
- Performance overhead.
UUID.randomUUID()uses SecureRandom internally, which is significantly slower thanSplittableRandom.nextInt()and could become a bottleneck in high-throughput benchmarks — the opposite of what a perf tool should do.
Engineers who genuinely need globally unique keys can use --key-distribution random --message-key-range <large-number> (e.g., 2^31−1) to approximate the same effect without the overhead.
8 Comments
ViquarKhan
Please update KIP no 1299 already have KIP -KIP-1317:Mandatory DLQ Disposition Header for Share Groups
Ken Huang
You didn’t follow the guideline to update the KIP number, so I won’t change it.
ViquarKhan
Please check creation date else bring separate thread for discussion
Chia-Ping Tsai
hi Vaquar
Thanks for submitting these recent KIPs! I noticed that the KIP numbers aren't updated sequentially, which is causing a few conflicts on the wiki.
Just wanted to check if you have any strict restrictions on using those specific numbers (e.g., for an internal dashboard)? If not, we normally just pick the number from "Next KIP Number" rather than the creation date.
thanks.
ViquarKhan
I've created this on March 14th when 1298 and 1299 were the latest numbers. Unfortunately, I've been out sick and couldn't provide updates until now, and I see there are two new KIPs as of today . each new KIP author should check if existing KIP with same no exist , we have already running 1300 + series .
I have both KIP 1298 and 1299 in draft so i will change my no to avoid conflict and confusion.
PoAn Yang
ViquarKhan You put KIP in wrong directory. Please move it to Kafka Improvement Proposals.
ViquarKhan
Thanks Yang , accepted wrong directory and let me change my KIP no as its still move in discussion later .
ViquarKhan
No action required i have updated my KIP