Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/vdp8scrrzdq7ofvl0mm84dhphq8kmzgc

JIRA: KAFKA-967 - Getting issue details... STATUS

Motivation

In production, producers commonly send keyed messages to leverage semantic partitioning for ordering guarantees, stream joins, and cross-cluster replication. However, kafka-producer-perf-test always produces records with null keys, making benchmark results systematically optimistic and unable to reflect the performance characteristics of real keyed workloads.

This proposal adds key distribution support to kafka-producer-perf-test, allowing engineers to benchmark keyed workloads with configurable key ranges and distribution strategies.

Public Interfaces

This proposal adds two new command-line arguments to kafka-producer-perf-test:

--key-distribution <none|range|random> (optional, default: none)

Controls how message keys are assigned:

  • none — null key (current behavior, default)
  • range — keys cycle through integers 0, 1, ..., KEY-RANGE-1 in round-robin order
  • random — each record gets a randomly selected integer from [0, KEY-RANGE)

--message-key-range <KEY-RANGE> (optional, required when --key-distribution is range or random)

Defines the size of the key space. Must be a positive integer.

Proposed Changes

New Enum: KeyDistribution

public enum KeyDistribution {
  NONE, RANGE, RANDOM
}

Generate key

Keys are serialized as their decimal string representation encoded in UTF-8, consistent with the ByteArraySerializer already configured for the producer. This keeps keys human-readable in tools like kafka-console-consumer.

DistributionKey value
NONEnull
RANGEInteger.toString(recordIndex % keyRange)  
RANDOM
Integer.toString(random.nextInt(keyRange))

Performance note: The random distribution reuses a single SplittableRandom instance that is already constructed for payload generation. SplittableRandom.nextInt() is a lightweight, non-thread-safe PRNG with no allocation overhead, so key generation adds negligible latency to the hot path.

Validation

ConfigPostProcessor enforces mutual consistency between the two new arguments:

ConditionError
--key-distribution range or random without --message-key-range--message-key-range is required when --key-distribution is 'range' or 'random'.
--message-key-range specified with --key-distribution none--key-distribution must be 'range' or 'random' when --message-key-range is specified.
--message-key-range ≤ 0--message-key-range should be greater than zero.

Example Usage

  • Null keys — existing behavior (default)
    bin/kafka-producer-perf-test.sh \
      --topic my-topic --num-records 1000000 --record-size 1024 \
      --throughput -1 --bootstrap-server localhost:9092
  • Round-robin across 100 distinct keys
    bin/kafka-producer-perf-test.sh \
      --topic my-topic --num-records 1000000 --record-size 1024 \
      --throughput -1 --bootstrap-server localhost:9092 \
      --key-distribution range --message-key-range 100
  • Random keys from a space of 10,000
    bin/kafka-producer-perf-test.sh \
      --topic my-topic --num-records 1000000 --record-size 1024 \
      --throughput -1 --bootstrap-server localhost:9092 \
      --key-distribution random --message-key-range 10000

Compatibility, Deprecation, and Migration Plan

The default value of --key-distribution is none, which preserves the current behavior of sending null-key records. Existing scripts and benchmarks continue to work without modification.

Test Plan

All remaining tests should pass, and new unit test.

Rejected Alternatives

UUID keys for random distribution

An alternative design would use UUID.randomUUID().toString()  as the key for the random distribution, providing globally unique keys with no repeated values across the entire benchmark run.

This was rejected for two reasons:

  1. Unbounded key space defeats the purpose. The primary use case for random keys is to benchmark workloads with a known, bounded key space (e.g., 10,000 customer IDs). UUID keys give every record a unique key, making partition distribution identical to round-robin and eliminating the ability to model hot-key or skewed-partition scenarios.
  2. Performance overhead. UUID.randomUUID()  uses SecureRandom internally, which is significantly slower than SplittableRandom.nextInt()  and could become a bottleneck in high-throughput benchmarks — the opposite of what a perf tool should do.

Engineers who genuinely need globally unique keys can use --key-distribution random --message-key-range <large-number> (e.g., 2^31−1) to approximate the same effect without the overhead.

  • No labels

8 Comments

  1. You didn’t follow the guideline to update the KIP number, so I won’t change it.

  2. Please check creation date else bring separate thread for discussion 

    1. hi Vaquar

      Thanks for submitting these recent KIPs! I noticed that the KIP numbers aren't updated sequentially, which is causing a few conflicts on the wiki.

      Just wanted to check if you have any strict restrictions on using those specific numbers (e.g., for an internal dashboard)? If not, we normally just pick the number from "Next KIP Number" rather than the creation date.

      thanks.


      1. I've created this on March 14th when 1298 and 1299 were the latest numbers. Unfortunately, I've been out sick and couldn't provide updates until now, and I see there are two new KIPs as of today . each new KIP author should check if existing KIP with same no exist  , we have already running 1300 + series . 


        I have both KIP 1298 and 1299 in draft so i will change my no to avoid conflict and confusion.

  3. ViquarKhan You put KIP in wrong directory. Please move it to Kafka Improvement Proposals.

  4. Thanks Yang , accepted wrong directory and let me change my KIP no as its still move  in discussion later .

  5. No action required i have updated my KIP