This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current state: Under Discussion

Discussion thread: here

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

  1. Modern Hardware Capabilities: Current deployments often use high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB segments are inefficiently small
  2. File Handle Optimization: Large Kafka deployments with many topics can have 50-100k open files across all segment types (.log, .index, .timeindex files). Each segment requires open file handles, and larger segments would reduce the total number of files and improve caching efficiency
  3. Performance Benefits: Fewer segment rotations in high-traffic scenarios would reduce I/O overhead and improve overall performance. Sequential disk operations are much faster than random access patterns
  4. Storage Efficiency: Having fewer segment files improves filesystem metadata performance and reduces inode usage on high-volume deployments
  5. Community Interest: Similar requests have been raised in community forums (see Confluent forum discussion)

Public Interfaces

Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.

Public Interfaces

Configuration Changes

  • log.segment.bytes: Configuration parameter data type changed from int to long
    • This affects server configuration validation and parsing
    • Maximum value increases from ~2.1GB (2,147,483,647) to much larger values

Java Interfaces/Classes

  • RemoteLogSegmentMetadata: Public interface currently uses int segmentSizeInBytes
    • Method signature will change to use long for segment size representation
    • This affects any code implementing or consuming this interface

Binary Log Format (Potentially)

  • Index file format: If segments can exceed 2GB, the index file format may need changes
    • Current .index files use 4-byte integers for file positions
    • May require format version bump to support 8-byte positions
    • This would be a binary format change affecting log compatibility

Monitoring and Metrics

  • JMX Metrics: Any metrics that report segment sizes may need to handle larger values
    • Metrics like segment size distributions, average segment sizes, etc.
    • Metric data types may need to change from int to long

Administrative Tools

  • Configuration validation: Tools that validate Kafka configurations
  • kafka-configs.sh: Command-line tool for setting configurations
  • kafka-log-dirs.sh: Tool that reports log directory information including segment sizes

Backward Compatibility Considerations

  • Protocol version: May require bumping if RemoteLogSegmentMetadata changes affect inter-broker communication
  • Log format version: If index file format changes, may require new log format version

Notes on Scope

  • Client-facing APIs (Producer/Consumer) should not be directly affected since this is a broker-side configuration
  • Serialization interfaces likely unaffected as this is primarily a storage-layer change

Proposed Changes

Core Change

  • Change log.segment.bytes configuration parameter data type from int to long
  • Update configuration validation to accept values larger than 2GB

Interface Updates

  • Modify RemoteLogSegmentMetadata.segmentSizeInBytes from int to long
  • Update any related method signatures and implementations

Index File Format

  • Evaluate if index file format needs updating to support 8-byte file positions for segments > 2GB
  • If needed, introduce new index format version with backward compatibility

Configuration and Validation

  • Update configuration parsing logic to handle long values
  • Implement reasonable upper bounds (e.g., prevent extremely large values that could cause issues)
  • Update configuration documentation and default value handling

Compatibility

  • Maintain backward compatibility for existing configurations < 2GB
  • Ensure graceful handling during upgrades from int to long values

Implementation Priority: Start with configuration change and RemoteLogSegmentMetadata interface updates, then address index file format if segments actually exceed 2GB in testing.

Compatibility, Deprecation, and Migration Plan

  • The Good News

    • Nothing breaks: Your existing Kafka setup keeps working exactly as before
    • No forced migration: You don't have to change anything if you don't want to
    • Rolling upgrades work: Upgrade brokers one by one like usual

    What Actually Happens

    When You Upgrade

    1. Install the new broker version - everything works the same
    2. Your current segment sizes (under 2GB) keep working fine
    3. Want bigger segments? Just change the config when you're ready

    If You Want Larger Segments

    • Set log.segment.bytes to something bigger than 2GB
    • Only new segments will be larger - old ones stay as-is
    • Your topics will have a mix of old small segments and new large ones (totally fine)

    The Index File Thing

    • If segments get really big (>2GB), Kafka might need to update how it tracks message positions
    • This happens automatically for new segments
    • Old segments keep using the old format
    • Both formats work together just fine

    Real Talk About Rollbacks

    • Easy rollback: If you never set segments >2GB, you can downgrade no problem
    • Tricky rollback: If you created huge segments, you'll need to deal with those first before downgrading

    No Deprecation Drama

Test Plan

Basic Functionality Tests

  • Configuration parsing: Test that broker accepts long values for log.segment.bytes
  • Validation: Ensure reasonable upper/lower bounds work (reject negative values, extremely large values)
  • Config tools: Verify kafka-configs.sh handles long values correctly

Segment Creation and Rotation

  • Small segments: Confirm existing behavior unchanged for values < 2GB
  • Large segments: Create segments between 2-4GB and verify they work properly
  • Mixed sizes: Test topics with both small and large segments coexisting
  • Rotation triggers: Ensure large segments rotate correctly when size limit reached

The Tricky Stuff

  • Index file handling: When segments exceed 2GB, verify index files work correctly
    • Test message lookup performance doesn't degrade
    • Verify offset-to-position mappings work for large files
  • RemoteLogSegmentMetadata: Test interface changes don't break tiered storage
  • Memory usage: Check that large segments don't cause unexpected memory issues

Upgrade/Compatibility Testing

  • Rolling upgrades: Test mixed broker versions during upgrade
  • Config replication: Verify large values rejected gracefully on older brokers
  • Downgrade scenarios: Test rollback behavior with and without large segments

Stress Testing

  • High throughput: Test segment rotation under heavy write load with large segments
  • Many partitions: Verify file handle limits with fewer but larger segment files
  • Storage full: Test behavior when disk fills up with large segments

Edge Cases

  • Boundary values: Test exactly at 2GB, just over 2GB
  • Concurrent operations: Multiple producers writing to segments near size limit
  • Broker restart: Verify large segments load correctly after restart

Performance Validation

  • Compare metrics before/after: segment rotation frequency, file handle count, disk I/O patterns

What We're NOT Testing

  • Client-side changes (there shouldn't be any)
  • Network protocol changes (this is broker-internal)

Rejected Alternatives

Rejected Alternatives

Option 1: Add New Configuration Parameter

Keep log.segment.bytes as int, add log.segment.bytes.v2 as long. Rejected: Confusing to have two configs doing the same thing.

Option 2: Multiple Files per Segment

Allow logical segments to span multiple 2GB files. Rejected: Doesn't solve the main issue - log.segment.bytes would still be an int limited to 2GB.

Option 3: Segment Compression

Compress segments to fit more data under 2GB. Rejected: Nothing to compress - Kafka stores raw bytes. Also adds CPU overhead.

The Winner

Change int to long, keep it simple. Once you create segments >2GB, you can't rollback to older broker versions, but that's the trade-off for bigger segments.

  • No labels