DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: Under Discussion
Discussion thread: here
JIRA: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
- Modern Hardware Capabilities: Current deployments often use high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB segments are inefficiently small
- File Handle Optimization: Large Kafka deployments with many topics can have 50-100k open files across all segment types (.log, .index, .timeindex files). Each segment requires open file handles, and larger segments would reduce the total number of files and improve caching efficiency
- Performance Benefits: Fewer segment rotations in high-traffic scenarios would reduce I/O overhead and improve overall performance. Sequential disk operations are much faster than random access patterns
- Storage Efficiency: Having fewer segment files improves filesystem metadata performance and reduces inode usage on high-volume deployments
- Community Interest: Similar requests have been raised in community forums (see Confluent forum discussion)
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
Public Interfaces
Configuration Changes
log.segment.bytes: Configuration parameter data type changed frominttolong- This affects server configuration validation and parsing
- Maximum value increases from ~2.1GB (2,147,483,647) to much larger values
Java Interfaces/Classes
RemoteLogSegmentMetadata: Public interface currently usesint segmentSizeInBytes- Method signature will change to use
longfor segment size representation - This affects any code implementing or consuming this interface
- Method signature will change to use
Binary Log Format (Potentially)
- Index file format: If segments can exceed 2GB, the index file format may need changes
- Current
.indexfiles use 4-byte integers for file positions - May require format version bump to support 8-byte positions
- This would be a binary format change affecting log compatibility
- Current
Monitoring and Metrics
- JMX Metrics: Any metrics that report segment sizes may need to handle larger values
- Metrics like segment size distributions, average segment sizes, etc.
- Metric data types may need to change from int to long
Administrative Tools
- Configuration validation: Tools that validate Kafka configurations
- kafka-configs.sh: Command-line tool for setting configurations
- kafka-log-dirs.sh: Tool that reports log directory information including segment sizes
Backward Compatibility Considerations
- Protocol version: May require bumping if RemoteLogSegmentMetadata changes affect inter-broker communication
- Log format version: If index file format changes, may require new log format version
Notes on Scope
- Client-facing APIs (Producer/Consumer) should not be directly affected since this is a broker-side configuration
- Serialization interfaces likely unaffected as this is primarily a storage-layer change
Proposed Changes
Core Change
- Change
log.segment.bytesconfiguration parameter data type frominttolong - Update configuration validation to accept values larger than 2GB
Interface Updates
- Modify
RemoteLogSegmentMetadata.segmentSizeInBytesfrominttolong - Update any related method signatures and implementations
Index File Format
- Evaluate if index file format needs updating to support 8-byte file positions for segments > 2GB
- If needed, introduce new index format version with backward compatibility
Configuration and Validation
- Update configuration parsing logic to handle long values
- Implement reasonable upper bounds (e.g., prevent extremely large values that could cause issues)
- Update configuration documentation and default value handling
Compatibility
- Maintain backward compatibility for existing configurations < 2GB
- Ensure graceful handling during upgrades from int to long values
Implementation Priority: Start with configuration change and RemoteLogSegmentMetadata interface updates, then address index file format if segments actually exceed 2GB in testing.
Compatibility, Deprecation, and Migration Plan
The Good News
- Nothing breaks: Your existing Kafka setup keeps working exactly as before
- No forced migration: You don't have to change anything if you don't want to
- Rolling upgrades work: Upgrade brokers one by one like usual
What Actually Happens
When You Upgrade
- Install the new broker version - everything works the same
- Your current segment sizes (under 2GB) keep working fine
- Want bigger segments? Just change the config when you're ready
If You Want Larger Segments
- Set
log.segment.bytesto something bigger than 2GB - Only new segments will be larger - old ones stay as-is
- Your topics will have a mix of old small segments and new large ones (totally fine)
The Index File Thing
- If segments get really big (>2GB), Kafka might need to update how it tracks message positions
- This happens automatically for new segments
- Old segments keep using the old format
- Both formats work together just fine
Real Talk About Rollbacks
- Easy rollback: If you never set segments >2GB, you can downgrade no problem
- Tricky rollback: If you created huge segments, you'll need to deal with those first before downgrading
No Deprecation Drama
Test Plan
Basic Functionality Tests
- Configuration parsing: Test that broker accepts long values for
log.segment.bytes - Validation: Ensure reasonable upper/lower bounds work (reject negative values, extremely large values)
- Config tools: Verify
kafka-configs.shhandles long values correctly
Segment Creation and Rotation
- Small segments: Confirm existing behavior unchanged for values < 2GB
- Large segments: Create segments between 2-4GB and verify they work properly
- Mixed sizes: Test topics with both small and large segments coexisting
- Rotation triggers: Ensure large segments rotate correctly when size limit reached
The Tricky Stuff
- Index file handling: When segments exceed 2GB, verify index files work correctly
- Test message lookup performance doesn't degrade
- Verify offset-to-position mappings work for large files
- RemoteLogSegmentMetadata: Test interface changes don't break tiered storage
- Memory usage: Check that large segments don't cause unexpected memory issues
Upgrade/Compatibility Testing
- Rolling upgrades: Test mixed broker versions during upgrade
- Config replication: Verify large values rejected gracefully on older brokers
- Downgrade scenarios: Test rollback behavior with and without large segments
Stress Testing
- High throughput: Test segment rotation under heavy write load with large segments
- Many partitions: Verify file handle limits with fewer but larger segment files
- Storage full: Test behavior when disk fills up with large segments
Edge Cases
- Boundary values: Test exactly at 2GB, just over 2GB
- Concurrent operations: Multiple producers writing to segments near size limit
- Broker restart: Verify large segments load correctly after restart
Performance Validation
- Compare metrics before/after: segment rotation frequency, file handle count, disk I/O patterns
What We're NOT Testing
- Client-side changes (there shouldn't be any)
- Network protocol changes (this is broker-internal)
Rejected Alternatives
Rejected Alternatives
Option 1: Add New Configuration Parameter
Keep log.segment.bytes as int, add log.segment.bytes.v2 as long. Rejected: Confusing to have two configs doing the same thing.
Option 2: Multiple Files per Segment
Allow logical segments to span multiple 2GB files. Rejected: Doesn't solve the main issue - log.segment.bytes would still be an int limited to 2GB.
Option 3: Segment Compression
Compress segments to fit more data under 2GB. Rejected: Nothing to compress - Kafka stores raw bytes. Also adds CPU overhead.
The Winner
Change int to long, keep it simple. Once you create segments >2GB, you can't rollback to older broker versions, but that's the trade-off for bigger segments.