DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Status
Current state: "Under Discussion"
Discussion thread: here
JIRA: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Currently, when a Kafka broker runs out of disk space, continuing to write to a log dir will result in a java.io.IOException: No space left on device error. This causes the log dir to be considered failed, and all topic partitions in the offline log dir to be offline. When all log dir under the broker fail, the Kafka broker will execute shutdown to stop service.
In this case, Kafka Broker cannot continue to process any requests or clean up expired logs. After the disk is full, the service can only be recovered by manually deleting data or expanding disk storage and then restarting the service.
In contrast, RocketMQ and RabbitMQ already support disk threshold configuration (such as diskMaxUsedSpaceRatio , disk_free_limit ). When the disk usage exceeds the limit, the system rejects new writes, but the service remains available.
Therefore, this proposal suggests introducing a similar mechanism in Kafka: allowing for configurable disk usage thresholds. When these thresholds are exceeded, the broker will reject new production requests while retaining other service capabilities. In this case, the service will still clean up expired logs, and users can easily adjust topic retention times or delete topics to free up disk space through kafka management tools or APIs.
This will sacrifice a small amount of disk space utilization, but by avoiding disk full error, it improves service availability and avoid recover operations after the disk full error.
Public Interfaces
Configs
This proposal will add three broker config:
disk.usage.check.interval.ms- Type: long
- Default: 1 minute
- Description: The frequency in milliseconds that the log manager check disk usage to determine whether any log dir usage exceeds the threshold.
disk.max.used.percent- Type: int (10 ~ 100)
- Default: 99 (99%)
- Description: The max percent of disk usage, if the disk usage is more than this value, all topic partition in this log dir will reject produce request.
disk.min.free.bytes- Type: long (bytes)
- Default: 1GB
- Description: The min free bytes of disk, if the disk free bytes is less than this value, all topic partition in this log dir will reject produce request.
Metrics
This proposal will add two metrics:
kafka.log:type=LogManager,name=DiskFullLogDirectoryCount:The number of log dir whose disk usage exceeds the threshold.kafka.log:type=LogManager,name=LogDirectoryDiskFull: Indicates if a log dir is exceeds the threshold (1) or not (0).
Error Codes
This proposal will add a new error code for produce requests rejected due to disk usage protection.
NOT_ENOUGH_SPACE(128, "Messages are rejected since there are not enough disk space.", NotEnoughSpaceException::new);
Proposed Changes
- Add a property to
UnifiedLogto indicate whether the remaining space on the disk is sufficient. - Add a new scheduled task
checkLogDirsUsagetoLogManagerto periodically check the usage (used percent, free space) of each log directory (log.dirs).- If any log directory exceeds
disk.max.used.percentordisk.free.limit.bytes, add it to the disk full log dirs set, and update the disk status of allUnifiedLogunder this log dir. - When the disk usage falls below the threshold, remove it from the disk full log dir set, and restore the disk status of the
UnifiedLog.
- If any log directory exceeds
- When processing the client produce request, use the disk status of
UnifiedLogto determine whether the log dirs where the topic partition is located have sufficient space. If not, reject the production request and return theNOT_ENOUGH_SPACEerror. - When the replica fetcher thread builds a fetch request, use the disk status of
UnifiedLogto determine whether the log dirs where the topic partition is located have sufficient space. If not, skip the partition.
Compatibility, Deprecation, and Migration Plan
For the server, after upgrading to the new version, users need pay attention to the disk.max.used.percent and disk.min.free.bytes config. The new version will reserve a small amount of space on the server to prevent disk full and failure. If users want to maintain the same settings as before, can set disk.max.used.percent to 100 and disk.min.free.bytes to 0.
For the client, because of the new error code, if the server rejects the produce request due to insufficient disk space, if the client has not been upgraded to the new version, it will receive Unexpected error code: 128 . If the client has been upgraded to the new version, it will get a clear NOT_ENOUGH_SPACE error.
Test Plan
- Unit Tests
- ApiErrorTest
- RequestResponseTest
- PartitionTest
- Test that appendRecordsToLeader succeeds when there is sufficient disk space
- Test that appendRecordsToLeader throws a NotEnoughSpaceException when the disk is full and the request comes from a client
- LogManagerTest
- Integration Tests
- Produce Request Fails When Disk is Full
- Replication Behavior When Follower Disk is Full
- Automatic Recovery After Disk Cleanup
- Performance Tests
Rejected Alternatives
- Check disk usage in each produce request, which can avoid disk full between the check interval. However, frequent checks in high-throughput case can increase system load and performance overhead.
- Check disk usage in the existing log cleanup task `cleanupLogs`, but this would make the task more complex, and the disk usage check task would likely require a higher frequency. (RocketMQ checks disk usage in the cleanup task, but its cleanup logic differs from Kafka's.)
- Auto delete old data when a threshold is reached. However, because the partition must contain an active segment, and a min retention limit is required to prevent broker deleting too much data, This can alleviate the problem in some cases, but it cannot completely prevent the broker disk full. (RocketMQ will force delete old data, we have a similar implement on kafka, but with a min retention limit when auto delete, maybe we can discuss it in the future.)