Current state: Under Discussion
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Currently, log manager chooses a directory configured in `log.dirs` by calculating the number partitions in each directory and then choosing the one with the fewest partitions. But in some real production environments where data volumes of partitions are not even, we already observed that some directories became nearly full whereas the others had a lot of spaces which lead to a poor data distribution. The directory with fewest partitions might not be the one with minimum disk spaces.
We could offer a new strategy to users to have log manager honor the real disk free spaces and select the one with most disk space as the next log directory.
This log directory selection strategy, which is implemented on the broker side, would add a new broker configuration parameter 'log.direcotry.select.strategy' that specifies which strategy is used to determine the log directory in which new log is created. Two possible values are :
- partition：Same as the current design which is based on the partition numbers on each directories.
- size: the newly introduced strategy which is based on the real disk spaces for each directories.
The size-based strategy could employ Java File's getFreeSpace API to retrieve the free space for a directory. If SecurityException is captured, LogManager could step back and use the partition-count-based strategy.
Compatibility, Deprecation, and Migration Plan
There is no impact on the existing users. Anyone who wants to use the new strategy has to configure the parameter 'log.direcotry.select.strategy' to `size` and restart the broker.