...
There may be situations where it is not desirable for all batches to be compressed with a single compression type. For this reason, it may eventually be useful to provide a way to restrict the batches being compressed for the analysis. For example, it might be possible to exclude batches compressed with a certain compression type from being recompressed, only analyzing the remaining subset of the log. However, this can be implemented as a follow-up addition once better motivation for what mechanisms are needed and how they might work is available.
Rejected Alternatives
TODOAnother approach could be to run the tool as a consumer-like process that would fetch batches from the Kafka cluster and perform the compression measurements directly on those batches. This would require the tool to be provided the appropriate authentication information for the topic/partition being analyzed. This would also require batches of records to be exposed to the tool, which the consumer's interface and internals (specifically the fetcher) do not currently expose.