|Table of Contents
Current state: Under Discussion
Kafka Consumers use the FetchRequest to read topic partition data at a requested offset. The producer messages are stored as immutable log files on secondary non-volatile storage like hard disk drive (HDD). The consumers consume these messages in a sequential manner. Kafka takes advantage of the Kernel PageCache as it helps to facilitate low latency fetches for reads that get fulfilled via Page Cache. If there is enough free memory, the page is kept in the cache for an indefinite period of time and can then be reused by other processes without accessing the disk.
Note In many use cases consumers are expected to fetch at the tail end of the log.
- Metrics : The consumer lag metrics will be added to the existing metrics interface in the broker
The proposal is to have stats on consumer lag in terms for bytes as well as time. We will add broker level metrics aggregated across all partitions. Both metrics will be histogram.
In order to measure the byte lag newer segments size sum requires O(N) sum operation, the number of segments are assumed to be small (or none) for the large case as the majority consumer fetch pattern is at tail end (recent)
Event times have been used for lag measurement which has side effects of user provided timestamp info
Compatibility, Deprecation, and Migration Plan
- No expected behavior change. The fetch code path will me used for measuring the consumer lag. The current fetch path is O(lg N), where N = number of segments. Since the consumer lag has to me measured in bytes this will have to change to an O(N) over the map that preserves the metadata of the segments at the tail end. Since the majority of the scenarios have a tail end fetch the impact is expected to be minimal.
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.