Current state: Under Discussion
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
The KafkaConsumer is a complex client that requires many different components to function properly. When a consumer is not operating properly, it can be difficult to identify the root cause and which component is causing issues (ConsumerCoordinator, Fetcher, ConsumerNetworkClient, etc).
This KIP aims to improve the monitoring and detection of KafkaConsumer’s Fetcher component.
Fetcher will send a fetch request for each node that the consumer has assigned partitions for.
This fetch request may fail under the following cases:
- Intermittent network issues
- Node sent an invalid full/incremental fetch response
These cases are logged, but it would be valuable to provide a corresponding metric that allows for monitoring and alerting.
This proposal adds the following metrics:
We will add a new sensor in FetchManagerMetrics fetchErrors which will use a Meter to provide the rate and total calculations.
private final Sensor fetchErrors;
There will also be a helper method in FetchManagerMetrics to increment the fetchErrors sensor.
Fetch errors will be recorded in two places:
Compatibility, Deprecation, and Migration Plan
This proposal adds new metrics without making any changes to the underlying operations so there should not be any issues.
None so far.