Status
Current state: Under Discussion
Discussion thread: here [Change the link from the KIP proposal email archive to your own email thread]
JIRA:
-
KAFKA-18455Getting issue details...
STATUS
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Currently, when clients attempt to establish connections to a broker and encounter throttling or exceeding the maximum connection limit (waiting for an available connection slot), the broker does not provide any logs or metrics for these scenarios. Clients only receive connection timeout exceptions, which provide insufficient information for troubleshooting. We can enhance observability and help users effectively diagnose connection issues by implementing additional connection-related metrics.
Public Interfaces
MetricName | Type | Group | Tag | Description | JMX Bean |
---|---|---|---|---|---|
${listener_name}WaitingConnection | Gauge | Acceptor | listener:<listener_name> | Waiting connections for the specific listener | kafka.network:type=Acceptor,name=${listener_name}WaitingConnection,listener={listener_name} |
Proposed Changes
We propose adding metrics described in the Public Interfaces section, which could help users effectively diagnose connection quota issues.
Compatibility, Deprecation, and Migration Plan
N/A.
This a new metric, there are no compatibility concerns.
Test Plan
The new metrics will need unit and integration tests to prove their correctness.
Rejected Alternatives
Adding logs to the SocketServer
This alternative was rejected because Kafka is a high-throughput system handling numerous concurrent connections.
Adding logs for connection throttling and limit exceeded scenarios would likely result in log flooding, potentially causing:
- I/O overhead
- Storage space issues
- Identifying critical issues among the massive volume of connection logs would be super annoying
Using metrics instead of logs provides a more suitable solution for monitoring connection states without the overhead of extensive logging.