Status

Current state: Under Discussion

Discussion thread: here 

JIRA: KAFKA-18455 - Getting issue details... STATUS

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Currently, when clients attempt to establish connections to a broker and encounter throttling or exceeding the maximum connection limit (waiting for an available connection slot), the broker does not provide any logs or metrics for these scenarios. Clients only receive connection timeout exceptions, which provide insufficient information for troubleshooting. We can enhance observability and help users effectively diagnose connection issues by implementing additional connection-related metrics.

Public Interfaces

MetricNameTypeGroupTagDescriptionJMX Bean
waiting-connectionGaugeAcceptorlistener:<listener_name>Waiting connections for the specific listenerkafka.network:type=Acceptor,name=waiting-connection,listener={listener_name}
connection-latencyHistogramAcceptorlistener: <listener_name>connection wait time for the specific listenerkafka.network:type=Acceptor,name=connection-latency,listener={listener_name}

Proposed Changes

We propose adding metrics described in the Public Interfaces section, which could help users effectively diagnose connection quota issues.

Compatibility, Deprecation, and Migration Plan

N/A.

This is a new metric, and there are no compatibility concerns.

Test Plan

The new metrics will need unit and integration tests to prove their correctness.

Rejected Alternatives

Adding logs to the SocketServer 

This alternative was rejected because Kafka is a high-throughput system handling numerous concurrent connections.
Adding logs for connection throttling and limit exceeded scenarios would likely result in log flooding, potentially causing:

  • I/O overhead
  • Storage space issues
  • Identifying critical issues among the massive volume of connection logs would be super annoying.

Using metrics instead of logs provides a more suitable solution for monitoring connection states without the overhead of extensive logging.


  • No labels