Status

Current state: Under Discussion 

Discussion thread: here 

JIRA: here 

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Kafka Streams offers a comprehensive set of metrics for analyzing an application.  Although it's possible to deduce the health status of a Kafka Streams application from these metrics, operators require explicit status information metrics for better insights into the application's health.  In particular, operators will want to know the current state (CREATED, STARTING, PARTITIONS_REVOKED, PARTITIONS_ASSIGNED, RUNNING, PENDING_SHUTDOWN, DEAD) of each StreamThread in the client instance, the state (CREATED, REBALANCING, RUNNING, PENDING_SHUTDOWN, NOT_RUNNING, PENDING_ERROR, ERROR) of the client instance itself.  While KIP-1076 extended the use of KIP-714 to provide additional application metrics, those metrics must map to either a Gauge (up/down counter) or a Sum (counter).  The existing Kafka Streams client-level metric is a String, so it can't be collected via the KIP-1076 extension.  To that end, we'll introduce a new numeric metric for the client-level state and take the opportunity to add a similar numeric metric for the thread-level state.  Since the thread-level state will also be of interest to users leveraging JMX metrics, we'll also add a thread-level metric with a String value available only via JMX. 

Additionally, Kafka Streams provides several metrics with fine-grained details. Since recording all these metrics would result in a performance hit, Kafka Streams does not record all metrics out of the box.  There are different recording levels: info , debug , and trace set by the configuration metrics.recording.level. To avoid operator confusion or frustration regarding what metrics to expect, being able to observe the current recording level quickly is essential.

This KIP introduces new metrics to support Kafka Streams operators in quickly assessing an application's current health.

Public Interfaces

New Metrics

The following table provides the new metric name, recording level, type, and description.  Additionally, the group, tag, and converted telemetry name are provided.

NameRecording LevelMetric TypeDescriptionGroupTagsTelemetry NameMBEAN Name
client-stateINFOGaugeThe current state of the Kafka Streams instance stream-metricsclient-id, process-idorg.apache.kafka.stream.client.statekafka.streams:type=stream-metrics,client-id=([-.\w]+), process-id=([-.\w]+)
thread-stateINFOGaugeThe current state of the StreamThread stream-thread-metricsthread-idorg.apache.kafka.stream.thread.thread.statekafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+)
stateINFOStringThe current state of the StreamThread stream-thread-metricsthread-idN/A JMX onlykafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+)
recording-levelINFOIntegerThe level of metrics recordingstream-metricsclient-id, process-idorg.apache.kafka.stream.recording.levelkafka.streams:type=stream-metrics,client-id=([-.\w]+), process-id=([-.\w]+)

Values for client-state metrics

The client-state  metric will reflect the current state of the Kafka Streams client instance, represented by the value returned by KafkaStreams.State.ordinal() of the current state enum.

Client StateMetric Value
CREATED0
REBALANCING1
RUNNING2
PENDING_SHUTDOWN3
NOT_RUNNING4
PENDING_ERROR5
ERROR6

Values for thread-state metrics (KIP-1076 extension)

The thread-state  metric will reflect the current state of a StreamThread  instance, represented by the value returned by StreamThread.State.ordinal()  of the current state enum.

Thread StateMetric Value
CREATED0
STARTING1
PARTITIONS_REVOKED2
PARTITIONS_ASSIGNED3
RUNNING4
PENDING_SHUTDOWN5
DEAD6

Values for thread-state metrics (JMX)

The thread-state  metric for JMX reporters will also reflect the current state represented by the current StreamThread.State, but the metric will be the string value of enum name.

Thread StateMetric Value
CREATED"created"
STARTING"starting"
PARTITIONS_REVOKED"partitions_revoked"
PARTITIONS_ASSIGNED"partitions_assigned"
RUNNING"running"
PENDING_SHUTDOWN"pending_shutdown"
DEAD"dead"

Values for recording-level metrics

The recording-level metric will map the values in the following table

Recording levelMetric Value
INFO0
DEBUG1
TRACE2

Proposed Changes

These new metrics will be available at the INFO level.  Both the client-state  and thread-state will be updated to reflect the changes in the client or thread state.  The recording-level is determined on startup and contains the configured metrics recording level. 

Compatibility, Deprecation, and Migration Plan

  • No impact is expected for existing users, as no existing interfaces or behavior is being changed.
  • The expected impact of representing the state or recording-level will be negligible.

Test Plan

Existing metrics tests will be updated to support these new metrics.

Rejected Alternatives

N/A

  • No labels