Status
Current state: Under Discussion
Discussion thread: here
JIRA: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Kafka Streams offers a comprehensive set of metrics for analyzing an application. Although it's possible to deduce the health status of a Kafka Streams application from these metrics, operators require explicit status information metrics for better insights into the application's health. In particular, operators will want to know the current state (CREATED, STARTING, PARTITIONS_REVOKED, PARTITIONS_ASSIGNED, RUNNING, PENDING_SHUTDOWN, DEAD) of each StreamThread
in the client instance, the state (CREATED, REBALANCING, RUNNING, PENDING_SHUTDOWN, NOT_RUNNING, PENDING_ERROR, ERROR) of the client instance itself. While KIP-1076 extended the use of KIP-714 to provide additional application metrics, those metrics must map to either a Gauge (up/down counter) or a Sum (counter). The existing Kafka Streams client-level metric is a String
, so it can't be collected via the KIP-1076 extension. To that end, we'll introduce a new numeric metric for the client-level state and take the opportunity to add a similar numeric metric for the thread-level state. Since the thread-level state will also be of interest to users leveraging JMX metrics, we'll also add a thread-level metric with a String value available only via JMX.
Additionally, Kafka Streams provides several metrics with fine-grained details. Since recording all these metrics would result in a performance hit, Kafka Streams does not record all metrics out of the box. There are different recording levels: info
, debug
, and trace
set by the configuration metrics.recording.level
. To avoid operator confusion or frustration regarding what metrics to expect, being able to observe the current recording level quickly is essential.
This KIP introduces new metrics to support Kafka Streams operators in quickly assessing an application's current health.
Public Interfaces
New Metrics
The following table provides the new metric name, recording level, type, and description. Additionally, the group, tag, and converted telemetry name are provided.
Name | Recording Level | Metric Type | Description | Group | Tags | Telemetry Name | MBEAN Name |
---|---|---|---|---|---|---|---|
client-state | INFO | Gauge | The current state of the Kafka Streams instance | stream-metrics | client-id, process-id | org.apache.kafka.stream.client.state | kafka.streams:type=stream-metrics,client-id=([-.\w]+), process-id=([-.\w]+) |
thread-state | INFO | Gauge | The current state of the StreamThread | stream-thread-metrics | thread-id | org.apache.kafka.stream.thread.thread.state | kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+) |
state | INFO | String | The current state of the StreamThread | stream-thread-metrics | thread-id | N/A JMX only | kafka.streams:type=stream-thread-metrics,thread-id=([-.\w]+) |
recording-level | INFO | Integer | The level of metrics recording | stream-metrics | client-id, process-id | org.apache.kafka.stream.recording.level | kafka.streams:type=stream-metrics,client-id=([-.\w]+), process-id=([-.\w]+) |
Values for client-state metrics
The client-state
metric will reflect the current state of the Kafka Streams client instance, represented by the value returned by KafkaStreams.State.ordinal()
of the current state enum.
Client State | Metric Value |
---|---|
CREATED | 0 |
REBALANCING | 1 |
RUNNING | 2 |
PENDING_SHUTDOWN | 3 |
NOT_RUNNING | 4 |
PENDING_ERROR | 5 |
ERROR | 6 |
Values for thread-state metrics (KIP-1076 extension)
The thread-state
metric will reflect the current state of a StreamThread
instance, represented by the value returned by StreamThread.State.ordinal()
of the current state enum.
Thread State | Metric Value |
---|---|
CREATED | 0 |
STARTING | 1 |
PARTITIONS_REVOKED | 2 |
PARTITIONS_ASSIGNED | 3 |
RUNNING | 4 |
PENDING_SHUTDOWN | 5 |
DEAD | 6 |
Values for thread-state metrics (JMX)
The thread-state
metric for JMX reporters will also reflect the current state represented by the current StreamThread.State
, but the metric will be the string value of enum name.
Thread State | Metric Value |
---|---|
CREATED | "created" |
STARTING | "starting" |
PARTITIONS_REVOKED | "partitions_revoked" |
PARTITIONS_ASSIGNED | "partitions_assigned" |
RUNNING | "running" |
PENDING_SHUTDOWN | "pending_shutdown" |
DEAD | "dead" |
Values for recording-level metrics
The recording-level metric will map the values in the following table
Recording level | Metric Value |
---|---|
INFO | 0 |
DEBUG | 1 |
TRACE | 2 |
Proposed Changes
These new metrics will be available at the INFO level. Both the client-state
and thread-state
will be updated to reflect the changes in the client or thread state. The recording-level
is determined on startup and contains the configured metrics recording level.
Compatibility, Deprecation, and Migration Plan
- No impact is expected for existing users, as no existing interfaces or behavior is being changed.
- The expected impact of representing the state or recording-level will be negligible.
Test Plan
Existing metrics tests will be updated to support these new metrics.
Rejected Alternatives
N/A