DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.
Status
Current state: Under Discussion.
Discussion thread: here
JIRA: KAFKA-20407
Motivation
Kafka transactions complete through multiple asynchronous steps coordinated by the transaction coordinator. When a producer sends EndTxn , the coordinator first appends a PREPARE_COMMIT or PREPARE_ABORT transition to the __transaction_state topic. After this append succeeds, the coordinator can respond to the client and then continues transaction finalization by sending commit or abort markers to all partitions participating in the transaction. Once those markers complete, the coordinator appends the final COMPLETE_COMMIT or COMPLETE_ABORT transition to __transaction_state .
When this path is slow or repeatedly retried, users may observe increased EndTxn latency, slow progress for read_committed consumers, transactions remaining in prepare states, or delayed cleanup after coordinator failover.
However, today it is difficult for operators to identify which part of the transaction finalization path is responsible.
Existing request latency metrics can show that EndTxn is slow, but they do not distinguish coordinator validation, request queueing, metadata locking, and the durable append to __transaction_state . Generic broker storage and replication metrics may show disk or replica pressure, but they do not indicate whether that pressure is affecting the transaction state log specifically. Existing transaction marker queue metrics expose only selected queues, such as markers waiting for an unknown destination broker and retries of final transaction log appends, but they do not expose the number of transactions currently waiting for marker completion.
This lack of visibility makes common operational questions difficult to answer:
- Is the transaction coordinator slow because appends to
__transaction_stateare slow? - Is
EndTxnlatency caused by the prepare transition, or is finalization delayed after the client has already received a successful response? - Are transactions waiting because marker fan-out to data partition leaders is failing or retrying?
- Are marker completions done, but the final
COMPLETE_COMMITorCOMPLETE_ABORTappend to__transaction_stateis failing or slow?
This KIP proposes adding transaction coordinator metrics for transaction state log append latency, append errors, marker retry errors, and transactions waiting for marker completion.
Public Interfaces
New Metrics
kafka.server:type=transaction-coordinator-metrics
These metrics are exposed through Kafka Metrics.
| Metric Name | Type | Labels | Description |
|---|---|---|---|
| transaction-state-log-append-latency-ms | Histogram | target-state | The time in milliseconds to append a transaction state transition to __transaction_state . |
| transaction-state-log-append-errors-total | Counter | target-state, error | The count of failed __transaction_state append attempts. |
The target-state label is the transaction state being appended to __transaction_state . The error label is the Kafka error name.
kafka.coordinator.transaction:type=TransactionMarkerChannelManager
These metrics are exposed through the existing Yammer-based TransactionMarkerChannelManager metrics group.
| Metric Name | Type | Labels | Description |
|---|---|---|---|
TransactionsWithPendingMarkersCount | Gauge | none | The current number of transactions waiting for transaction marker completion. |
TransactionMarkerRetriesPerSec | Meter | error | The rate of partition-level retriable transaction marker failures returned in |
The error tag is the Kafka error name returned for the partition-level marker failure.
Proposed Changes
Transaction state log append metrics
The transaction coordinator will record latency and failures for transaction state transition appends to __transaction_state . The append latency metric measures the time from when the coordinator submits a transaction state transition append until that append completes. This metric is recorded only for append attempts that are actually submitted and completed.
The append error metric is incremented when a submitted transaction state transition append fails. It is tagged by the target transaction state and the Kafka error name. The target-state tag identifies the transaction state being appended. The target-state label uses the transaction state enum name. Its possible values are EMPTY, ONGOING , PREPARE_COMMIT , PREPARE_ABORT , COMPLETE_COMMIT , COMPLETE_ABORT .
Transaction marker metrics
The transaction coordinator will expose the number of transactions currently waiting for transaction marker completion. This count represents transactions, not marker entries or partitions. A transaction involving many partitions contributes one to this metric while it is waiting for marker completion.This metric does not include transactions that have already completed marker fan-out and are retrying the final transaction state log append. Such transactions are already represented by the existing final log append retry queue metric.
The transaction coordinator will also meter partition-level retriable transaction marker failures by Kafka error name. This meter is marked once for each partition that returns a retriable marker error in a WriteTxnMarkers response. Retries caused by a disconnected response are not included in this metric because they are not associated with a partition-level Kafka error. Such retries may be covered by existing network/client metrics. The error tag identifies the Kafka error that caused the retry. The error label uses the Kafka error enum name, such as NOT_ENOUGH_REPLICAS_AFTER_APPEND .
Compatibility, Deprecation, and Migration Plan
This change only adds new metrics. No existing metrics or APIs are deprecated.
This KIP adds transaction state log metrics using Kafka Metrics, consistent with the existing transaction-coordinator-metrics group. It adds marker-channel metrics to the existing Yammer-based TransactionMarkerChannelManager metrics group so that they remain colocated with existing marker queue metrics, such as UnknownDestinationQueueSize and LogAppendRetryQueueSize . This KIP does not migrate existing Yammer-based TransactionMarkerChannelManager metrics to Kafka Metrics. Such migration is outside the scope of this KIP and can be addressed separately.
Test Plan
Unit tests will be added or updated to verify the following behavior.
- Transaction state log append metrics
- Verify that
transaction-state-log-append-latency-msis recorded when a transaction state append is submitted and completes successfully. - Verify that
transaction-state-log-append-latency-msis not recorded when the append is not submitted. - Verify that the latency metric is tagged with the expected
target-state. - Verify that
transaction-state-log-append-errors-totalis incremented when a submitted transaction state append completes with an error. - Verify that the append error metric is tagged with the expected
target-stateanderror. - Verify that no append error metric is recorded for successful appends.
- Verify that
- Transaction marker metrics
- Verify that
TransactionsWithPendingMarkersCountreflects the number of transactions waiting for marker completion. - Verify that
TransactionsWithPendingMarkersCountincreases when a transaction is registered for marker completion. - Verify that
TransactionsWithPendingMarkersCountdecreases when marker completion is no longer pending. - Verify that
TransactionsWithPendingMarkersCountdoes not include transactions that have moved to the final transaction state log append retry queue. - Verify that
TransactionMarkerRetriesPerSecis marked once for each partition-level retriable marker failure. - Verify that
TransactionMarkerRetriesPerSecis tagged with the expectederror. - Verify that non-retriable marker errors do not mark
TransactionMarkerRetriesPerSec. - Verify that metrics registered by the transaction marker channel manager are removed when the manager shuts down.
- Verify that transaction marker metrics are cleaned up correctly when a transaction topic partition is unloaded or the coordinator resigns.
- Verify that
Rejected Alternatives
Add transaction-count{state}
A transaction state count metric can be useful as diagnostic context, but it does not directly identify the bottleneck in transaction completion. For example, a high number of PREPARE_COMMIT or PREPARE_ABORT transactions may indicate marker backlog, but it does not distinguish marker fan-out delay from final __transaction_state append delay.
Add exact pending transaction age
An exact current-age gauge would require tracking the oldest pending transaction or scanning pending transactions when queried. Scanning is undesirable for high-cardinality transaction workloads, and maintaining exact oldest age adds complexity around removals, retries, coordinator failover, and partition migration.
Add per-transactional-id metrics
Per-transactional-id metrics would create unbounded cardinality and are not appropriate for broker metrics.
Add transaction completion latency
This KIP does not add a transaction completion latency metric. Defining this metric consistently across transaction coordinator failover is ambiguous. A transaction may enter the prepare state on one coordinator and be completed by another coordinator after loading the transaction state from __transaction_state . A coordinator-local latency would be implementable, but it would not represent end-to-end transaction completion time across failover. A prepare-state-duration metric could include failover time, but it would measure a different concept and mix marker processing, coordinator loading, and failover delay.