This page is meant as a template for writing a KIP. To create a KIP choose Tools->Copy on this page and modify with your content and replace the heading with the next KIP number and a description of your issue. Replace anything in italics with your own description.

Status

Current stateUnder Discussion.

Discussion thread: here 

JIRA: KAFKA-20407 


Motivation

Kafka transactions complete through multiple asynchronous steps coordinated by the transaction coordinator. When a producer sends EndTxn , the coordinator first appends a PREPARE_COMMIT  or PREPARE_ABORT  transition to the __transaction_state  topic. After this append succeeds, the coordinator can respond to the client and then continues transaction finalization by sending commit or abort markers to all partitions participating in the transaction. Once those markers complete, the coordinator appends the final COMPLETE_COMMIT  or COMPLETE_ABORT  transition to __transaction_state . 

When this path is slow or repeatedly retried, users may observe increased EndTxn  latency, slow progress for read_committed  consumers, transactions remaining in prepare states, or delayed cleanup after coordinator failover.
However, today it is difficult for operators to identify which part of the transaction finalization path is responsible. 

Existing request latency metrics can show that EndTxn  is slow, but they do not distinguish coordinator validation, request queueing, metadata locking, and the durable append to __transaction_state . Generic broker storage and replication metrics may show disk or replica pressure, but they do not indicate whether that pressure is affecting the transaction state log specifically. Existing transaction marker queue metrics expose only selected queues, such as markers waiting for an unknown destination broker and retries of final transaction log appends, but they do not expose the number of transactions currently waiting for marker completion.


This lack of visibility makes common operational questions difficult to answer:

  1. Is the transaction coordinator slow because appends to __transaction_state  are slow?
  2. Is EndTxn  latency caused by the prepare transition, or is finalization delayed after the client has already received a successful response?
  3. Are transactions waiting because marker fan-out to data partition leaders is failing or retrying?
  4. Are marker completions done, but the final COMPLETE_COMMIT  or COMPLETE_ABORT  append to __transaction_state  is failing or slow?


This KIP proposes adding transaction coordinator metrics for transaction state log append latency, append errors, marker retry errors, and transactions waiting for marker completion.


Public Interfaces

New Metrics

kafka.server:type=transaction-coordinator-metrics

These metrics are exposed through Kafka Metrics.

Metric NameTypeLabelsDescription
transaction-state-log-append-latency-msHistogramtarget-stateThe time in milliseconds to append a transaction state transition to __transaction_state .
transaction-state-log-append-errors-totalCountertarget-state, errorThe count of failed __transaction_state  append attempts.

The target-state  label is the transaction state being appended to __transaction_state . The error  label is the Kafka error name.


kafka.coordinator.transaction:type=TransactionMarkerChannelManager

These metrics are exposed through the existing Yammer-based TransactionMarkerChannelManager  metrics group.

Metric NameTypeLabelsDescription

TransactionsWithPendingMarkersCount

GaugenoneThe current number of transactions waiting for transaction marker completion.

TransactionMarkerRetriesPerSec

Metererror

The rate of partition-level retriable transaction marker failures returned in WriteTxnMarkers  responses. The meter count represents the cumulative total.

The error  tag is the Kafka error name returned for the partition-level marker failure.


Proposed Changes

Transaction state log append metrics

The transaction coordinator will record latency and failures for transaction state transition appends to __transaction_state .  The append latency metric measures the time from when the coordinator submits a transaction state transition append until that append completes. This metric is recorded only for append attempts that are actually submitted and completed.


The append error metric is incremented when a submitted transaction state transition append fails. It is tagged by the target transaction state and the Kafka error name. The target-state  tag identifies the transaction state being appended. The target-state  label uses the transaction state enum name. Its possible values are EMPTY, ONGOING , PREPARE_COMMIT , PREPARE_ABORT , COMPLETE_COMMIT , COMPLETE_ABORT .


Transaction marker metrics

The transaction coordinator will expose the number of transactions currently waiting for transaction marker completion. This count represents transactions, not marker entries or partitions. A transaction involving many partitions contributes one to this metric while it is waiting for marker completion.This metric does not include transactions that have already completed marker fan-out and are retrying the final transaction state log append. Such transactions are already represented by the existing final log append retry queue metric.


The transaction coordinator will also meter partition-level retriable transaction marker failures by Kafka error name. This meter is marked once for each partition that returns a retriable marker error in a WriteTxnMarkers  response. Retries caused by a disconnected response are not included in this metric because they are not associated with a partition-level Kafka error. Such retries may be covered by existing network/client metrics.  The error  tag identifies the Kafka error that caused the retry. The error  label uses the Kafka error enum name, such as NOT_ENOUGH_REPLICAS_AFTER_APPEND .


Compatibility, Deprecation, and Migration Plan

This change only adds new metrics. No existing metrics or APIs are deprecated.

This KIP adds transaction state log metrics using Kafka Metrics, consistent with the existing transaction-coordinator-metrics  group. It adds marker-channel metrics to the existing Yammer-based TransactionMarkerChannelManager  metrics group so that they remain colocated with existing marker queue metrics, such as UnknownDestinationQueueSize  and LogAppendRetryQueueSize .   This KIP does not migrate existing Yammer-based TransactionMarkerChannelManager  metrics to Kafka Metrics. Such migration is outside the scope of this KIP and can be addressed separately.


Test Plan

Unit tests will be added or updated to verify the following behavior.

  • Transaction state log append metrics
    • Verify that transaction-state-log-append-latency-ms  is recorded when a transaction state append is submitted and completes successfully.
    • Verify that transaction-state-log-append-latency-ms  is not recorded when the append is not submitted.
    • Verify that the latency metric is tagged with the expected target-state .
    • Verify that transaction-state-log-append-errors-total  is incremented when a submitted transaction state append completes with an error.
    • Verify that the append error metric is tagged with the expected target-state  and error .
    • Verify that no append error metric is recorded for successful appends.
  • Transaction marker metrics
    • Verify that TransactionsWithPendingMarkersCount  reflects the number of transactions waiting for marker completion.
    • Verify that TransactionsWithPendingMarkersCount  increases when a transaction is registered for marker completion.
    • Verify that TransactionsWithPendingMarkersCount  decreases when marker completion is no longer pending.
    • Verify that TransactionsWithPendingMarkersCount  does not include transactions that have moved to the final transaction state log append retry queue.
    • Verify that TransactionMarkerRetriesPerSec  is marked once for each partition-level retriable marker failure.
    • Verify that TransactionMarkerRetriesPerSec  is tagged with the expected error .
    • Verify that non-retriable marker errors do not mark TransactionMarkerRetriesPerSec .
    • Verify that metrics registered by the transaction marker channel manager are removed when the manager shuts down.
    • Verify that transaction marker metrics are cleaned up correctly when a transaction topic partition is unloaded or the coordinator resigns.


Rejected Alternatives

Add transaction-count{state}

A transaction state count metric can be useful as diagnostic context, but it does not directly identify the bottleneck in transaction completion. For example, a high number of PREPARE_COMMIT  or PREPARE_ABORT  transactions may indicate marker backlog, but it does not distinguish marker fan-out delay from final __transaction_state append delay. 


Add exact pending transaction age

An exact current-age gauge would require tracking the oldest pending transaction or scanning pending transactions when queried.  Scanning is undesirable for high-cardinality transaction workloads, and maintaining exact oldest age adds complexity around removals, retries, coordinator failover, and partition migration.


Add per-transactional-id metrics

Per-transactional-id metrics would create unbounded cardinality and are not appropriate for broker metrics.


Add transaction completion latency

This KIP does not add a transaction completion latency metric. Defining this metric consistently across transaction coordinator failover is ambiguous. A transaction may enter the prepare state on one coordinator and be completed by another coordinator after loading the transaction state from __transaction_state .  A coordinator-local latency would be implementable, but it would not represent end-to-end transaction completion time across failover. A prepare-state-duration metric could include failover time, but it would measure a different concept and mix marker processing, coordinator loading, and failover delay.

  • No labels