Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • record-e2e-latency-min [ms]
  • record-e2e-latency-max [ms]
  • record-e2e-latency-p99 [ms]record-e2e-latency-p90 avg [ms] 

These will be exposed on the processor-nodetask-level with the following tags:


In all cases the metrics will be computed at the end of the operation, once the processing has been complete


The min and max task-level INFO metrics have been added in 2.6, and the remaining metrics will ship in the next version

Proposed Changes

Imagine a simple 3-node subtopology with source node O, filter node F, aggregation A, and sink node I. For any record flowing through this with record timestamp t, let tO be the system (wallclock) time when it is sent from the source topic, tA be the time when it is finished being processed by the aggregator node, and tI be the time when it leaves the sink node for the output or repartition topic. The end-to-end latency at operator for a given record is defined as 


This idea was originally discussed but ultimately put to rest as it does address the specific goal set out in this KIP, to report the time for an event to be reflected in the output. This alternative metric, which we call "staleness", has some use as a gauge of the record time when received by an operator, which may have implications for its processing for some operators. However this issue is orthogonal and thus rejected in favor of measuring at the record output.

Reporting mean or median (p50)

Rejected because: