Current state: Vote Passed
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
KStream looks a lot like
java.util.stream.Stream; it includes methods such as
map, terminal operations like
to, and so on. One conspicuously absent method is
Most stream operations are expected to be pure functions.
peek, by its nature, is not. Usually mutating external state is discouraged but a number of diagnostic activities are made much easier.
For example, you might want to write:
KStream<String> words = someWordsWeWantToCount(); // since this is what basically every kafka application does, right? :) words.filter((w, v) -> w.startsWith("a")) .peek(w -> metrics.wordsStartingWithAProcessed.increment()) .peek(System.out::println) .groupByKey() .count();
In this particular case, we increment a metric counter (understanding that in failure cases some records may be double-counted due to retried work) and print every word observed out to
System.out. This provides useful information about the processing of the stream as opposed to the contents of the stream, which can be very useful when experimenting with stateful processors or otherwise diagnosing things that went wrong.
Additionally, the JDK is known for a high bar to include methods in the base class library.
Stream#peek made that cut. So that is pretty good evidence that some really smart Java API designers agree that it holds its weight.
A straightforward first pass is on GitHub PR #2493.
No compatibility issues forseen.
peek by its nature may be implemented as a simple transformation of
map. Unfortunately this is the wrong interface (as you need to return the same value passed in for the functional interface contract) and loses some semantic information that could be used for a potential future optimizer (that the stream is not modified by the operation).