Current state: Under Discussion
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
The documentation says that ReplicationBytes(Out|In)PerSec metrics report "byte out/in (to the other brokers) rate per topic". However, according to the code, it looks like this statement is misleading , and these metrics actually report only per broker metrics. This mismatch between the documentation and actual implementation might cause problems for users who rely on these metrics. For example, as can be seen here Cruise Control relies on these metrics to build a topic workload model. (However, it looks like CC authors are aware of the fact that ReplicationBytes(Out|In)PerSec do not report per topic metrics yet).
Thus, I'm proposing to align the implementation with documentation and make ReplicationBytes(Out|In)PerSec report per topic metrics.
Alternatively, we could align documentation with the implementation and say that ReplicationBytes(Out|In)PerSec report only per broker metrics. I've actually already opened a PR for that. (I'll close the PR if this KIP is accepted)
MBeans will change as shown below:
Extend the current implementation of ReplicationBytes(Out|In)PerSec to include per-topic metrics.
Compatibility, Deprecation, and Migration Plan
Since the proposed changes won't remove any existing metrics there shouldn't be any compatibility issues.
Newly added metrics can be tested using a JMX tool. Also, I think it's possible to implement integration tests (I'm not sure about this but I'm ready to look into it).
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.