This is based on the several discussions on user@ and dev@ mailing list. The latest discussion link is here.

Thanks all for participating on discussion and special thanks to Abhishek Agarwal to sort out wishlists for the new metrics feature.

Why I represent these as wishlist is that I don’t think it’s not the hard requirements for new metrics. We can drop some of them if new metrics feature can’t handle them by design. New metrics feature should consider resolving limitations of current metrics feature first.

1. Aggregation at component level (Average, Sum etc)
2. Blacklist/whitelist
3. Allow only numbers for values
4. Efficient routing of built-in metrics to UI (current they get tagged
along with executor heartbeat which puts pressure on zookeeper)
5. Worker/JVM level metrics which are not owned by a particular component
6. Percentiles for latency metrics such as p99, p95 etc
7. Aggregation at stream level, and machine level
8. way to subscribe cluster metrics
9. counter stats as non-sampled if it doesn't hurt performance
10. more metrics like serialization/deserialization latency, queue status
11. Dynamically turning on/off specific metrics

Please note that some of them are addressed for current metrics feature. 

  • No labels