As a system Kafka has a fairly simple description of correctness: all messages sent to a topic should be delivered to each consumer group consuming that topic. The goal of this facility is to continuously monitor that this is occurring. A facility similar to this exists at LinkedIn.
Clients that support auditing do so by periodically reporting a summary of the messages they have sent or received. These messages are then aggregated to check that all messages sent were received. Auditing is an optional facility--clients that don't support it simple will not be able to use the audit monitoring tool.
Participants report by offset interval. A simple scheme would be to have all participants report every 10k offsets. The downside of this is that some topics may take a very long time to receive 10000 messages per partition
The format of this message is json in the form
General Cluster UI
There are several uses for a web UI related to Kafka, we may as well fold these into a single tool. We did a code dump of the current audit tool as part of KAFKA-260, but if we are going to expand the functionality it might make sense to clean up that code, and move it into Scala.
In addition to audit monitoring the following would be useful:
- Cluster information: Which machines have which topics and partitions? Which are leaders?
- Administrative commands: Add topic, move partitions, etc.
A facility I would hesitate to add would be general monitoring and timeseries graphs as most people have a way to do that and it is perhaps better to integrate with those.