Current state: Under Discussion
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Since the early days of Apache Kafka the way to obtain information about a running cluster was to query zookeeper state. One of the pieces of information available from zookeeper is the start time of each kafka broker, available in the
timestamp field. This information is useful when building automation that provides functionality such as rolling restarts, to determine whether a broker has successfully restarted.
We propose adding an additional field
startTimeMs to the
Node class that is returned in the
DescribeClusterResult return value from
AdminClient.describeCluster(). This would be a completely backwards compatible change and a logical evolution of the interface requiring no changes to existing code using
AdminClient. A new method
public long startTimeMs();
would be introduced to the
Node class returning the start time, expressed in non-leap milliseconds since the start of the Unix Epoch, of the corresponding broker. If a client with this feature implemented connects to a cluster that doesn’t yet have this functionality implemented, the special value
0L would be returned.
The current unix timestamp is currently being written on creation to the
BrokerIdZNode in zookeeper but is not currently read back by kafka code. We propose to make the following changes to propagate this piece of information.
To complicate things a bit, the broker information that gets returned by the
describeCluster() API call is read from the metadata cache on the broker responding to the request. To be able to provide the startTimeMs value, this information needs to be propagated from zookeeper to the metadata cache. This means that the
UpdateMetadataBroker message that is part of the
UpdateMetadataRequest message needs to be updated to include the timestamp as well as the
Broker class, so that a version containing startTimeMs is cached on all brokers.
MetadataResponseBroker message part of the
MetadataResponse message also needs to be updated to hold a timestamp field, as well as the
Node class that is exposed by the
AdminClient. An implication of these changes would be that the versions of the affected protocol message pairs affected would be incremented.
This is a poposed update to the MetadataResponse message that introduces the startTimeMs field
The corresponding change to the UpdateMetadataRequest message
Compatibility, Deprecation, and Migration Plan
This is a completely backwards compatible extension of the existing API. The only compatibility consideration that needs to be taken into account is that a client with this change included connecting to an older cluster needs to handle this condition according to the public interface description above, with the
startTimeMs() accessor returning the special value
It is certainly possible to use mechanisms outside of Kafka to determine when a broker was started, using for example the operating system process table. However, such solutions would be very specific to their execution environment and it would take a lot of work to have them perform similarly well as the solution outlined above.