Current state: Under Discussion
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
- GetOffsetShell is slow. It uses SimpleConsumer API to get offsets and old Producer API to get topic/partition metadata. It determines a leader broker for each partition and then requests the leader for offsets. In total, it does as many requests to the broker as the number of partitions (plus a request to Zookeeper for metadata).
- User gets offsets for one topic only. User needs to invoke GetOffsetShell for each topic separately. In order to get offsets of all topics, user first has to obtain the list of topics (by means of another tool) and then query each topic.
- GetOffsetShell does not have a convenient shell launcher as other console tools. It can be run only with a generic tool: kafka-run-class.sh.
- GetOffsetShell is inconsistent with other command-line tools in terms of command-line arguments. For example, --broker-list instead of --bootstrap-servers.
This KIP introduces a new command line tool: kafka-get-offsets.sh. The tool provides the following arguments:
- --bootstrap-servers vm1:9092 - Comma-separated list of Kafka servers
- --topics topic1,topic2 - Comma-separated list of topics to query for offsets (if omitted, query all topics)
- --partitions 1,2 - Comma-separated list of topic partitions to query (if omitted, query all partitions of each topic)
- --include-internal-topics - Query also Kafka-internal topics, like consumer offsets (ignore internal topics by default)
- --consumer-property - Pass an arbitrary consumer property to the consumer that actually queries the broker for offsets
For backward compatibility, the kafka-get-offsets.sh tool also accepts the following deprecated arguments from the old GetOffsetShell implementation:
- --broker-list - Same as --bootstrap-servers. When both specified, --bootstrap-servers is used.
- --topic - Same as --topics. When both specified, --topics is used.
- --offsets - Ignored. Always one offset is returned for each partition.
- --max-wait-ms - Ignored. Instead, use --consumer-property and pass request.timeout.ms property.
When user specifies a deprecated argument, the tool displays a warning message.
New implementation kafka-get-offsets.sh uses KafkaConsumer API. It makes at most two requests to the broker:
- To query existing topics and partitions
- To grab all requested offsets.
New implementation correctly handles non-existing topics and partitions asked by user:
kafka-get-offsets.sh --bootstrap-servers vm:9092 --topics AAA,ZZZ --partitions 0,1
AAA:1:Partition not found
ZZZ:0:Topic not found
Now user can get offsets for many topics at once. No need to retrieve the list of existing topics and then query them one by one.
Moreover, now user is able to retrieve offsets for all topics - this is the default when no topics specified.
Compatibility, Deprecation, and Migration Plan
Any client tools depending on deprecated command-line arguments will continue working without changes. User will see on stderr warnings about deprecated arguments and pointers to new arguments.
Deprecated arguments can be removed in next minor Kafka release