This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • KIP-308: GetOffsetShell: new KafkaConsumer API, support for multiple topics, minimize the number of requests to server
Skip to end of metadata
Go to start of metadata

Status

Current state: Under Discussion

Discussion thread: here

JIRA: here

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

  1. GetOffsetShell is slow. It uses SimpleConsumer API to get offsets and old Producer API to get topic/partition metadata. It determines a leader broker for each partition and then requests the leader for offsets. In total, it does as many requests to the broker as the number of partitions (plus a request to Zookeeper for metadata).
  2. User gets offsets for one topic only. User needs to invoke GetOffsetShell for each topic separately. In order to get offsets of all topics, user first has to obtain the list of topics (by means of another tool) and then query each topic.
  3. GetOffsetShell does not have a convenient shell launcher as other console tools. It can be run only with a generic tool: kafka-run-class.sh.
  4. GetOffsetShell is inconsistent with other command-line tools in terms of command-line arguments. For example, --broker-list instead of --bootstrap-servers.

Public Interfaces

This KIP introduces a new command line tool: kafka-get-offsets.sh. The tool provides the following arguments:

  • --bootstrap-servers vm1:9092 - Comma-separated list of Kafka servers
  • --topics topic1,topic2 - Comma-separated list of topics to query for offsets (if omitted, query all topics)
  • --partitions 1,2 - Comma-separated list of topic partitions to query (if omitted, query all partitions of each topic)
  • --include-internal-topics - Query also Kafka-internal topics, like consumer offsets (ignore internal topics by default)
  • --consumer-property - Pass an arbitrary consumer property to the consumer that actually queries the broker for offsets

For backward compatibility, the kafka-get-offsets.sh tool also accepts the following deprecated arguments from the old GetOffsetShell implementation:

  • --broker-list - Same as --bootstrap-servers. When both specified, --bootstrap-servers is used.
  • --topic - Same as --topics. When both specified, --topics is used.
  • --offsets - Ignored. Always one offset is returned for each partition.
  • --max-wait-ms - Ignored. Instead, use --consumer-property and pass request.timeout.ms property.

When user specifies a deprecated argument, the tool displays a warning message.

Proposed Changes

New implementation kafka-get-offsets.sh uses KafkaConsumer API. It makes at most two requests to the broker:

  1. To query existing topics and partitions
  2. To grab all requested offsets.

New implementation correctly handles non-existing topics and partitions asked by user:

kafka-get-offsets.sh --bootstrap-servers vm:9092 --topics AAA,ZZZ --partitions 0,1

AAA:0:7

AAA:1:Partition not found

ZZZ:0:Topic not found

Now user can get offsets for many topics at once. No need to retrieve the list of existing topics and then query them one by one.

Moreover, now user is able to retrieve offsets for all topics - this is the default when no topics specified.

Compatibility, Deprecation, and Migration Plan

Any client tools depending on deprecated command-line arguments will continue working without changes. User will see on stderr warnings about deprecated arguments and pointers to new arguments.

Deprecated arguments can be removed in next minor Kafka release

Rejected Alternatives

None

  • No labels