Child pages
  • Kafka Improvement Proposals
120 more child pages
Skip to end of metadata
Go to start of metadata

This page describes a proposed Kafka Improvement Proposal (KIP) process for proposing a major change to Kafka.

To create your own KIP, click on "Create" on the header and choose "KIP-Template" other than "Blank page".

Purpose

We want to make Kafka a core architectural component for users. We also support a large number of integrations with other tools, systems, and clients. Keeping this kind of usage health requires a high level of compatibility between releases — core architectural elements can't break compatibility or shift functionality from release to release. As a result each new major feature or public api has to be done in a way that we can stick with it going forward.

This means when making this kind of change we need to think through what we are doing as best we can prior to release. And as we go forward we need to stick to our decisions as much as possible. All technical decisions have pros and cons so it is important we capture the thought process that lead to a decision or design to avoid flip-flopping needlessly.

Hopefully we can make these proportional in effort to their magnitude — small changes should just need a couple brief paragraphs, whereas large changes need detailed design discussions.

This process also isn't meant to discourage incompatible changes — proposing an incompatible change is totally legitimate. Sometimes we will have made a mistake and the best path forward is a clean break that cleans things up and gives us a good foundation going forward. Rather this is intended to avoid accidentally introducing half thought-out interfaces and protocols that cause needless heartburn when changed. Likewise the definition of "compatible" is itself squishy: small details like which errors are thrown when are clearly part of the contract but may need to change in some circumstances, likewise performance isn't part of the public contract but dramatic changes may break use cases. So we just need to use good judgement about how big the impact of an incompatibility will be and how big the payoff is.

What is considered a "major change" that needs a KIP?

Any of the following should be considered a major change:

  • Any major new feature, subsystem, or piece of functionality
  • Any change that impacts the public interfaces of the project

What are the "public interfaces" of the project?

All of the following are public interfaces that people build around:

  • Binary log format
  • The network protocol and api behavior
  • Any class in the public packages under clients
    • org/apache/kafka/common/serialization

    • org/apache/kafka/common

    • org/apache/kafka/common/errors

    • org/apache/kafka/clients/producer

    • org/apache/kafka/clients/consumer (eventually, once stable)

  • Configuration, especially client configuration
  • Monitoring
  • Command line tools and arguments

Not all compatibility commitments are the same. We need to spend significantly more time on log format and protocol as these break code in lots of clients, cause downtime releases, etc. Public apis are next as they cause people to rebuild code and lead to compatibility issues in large multi-dependency projects (which end up requiring multiple incompatible versions). Configuration, monitoring, and command line tools can be faster and looser — changes here will break monitoring dashboards and require a bit of care during upgrades but aren't a huge burden.

For the most part monitoring, command line tool changes, and configs are added with new features so these can be done with a single KIP.

What should be included in a KIP?

A KIP should contain the following sections:

  • Motivation: describe the problem to be solved
  • Proposed Change: describe the new thing you want to do. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences, depending on the scope of the change.
  • New or Changed Public Interfaces: impact to any of the "compatibility commitments" described above. We want to call these out in particular so everyone thinks about them.
  • Migration Plan and Compatibility: if this feature requires additional support for a no-downtime upgrade describe how that will work
  • Rejected Alternatives: What are the other alternatives you considered and why are they worse? The goal of this section is to help people understand why this is the best solution now, and also to prevent churn in the future when old alternatives are reconsidered.

Who should initiate the KIP?

Anyone can initiate a KIP but you shouldn't do it unless you have an intention of getting the work done to implement it (otherwise it is silly).

Process

Here is the process for making a KIP:

  1. Create a page which is a child of this one. Take the next available KIP number and give your proposal a descriptive heading. e.g. "KIP 42: Allow Infinite Retention With Bounded Disk Usage".
  2. Fill in the sections as described above
  3. Start a [DISCUSS] thread on the Apache mailing list. Please ensure that the subject of the thread is of the format [DISCUSS] KIP-{your KIP number} {your KIP heading} The discussion should happen on the mailing list not on the wiki since the wiki comment system doesn't work well for larger discussions. In the process of the discussion you may update the proposal. You should let people know the changes you are making. When you feel you have a finalized proposal 
  4. Once the proposal is finalized call a [VOTE] to have the proposal adopted. These proposals are more serious than code changes and more serious even than release votes. The criteria for acceptance is lazy majority.
  5. Please update the KIP wiki page, and the index below, to reflect the current stage of the KIP after a vote. This acts as the permanent record indicating the result of the KIP (e.g., Accepted or Rejected). Also report the result of the KIP vote to the voting thread on the mailing list so the conclusion is clear.

KIP round-up

Next KIP Number: 124

Use this number as the identifier for your KIP and increment this value.

Adopted KIPs

KIP Release
KIP-1 - Remove support of request.required.acks 0.9.0.0
KIP-2 - Refactor brokers to allow listening on multiple ports and IPs 0.9.0.0
KIP-3 - Mirror Maker Enhancement 0.9.0.0
KIP-4 - Command line and centralized administrative operations 0.9.0.0, 0.10.0.0, 0.10.1.0, (WIP)
KIP-4 - Metadata Protocol Changes 0.10.0.0
KIP-8 - Add a flush method to the producer API 0.9.0.0
KIP-11 - Kafka Authorizer design 0.9.0.0
KIP-12 - Kafka Sasl/Kerberos and SSL implementation 0.9.0.0
KIP-13 - Quota Design 0.9.0.0
KIP-15 - Add a close method with a timeout in the producer 0.9.0.0
KIP-16 - Automated Replica Lag Tuning 0.9.0.0
KIP-19 - Add a request timeout to NetworkClient 0.9.0.0
KIP-20 Enable log preallocate to improve consume performance under windows and some old Linux file system 0.9.0.0
KIP-21 - Dynamic Configuration 0.9.0.0, (WIP)
KIP-22 - Expose a Partitioner interface in the new producer 0.9.0.0
KIP-25 - System test improvements 0.9.0.0
KIP-26 - Add Kafka Connect framework for data import/export 0.9.0.0
KIP-28 - Add a processor client 0.10.0.0
KIP-31 - Move to relative offsets in compressed message sets 0.10.0.0
KIP-32 - Add timestamps to Kafka message 0.10.0.0
KIP-33 - Add a time based log index 0.10.1.0
KIP-35 - Retrieving protocol version 0.10.0.0
KIP-36 - Rack aware replica assignment 0.10.0.0
KIP-38: ZooKeeper Authentication 0.9.0.0
KIP-40: ListGroups and DescribeGroup 0.9.0.0
KIP-41: Consumer Max Records 0.10.0.0
KIP-42: Add Producer and Consumer Interceptors 0.10.0.0
KIP-43: Kafka SASL enhancements 0.10.0.0
KIP-45 - Standardize all client sequence interaction on j.u.Collection. 0.10.0.0
KIP-50 - Move Authorizer to o.a.k.common package 0.10.1.0
KIP-51 - List Connectors REST API 0.10.0.0
KIP-52: Connector Control APIs 0.10.0.0
KIP-54: Sticky Partition Assignment Strategy 0.10.3.0 (WIP)
KIP-55: Secure Quotas for Authenticated Users 0.10.1.0
KIP-56: Allow cross origin HTTP requests on all HTTP methods 0.10.0.0
KIP-57 - Interoperable LZ4 Framing 0.10.0.0
KIP-58 - Make Log Compaction Point Configurable 0.10.1.0
KIP-60 - Make Java client classloading more flexible 0.10.1.0
KIP-62: Allow consumer to send heartbeats from a background thread 0.10.1.0
KIP-63: Unify store and downstream caching in streams 0.10.1.0
KIP-65: Expose timestamps to Connect 0.10.1.0
KIP-66: Single Message Transforms for Kafka Connect 0.10.2.0
KIP-67: Queryable state for Kafka Streams 0.10.1.0
KIP-70: Revise Partition Assignment Semantics on New Consumer's Subscription Change 0.10.1.0
KIP-71: Enable log compaction and deletion to co-exist 0.10.1.0
KIP-72: Allow putting a bound on memory consumed by Incoming request  0.10.3.0 (WIP)
KIP-73: Replication Quotas 0.10.1.0
KIP-74: Add Fetch Response Size Limit in Bytes 0.10.1.0
KIP-75 - Add per-connector Converters 0.10.1.0
KIP-78: Cluster Id 0.10.1.0
KIP-79 - ListOffsetRequest/ListOffsetResponse v1 and add timestamp search methods to the new consumer 0.10.1.0
KIP-77: Improve Kafka Streams Join Semantics 0.10.2.0
KIP-84: Support SASL SCRAM mechanisms 0.10.2.0
KIP-85: Dynamic JAAS configuration for Kafka clients 0.10.2.0
KIP-88: OffsetFetch Protocol Update 0.10.2.0
KIP-89: Allow sink connectors to decouple flush and offset commit 0.10.2.0
KIP-90 - Remove zkClient dependency from Streams 0.10.2.0
KIP-92 - Add per partition lag metrics to KafkaConsumer 0.10.2.0
KIP-93: Improve invalid timestamp handling in Kafka Streams 0.10.2.0
KIP-94 Session Windows 0.10.2.0
KIP-96 - Add per partition metrics for in-sync and assigned replica count 0.10.2.0
KIP-97: Improved Kafka Client RPC Compatibility Policy 0.10.2.0
KIP-99: Add Global Tables to Kafka Streams 0.10.2.0
KIP-100 - Relax Type constraints in Kafka Streams API 0.10.2.0
KIP-101 - Alter Replication Protocol to use Leader Epoch rather than High Watermark for Truncation 0.11.0.0 (WIP)
KIP-102 - Add close with timeout for consumers 0.10.2.0
KIP-103: Separation of Internal and External traffic 0.10.2.0
KIP-104: Granular Sensors for Streams  0.10.2.0
KIP-105: Addition of Record Level for Sensors 0.10.2.0

KIP-106 - Change Default unclean.leader.election.enabled from True to False

 0.11.0.0 (WIP)
KIP-107: Add purgeDataBefore() API in AdminClient 0.11.0.0 (WIP)
KIP-108: Create Topic Policy 0.10.2.0
KIP-109: Old Consumer Deprecation 0.10.3.0 (WIP)
KIP-115: Enforce offsets.topic.replication.factor upon __consumer_offsets auto topic creation 0.10.3.0
KIP-121: Add KStream peek method 0.10.3.0
KIP-48 Delegation token support for Kafka 0.10.3.0 (WIP)

KIPs under discussion

Dormant/inactive KIPs

 

Discarded KIPs

KIP Discussion Recordings

Date (link to recording)Summary
2017-01-07
  • KIP-112 - Handle disk failure for JBOD: We discussed whether we need to support JBOD directly in Kafka or just rely on the 1 disk per broker model. The general consensus is that direct JBOD support in Kafka is needed. There is some concern on the complexity added to Kafka. So, we have to be careful with the implementation details. We discussed how directory failure should be detected, where the failure state is kept, and whether the state should be reset on broker restart. There is a bit confusing on what's written in the wiki. Dong is going to clarify the proposal based on the feedback and we will follow up on the details in the mailing list.
2016-10-19
  • KIP-82 - add record header: We agreed that there are use cases for third-party vendors building tools around Kafka. We haven't reached the conclusion whether the added complexity justifies the use cases. We will follow up on the mailing list with use cases, container format people have been using, and details on the proposal.
2016-09-13
  • KIP-54 (Sticky Partition Assignment): aims to minimise partition movement so that resource reinitialisation (e.g. caches) is minimised. It is partially sticky and partially fair. Some concerns around the fact that user code for partitionsRevoked and partitionsAssigned would have to be changed to work correctly with this assignment strategy. Good: more complex usage of an assigner that takes advantage of the user data field. Vahid will start the vote.

  • KIP-72 (Allow Sizing Incoming Request Queue in Bytes): large requests can kill the broker, no control over how much memory is allocated. Client quotas don't help as damage may already have been done by the time they kick in. There was a discussion on whether it was worth it to avoid the immediate return from select when there was no memory available in the pool. Radai will update the KIP to describe this aspect in more detail as well as the config validation that is performed.

  • KIP-79 (ListOffsetRequest/ListOffsetResponse v1 and add timestamp search methods to the new consumer): we discussed the option of passing multiple timestamps for the same partition in the same request. Becket thinks it's a rare use case and not worth supporting. Gwen said that it would be nice to have, but not essential. We talked about validation of duplicate topics. Becket will check the approach taken by the create topics request and evaluate if it can be adopted here too. PR will be available today and Jason will evaluate if it's feasible to include it in the next release once it's available.

2016-08-30
  • KIP48 (delegation tokens): Harsha will update the wiki with more details on how to use delegation tokens and how to configure it.
  • KIP-78 (cluster id): There was discussion on adding human readable tags later. No major concerns.
2016-08-23
  • time-based release: No one seems to have objections. Ismael will follow up with a release wiki.
  • KIP-4: We discussed having separate ACL requests of add and delete. No one seems to object to it. We discussed the admin client. Grant will send a PR. We discussed how KStream can use the ACL api.  It seems that we will need some kind of regex or namespace support in ACL to make the authorization convenient in KStream.
  • KIP-50: There is some discussion for further changes in the PR. Ashish will reply to the KIP email thread with the recommended changes. Ashish/Grant plan to look into whether it's possible to make the authorizer api change backward compatible. However, it seems that people are in general ok with a non-compatible api change.
  • KIP-74: No objections on the current proposal.
  • Java 7 support timeline: The consensus is to defer dropping the Java 7 support until the next major release (which will be next year). Ismael will follow up on the email thread.
  • KIP-48 delegation token : Ashish will ping Harsh to see if this is still active.
  • Some of the KIPs have been idle. Grant will send a proposal on tagging them properly (e.g., blocked, inactive, no resource, etc).
2016-05-24
  • KIP-58 - Make Log Compaction Point Configurable: We want to start with just a time-based configuration since there is no good usage for byte-based or message-based configuration. Eric will change the KIP and start the vote.
  • KIP-4 - Admin api: Grant will pick up the work. Initially, he plans to route the write requests from the admin clients to the controller directly to avoid having the broker forward the requests to the controller.
  • KIP-48 - Delegation tokens: Two of the remaining issues are (1) how to store the delegation tokens and (2) how token expiration works. Since Parth wasn't able to attend the meeting. We will follow up in the mailing list.
2016-04-05
  • KIP-4: There is a slight debate on the metadata request schema, as well as the internal ZK based implementation, which we will wait for Jun to comment on the mailing list thread.
  • KIP-52: We decided to start a voting process for this.
  • KIP-35: Decided on renaming ApiVersionQuery api to ApiVersion. Consensus on using the api in java client to only check for availability of current versions. ApiVersion api's versions will not be deprecated. Update KIP-35 wiki will be updated with latest info and vote thread will be initiated.
2016-03-15
  • KIP-33 - Add a time based log index to Kafka: We decided NOT to include this in 0.10.0 since the changes may have performance risks.
  • KIP-45 - Standardize all client sequence interaction on j.u.Collection: There is no consensus in the discussion. We will just put it to vote.
  • KIP-35 - Retrieving protocol version: This gets the longest discussion. There is still no consensus. Magnus thinks the current proposal of maintaining a global protocol version won't work and will try to submit a new proposal.
  • KIP-43 - Kafka SASL enhancements: Rajini will modify the KIP to only support native SASL mechanisms and leave the changes to Login and CallbackHandler to KIP-44 instead.
2016-02-23
  • KIP-33 and KIP-47: No issues. Will start the voting thread.
  • KIP-43: We discussed whether there is a need to support multiple SASL mechanisms at the same time and what's the best way to implement this. Will discuss this in more details in the email thread.
  • KIP-4: Grant gave a comprehensive summary of the current state. We have gaps on how to make the admin request block on the broker, how to integrate admin requests with ACL (especially with respect to client config changes for throttling and ACL changes), how to do the alter topic request properly. Grant will update the KIP with an interim plan and a long term plan.
  • KIP-43: We briefly discussed on to support multiple sasl mechanisms on the broker. Harsha will follow up with more details on the email thread.
  • Everyone seems to be in favor of making the next major release 0.10.0, instead of 0.9.1.
2016-01-26
  • KIP-42: We agreed to leave the broker side interceptor for another KIP. On the client side, people favor the 2nd option in Anna's proposal. Anna will update the wiki accordingly.
  • KIP-43: We discussed whether there is a need to support multiple SASL mechanisms at the same time and what's the best way to implement this. Will discuss this in more details in the email thread.
  • Jiangjie brought up an issue related to KIP-32 (adding timestamp field in the message). The issue is that currently there is no convenient way for the consumer to tell whether the timestamp in a message is the create time or the server time. He and Guozhang propose to use a bit in the message attribute to do that. Jiangjie will describe the proposal in the email thread.
2016-01-12
  • KIP-41: Discussed whether the issue of long processing time between poll calls is a common issue and whether we should revisit the poll api. Also discussed whether the number of records returned in poll calls can be made more dynamic. In the end, we feel that just adding a config that controls the number records returned in poll() is the simplest approach at this moment.
  • KIP-36: Need to look into how to change the broker JSON representation in ZK w/o breaking rolling upgrades. Otherwise, ready for voting.
2015-10-20
  • KIP-38: No concerns with this KIP. Flavio will initiate the voting on this.
  • KIP-37: There are questions on how ACL, configurations, etc will work, and whether we should support "move" or not. We will discuss the details more in the mailing list.
  • KIP-32/KIP-33: Jiangjie raised some concerns on the approach that Jay proposed. Guozhang and Jay will follow up on the mailing list.
2015-10-13
  • 0.9.0 release: We discussed if KAFKA-2397 should be a blocker in 0.9.0. Jason and Guozhang will follow up on the jira.
  • KIP-32 and KIP-33: We discussed Jay's alternative proposal of just keeping CreateTime in the message and having a config to control how far off the CreateTime can be from the broker time. We will think a bit more on this and Jiangjie will update the KIP wiki.
  • KIP-36: We discussed an alternative approach of introducing a new broker property to designate the rack. It's simpler and potentially can work in the case when the broker to rack mapping is maintaining externally. We need to make sure that we have an upgrade plan for this change. Allen will update the KIP wiki
2015-10-06
  • We only had the time to go through KIP-35. The consensus is that we will add a BrokerProtocolRequest that returns the supported versions for every type of requests. It's up to the client to decide how to use this. Magnus will update the KIP wiki with more details.
2015-09-22
  • KIP-31: Need to figure out how to evolve inter.broker.protocol.version with multiple protocol changes within the same release, mostly for people who are deploying from trunk. Becket will update the wiki.
  • KIP-32/KIP-33: Having both CreateTime and LogAppendTime per message adds significant overtime. There are a couple of possibilities to improve this. Becket will follow up on this.
  • LinkedIn has been testing SSL in MirrorMaker (SSL is only enabled in the producer). So far, MirrorMaker can keep up with the load. LinkedIn folks will share some of the performance results.
2015-09-14
  • KIP-28: Discussed the improved proposal including 2 layers of API (the higher layer is for streaming DSL), and stream time vs processor time. Ready for review.
  • KIP-31, KIP-32: (1) Discussed whether the timestamp should be from the client or the broker. (2) Discussed the migration path and whether this requires all consumers to upgrade before the new message format can be used. (3) Since this is too big a change, it will NOT be included in 0.9.0 release. Becket will update the wiki.
2015-08-18
  • client-side assignment strategy: We discussed concerns about rebalancing time due to metadata inconsistency, especially when lots of topics are subscribed. Will discuss a bit more on the mailing list. 
  • CopyCat data api: The discussions are in KAFKA-2367 for people who are interested.
  • 0.8.2.2: We want to make this a low risk bug fix release since 0.8.3 is coming. So, will only include a small number of critical and small fixes.
  • 0.8.3: The main features will be security and the new consumer. We will be cutting a release branch when the major pieces for these new features have been committed.
2015-08-11
  • No labels