Table of Contents |
---|
Status
Current state: Under Discussion
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Kafka powers real-time data processing, and kafka failure could cause severe impact to many businesses that depend on it. Among different types of kafka failures, one notable type of failure is deployment caused failure. According to failed components the problem can be classified into
...
For Kafka user services that have a canary environment, Kafka lacks solution to achieve end to end canary isolation between producer and consumer services, thus hard to limit blast radius of a bad producer/consumer service deployment
Non-Goal
Canary Isolation is not hard isolation. There are situations where the system is in sub optimal state, and kafka will prioritize availability over canary isolation.
E.g. If a topic has no canary partition, producers in canary will be able to produce messages to non-canary partitions.
Proposed Changes
Canary kafka broker
Canary broker is a subset of kafka brokers that serves canary traffic. Canary brokers can be identified by Kafka broker metadata pod. Pod is a new broker metadata introduced to identify a subset of brokers. Pod value will be canary-broker to identify canary brokers, while its default value is broker.
...
It’s the consumer service owner’s responsibility to identify canary consumer instances. In consumer rebalance protocol, each consumer instance is identified by a member Id. Member Id has 2 components, clientId + UUID if consumer is a dynamic member or groupInstanceID + UUID if consumer is a static member. Consumer service owners can encode canary into the clientId or groupInstanceId to ensure the consumer leader can identify canary consumer instances and achieve canary isolation.
Public Interfaces
New Configurations
New broker Configs
...
Code Block |
---|
Version 6 JSON schema for a broker is: { "version":6, "host":"localhost", //start of new field “pod”:“broker”, //end of new field } |
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
Test Plan
Describe in few sentences how the KIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.
...