TL;DR
This is an addendum of the parent page on reducing rebalance cost, with a focus on Kafka Streams: as of today rebalancing is a costly operation that we need to optimize to achieve faster and more efficient application starts/restarts/shutdowns, failovers, elasticity.
Goals
- At a high-level, we want to strengthen the operability of Streams applications.
- To achieve faster and more efficient application startups / restarts / shutdowns (notably rolling restarts and rolling upgrades)
- "More efficient" includes less unnecessary network traffic and load on both app instances and on the backing Kafka clusters.
- Very similar to the previous point, to achieve faster and more efficient failovers (e.g. when 1 app instance out of 10 has died and the remaining 9 app instances need to take over work).
- "Very similar" because, from a rebalancing standpoint, there's not much difference between a planned app instance restart and an unplanned failover event, for example.
- To achieve faster and more efficient elasticity (scale in/scale out).
Desired features
We'd like to present the returned value in categories of scenarios. Note that sticky assignment and standby replication would be relevant determining the impact of each scenario.
Also I'm sorting the scenarios by their commonness and user impact (subjective and open for discussion):
- Application start: when multi-instance application is started, multiple rebalances are required to migrate states to newly started instances. Standby-replication will not help.
- Application shutdown: when multi-instance application is shutting down, multiple rebalances are required. Standby-replication only slightly remedy this situation.
- Application scale out: when a new instance is started, one rebalance is executed to shuffle all assignment. Standby-replication will not help.
- Application scale in: when an existing instance gracefully shutdown, once rebalance is executed to shuffle all assignment. Standby-replication will largely help in this situation.
- Application instance bounce (upgrade, config change etc): one instance shut down and then restart, it will trigger two rebalances. Standby-replication will largely help in this situation.
- Application instance failure: one instance failed, and probably a new instance start to take its assignment, it will trigger two rebalances. The different with 3) above is that new instance would not have local cached tasks. Standby-replication will not help.
Proposal
We have two proposals to generally improve the rebalance protocol in Incremental Cooperative Rebalancing: Support and Policies (we consider the "Incremental Imbalance" as a follow-up of "Delayed Imbalance").
1) Simple approach tackles on not involving all the partitions in each assignment as it will incurs committing costs; instead it introduces a partiton revokation field in the protocol such that a second join can be triggered to finally move the assignment.
2) Delayed Imbalance approach takes one step further on the Simple approach, that it defers (by a configured timeout) the second rebalance to really migrate the partitions; note in Simple the second rebalance to migrate the partitions always happen immediately.
There is a third semi-orthogonal proposal dependent on the Simple approach above:
3) Standby Bootstrap approach targeted to reduce new member taking restoration with long latency, by letting the new joining member to be assigned standby tasks only at first, and then when it has bootstraped completed trigger a another join group to move the active task.
I'd like to summarize their values on the above scenarios below compared with what we have today (counting the existing optimizations we have done as of 2017.Q4).
Note again the LOE is my personal estimates:
Approach / LOE | App Start | App Shutdown | App Scale-Up | App Scale-Down | Instance Bounce | Instance Failureover |
---|---|---|---|---|---|---|
Current | MAYBE OK
| MAYBE OK
| BAD
| MAYBE OK With Standby:
BAD Without Standby:
| MAYBE OK
| MAYBE OK With Standby:
BAD Without standby:
|
Simple | MAYBE BETTER
| MAYBE BETTER
| BAD
| BEST With standby:
BETTER Without standby:
| BETER
| MAYBE BETTER With Standby:
BAD Without Standby:
|
Delayed Imbalance | BETTER
| BEST
| BAD
| BEST With standby:
BETTER Without standby:
| BEST
| BETTER With Standby:
BETTER Without standby:
|
Standby Bootstrap | MAYBE BETTER
| MAYBE BETTER
| BEST
| BEST With standby:
BEST Without standby:
| BEST
| BETTER With Standby:
STILL NOT GOOD Without standby:
|
Delayed Imbalance + Standby Bootstrap | BETTER
| BEST
| BEST
| BEST With standby:
BEST Without standby:
| BEST
| BEST With Standby:
BEST Without standby:
|