DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
This phase builds upon the Blue/Green Deployments for Flink on Kubernetes: Phase 1 (basic) by adding the capability to coordinate, in a more granular and extensible way, the transition between the 2 jobs to achieve Exactly-Once semantics, by aiming to eliminate (or at least minimize) record duplication.
A Blue/Green deployment architecture, or Active/StandBy, can be achieved by having 2 identical pipelines running side-by-side with only one (the Active) producing or outputting records at all times.
Moreover, since the pipelines are the ones “handling” each and every record, we explore the idea of empowering the pipeline to decide, at the record level, what data goes through and what doesn’t (by means of a “Gate” component).
Public Interfaces
On top of the new CRD introduced by Phase 1, this effort includes a new GateProcessFunction for Flink pipelines that gives jobs the capability of coordinating the transition with the Blue/Green Controller in an extensible way.
public class FlinkBlueGreenDeploymentSpec {
private CoordinationConfiguration coordinationConfiguration;
private FlinkDeploymentTemplateSpec template;
}Note: the new proposed CoordinationConfiguration (TBD) can be used to configure different aspects of the Coordination sequence
GateProcessFunction class hierarchy
Class/module organization
- Base classes → org.apache.flink.kubernetes.operator.api
- Custom extended classes → org.apache.flink.kubernetes.operator
- * This has sparked some discussion. Another suggestion is to place them under the “examples” project.
- ** Alternative: Since this solution is strictly for Kubernetes, it makes sense to create a new “bluegreen” module here in the Kubernetes Operator code base. This way it can still be managed in this code base while having a dedicated artifact.
Watermark based ConfigMap
{
“active-deployment-type”: [BLUE|GREEN],
“deployment-deletion-delay-ms”: [Long],
“stage”: [CLEAR_TO_TEARDOWN|FAILING|INITIALIZING|RUNNING|TRANSITIONING],
“watermark-toggle-value”: [Long],
“watermark-stage”: [WATERMARK_NOT_SET|WAITING_FOR_WATERMARK|WATERMARK_SET],
}
Proposed Changes
This effort will add our FlinkBlueGreenDeploymentController the capability of coordinating with the Blue/Green jobs for a more controlled and accurate transition sequence.
- At all times the controller will “know” which deployment type is/will become Active and from which point in time. This implies an initial Event Watermark (WatermarkGateProcessFunction) based implementation but with an extensible architecture to easily support other use cases.
- A ConfigMap will be used to both contain the deployment parameters as well as act as a communication conduit between the Controller and the two corresponding Jobs.
- All communication between the Controller and Jobs’ GateProcessFunction operators takes place via the fabric8Informer API by monitoring this centralized ConfigMap. This way each job will know if it’s meant to be Active or Standby, at record level.
Record Level Transition
The main idea is to have a global, per deployment, Watermark with a future value that the jobs can use as a reference. Since Watermarks are specific to each use case and susceptible to traffic and configuration variations, its value needs to be defined by the user.
As soon as a Blue/Green transition is signaled, and the Blue deployment is started from Green’s latest Checkpoint/Savepoint, the Gate component on each pipeline will evaluate each record’s Watermark vs. the global future value and perform the following actions:
- The Green gate will close and stop records with a Watermark greater than the global
- The Blue gate will open and allow records with a Watermark greater than the global
Notes:
- Having an independent Gate Process Function gives the user the freedom of placing this “gating” logic where it best fits their use cases, for example before/after state or Async I/O, to potentially avoid reprocessing/duplicating output.
- Even if the records are not flowing in exactly the same way in both pipelines, they’ll eventually make it to the sink as long as both pipelines remain active for the duration of this “crossover”. The final Blue to Green “promotion” won’t happen until after both pipelines’ records have crossed the global Watermark.
- This scenario assumes we’re using a source that can be read by both pipelines simultaneously.
Notes
- Only targeting Application jobs (no Session Jobs)
- The new Controller uses JOSDK Event Sources for CR changes/notifications as well as the stock UpdateControl.patchStatus mechanism.
- TODOs:
- A pipeline using a Gate is deployed as a BlueGreen application.
- A pipeline deployed as a BlueGreen application contains a Gate.
- Compare both deployment’s watermark before setting the transition/toggle value
- If the “Blue” deployment fails during the transition, the controller currently throws an exception and is left in a bad state. The status should probably be patched to eliminate Blue from it, thoughts? This can be reproduced by simulating a bad checkpoint location during the initialization of the Blue deployment.
- Validate that:
Controller Reconciliation Logic
Event Sequence for a Blue/Green with Coordination
- NOTES:
- For this initial implementation the concept of Watermark here refers to Event Time Watermarks, if the pipeline doesn't use them it'll fall back to Processing Time Watermarks.
- TBD: if the existing deployment is not stable the controller will not attempt to deploy the other one, instead it will treat the existing unstable job as a single one and will try to patch/fix it. Thoughts?
Open Questions
- Should starting from Checkpoints be optional? Starting a 2nd pipeline from a further away point will make “catching up” and hence reaching the handover point more difficult. Moreover, should we allow custom lastSavepointPath values? (similar consequences).
- Will changing the parallelism have unexpected consequences?
- Are there any scenarios where watermark-based cutover could cause loss of records?
- How to handle situations where sources cannot be consumed simultaneously?
- What happens when idleness is configured? Watermarks will get ignored from these “slow” subtasks and advance, could records from the ignored subtasks eventually be lost?
Compatibility, Deprecation, and Migration Plan
This is a new feature, no migration necessary
Test Plan
Describe in few sentences how the FLIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.



