DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Motivation
The CompiledPlan feature [1] was introduced to support Flink version evolution of Flink SQL pipelines.
JSON is used to serializes the optimized execution plan and the planner is able to restore a pipeline from that. The JSON plan is stored explicitly next to the table program and savepoint.
However together with growing of popularity of SQL and Table APIs the plans keep growing. This leads us thinking towards more space and time efficient format like Smile [2].
Smile has the following characteristics:
like JSON but smaller/faster; well supported by Jackson; works particularly well for processing long/big data streams.
Smaller content size for longer streams via String back-references (has ability to refer to previously written [short] String values). Optionally enabled to allow simpler encoder implementations as well [3]
More efficient binary encoding (similar to BSON, CBOR and UBJSON), but an additional feature is optional use of back references for property names and values. Back referencing allows replacing of property names and/or short (64 bytes or less) String values with 1- or 2-byte reference ids [4].
The support of it is already present in Jackson and it wouldn’t be hard to support it in Flink (it will require a new release of flink-shaded).
Moreover there is already a PoC showing that on average it consumes 50-70% less size than json [5].
[1] FLIP-190: Support Version Upgrades for Table API & SQL Programs
[2] https://github.com/FasterXML/smile-format-specification/blob/master/smile-design-goals.md
[3] https://cowtowncoder.medium.com/is-smile-format-splittable-yes-21bc9773a77c
[5] https://github.com/snuyanzin/flink/tree/smile
Public Interfaces
CompiledPlan.asSmileBytes()
PlanReference.fromSmileBytes()
Proposed Changes
We try to keep the changes to existing interfaces minimal
JSON is still the main representation for CompiledPlan
It is human readable
Easy to review in PR diffs
Printable to a console
Existing methods for resources and file read/write will still use the JSON representation.
Smile is an alternative for teams that embed the CompiledPlan feature into their platform / services.
Note: This FLIP focuses on a tradeoff between space and time efficiency. Thus, compression is not built-in by default. It’s up to the caller to compress the given bytes[] further.
Changes in CompiledPlan
public interface CompiledPlan {
byte[] asSmileBytes();
}
Notes:
CompiledPlanwas already prepared to support different formats, this is also whyasJsonStringexists.We add a
asSmileBytes()
Changes in PlanReference
public abstract class PlanReference {
PlanReference fromSmileBytes(byte[] bytes)
}
Notes:
PlanReferencewas already prepared to support different formats, this is also whyfromJsonStringexists.We add a
fromSmileBytes()
Compatibility, Deprecation, and Migration Plan
Test Plan
RestoreTests
The idea is to load plan existing plan in json, serialize it to Smile and load it from smile again, then continue normal test execution.
Rejected Alternatives
BSON: slight disadvantage in space efficiency (BSON has overhead for field names within the serialized data)
CBOR[1]
* (longer) data streams; like log processing, or Hadoop, Kafka, Smile is more efficient to read and write, and takes less space as well
* shorter content like single messages (request/response): size difference negligible; performance likely to be very similar [2]