Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

While several research projects explore disaggregated, embedded key-value stores (like those referenced in [ disaggregated RocksDB[1] and [ RocksDB-Cloud[2]), no widely adopted, open-source solutions exist yet. By carefully weighing usability, extensibility, complexity, and performance as well as the efforts to integrate with the Flink engine, we decided to build a disaggregated state store named ForSt on top of frocksdb. Additionally, we created a unified file system JNI proxy that leverages existing file system implementations in Flink ensuring compatibility with various file system options.

...

Remote Compaction addresses this issue by delegating compaction tasks to a separate work node from the LSM engine itself, as seen in RocksDB-cloud/Rockset [12]. This decouples the sporadic resource demands of compaction from the normal data process, leading to:

...

  1. Heavy Checkpointing Procedure: A considerable amount of files need to be uploaded during checkpointing.
  2. Limited Data Structure Flexibility: Confining local disk data to the SST format restricts potential performance gains from alternative caching structures.
  3. Inaccurate Warm/Cold Distinction: File-level classification of data as warm or cold inaccurately reflects actual access patterns, leading to suboptimal resource allocation.
  4. More Complicated File Management: This architecture indicates that both local disk and DFS play part of the primary storage, hence needs to unify the file management of the local disk and DFS, which is complicated in extreme cases of error handling e.t.c. 

References

[1] Disaggregated RocksDB: https://dl.acm.org/doi/pdf/10.1145/3589772

[2] RocksDB-cloud/Rockset https://github.com/rockset/rocksdb-cloud

Appendix: How to run the PoC

...