Definition

Type that determines how data will be laid out as file and stored, inside a def~table.



Following table summarizes the trade-offs between these two table types

Trade-offdef~copy-on-write (COW)def~merge-on-read (MOR)
Data LatencyHigherLower
Update cost (I/O)Higher (rewrite entire def~table parquet)Lower (append to `delta log`)
Write AmplificationHigherLower (depending on compaction strategy to the def~table parquet)
Query/Read AmplificationLower/ZeroHigher (merging base and deltas on the fly)

Related concepts

  1. def~commit
  2. Commit List
  3. def~merge-on-read (MOR)
  4. def~copy-on-write (COW)
  5. def~timeline

Status (draft)



  • No labels

5 Comments

  1. Vinoth Chandar : would this not be named `commit type` instead (question)

    I'd call `HDFS` and `S3` "storage type`s instead.

    fyi: Balaji VaradarajanNishith Agarwal


    1. Commit is just one type of action done on a dataset. Not sure if thats a good way describe it.. 


      1. I do not insist but both COW and MOR def~table-types are about the `commit def~instant-action` (only).

        1. MOR does delta commits. Not commits actually.. Only cow and compaction do commits.. 

  2. SemanticBeeng you are right.. better to call this dataset-type instead of storage type.. Changing this