Definition
Type that determines how data will be laid out as file and stored, inside a def~table.
Following table summarizes the trade-offs between these two table types
Trade-off | def~copy-on-write (COW) | def~merge-on-read (MOR) |
---|---|---|
Data Latency | Higher | Lower |
Update cost (I/O) | Higher (rewrite entire def~table parquet) | Lower (append to `delta log`) |
Write Amplification | Higher | Lower (depending on compaction strategy to the def~table parquet) |
Query/Read Amplification | Lower/Zero | Higher (merging base and deltas on the fly) |
5 Comments
SemanticBeeng
Vinoth Chandar : would this not be named `commit type` instead
I'd call `HDFS` and `S3` "storage type`s instead.
fyi: Balaji Varadarajan, Nishith Agarwal
Vinoth Chandar
Commit is just one type of action done on a dataset. Not sure if thats a good way describe it..
SemanticBeeng
I do not insist but both COW and MOR def~table-types are about the `commit def~instant-action` (only).
Vinoth Chandar
MOR does delta commits. Not commits actually.. Only cow and compaction do commits..
Vinoth Chandar
SemanticBeeng you are right.. better to call this dataset-type instead of storage type.. Changing this