Discussion thread | https://lists.apache.org/thread/7qjzbcfzdshqb3h7ft31v9o3x43t8k6r |
---|---|
Vote thread | |
ISSUE | |
Release | TBD |
Motivation
Position delete is a solution to implement the Merge-On-Read (MOR) structure, which has been adopted by other formats such as Iceberg[1] and Delta[2]. By combining with Paimon's LSM tree, we can create a new position deletion mode unique to Paimon.
...
Delete file is used to mark the deletion of original file. The following figure illustrates how data updating and deleting under the delete file mode:
Currently, there are two ways to represent the deletion of records:
...
measure: the total time of calling "RoaringBitmap.add(x)", time of serialize to file, time of deserialize from file, the serialized file size, and the total time of calling "RoaringBitmap.contains(x)".
data rate / max num | add(ms) | serialization(ms) | deserialization(ms) | file size(MB) | constains(ms) |
20% /2,000,000 | 43 | 5 | 26 | 0.24 | 7 |
50% /2,000,000 | 47 | 3 | 52 | 0.24 | 5 |
80% /2,000,000 | 57 | 1 | 24 | 0.24 | 8 |
20% /20,000,000 | 450 | 13 | 247 | 2.4 | 49 |
50% /20,000,000 | 629 | 6 | 222 | 2.4 | 76 |
80% /20,000,000 | 1040 | 5 | 222 | 2.4 | 121 |
20% /200,000,000 | 5079 | 44 | 2262 | 24 | 442 |
50% /200,000,000 | 9469 | 43 | 2773 | 24 | 1107 |
80% /200,000,000 | 13625 | 38 | 2233 | 24 | 1799 |
20% /2,000,000,000 | 93753 | 568 | 22290 | 239 | 5747 |
50% /2,000,000,000 | 166070 | 679 | 22339 | 239 | 14735 |
80% /2,000,000,000 | 218233 | 553 | 22684 | 239 | 26504 |
Summarize the following points:
...