Discussion thread | https://lists.apache.org/thread/7qjzbcfzdshqb3h7ft31v9o3x43t8k6r |
---|---|
Vote thread | |
ISSUE | |
Release | TBD |
Motivation
Position delete is a solution to implement the Merge-On-Read (MOR) structure, which has been adopted by other formats such as Iceberg[1] and Delta[2]. By combining with Paimon's LSM tree, we can create a new position deletion mode unique to Paimon.
...
"data rate / max num" = 20% / 2,000,000, means randomly call "RoaringBitmap.add(x)", which "x" is in the range of 0 to 2,000,000 for a total of 20% * 2,000,000 = 400,000 times to build the bitmap, then serialize it to file, next deserialize from file, finally call "RoaringBitmap.addcontains(x)" for 400,000 times to simulate filter.
measure: the total time of calling "RoaringBitmap.add(x)", time of serialize to file, time of deserialize from file, the serialized file size, and the total time of calling "RoaringBitmap.contains(x)".
data rate / max num | add(ms) | serialization(ms) | deserialization(ms) | file size(MB) | constains(ms) |
20% /2,000,000 | 43 | 5 | 26 | 0.24 | 7 |
50% /2,000,000 | 47 | 3 | 52 | 0.24 | 5 |
80% /2,000,000 | 57 | 1 | 24 | 0.24 | 8 |
20% /20,000,000 | 450 | 13 | 247 | 2.4 | 49 |
50% /20,000,000 | 629 | 6 | 222 | 2.4 | 76 |
80% /20,000,000 | 1040 | 5 | 222 | 2.4 | 121 |
20% /200,000,000 | 5079 | 44 | 2262 | 24 | 442 |
50% /200,000,000 | 9469 | 43 | 2773 | 24 | 1107 |
80% /200,000,000 | 13625 | 38 | 2233 | 24 | 1799 |
20% /2,000,000,000 | 93753 | 568 | 22290 | 239 | 5747 |
50% /2,000,000,000 | 166070 | 679 | 22339 | 239 | 14735 |
80% /2,000,000,000 | 218233 | 553 | 22684 | 239 | 26504 |
Summarize the following points:
...