...
Reuse the current index layout and just treat deletemap as a new index file type
...
2.2. DeleteMap index file encoding
Like hash index, one bucket one deleteMap index. Therefore, a deleteMap index file needs to contain bitmaps of multiple files in the same bucket, its structure is actually a map<fileName, bitmap>
, to support high-performance reads, we have designed the following file encoding to store this :
In IndexFileMeta:
{
"org.apache.paimon.avro.generated.record":
...
{
"_VERSION":
...
1,
...
"_KIND":
...
0,
...
"_PARTITION":
...
"\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000p\u0000\u0000\u0000\u0000\u0000\u0000",
...
"_BUCKET":
...
0,
...
"_TYPE":
...
"DELETE_MAP",
...
"_FILE_NAME":
...
"index-32f16270-5a81-4e5e-9f93-0e096b8b58d3-0",
...
"_FILE_SIZE":
...
x,
...
"_ROW_COUNT":
...
count of the map,
"_DELETE_INDEX_RANGES": "binary Map<String, Pair<Integer, Integer>>"
}
}
In IndexFile:
- First, record version by a byte.
- Then, record <serialized bitmap' size , serialized bitmap, serialized bitmap's checksum> in sequence.
For each serialized bitmap:
- First, record a const magic number by an int.
- Then, record serialized bitmap
2.2. DeleteMap index file encoding
Like hash index, one bucket one deleteMap index. Therefore, a deleteMap index file needs to contain bitmaps of multiple files in the same bucket, its structure is actually a map<fileName, bitmap>
, to support high-performance reads, we have designed the following file encoding to store this map :
- First, record the size of this map by an int.
- Then, record the max fileName length by an int (because the length of the file name may not be consistent).
- Next, record <fileName (padding to max length), the starting pos of the serialized bitmap, the size of the bitmap> in sequence. Finally, record <serialized bitmap> in sequence
- .
e.g:
2.3 Classes
BitmapDeleteIndex
Implement DeleteIndex
based on RoaringBitmap
...
Code Block | ||||
---|---|---|---|---|
| ||||
public class DeleteMapIndexFile { public long fileSize(String fileName); public Map<String, long[]> readDeleteIndexBytesOffsets(String fileName); public Map<String, DeleteIndex> readAllDeleteIndex(String fileName, Map<String, long[]> deleteIndexBytesOffsetsPair<Integer, Integer>> deleteIndexRanges); public DeleteIndex readDeleteIndex(String fileName, Pair<Integer, long[]Integer> deleteIndexBytesOffsetdeleteIndexRange); public String Pair<String, Map<String, Pair<Integer, Integer>>> write(Map<String, DeleteIndex> input); public void delete(String fileName); } |
...