Hudi provides efficient upserts, by mapping a def~record-key + def~partition-path combination consistently to a def~file-id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file group. In short, the mapped file group contains all versions of a group of records. Hudi provides efficient upserts, by mapping a def~record-key + def~partition-path combination consistently to a def~file-id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file group. In short, the mapped file group contains all versions of a group of records. Hudi currently provides two choices for indexes : def~bloom-index and def~hbase-index, (with a few in the works :
-
HUDI-466Getting issue details...
STATUS
,
-
HUDI-407Getting issue details...
STATUS
) to map a record key into the file id to which it belongs to. This enables us to speed up upserts significantly, without scanning over every record in the table. Hudi Indices can be classified based on their ability to lookup records across partition.