...
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Copy On Write Table
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Merge On Read Table
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Writing
Write Operations
...
- The small file handling feature in Hudi, profiles incoming workload and distributes inserts to existing def~file-group instead of creating new file groups, which can lead to small files.
- Employing a cache of the def~timeline, in the writer such that as long as the spark cluster is not spun up everytime, subsequent def~write-operations never list DFS directly to obtain list of def~file-slices in a given def~table-partition
- User can also tune the size of the def~base-file as a fraction of def~log-files & expected compression ratio, such that sufficient number of inserts are grouped into the same file group, resulting in well sized base files ultimately.
- Intelligently tuning the bulk insert parallelism, can again in nicely sized initial file groups. It is in fact critical to get this right, since the file groups once created cannot be deleted, but simply expanded as explained before.
Querying
<WIP>
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Snapshot Queries
...
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Incremental Queries
...
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|
Read Optimized Queries
...
Excerpt Include | ||||||
---|---|---|---|---|---|---|
|