Definition
A set of records in tabular format (a table) ingested in `Hudi` - represents data internal to `Hudi` as opposed to external data, un-managed by `Hudi`.
Design decisions
- each def~table has a single `parquet` file and one or more def~timelines (with `delta file`s / `log file`s)
- external data is ingested in `Hudi` by one or more def~commits
Related concepts
Status (draft)
Definition
A set of records in tabular format (a table) ingested in `Hudi` - represents data internal to `Hudi` as opposed to external data, un-managed by `Hudi`.
Design decisions
- each def~table has a single `parquet` file and one or more def~timelines (with `delta file`s / `log file`s)
- external data is ingested in `Hudi` by one or more def~commits
1 Comment
SemanticBeeng
#todo clarify how `data schema` is defined and used when external datasets are registered & ingested in `Hudi`.