- Modify input record: This may involve dropping fields from input data if they don’t have corresponding table columns, adding nulls in case of missing fields for certain columns, and adding __HIVE_DEFAULT_PARTITION__ if partition column value is null or empty. Dynamically creating partitions requires understanding of incoming data format to extract last columns to extract partition values.
- Encode modified record: The encoding involves serialization using an appropriate Hive SerDe.
- Identify the bucket to which the record belongs
- Identify the partition to which the For bucketed tables, extract bucket column values from the record to identify the bucket where the record belongs
- For partitioned tables, in dynamic partitioning mode, extract the partition column values from last N columns (where N is number of partitions) of the record to identify the partition where the record belongs
- Write encoded record to Hive using the AcidOutputFormat's record updater for the appropriate bucket.