This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • Streaming Data Ingest V2

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Modify input record: This may involve dropping fields from input data if they don’t have corresponding table columns, adding nulls in case of missing fields for certain columns, and adding __HIVE_DEFAULT_PARTITION__ if partition column value is null or empty. Dynamically creating partitions requires understanding of incoming data format to extract last columns to extract partition values.
  2. Encode modified record: The encoding involves serialization using an appropriate Hive SerDe.
  3. Identify the bucket to which the record belongs
  4. Identify the partition to which the For bucketed tables, extract bucket column values from the record to identify the bucket where the record belongs
  5. For partitioned tables, in dynamic partitioning mode, extract the partition column values from last N columns (where N is number of partitions) of the record to identify the partition where the record belongs
  6. Write encoded record to Hive using the AcidOutputFormat's record updater for the appropriate bucket.

...