RFC - 20 : handle failed records

Proposers

Approvers

Status

Current state


Current State

UNDER DISCUSSION


IN PROGRESS

(tick)

ABANDONED


COMPLETED


INACTIVE


Discussion thread: here

JIRA: HUDI-648 - Getting issue details... STATUS

Released: 0.6.x

Abstract

Present a proposal for handling failed records in writer path.

Background

To handle failed records properly to facilitate investigation.

Implementation

Error record

Define an avro schema for error records

error record schema
{
  "type": "record",
  "namespace": "org.apache.hudi.common",
  "name": "ErrorRecord",
  "fields": [
    {
      "name": "uid",
      "type": "string"
    },
    {
      "name": "ts",
      "type": "string"
    },
    {
      "name": "schema",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "record",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "message",
      "type": ["null", "string"],
      "default": null
    },
    {
      "name": "context",
      "type": ["null", {"type": "map", "values": "string"}],
      "default": null
    }
}
  • `uid`: uuid for the error record
  • `ts`: creation unix timestamp for the error record
  • `schema`: original schema for the record if any
  • `record`: original serialized record in json if any
  • `message`: additional message or any string like error stacktrace to be attached
  • `context`: kv pairs for any related context info like commitTime, tableName, partitionpath, recordKey, etc

Errors table

Users can configure, based on their preferences, error tables as local or global ones.

Local error table

By default, if error table is enabled, it will be a local error table. Failed records will be written to a local Hudi table alongside with the original Hudi table with a suffix (configurable) like `_errors`.

Global error table

To write to a global error table, users can configure `hoodie.write.error.table.base.path=<some file system path>` and `hoodie.write.error.table.name=foobar`. If either of these 2 configs were set, error table is set to global mode and `hoodie.write.error.table.suffix` will be omitted.

Configurations

key
default
hoodie.write.error.table.enabledset to true to activate error table handling featurefalse
hoodie.write.error.table.suffixsuffix for local error table name, stored alongside the target table. If the Hudi table is "foo", errored records will be saved to "foo_errors" at the same base dir as configured via `hoodie.base.path`"_errors"
hoodie.write.error.table.nameerror table name"hoodie_errors"
hoodie.write.error.table.base.pathbase path for global error tablesame as `hoodie.base.path`


Write path

Start with

  • org.apache.hudi.client.HoodieWriteClient#postWrite

  • org.apache.hudi.client.HoodieWriteClient#completeCompaction

CLI support

  • Consider adding CLI support for easy inspection

Metrics

  • Emit a count metric for the number of failed records

Rollout/Adoption Plan

  • Use configuration turn on this feature `hoodie.write.error.table.enabled=true`
  • Default to false for smooth roll-out

Test Plan

  • Functional test cases to cover both local and global cases.







4 Comments

  1. Raymond Xu I think our mailing list discussions at this point are lot ahead of this RFC.. I suggest we sketch a high level approach here and move onto to prototyping?

    1. Vinoth Chandar Updated to reflect the latest discussion points.

  2. Raymond Xu Left some comments around whether or not, we need to special case local vs global in implementation. Please let me know what you think.

    1. Vinoth Chandar yup updated the content with less config entries