Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Overview

This is a proposal for adding API to hive which allows reading and writing with Hive using a Hadoop compatible API. Specifically, the interfaces being implemented are:

...

The classes will be named HiveApiInputFormat and HiveApiOutputFormat.

InputFormat (reading from Hive

...

)

UsageAt the high level, to read from hive using this API:

  1. Create a HiveInputDescription object
  2. Fill it with information about the table to read from
  3. Initialize HiveApiInputFormat with that information
  4. Go to town using HiveApiInputFormat with your Hadoop-compatible reading system.

...

  • The HiveInputDescription describes the database, table and columns to select. It also has a partition filter property that can be used to read from only the partitions that match the filter statement.
  • HiveApiInputFormat supports reading from multiple tables by having a concept of profiles. Each profile stores its input description in a separate section, and the HiveApiInputFormat has a member which tells it which profile to read from. When initializing the input data in HiveApiInputFormat you can pair it with a profile. If no profile is selected then a default profile is used.

Future plans:

  • Lots of performance work. Expose more direct byte[] sort of semantics.

...

  • Filtering of rows returned.

OutputFormat (writing to Hive)

TODO