HBase Input Storage Driver for HCatalog

Author: Vandana Ayyalasomayajula

This document presents the design for the HBase input storage driver for HCatalog.

Problem Statement

HCatalog presently does not support reading records from HBase tables. As a first step for implementing repeatable reads (snapshots) from HBase tables, there is a need to develop an input storage driver for HCatalog to enable the client applications to read from HBase tables.

Approach

Hbase tables can be read through HCatalog by creating a storage driver for HBase. The new input storage driver will use “TableInputFormat” to read rows from an HBase table. The input format provides a table record reader which returns the HBase rows as a < rowkey, Result> pair. The storage driver contains a method "convertToHCatRecord" which converts the previously mentioned pair to a HCatRecord

Diagram

Discussion Topics

Result Converter

The process of converting the HBase rows into HCat records needs be done by looking into the following:

  • The output schema which decides the final columns which the user needs.
  • The mapping from HBase columns to HCatalog table columns.

For efficiency purposes, the output schema could be used to set the columns of a scan operation to make sure only the required columns are read from the HBase table. Now, all the results obtained from the HBase table would be of interest to the user. The result converter utility class would carefully convert a particular result to HCat record.

Newly Added Classes

HBase Input Driver

This class is an extension of “HCatInputStorageDriver” for HBase. The input driver uses the "TableInputFormatClass" (of HBase) to provide the capability of reading HBase tables.

HBaseInputDriver.java
public class HBaseInputDriver extends HCatInputStorageDriver{

	public void initialize(JobContext context, Properties storageDriverArgs) throws IOException {
		jobConf = context.getConfiguration();
		converter = new ResultConverter(this.desiredColSchema, mapping);

	}


	@Override
	public InputFormat getInputFormat(
			Properties howlProperties) {
		this.driverProps = howlProperties;
		TableInputFormat tableInputFormat = new TableInputFormat();
		tableInputFormat.setConf(jobConf);
		return tableInputFormat;
	}

	@Override
	public HCatRecord convertToHCatRecord(WritableComparable baseKey,
			Writable baseValue) throws IOException {
		 return this.converter.convert((Result)baseValue);
	}


	@Override
	public void setOutputSchema(JobContext jobContext, HCatSchema howlSchema)
			throws IOException {
		desiredColSchema = howlSchema;

	}

	@Override
	public void setPartitionValues(JobContext jobContext,
			Map partitionValues) throws IOException {
		//throw exception.
	}


Result Converter

This class takes the output schema required by the user and the HBase column mapping provided using the table properties to convert a HBase record to a HCatalog record. This is a utility class used by the HBase input driver, HBase record reader and the HCat result scanner.

ResultConverter.java
class ResultConverter {

	private HCatSchema dataSchema = null;
	private String colMapping = null;

	public ResultConverter(HCatSchema schema, String mapping) {
		this.dataSchema = schema;
		this.colMapping = mapping;
	}

    	/**
         * @param result - HBase result to convert
         * @return HCatRecord that wraps the input
         */
         public HCatRecord convert(Result result) {

            // Perform conversion and return HCatRecord.
         }

        /**
         * Given an HCatRecord convert it to a Put for HBase.
         * @param record - HCatRecord to add put to
         * @returns put to be given to HBase
         */
         public Put convert(HCatRecord record) {

            // Perform conversion and return Put.
         }

  • No labels