Status

Current stateDiscuss

Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLIP-117-HBase-catalog-tc38908.html

JIRA: FLINK-16575 - Getting issue details... STATUS

Released: 1.11

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Similarly to FLIP-93, this Flip aims to ease the interoperability of Flink with popular storage systems like HBase.

Proposal

We propose to add a `HBaseCatalog` implementation.

Design

`HBaseCatalog` will be a read only catalog that supports the following operations on db and table:

  • database
    • listDatabases (i.e. namespaces)
    • databaseExists
  • table
    • getTable
    • listTables
    • tableExists

Users can only read these objects from databases, but not able to manipulate them yet in this FLIP.

Configurations

To configure a `HBaseCatalog`, users only need to provide a Map<String, String> tableProperties

  1. connector.zookeeper.quorum
  2. connector.zookeeper.znode.parent
  3. ... (Kerberor authentication or other features could require other props)

default_database will be default to 'default'

Interfaces, Classes, and Modules

classes will be all in flink-HBase module. In particular org.apache.flink.table.catalog.hbase.HBaseCatalog that extends AbstractCatalog

Does HBase support the concept of partitions..? I don't think so..
Does HBase support functions? I don't think so..
Does HBase support statistics? I don't think so..  
Does HBase support views? I don't think so..

Using the catalog

Table API

// java
tEnv.registerCatalog(name, new HBaseCatalog(...))

SQL CLI yaml configs

catalogs:
    - name: ...
      type: hbase
      zk.quorum: host1,host2,host3:2181 (if default port 2181 can be omitted)
      default-database: default # optional, mapped to hbase namespace

Versions

Currently HBase connector has dependency version 1.4.3. We rely on the driver itself for backward compatibility. Users can also build jars with their own versions.

The proposed mapping is the following

Flink Catalog Metaspace Structure

n/a

catalog name (defined in Flink only)

n/a

database name

HBase namespace

table name

table name


The full path of HBase in Flink would be "<catalog>.<db>.<table>"  or simply "<catalog>.<table>"   if using the default namespace


HBase supports only byte[] so every Flink type should be converted into a Byte[].

How do we map column (family:qualifier) to Flink Table columns? 

Column Families are quite complex object in Flink..do we care about all their properties?

Rejected Alternatives

n/a

  • No labels