Status
Current state: Discuss
Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLIP-117-HBase-catalog-tc38908.html
JIRA:
-
FLINK-16575Getting issue details...
STATUS
Released: 1.11
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
Similarly to FLIP-93, this Flip aims to ease the interoperability of Flink with popular storage systems like HBase.
Proposal
We propose to add a `HBaseCatalog` implementation.
Design
`HBaseCatalog` will be a read only catalog that supports the following operations on db and table:
- database
- listDatabases (i.e. namespaces)
- databaseExists
- table
- getTable
- listTables
- tableExists
Users can only read these objects from databases, but not able to manipulate them yet in this FLIP.
Configurations
To configure a `HBaseCatalog`, users only need to provide a Map<String, String> tableProperties
- connector.zookeeper.quorum
- connector.zookeeper.znode.parent
- ... (Kerberor authentication or other features could require other props)
default_database will be default to 'default'
Interfaces, Classes, and Modules
classes will be all in flink-HBase module. In particular org.apache.flink.table.catalog.hbase.HBaseCatalog that extends AbstractCatalog
Does HBase support the concept of partitions..? I don't think so..
Does HBase support functions? I don't think so..
Does HBase support statistics? I don't think so..
Does HBase support views? I don't think so..
Using the catalog
Table API
|
SQL CLI yaml configs
|
Versions
Currently HBase connector has dependency version 1.4.3. We rely on the driver itself for backward compatibility. Users can also build jars with their own versions.
Flink-HBaseMapping
The proposed mapping is the following
Flink Catalog Metaspace Structure | n/a |
catalog name (defined in Flink only) | n/a |
database name | HBase namespace |
table name | table name |
The full path of HBase in Flink would be "<catalog>.<db>.<table>" or simply "<catalog>.<table>" if using the default namespace
Flink-Postgres Data Type Mapping
HBase supports only byte[] so every Flink type should be converted into a Byte[].
How do we map column (family:qualifier) to Flink Table columns?
Column Families are quite complex object in Flink..do we care about all their properties?
Rejected Alternatives
n/a