Table of Contents
JdbcStorageHandler supports reading from jdbc data source in Hive. Currently writing to a jdbc data source is not supported. To use JdbcStorageHandler, you need to create an external table using JdbcStorageHandler. Here is a simple example:
ALTER TABLE student_jdbc SET TBLPROPERTIES ("hive.sql.dbcp.password" = "passwd");
In the create table statement, you are required to specify the following table properties:
hive.sql.catalog: jdbc catalog name (only valid if “hive.sql.table“ is specified)
hive.sql.schema: jdbc schema name (only valid if “hive.sql.table“ is specified)
hive.sql.jdbc.fetch.size: number of rows to fetch in a batch
hive.sql.dbcp.xxx: all dbcp parameters will pass to commons-dbcp. See https://commons.apache.org/proper/commons-dbcp/configuration.html for definition of the parameters. For example, if you specify hive.sql.dbcp.maxActive=1 in table property, Hive will pass maxActive=1 to commons-dbcp
Supported Data Type
The column data type for a Hive JdbcStorageHandler table can be:
Note complex data type: struct, map, array are not supported
hive.sql.table / hive.sql.query defines a tabular data with a schema. The schema definition has to be the same as the table schema definition. For example, the following create table statement will fail:
Hive will try to convert the double “gpa” of underlining table STUDENT to decimal(4,3) as the effective_gpa field of the student_jdbc table. In case the conversion is not possible, Hive will produce null for the field.
JdbcStorageHandler will ship required jars to MR/Tez/LLAP backend automatically if JdbcStorageHandler is used in the query. User don’t need to add jar manually. JdbcStorageHandler will also ship required jdbc driver jar to the backend if it detects any jdbc driver jar in classpath (include mysql, postgres, oracle and mssql). However, user are still required to copy jdbc driver jar to hive classpath (usually, lib directory in hive).
In most cases, we don’t want to store jdbc password in clear text in table property "hive.sql.dbcp.password". Instead, user can store password in a Java keystore file on HDFS using the following command:
You will need to protect the keystore file by only authorize targeted user to read this file using authorizer (such as ranger). Hive will check the permission of the keystore file to make sure user has read permission of it when creating/altering table.
Hive is able to split the jdbc data source and process each split in parallel. User can use the following table property to decide whether or not to split and how many splits to split into:
jdbc.JdbcInputFormat: Num input splits created 4 jdbc.JdbcInputFormat: split:interval:ikey[,70) jdbc.JdbcInputFormat: split:interval:ikey[70,80) jdbc.JdbcInputFormat: split:interval:ikey[80,90) jdbc.JdbcInputFormat: split:interval:ikey[90,)
Hive will pushdown computation to jdbc table aggressively, so we can make best usage of the native capacity of jdbc data source.