Hive Metastore Administration
Table of Contents |
---|
Note |
---|
This page only documents the MetaStore in Hive 2.x and earlier. For 3.x and later releases please see AdminManual Metastore 3.0 Administration |
Introduction
All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using JPOX ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of supported databases in section below.
You can find an E/R diagram for the metastore here.
...
Configuration options for metastore database where metadata is persisted:
- Local/Embedded Metastore Database (Derby)
- Remote Metastore Database
- AdminManual MetastoreAdmin
- AdminManual MetastoreAdmin
Configuration options for metastore server:
- AdminManual MetastoreAdminLocal/Embedded Metastore Server AdminManual MetastoreAdmin
- Remote Metastore Server
Basic Configuration Parameters
The relevant configuration parameters are shown here. (Non-metastore parameters are described in Configuring Hive. Also see the Language Manual's Hive Configuration Properties, including Metastore and Hive Metastore Security.)
...
The Hive metastore is stateless and thus there can be multiple instances to achieve High Availability. Using hive.metastore.uris
it is possible to specify multiple remote metastores. Hive will use the first one from the list by default but will pick a random one on connection failure and will try to reconnect.
Additional Configuration Parameters
The following metastore configuration parameters were carried over from old documentation without a guarantee that they all still exist. See the HiveConf
Java class for current Hive configuration options, and see the Metastore and Hive Metastore Security sections of the Language Manual's Hive Configuration Properties for user-friendly descriptions of the metastore parameters.
Configuration Parameter | Description | Default Value |
---|---|---|
hive.metastore.metadb.dir | The location of filestore metadata base directory. (Functionality removed in 0.4.0 with HIVE-143.) |
|
hive.metastore.rawstore.impl | Name of the class that implements the org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database. (Hive 0.8.1 and later.) |
|
org.jpox.autoCreateSchema | Creates necessary schema on startup if one doesn't exist. (The schema includes tables, columns, and so on.) Set to false after creating it once. |
|
org.jpox.fixedDatastore | Whether the datastore schema is fixed. |
|
datanucleus.autoStartMechanism | Whether to initialize on startup. |
|
hive.metastore.ds.connection.url.hook | Name of the hook to use for retriving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used as the connection URL. (Hive 0.6 and later.) |
|
hive.metastore.ds.retry.attempts | The number of times to retry a call to the backing datastore if there were a connection error. | 1 |
hive.metastore.ds.retry.interval | The number of miliseconds between datastore retry attempts. | 1000 |
hive.metastore.server.min.threads | Minimum number of worker threads in the Thrift server's pool. | 200 |
hive.metastore.server.max.threads | Maximum number of worker threads in the Thrift server's pool. | 100000 since Hive 0.8.1 |
hive.metastore.filter.hook | Metastore hook class for further filtering the metadata read results on client side. | org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl |
hive.metastore.port | Hive metastore listener port. | 9083 |
Data Nucleus Auto Start
Warning | ||
---|---|---|
| ||
Configuring auto start for data nucleus is highly recommended. See HIVE-4762 for more details.
|
Default Configuration
The default configuration sets up an embedded metastore which is used in unit tests and is described in the next section. More practical options are described in the subsequent sections.
Local/Embedded Metastore Database (Derby)
An embedded metastore database is mainly used for unit tests. Only one process can connect to the metastore database at a time, so it is not really a practical solution but works well for unit tests.
For unit tests AdminManual MetastoreAdminLocal/Embedded Metastore Server configuration for the metastore server is used in conjunction with embedded database.
...
If you want to run Derby as a network server so the metastore can be accessed from multiple nodes, see Hive Using Derby in Server Mode.
Remote Metastore Database
In this configuration, you would use a traditional standalone RDBMS server. The following example configuration will set up a metastore in a MySQL server. This configuration of metastore database is recommended for any real use.
Config Param | Config Value | Comment |
---|---|---|
javax.jdo.option.ConnectionURL |
| metadata is stored in a MySQL server |
javax.jdo.option.ConnectionDriverName |
| MySQL JDBC driver class |
javax.jdo.option.ConnectionUserName |
| user name for connecting to MySQL server |
javax.jdo.option.ConnectionPassword |
| password for connecting to MySQL server |
Local/Embedded Metastore Server
In local/embedded metastore setup, the metastore server component is used like a library within the Hive Client. Each Hive Client will open a connection to the database and make SQL queries against it. Make sure that the database is accessible from the machines where Hive queries are executed since this is a local store. Also make sure the JDBC client library is in the classpath of Hive Client. This configuration is often used with HiveServer2 (to use embedded metastore only with HiveServer2 add "--hiveconf hive.metastore.uris=' '" in command line parameters of the hiveserver2 start command or use hiveserver2-site.xml (available in Hive 0.14)).
Config Param | Config Value | Comment |
---|---|---|
hive.metastore.uris | not needed because this is local store |
|
hive.metastore.local |
| this is local store (removed in Hive 0.10, see configuration description section) |
hive.metastore.warehouse.dir |
| Points to default location of non-external Hive tables in HDFS. |
Remote Metastore Server
In remote metastore setup, all Hive Clients will make a connection to a metastore server which in turn queries the datastore (MySQL in this example) for metadata. Metastore server and client communicate using Thrift Protocol. Starting with Hive 0.5.0, you can start a Thrift server by executing the following command:
...
If you execute Java directly, then JAVA_HOME, HIVE_HOME, HADOOP_HOME must be correctly set; CLASSPATH should contain Hadoop, Hive (lib and auxlib), and Java jars.
Server Configuration Parameters
The following example uses a AdminManual MetastoreAdminRemote Metastore Database.
Config Param | Config Value | Comment |
---|---|---|
javax.jdo.option.ConnectionURL |
| metadata is stored in a MySQL server |
javax.jdo.option.ConnectionDriverName |
| MySQL JDBC driver class |
javax.jdo.option.ConnectionUserName |
| user name for connecting to MySQL server |
javax.jdo.option.ConnectionPassword |
| password for connecting to MySQL server |
hive.metastore.warehouse.dir |
| default location for Hive tables. |
From Hive 3.0.0 (HIVE-16452) onwards the metastore database stores a GUID which can be queried using the Thrift API get_metastore_db_uuid by metastore clients in order to identify the backend database instance. This API can be accessed by the HiveMetaStoreClient using the method getMetastoreDbUuid().
Client Configuration Parameters
Config Param | Config Value | Comment |
---|---|---|
hive.metastore.uris |
| host and port for the Thrift metastore server |
hive.metastore.local |
| Metastore is remote. Note: This is no longer needed as of Hive 0.10. Setting hive.metastore.uri is sufficient. |
hive.metastore.warehouse.dir |
| Points to default location of non-external Hive tables in HDFS. |
...
No Format |
---|
hive --service metastore -p <port_num> |
Supported Backend Databases for Metastore
Database | Minimum Supported Version | Name for Parameter Values | See Also |
---|---|---|---|
MySQL | 5.6.17 | mysql | |
Postgres | 9.1.13 | postgres | |
Oracle | 11g | oracle | hive.metastore.orm.retrieveMapNullsAsEmptyStrings |
MS SQL Server | 2008 R2 | mssql |
Metastore Schema Consistency and Upgrades
Info | ||
---|---|---|
| ||
Introduced in Hive 0.12.0. See HIVE-3764. |
...