Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
== Hive Metastore ==
{toc}
h3. Introduction

All the metadata for Hive tables and partitions are stored in Hive Metastore. Metadata is persisted using [JPOX|http://www.datanucleus.org/] ORM solution so any store that is supported by it. Most of the commercial relational databases and many open source datstores are supported. Any datastore that has JDBC driver can probably be used.

You can find an E/R diagram for the metastore [here|https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf].

There are 3 different ways to setup metastore server using different Hive configurations. The relevant configuration parameters are

| Config Param | Description |
| javax.jdo.option.ConnectionURL | JDBC connection string for the data store which contains metadata |
| javax.jdo.option.ConnectionDriverName | JDBC Driver class name for the data store which contains metadata |
| hive.metastore.uris | Hive connects to this URI to make metadata requests for a remote metastore |
| hive.metastore.local | local or remote metastore |
| hive.metastore.warehouse.dir | URI of the default location for native tables |

Default configuration sets up an embedded metastore which is used in unit tests and is described in the next section. More practical options are described in the subsequent sections.
h3. Embedded Metastore
Mainly used for unit tests and only one process can connect to metastore at a time. So it is not really a practical solution but works well for unit tests.
| Config Param | Config Value | Comment |
| javax.jdo.option.ConnectionURL | jdbc:derby:;databaseName=../build/test/junit_metastore_db;create=true | derby database located at hive/trunk/build... |
| javax.jdo.option.ConnectionDriverName | org.apache.derby.jdbc.EmbeddedDriver | Derby embeded JDBC driver class|
| hive.metastore.uris | not needed since this is a local metastore | |
| hive.metastore.local | true | embeded is local |
| hive.metastore.warehouse.dir | file://${user.dir}/../build/ql/test/data/warehouse | unit test data goes in here |

If you want to run the metastore as a network server so it can be accessed from multiple nodes try HiveDerbyServerMode.
h3. Local Metastore
In local metastore setup, each Hive Client will open a connection to the datastore and make SQL queries against it. The following config will setup a metastore in a MySQL server. Make sure that the server accessible from the machines where Hive queries are executed since this is a local store. Also the jdbc client library is in the classpath of Hive Client.

| Config Param | Config Value | Comment |
| javax.jdo.option.ConnectionURL | jdbc:mysql://<host name>/<database name>?createDatabaseIfNotExist=true | metadata is stored in a MySQL server |
| javax.jdo.option.ConnectionDriverName | com.mysql.jdbc.Driver | MySQL JDBC driver class |
| javax.jdo.option.ConnectionUserName | <user name> | user name for connecting to mysql server |
| javax.jdo.option.ConnectionPassword | <password> | password for connecting to mysql server |
| hive.metastore.uris | not needed because this is local store| |
| hive.metastore.local | true | this is local store |
| hive.metastore.warehouse.dir | <base hdfs path> | default location for Hive tables. |
h3. Remote Metastore
In remote metastore setup, all Hive Clients will make a connection a metastore server which in turn queries the datastore (MySQL in this example) for metadata. Metastore server and client communicate using [Thrift|http://incubator.apache.org/thrift] Protocol. Starting with Hive 0.5.0, you can start a thrift server by executing the following command:

{code}
hive --service metastore
{code}

In versions of Hive earlier than 0.5.0, it's instead necessary to run the thrift server via direct execution of Java:

{code}
$JAVA_HOME/bin/java  -Xmx1024m -Dlog4j.configuration=file://$HIVE_HOME/conf/hms-log4j.properties -Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/ -cp $CLASSPATH org.apache.hadoop.hive.metastore.HiveMetaStore
{code}

If you execute Java directly, then JAVA_HOME, HIVE_HOME, HADOOP_HOME must be correctly set; CLASSPATH should contain Hadoop, Hive (lib and auxlib), and Java jars.

Server Configuration Parameters
| Config Param | Config Value | Comment |
| javax.jdo.option.ConnectionURL | jdbc:mysql://<host name>/<database name>?createDatabaseIfNotExist=true | metadata is stored in a MySQL server |
| javax.jdo.option.ConnectionDriverName | com.mysql.jdbc.Driver | MySQL JDBC driver class |
| javax.jdo.option.ConnectionUserName | <user name> | user name for connecting to mysql server |
| javax.jdo.option.ConnectionPassword | <password> | password for connecting to mysql server |
| hive.metastore.warehouse.dir | <base hdfs path> | default location for Hive tables. |

Client Configuration Parameters
| Config Param | Config Value | Comment |
| hive.metastore.uris | thrift://<host_name>:<port> | host and port for the thrift metastore server |
| hive.metastore.local | false | this is local store |
| hive.metastore.warehouse.dir | <base hdfs path> | default location for Hive tables. |

If you are using MySQL as the datastore for metadata, put MySQL client libraries in HIVE_HOME/lib before starting Hive Client or HiveMetastore Server.
h2. Metastore Deployment Options in Pictures
[^metastore_usage.pptx|Metastore Deployments and Usage]
a