This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • AdminManual Metastore Administration

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: fix table of contents

Hive Metastore Administration

Table of Contents

Introduction

All the metadata for Hive tables and partitions are stored in Hive Metastore. Metadata is persisted using JPOX ORM solution so any store that is supported by it can be used by Hive. Most of the commercial relational databases and many open source datstores are supported. Any datastore that has JDBC driver can probably be used.

...

There are 3 different ways to setup metastore server using different Hive configurations:

...

locationtop
typelist

The relevant configuration parameters are shown here. (Non-metastore parameters are described

...

in Configuring Hive. Also see the Language Manual'

...

Hive Configuration Properties,

...

including Metastore

...

 and Hive Metastore Security.)

Config Param

Description

javax.jdo.option.ConnectionURL

JDBC connection string for the data store which contains metadata

javax.jdo.option.ConnectionDriverName

JDBC Driver class name for the data store which contains metadata

hive.metastore.uris

Hive connects to this URI to make metadata requests for a remote metastore

hive.metastore.local

local or remote metastore (Removed as of

...

Hive 0.10:

...

If hive.metastore.urisis

...

empty local

...

 mode is assumed, remoteotherwise)

hive.metastore.warehouse.dir

URI of the default location for native tables

 

These variables were carried over from old documentation without a guarantee that they all still exist (see

...

the HiveConf

...

 java class for current Hive configuration options):

Variable Name

Description

Default Value

hive.metastore.metadb.dir

 

 

hive.metastore.usefilestore

 

 

hive.metastore.rawstore.impl

 

 

org.jpox.autoCreateSchema

Creates necessary schema on startup if one doesn't exist. (e.g. tables, columns...) Set to false after creating it once.

 

org.jpox.fixedDatastore

Whether the datastore schema is fixed.

 

datanucleus.autoStartMechanism

Whether to initialize on startup.

 

hive.metastore.checkForDefaultDb

 

 

hive.metastore.ds.connection.url.hook

Name of the hook to use for retriving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used as the connection URL

 

hive.metastore.ds.retry.attempts

The number of times to retry a call to the backing datastore if there were a connection error

1

hive.metastore.ds.retry.interval

The number of miliseconds between datastore retry attempts

1000

hive.metastore.server.min.threads

Minimum number of worker threads in the Thrift server's pool.

200

hive.metastore.server.max.threads

Maximum number of worker threads in the Thrift server's pool.

10000

 

 

Warning
titleConfiguring datanucleus.autoStartMechanism is highly recommended

Configuring auto start for data nucleus is highly recommended.

See

See HIVE-4762

for

 for more details.

noformat
Code Block
 
<property>
    <name>datanucleus.autoStartMechanism</name>
    <value>SchemaTable</value>
  </property>

 

Default configuration sets up an embedded metastore which is used in unit tests and is described in the next section. More practical options are described in the subsequent sections.

Embedded Metastore

An embedded metastore is mainly used for unit tests. Only one process can connect to the metastore at a time, so it is not really a practical solution but works well for unit tests.

Derby is the default database for the embedded metastore.

Config Param

Config Value

Comment

javax.jdo.option.ConnectionURL

jdbc:derby:;databaseName=../build/test/junit_metastore_db;create=true

Derby database located at hive/trunk/build...

javax.jdo.option.ConnectionDriverName

org.apache.derby.jdbc.EmbeddedDriver

Derby embeded JDBC driver class

hive.metastore.uris

not needed since this is a local metastore

 

hive.metastore.local

true

embeded is local

hive.metastore.warehouse.dir

file://${user.dir}/../build/ql/test/data/warehouse

unit test data goes in here on your local filesystem

If you want to run Derby as a network server so the metastore can be accessed from multiple nodes,

...

see Hive Using Derby in Server Mode.

Local Metastore

In local metastore setup, each Hive Client will open a connection to the datastore and make SQL queries against it. The following config will set up a metastore in a MySQL server. Make sure that the server is accessible from the machines where Hive queries are executed since this is a local store. Also make sure the jdbc client library is in the classpath of Hive Client.

Config Param

Config Value

Comment

javax.jdo.option.ConnectionURL

jdbc:mysql://<host name>/<database name>?createDatabaseIfNotExist=true

metadata is stored in a MySQL server

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

MySQL JDBC driver class

javax.jdo.option.ConnectionUserName

<user name>

user name for connecting to MySQL server

javax.jdo.option.ConnectionPassword

<password>

password for connecting to MySQL server

hive.metastore.uris

not needed because this is local store

 

hive.metastore.local

true

this is local store

hive.metastore.warehouse.dir

<base hdfs path>

default location for Hive tables.

Remote Metastore

In remote metastore setup, all Hive Clients will make a connection to a metastore server which in turn queries the datastore (MySQL in this example) for metadata. Metastore server and client communicate

...

using Thrift

...

 Protocol. Starting with Hive 0.5.0, you can start a Thrift server by executing the following command:

...

 

...

hive

...

--service

...

metastore

...

 

In versions of Hive earlier than 0.5.0, it's instead necessary to run the Thrift server via direct execution of Java:

...

 

...

$JAVA_HOME/bin/

...

java  -Xmx1024m

...

-Dlog4j.configuration=file://$HIVE_HOME/conf/hms-log4j.properties

...

-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/

...

-cp

...

$CLASSPATH

...

org.apache.hadoop.hive.metastore.HiveMetaStore

...

 

If you execute Java directly, then JAVA_HOME, HIVE_HOME, HADOOP_HOME must be correctly set; CLASSPATH should contain Hadoop, Hive (lib and auxlib), and Java jars.

Server Configuration Parameters

Config Param

Config Value

Comment

javax.jdo.option.ConnectionURL

jdbc:mysql://<host name>/<database name>?createDatabaseIfNotExist=true

metadata is stored in a MySQL server

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

MySQL JDBC driver class

javax.jdo.option.ConnectionUserName

<user name>

user name for connecting to MySQL server

javax.jdo.option.ConnectionPassword

<password>

password for connecting to MySQL server

hive.metastore.warehouse.dir

<base hdfs path>

default location for Hive tables.

Client Configuration Parameters

Config Param

Config Value

Comment

hive.metastore.uris

thrift://<host_name>:<port>

host and port for the Thrift metastore server

hive.metastore.local

false

this is local store

hive.metastore.warehouse.dir

<base hdfs path>

default location for Hive tables.

If you are using MySQL as the datastore for metadata, put MySQL client libraries in HIVE_HOME/lib before starting Hive Client or HiveMetastore Server.

To change the metastore port, use

...

this hive

...

 command:

...

 

...

hive

...

--service

...

metastore

...

-p

...

<port_num>

...

Metastore Schema Consistency and Upgrades

...