Page History

...

This document applies only to the Metastore in Hive 3.0 and later releases. For Hive 0, 1, and 2 releases please see the Metastore Administration document.

Introduction

The definition of Hive objects such as databases, tables, and functions are stored in the Metastore. Depending on how the system is configured, statistics and authorization records may also be stored there. Hive, and other execution engines, can then use this data at runtime to determine how to parse, authorize, and efficiently execute user queries.

...

Beginning in Hive 3.0, the Metastore can be run without the rest of Hive being installed. It is provided as a separate release in order to allow non-Hive systems to easily integrate with it. (It is, however, still included in the Hive release for convenience.) Making the Metastore a standalone service involved changing a number of configuration parameter names and tool names. All of the old configuration parameters and tools still work work, in order to maximize backwards compatibility. This document will cover both the old and new names. As new functionality is added it will only old, Hive style names will not be added to the new names.

For details on using the Metastore without Hive, see Running the Metastore Without Hive below.

...

Parameter	Hive 2 Parameter	Default Value	Description
metastore.warehouse.dir	hive.metastore.warehouse.dir		URI of the default location for tables in the default catalog and database.
datanucleus.schema.autoCreateAll	datanucleus.schema.autoCreateAll	false	Auto creates the necessary schema in the RDBMS at startup if one does not exist. Set this to false after creating it once. To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended in production; run `schematool` instead.
metastore.schema.verification	hive.metastore.schema.verification	true	Enforce metastore schema version consistency. When set to true: verify that version information stored in the RDBMS is compatible with the version of the Metastore jar. Also disable automatic schema migration. Users are required to manually migrate the schema after upgrade, which ensures proper schema migration. This setting is strongly recommended in production. When set to false: warn if the version information stored in Metastore RDBMS doesn't match the version of the Metastore jar and allow auto schema migration.
metastore.hmshandler.retry.attempts	hive.hmshandler.retry.attempts	10	The number of times to retry a call to the meastore when there is a connection error.
metastore.hmshandler.retry.interval	hive.hmshandler.retry.interval	2 sec	Time between retry attempts.
metastore.log4j.file	hive.log4j.file	none	Log4j configuration file. If unset will look for `metastore-log4j2.properties` in $METASTORE_HOME/conf
metastore.stats.autogather	hive.stats.autogather	true	Whether to automatically gather basic statistics during insert commands.

...

Configuration Parameter	Comment
javax.jdo.option.ConnectionURL	Connection URL for the JDBC driver
javax.jdo.option.ConnectionDriverName	JDBC driver class
javax.jdo.option.ConnectionUserName	User name to connect to the RDBMS with, often 'hive' is used
javax.jdo.option.ConnectionPassword	Password to connect to the RDBMS with. The Metastore uses Hadoop's CredentialProvider API so this does not have to be stored in clear text in your configuration file.

...

Except in the case of HiveServer2, using this mode raises a few concerns. First, having many clients will put a burden on the backing RDBMS since each client will have its own set of connections. Second, every client must have read/write access to the RDBMS. This makes it hard to properly secure the RDBMS. Therefore embedded mode is not recommended in production use cases with the exception of HiveServer2.

...

Configured On	Parameter	Hive 2 Parameter	Format	Default Value	Comment
Client	metastore.thrift.uris	hive.metastore.uris	thrift://<HOST>:<PORT>[, thrift://<HOST>:<PORT>...]	none	HOST = hostname, PORT = port, default is 9083.should be set to match metastore.thrift.port on the server (which defaults to 9083. You can provide multiple servers in a comma separate list.
Server	metastore.thrift.port	hive.metastore.port	integer	9083	Port Thrift will listen on.

Once you have configured your clients, you can start the Metastore on a server using the start-metastore utility. See the -help option of that utility for available options. There is no stop-metastore script. Instead you You must locate the process id for the metastore and kill that process.

...

The Metastore service is stateless. This allows you to start multiple instances of the service to provide for high availability. It also allows you to configure some clients to embed the metastore (e.g. HiveServer2) while still running a Metastore service for other clients. If you are running multiple Metastore services you can put all their URIs into your client's metastore.thrift.uris value and then set metastore.thrift.uri.selection ( in Hive 2 hive.metastore.uri.selection) to RANDOM or SEQUENTIAL. RANDOM will cause your client to randomly select one of the servers in the list, while SEQUENTIAL will cause it to start at the beginning of the list and attempt to connect to each server in order.

Securing the Service

TODO: Need to fill in details for setting up with Kerberos, SSL, etc.

CLIENT_KERBEROS_PRINCIPAL, KERBEROS_*, SSL*, USE_SSL, USE_THRIFT_SASL

Running the Metastore Without Hive

Beginning in Hive 3.0, the Metastore is released as a separate package and can be run without the rest of Hive. This is referred to as standalone mode.

...

By default the Metastore is configured for use with Hive, so a few configuration parameters have to be changed in this configuration.

Configuration Parameter	Set to for Standalone Mode
metastore.task.threads.always	org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask
metastore.expression.proxy	org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy

Currently the following features have not been tested or are known not to work with the Metastore in standalone mode:

The compactor (for use with ACID tables) cannot be run without Hive. ACID tables can be read and written to, but they cannot compacted.
Replication has not been tested outside of Hive.

Less Commonly Changed Configuration Parameters

...

Space shortcuts

Child pages

Versions Compared

Old Version 4

New Version 5

Key

Introduction

Securing the Service

Running the Metastore Without Hive

Less Commonly Changed Configuration Parameters