Installation from Tarball
Table of Contents |
---|
HCatalog Installed with Hive
Info | ||
---|---|---|
| ||
HCatalog is installed with Hive, starting with Hive release 0.11.0. |
HCatalog Command Line
If you install Hive from the binary tarball, the hcat
command is available in the hcatalog/bin
directory.
The hcat
command line is similar to the hive
command line; the main difference is that it restricts the queries that can be run to metadata-only operations such as DDL and DML queries used to read metadata (for example, "show tables").
The HCatalog CLI is documented here and the Hive CLI is documented here.
Most Alternatively, most hcat
commands can be issued as hive
commands except for "hcat -g
" and "hcat -p
". Note that the hcat
command uses the -p
flag for permissions but hive
uses it to specify a port number. The HCatalog CLI is documented here and the Hive CLI is documented here.
Hive installation is documented here.
Server Installation from Source
Info | ||
---|---|---|
| ||
This information is adapted from the HCatalog 0.5.0 installation instructions. Now that HCatalog is part of the Hive project, it is installed with Hive and therefore much of the information below is obsolete. Furthermore, there is no such thing as "HCatalog 0.11.0" although that fiction has been substituted below for "HCatalog 0.5.0" — until this wikidoc is revised to reflect current realities, you should be skeptical of what you read here. |
Prerequisites
- machine to put the installation tar on
- machine on which the server can be installed — this should have access to the Hadoop cluster in question, and be accessible from the machines you launch jobs from
- an RDBMS — we recommend MySQL and provide instructions for it
- Hadoop cluster
- Unix user that the server will run as, and, if you are running your cluster in secure mode, an associated Kerberos service principal and keytabs.
Throughout these instructions when you see a word in italics it indicates a place where you should replace the word with a locally appropriate value such as a hostname or password.
Tarball Location
A binary tarball for HCatalog 0.11.0 is provided in the build
directory:
build/hcatalog-0.11.0.
x.x.x.x-xxx.tar.gz
If it is in another location, move it to the HCatalog build
directory.
Database Setup
If you do not already have Hive installed with MySQL, the following will walk you through how to do so. If you have already set this up, you can skip this step.
Select a machine to install the database on. This need not be the same machine as the Thrift server, which we will set up later. For large clusters we recommend that they not be the same machine. For the purposes of these instructions we will refer to this machine as hivedb.acme.com.
Install MySQL server on hivedb.acme.com. You can obtain packages for MySQL from MySQL's download site. We have developed and tested with versions 5.1.46 and 5.1.48. We suggest you use these versions or later. Once you have MySQL up and running, use the mysql
command line tool to add the hive
user and hivemetastoredb
database. You will need to pick a password for your hive
user, and replace dbpassword in the following commands with it.
mysql -u root
mysql> CREATE USER 'hive'@'
hivedb.acme.com' IDENTIFIED BY '
dbpassword';
mysql> CREATE DATABASE hivemetastoredb DEFAULT CHARACTER SET latin1 DEFAULT COLLATE latin1_swedish_ci;
mysql> GRANT ALL PRIVILEGES ON hivemetastoredb.* TO 'hive'@'
hivedb.acme.com' WITH GRANT OPTION;
mysql> flush privileges;
mysql> quit;
Use the database installation script found in the Hive package to create the database. In the line below, hive_home
refers to the directory where you have installed Hive. If you are using Hive rpms, then this will be /usr/lib/hive
.
mysql -u hive -D hivemetastoredb -h
hivedb.acme.com -p <
hive_home/scripts/metastore/upgrade/mysql/hive-schema-0.11.0.mysql.sql
Thrift Server Setup
If you do not already have Hive running a metastore server using Thrift, you can use the following instructions to set up and run one. You may skip this step if you already are using a Hive metastore server.
Select a machine to install your Thrift server on. For smaller and test installations this can be the same machine as the database. For the purposes of these instructions we will refer to this machine as hcatsvr.acme.com.
If you have not already done so, install Hive 0.11.0 on this machine. You can use the binary distributions provided by Hive or rpms available from Apache Bigtop. If you use the Apache Hive binary distribution, select a directory, henceforth referred to as hive_home
, and untar the distribution there. If you use the rpms, hive_home
will be /usr/lib/hive
.
Install the MySQL Java connector libraries on hcatsvr.acme.com. You can obtain these from MySQL's download site.
Select a user to run the Thrift server as. This user should not be a human user, and must be able to act as a proxy for other users. We suggest the name "hive" for the user. Throughout the rest of this documentation we will refer to this user as hive. If necessary, add the user to hcatsvr.acme.com.
Select a root directory for your installation of HCatalog. This directory must be owned by the hive user. We recommend /usr/local/hive
. If necessary, create the directory. You will need to be the hive user for the operations described in the remainder of this Thrift Server Setup section.
Copy the HCatalog installation tarball into a temporary directory, and untar it. Then change directories into the new distribution and run the HCatalog server installation script. You will need to know the directory you chose as root and the directory you installed the MySQL Java connector libraries into (referred to in the command below as dbroot). You will also need your hadoop_home, the directory where you have Hadoop installed, and the port number you wish HCatalog to operate on which you will use to set portnum.
tar zxf hcatalog-0.11.0.
x.x.x.x-xxx.tar.gz
cd hcatalog-0.11.0.
x.x.x.x-xxx
share/hcatalog/scripts/hcat_server_install.sh -r
root -d
dbroot -h
hadoop_home -p
portnum
Now you need to edit your hive_home/conf/hive-site.xml
file. If there is no such file in the hive conf directory, copy hcat_home/etc/hcatalog/proto-hive-site.xml
and rename it hive-site.xml
in hive_home/conf/
. Open this file in your favorite text editor. The following table shows the values you need to configure. (For more information about Hive configuration parameters, see Configuring Hive and Hive Configuration Properties.)
Parameter | Value to Set It To |
---|---|
hive.metastore.local | false |
javax.jdo.option.ConnectionURL | jdbc:mysql://hostname/hivemetastoredb?createDatabaseIfNotExist=true |
javax.jdo.option.ConnectionDriverName | com.mysql.jdbc.Driver |
javax.jdo.option.ConnectionUserName | hive |
javax.jdo.option.ConnectionPassword | dbpassword value you used in setting up the MySQL server above. |
hive.semantic.analyzer.factory.impl | org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory |
hive.metastore.warehouse.dir | The directory can be a URI or an absolute file path. If it is an absolute file path, it will be resolved to a URI by the metastore: |
hive.metastore.uris | thrift://hostname:portnum where hostname is the name of the machine hosting the Thrift server, and portnum is the port number used above in the installation script. |
hive.metastore.execute.setugi | true |
hive.metastore.sasl.enabled | Set to true if you are using Kerberos security with your Hadoop cluster, false otherwise. |
hive.metastore.kerberos.keytab.file | The path to the Kerberos keytab file containing the metastore Thrift server's service principal. Only required if you set hive.metastore.sasl.enabled above to true. |
hive.metastore.kerberos.principal | The service principal for the metastore Thrift server. You can reference your host as _HOST and it will be replaced with your actual hostname. Only required if you set hive.metastore.sasl.enabled above to true. |
You can now proceed to starting the server.
Starting the Server
To start your server, HCatalog needs to know where Hive is installed. This is communicated by setting the environment variable HIVE_HOME
to the location you installed Hive. Start the HCatalog server by switching directories to root and invoking "HIVE_HOME=
hive_home sbin/hcat_server.sh start
".
Logging
Server activity logs are located in root/var/log/hcat_server
. Logging configuration is located at root/conf/log4j.properties
. Server logging uses DailyRollingFileAppender
by default. It will generate a new file per day and does not expire old log files automatically.
Stopping the Server
To stop the HCatalog server, change directories to the root directory and invoke "HIVE_HOME=
hive_home sbin/hcat_server.sh stop
".
Client Installation
Select a root directory for your installation of HCatalog client. We recommend /usr/local/hcat
. If necessary, create the directory.
Copy the HCatalog installation tarball into a temporary directory, and untar it.
tar zxf hcatalog-0.11.0.
x.x.x.x-xxx.tar.gz
Now you need to edit your hive_home/conf/hive-site.xml
file. You can use the same file as on the server except that the value of javax.jdo.option.ConnectionPasswordh
should be removed. This avoids having the password available in plain text on all of your clients.
The HCatalog command line interface (CLI) can now be invoked as HIVE_HOME=
hive_home root/bin/hcat
.
HCatalog Client Jars
In the Hive tar.gz, HCatalog libraries are available under hcatalog/share/hcatalog/.
HCatalog Server
HCatalog server is the same as Hive metastore. You can just follow the Hive metastore documentation for setting it up.
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Previous: Using HCatalog Hive installation and configuration: Installing Hive General: HCatalog Manual – WebHCat Manual – Hive Wiki Home – Hive Project Site |