Hive Using Derby in Server Mode
Hive in embedded mode has a limitation of one active user at a time. You may want to run Derby as a Network Server, this way multiple users can access it simultaneously from different systems.
It is suggested you download the version of Derby that ships with Hive. If you have already run Hive in embedded mode, the first line of
derby.log contains the version.
My structure looks like this:
The variable to set has changed over the years. DERBY_HOME is now the proper name. I set this and the legacy name.
Hive also likes to know where Hadoop is installed:
Likely you are going to want to run Derby when Hadoop starts up. An interesting place for this other than as an
lsb-init-script might be alongside Hadoop scripts like
start-dfs. By default Derby will create databases in the directory it was started from.
Configure Hive to Use Network Derby
/opt/hadoop/hive/conf/hive-site.xml as follows. Note that "hadoop1" should be replaced with the hostname or IP address where the Derby network server can be found.
Version: JPOX properties are NOT used in Hive 5.0 or later.
JPOX properties can be specified in
jpox.properties changes are not required.
Copy Derby Jar Files
Now since there is a new client you MUST make sure Hive has these
jar files in the
lib directory or in the classpath. The same would be true if you used MySQL or some other DB.
If you receive the error "
javax.jdo.JDOFatalInternalException: Error creating transactional connection factory" where the stack trace originates at "
org.datanucleus.exceptions.ClassNotResolvedException: Class 'org.apache.derby.jdbc.ClientDriver' was not found in the CLASSPATH. Please check your specification and your CLASSPATH", you may benefit from putting the Derby
jar files directly in the Hadoop
Start Up Hive
The metastore will not be created until the first query hits it.
A directory should be created:
Now you can run multiple Hive instances working on the same data simultaneously and remotely.