Objective

Demonstrate how to configure gaianDB to connect to a relational database source

Provide an example of using a reasonable size data set

Act as a baseline for experimenting with VDC, Atlas & ranger plugins

Excluded

This page doesn't cover setup/install of the ranger plugin, install of Mariadb/linux or setup of reporting tools or other data oriented UIs that use the data source

 

Topology

MacOS client running dbvisualizer

Ubuntu 17.10 server running MariaDB

  •  An Azure B series 4GB ram/2 cpu environment is sufficient

Data Source

Install mariadb database - a more open fork of MySQL that is compatible. Follow the instructions at get-started to get mariadb installed

My versions for this test are:

jonesn@worklinux:~$ dpkg -l | grep mariadb

ii  mariadb-client                             10.1.25-1                                    all          MariaDB database client (metapackage depending on the latest version)
ii  mariadb-client-10.1                        10.1.25-1                                    amd64        MariaDB database client binaries
ii  mariadb-client-core-10.1                   10.1.25-1                                    amd64        MariaDB database core client binaries
ii  mariadb-common                             10.1.25-1                                    all          MariaDB common metapackage
ii  mariadb-server                             10.1.25-1                                    all          MariaDB database server (metapackage depending on the latest version)
ii  mariadb-server-10.1                        10.1.25-1                                    amd64        MariaDB database server binaries
ii  mariadb-server-core-10.1                   10.1.25-1                                    amd64        MariaDB database core server files

Download MySQL employee sample database by cloning the repo at  https://github.com/datacharmer/test_db

  • 160MB
  • 4 million records (300k employees, 2.8 million salary entries)
  • 6 tables 
  • open license (please read the details)

Follow the instructions to load this data into mysql

Client

One good tool for checking the database is dbvisualizer but any JDBC client will do

Check one can connect to the above database - using something like jdbc:mysql://12.34.56.78:3306/employees 

GaianDB

Download gaiandb from github

Unpack the prebuilt binary from the build subdirectory (gaiandb is tricky to build currently)

  • I am using GAIANDB_V2.1.8_20160523.zip

Download the current mariadb jdbc driver (or copy/link from local system)

  • I am using MariaDB Connector/J 2.1.2

Copy the .jar into a convenient location, such as /home/jonesn/mysql

In theory gaianDB should pick up jdbc drivers from the 'ext' subdirectory in the unpacked directory - however I couldn't get this to work, so update launchGaianServer.sh with the following line under 'static jars setting'

export CLASSPATH="$CLASSPATH:/home/jonesn/mysql/mariadb-java-client-2.1.2.jar"

Also modify the java heap size in this script by searching for JAVA_OPTS and using something like (the default is 256 MB)

[[ -z "$JAVA_OPTS" ]] && JAVA_OPTS=-Xmx1024m

Finally configure the mysql data source by modifying gaiandb_config.properties

VEMPLOYEE_DEF=employee_number integer, birth_date date, firstname varchar(14), lastname varchar(14), gender char, hiring_date date

# Example of an RDB data source, this one points to the 'DB2' connection property below (specifying DRIVER, URL, USR, PWD).

MYSQL_DB_DRIVER=com.mysql.jdbc.Driver
MYSQL_DB_URL=jdbc:mysql://localhost:3306/employees
MYSQL_USR=gaiandb
MYSQL_PWD=PUT YOUR PASSWORD HERE
VEMPLOYEE_DS0_CONNECTION=MYSQL employees
VEMPLOYEE_DS0_URL=jdbc:mysql://localhost:3306/employees
VEMPLOYEE_DS0_C1=emp_no
VEMPLOYEE_DS0_C2=birth_date
VEMPLOYEE_DS0_C3=first_name
VEMPLOYEE_DS0_C4=last_name
VEMPLOYEE_DS0_C5=gender
VEMPLOYEE_DS0_C6=hire_date

Note that the columns are named slightly differently to the source. For our VDC project this is done automatically by the virtualizer component using metadata from Atlas

Now you can launch gaiandb with

./launchGaianServer.sh
 

There's also documentation on a larger data set backed by postgresSQL - UK Land Registry Data

 

 

 

 

  • No labels