Skip to end of metadata
Go to start of metadata

How to Build

Setup an environment with the dependencies installed

Install dependencies on MAC (with xcode installed)

Make sure you have done: xcode-select --install to install developer tools

Please refer to section below titled Running catalog tidycat perl modules for installing perl-JSON module on MAC/

Note for Installing Dependencies

OS requirement

Use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value. 

       kern.sysv.shmmax=2147483648

       kern.sysv.shmmin=1

        kern.sysv.shmmni=64

        kern.sysv.shmseg=16

        kern.sysv.shmall=524288

        kern.maxfiles=65535

        kern.maxfilesperproc=65536

  • Reboot to apply the change.

Install Xcode and command line tools

After install/update xcode, please run ‘xcode-select --install’ to install command line tools, and then open xcode to make sure you have already installed it. 

MUST: Turning Off Rootless System Integrity Protection in OS X El Capitan 10.11+

If not do this, you may encounter some tricky LIBRARY_PATH problems. e.g. HAWQ-513

Following below instructions: ( refer to http://osxdaily.com/2015/10/05/disable-rootless-system-integrity-protection-mac-os-x )

  1. Reboot the Mac and hold down Command + R keys simultaneously after you hear the startup chime, this will boot OS X into Recovery Mode
  2. When the “OS X Utilities” screen appears, pull down the ‘Utilities’ menu at the top of the screen instead, and choose “Terminal”
  3. Type the following command into the terminal then hit return: csrutil disable; reboot

Install dependencies on Rad Hat/CentOS 7.X

Dependencies

 

OS requirement

  • use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value. 

     kernel.shmmax = 1000000000
     kernel.shmmni = 4096
     kernel.shmall = 4000000000
     kernel.sem = 250 512000 100 2048
     kernel.sysrq = 1
     kernel.core_uses_pid = 1
     kernel.msgmnb = 65536
     kernel.msgmax = 65536
     kernel.msgmni = 2048
     net.ipv4.tcp_syncookies = 0
     net.ipv4.conf.default.accept_source_route = 0
     net.ipv4.tcp_tw_recycle = 1
     net.ipv4.tcp_max_syn_backlog = 200000
     net.ipv4.conf.all.arp_filter = 1
     net.ipv4.ip_local_port_range = 1281 65535
     net.core.netdev_max_backlog = 200000
     vm.overcommit_memory = 2
     fs.nr_open = 3000000
     kernel.threads-max = 798720
     kernel.pid_max = 798720
     # increase network
     net.core.rmem_max=2097152
     net.core.wmem_max=2097152
  • Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:
    sysctl -p
  • Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed
     * soft nofile 2900000
     * hard nofile 2900000
     * soft nproc 131072
     * hard nproc 131072

Build dependencies yourself ( tested on Red Hat 6.X).

Dependencies

There are several dependencies (see the following table) you must install before building HAWQ. To build Apache HAWQ, gcc and some dependencies are needed. The libraries are tested on the given versions. Most of the dependencies can be installed through yum. Other dependencies should be installed through the source tarball. Typically you can use "./configure && make && make install" to install from source tarball.

Libraries that must be installed using source tarball.


 You might need to run "ldconfig -p <LIBRARY_INSTALL_PATH>" after installing them.

For thrift build, you might need "--without-tests" for configure.

Install maven:
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven

Install pip:

wget https://bootstrap.pypa.io/get-pip.py

python get-pip.py

 

pip --retries=50 --timeout=300 install pycrypto
 

Libraries that can be installed through yum.

NameVersion
epel-release6-8
make
3.81
gcc
>=4.7.2
gcc-c++
>=4.7.2
gperf
3.0.4
snappy-devel
1.1.3
bzip2-devel 
1.0.6
python-devel 
2.6.2
libevent-devel
1.4.6
krb5-devel
1.11.3
libuuid-devel
2.26.2
libgsasl-devel
1.8.0
libxml2-devel 
2.7.8
zlib-devel
1.2.3
readline-devel
6
openssl-devel
0.9.8
bison
1.875
apr-devel
1.2.12
libyaml-devel
0.1.1
flex
>2.5.4
lcov1.12
libesmtp-devel1.0.4
perl-JSON2.15
tomcat6.0.44

Default version of gcc in Red Hat/CentOS 6.X is 4.4.7 or lower, you can quickly upgrade gcc following instructions below:

You will need to install python packages same as those which are required for Red Hat/CentOS 7.

OS requirement

  • use a text editor to edit the /etc/sysctl.conf file. Add or edit each of the following parameter definitions to set the required value. 

     kernel.shmmax = 1000000000
     kernel.shmmni = 4096
     kernel.shmall = 4000000000
     kernel.sem = 250 512000 100 2048
     kernel.sysrq = 1
     kernel.core_uses_pid = 1
     kernel.msgmnb = 65536
     kernel.msgmax = 65536
     kernel.msgmni = 2048
     net.ipv4.tcp_syncookies = 0
     net.ipv4.conf.default.accept_source_route = 0
     net.ipv4.tcp_tw_recycle = 1
     net.ipv4.tcp_max_syn_backlog = 200000
     net.ipv4.conf.all.arp_filter = 1
     net.ipv4.ip_local_port_range = 1281 65535
     net.core.netdev_max_backlog = 200000
     vm.overcommit_memory = 2
     fs.nr_open = 3000000
     kernel.threads-max = 798720
     kernel.pid_max = 798720
     # increase network
     net.core.rmem_max=2097152
     net.core.wmem_max=2097152
  • Execute the following command to apply your updated /etc/sysctl.conf file to the operating system configuration:
    sysctl -p
  • Use a text editor to edit the /etc/security/limits.conf file. Add the following definitions in the exact order that they are listed
     * soft nofile 2900000
     * hard nofile 2900000
     * soft nproc 131072
     * hard nproc 131072

Build with Prebuilt Docker Image

Apache HAWQ source code contains the Dockerfiles to help developers to setup building and testing environment with docker.

To use the docker image follow the steps on: https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker

Build optional extension modules

ExtensionHow to enablePre-build steps on Mac
PL/R./configure --with-r

#install R before build

brew tap homebrew/science

brew install r

PL/Python./configure --with-python 
PL/Java./configure --with-java 
PL/PERL./configure --with-perl 
pgcrypto./configure --with-pgcrypto --with-openssl 
gporca./configure --enable-orca 
rps./configure --enable-rpsbrew install tomcat@6


Install Hadoop

Please follow the steps here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

Note:

  • you might need to build hadoop from source on Red Hat/CentOS 6.x if the downloaded hadoop package has higher glibc version requirement. When that happens, you will probably see the warning below when running start-dfs.sh." WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform"
  • You will also need to set  the port for fs.defaultFS to 8020 in etc/hadoop/core-site.xml (The example above set it as 9000.)
  • HDFS is a must, but YARN is optional. YARN is only needed when you want to use YARN as the global resource manager.
  • must setup passphraseless ssh, otherwise there will be some problems of "hawq init cluster" in the following step.

Your need to verify your HDFS works.

Get the HAWQ code and Compile

Once you have an environment with the necessary dependencies installed and Hadoop is ready, the next step is to get the code and build HAWQ

Init/Start/Stop HAWQ

Connect and Run basic queries

Query external hadoop data

You will need to use PXF to query external hadoop/hive/hbase data. Refer to PXF Build & Install document.

Test HAWQ

Running catalog tidycat perl modules

The JSON Perl Module is required to run the set of Perl scripts (src/include/catalog).  The versioned JSON formatted catalog files are stored in tools/bin/gppylib/data/<version>.json.  In order to install the JSON module, the developer will need to make the module available from CPAN.  The following was validated on a Macbook Pro OS X 10.11.6 using the information from the Perl on Mac OSX section (http://www.cpan.org/modules/INSTALL.html).  Below you will see the session which performs the following steps:

  1. Validate JSON module is not in the environment.  Receive appropriate error message.
  2. Run cpan install JSON command to install the JSON Perl module.  In the example below, the module is installed locally (local::lib) and not in the system's Perl installation.
  3. Execute the environment variable updates added to the .bashrc file by the installation process.
  4. Validate the tidycat.pl command can now be run without receiving error.

Note:

  • JSON Module version 2.27 and the latest 2.90 have been used to validate they generate the proper catalog JSON formatted file.
  • The scripts are essentially validating the evaluation of *require JSON* passes otherwise the error message is displayed:

Fatal Error: The required package JSON is not installed -- please download it from www.cpan.org

 

  • No labels