To set up PostgreSQL and MADlib with Anaconda Python on OSX, follow the super quick start.  Otherwise, follow the regular guides for installing from binaries or compiling from source.

For developers, you may want to use the Docker image described in the Developer Guide.

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed at the bottom of this page in the section called Release Specific Installations.

MADlib requires python version 2.7. Currently, Python 3.x is not supported.

Currently supported database versions: Please see this page for supported databases and OS

The following python libraries are required for their associated modules:

Deep Learning: dill, grpcio==1.39.0, protobuf==3.17.3, hyperopt==0.2.5, tensorflow == 1.14, scikit-learn==0.19

XGBoost: pandas, xgboost==0.82

KNN: scipy==1.2.1

Unit tests: pgsanity

Super Quick Start

To set up PostgreSQL + MADlib with Anaconda Python on OSX: 

Quick Start With Binaries

Prerequisites

Install and configure your database of choice. MADlib currently supports the following platforms:

MADlib requires the GNU M4 Unix macro processor which must be present for installation to succeed.

If the environment variables listed below are defined, it can save you some typing.

Postgres platform notes:

/usr/local/madlib/bin/madpack -s madlib -p postgres install
madpack.py : INFO : Detected PostgreSQL version 9.5.
madpack.py : INFO : *** Installing MADlib ***
madpack.py : INFO : MADlib tools version = 1.9.1 (//usr/local/madlib/Versions/1.9.1/bin/../madpack/madpack.py)
madpack.py : INFO : MADlib database version = None (host=localhost:5432, db=postgres, schema=madlib)
madpack.py : INFO : Testing PL/Python environment...
madpack.py : INFO : > Creating language PL/Python...
madpack.py : ERROR : SQL command failed:
SQL: CREATE LANGUAGE plpythonu;
ERROR: could not access file "$libdir/plpython2": No such file or directory
madpack.py : ERROR : Cannot create language plpythonu. Please check if you
                have configured and installed portid (your platform) with
                `--with-python` option. Stopping installation...
madpack.py : ERROR : MADlib installation failed

Installing MADlib

  1. Download the MADlib binary
  2. Install the package.
    1. Postgres:
      • on OSX double click the installer package
      • on Redhat / CentOS run the following as root:

        yum install <madlib_package> --nogpgcheck

        or

        rpm -i <madlib_package>


    2. Greenplum:

      • on Redhat / CentOS run the following as gpadmin:

        gppkg -i <madlib_package>


    3. NOTE: if you are using an rpm package on a CentOS 5 system, please add --no-deps flag to the command.
  3. Ensure that the environment is setup for your database deployment and that the database is up and running.
  4. Run the MADlib deployment utility to deploy MADlib into each database that you want to use it:
  5. After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions. Otherwise, users will get "ERROR: permission denied for schema MADlib."  Also, install checks (see next step below) will fail if CREATE TEMP TABLE privileges are not granted on the schema where MADlib is installed. See the PostgreSQL docs for information on schemas and privileges.

  6. Test your installation

Installing from PGXN (PostgreSQL)

Prerequisites

Requirements for installing MADlib:


Use below command to install and load the latest MADlib package uploaded on PGXN.  

pgxn install madlib
pgxn load madlib 

 If you see the following error, it's likely that you are using Parallel Execution flags for make. 

[ 86%] Performing build step for 'EP_boost'
Ignored: make
[ 86%] Performing install step for 'EP_boost'
Ignored: make
[ 86%] Completed 'EP_boost'
[ 86%] Built target EP_boost
make[1]: *** [all] Error 2
make: *** [all] Error 2
ERROR: command returned 2: make PG_CONFIG=/usr/local/pg10/bin/pg_config all

You can run this as a workaround:

MAKEFLAGS='-j1' pgxn install madlib
pgxn load madlib 

Or, if you want to use parallel execution, you can also install Boost 1.60 yourself, and tell cmake where to find it.

For example, on OSX that looks like this:


brew install boost@1.60
export BOOST_INCLUDEDIR=/usr/local/opt/boost@1.60/include/

Compiling From Source

Prerequisites

Requirements for installing MADlib:

Installing MADlib

In the $MADLIB_ROOT directory (location of the MADlib source) run the following commands:

mkdir build 
cd build 
cmake .. 
make -j8 # if this causes issues, switch back to a plan `make`

Above, we built the executables in the build folder. This can, however, be any user-named folder (henceforth called $BUILD_ROOT).

Deploying MADlib

Deploy MADlib into the database with MADlib package manager madpack located under $BUILD_ROOT/src/bin.

Run the MADlib deployment utility to install MADlib into each database that you want to use it:

Defining environment variables

The variables below will be automatically used by the madpack installer if no connection string is provided:

  1. User: PGUSER or USER (defaults to OS username)
  2. Password: PGPASSWORD (defaults to empty)
  3. Host: PGHOST (defaults to 'localhost')
  4. Database: PGDATABASE (defaults to OS username)
  5. Port: PGPORT (defaults to 5432)

An example of deploying MADlib using the environment variables:

export PGPORT=5430
export PGHOST=127.0.0.1
export PGDATABASE=madlibtest
$BUILD_ROOT/src/bin/madpack -p postgres install

Defining GPDB variables

The variables below can be set in GPDB in case memory-related issues show up. Feel free to adjust them based on the specifics of the installed system.

set max_statement_mem='50GB';
set statement_mem='50GB';
set memory_spill_ratio=80;
set gp_resqueue_memory_policy=auto;
set work_mem='4GB';
set gp_vmem_protect_limit=20000

Upgrading MADlib gppkg

  1. Download the MADlib binary

Upgrade MADlib gppkg.

Release Specific Installations

Sometimes there are release specific variations of the installation procedures.  These exceptions are listed in this section.

06/27/19 - Upgrading MADlib from 1.15

Currently, upgrading the rpm from 1.15 using rpm -U does not work due to a change in the rpm post uninstall script in MADlib version 1.15.1.  Below are the steps to follow to upgrade from MADlib version 1.15:

  1. Remove existing MADlib rpm (this does not affect the database in any way)
    rpm -e <madlib 1.15 package name>
  2. Remove old MADlib files
    rm -rf /usr/local/madlib/Versions
  3. Install the MADlib 1.15.1 or 1.16 rpm
    rpm -i <madlib 1.16 package name>
  4. Upgrade the MADlib deployment in the database
    madpack -p <platform> -c <connection> upgrade

01/11/18 - Upgrading MADlib to 1.13

The upgrade to v1.13 has a minor problem with some leftover functions. The issue can be fixed with the following commands before running the regular madpack upgrade command.

psql <<DB_NAME>> -c "DROP FUNCTION IF EXISTS <<SCHEMA>>.knn(VARCHAR);"
psql <<DB_NAME>> -c "DROP FUNCTION IF EXISTS <<SCHEMA>>.knn();"

<<DB_NAME>> denotes the name of the database.

<<SCHEMA>> denotes the name of the madlib schema.

We have also attached a script to this wiki page  called 'fix_upgrade.sh' that you can use.

11/30/16 - Installation of MADlib 1.9.1 on GPDB 4.3.11

The procedure exactly the same as described below for installation of MADlib on GPDB 4.3.10 .

10/19/16 - Installation of MADlib 1.9.1 on GPDB 4.3.10

This is an important note for installation of MADlib on GPDB 4.3.10.  It does not apply to any other releases.

1) Fix madpack install utility
* issue: After gppkg installation MADlib, you must run the script 
fix_madpack.sh BEFORE running the madpack utility (see below).  The script is downloadable from the Pivotal Network.

2) install checks
* issue: some failures may happen on MADlib install checks,  however the MADlib install actually completed OK.

This is a poor customer experience that will be fixed in the next release. On the positive side, once the installation is done, MADlib should work OK.

------------------------------

More on fixing madpack from #1 above:

After gppkg installation MADlib, you must run the script 
fix_madpack.sh BEFORE running the madpack utility.
The syntax for fix_madpack.sh is below.

This can be somewhat confusing because after gppkg
installation, you will get a message on the console
that says:

“Please run the following command to deploy MADlib
usage: madpack install [-s schema_name] -p hawq -c user@host:port/database
etc...”

So the correct order of operations is:

1. gppkg install of MADlib
2. run fix_madpack.sh
3. run madpack utility

*****************************************************
COMMAND NAME: fix_madpack.sh
*****************************************************

Script to fix a MADlib installation issue on GPDB 4.3.10.

This script patches a line in madpack.py, the MADlib installation
script. A backup of the original file is created in the same folder as
madpack.py called 'madpack.py.orig'.  The script is downloadable from the Pivotal Network.

*****************************************************
SYNOPSIS
*****************************************************

fix_madpack.sh [--prefix <MADLIB_INSTALL_PATH>]

fix_madpack.sh -h


*****************************************************
PREREQUISITES
*****************************************************

The following tasks should be performed prior to executing this script:

* Set $GPHOME to the correct GPDB installation directory containing MADlib
OR
* Set MADlib installation path using the --prefix option


*****************************************************
OPTIONS
*****************************************************

--prefix <MADLIB_INSTALL_PATH>
Optional. Expected MADlib installation path. If not set, the default value
${GPHOME}/madlib is used.

-h | -? | --help
Displays the online help.


*****************************************************
EXAMPLE
*****************************************************

/home/gpadmin/madlib/fix_madpack.sh --prefix /usr/local/gpdb/madlib