...
To set up PostgreSQL and MADlib with Anaconda Python on OSX, follow the super quick start. Otherwise, follow the regular guides for installing from binaries or compiling from source.
For developers, you may want to use the Docker image described in the Developer Guide.
Sometimes there are release specific variations of the installation procedures. These exceptions are listed at the bottom of this page in the section called Installation Guide for MADlib 1.X Release Specific Installations.
MADlib requires python version 2.7. Currently, Python 3.x is not supported.
...
KNN: scipy==1.2.1
Unit tests: pgsanity
Anchor | ||||
---|---|---|---|---|
|
To set up PostgreSQL + MADlib with Anaconda Python on OSX:
PYTHON=/Users/janedoe/anaconda/bin/python
- Install Postgres with the Python extension specified (i.e., --with-python), as described here in the PostgreSQL documentation. Note that previously you could install postgres with python support using brew by running '
brew install postgresql
--with-python
' but passing the '--with-python
' flag is not supported anymore. Set up database and roles
Install the .dmg of latest madlib downloaded from MADlib website https://madlib.apache.org/download.html
- /usr/local/madlib/bin/madpack -s madlib -p postgres install
Anchor | ||||
---|---|---|---|---|
|
Prerequisites
Install and configure your database of choice. MADlib currently supports the following platforms:
...
MADlib requires the GNU M4 Unix macro processor which must be present for installation to succeed.
If the environment variables listed below are defined, it can save you some typing.
...
Code Block |
---|
/usr/local/madlib/bin/madpack -s madlib -p postgres install madpack.py : INFO : Detected PostgreSQL version 9.5. madpack.py : INFO : *** Installing MADlib *** madpack.py : INFO : MADlib tools version = 1.9.1 (//usr/local/madlib/Versions/1.9.1/bin/../madpack/madpack.py) madpack.py : INFO : MADlib database version = None (host=localhost:5432, db=postgres, schema=madlib) madpack.py : INFO : Testing PL/Python environment... madpack.py : INFO : > Creating language PL/Python... madpack.py : ERROR : SQL command failed: SQL: CREATE LANGUAGE plpythonu; ERROR: could not access file "$libdir/plpython2": No such file or directory madpack.py : ERROR : Cannot create language plpythonu. Please check if you have configured and installed portid (your platform) with `--with-python` option. Stopping installation... madpack.py : ERROR : MADlib installation failed |
Installing MADlib
- Download the MADlib binary
- For Postgres: OS X and Linux binaries can be found on the MADlib download page
- For Greenplum: Linux .gppkg binaries can be found on Pivotal Network in the "Greenplum Advanced Analytics Group"
- NOTE: the above .gppkg binaries work for both open and closed source Greenplum and can be downloaded by anybody (after creating a Pivotal Network account)
- Install the package.
- Postgres:
- on OSX double click the installer package
on Redhat / CentOS run the following as root:
Code Block yum install <madlib_package> --nogpgcheck
or
Code Block language bash rpm -i <madlib_package>
Greenplum:
on Redhat / CentOS run the following as gpadmin:
Code Block language bash gppkg -i <madlib_package>
- NOTE: if you are using an rpm package on a CentOS 5 system, please add --no-deps flag to the command.
- Postgres:
- Ensure that the environment is setup for your database deployment and that the database is up and running.
Ensure that psql, postgres, and pg_config are in your path
Code Block language bash which psql postgres pg_config
Ensure that the database is started and running
Code Block language bash psql -c 'select version()'
The above may need user/port/password setting depending on how the database has been configured.
- Run the MADlib deployment utility to deploy MADlib into each database that you want to use it:
Postgres:
Code Block language bash /usr/local/madlib/bin/madpack -s madlib –p postgres install
if environment variables are defined. Otherwise use a fully defined connection string:
Code Block language bash /usr/local/madlib/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install
Greenplum Database:
Code Block language bash /usr/local/madlib/bin/madpack –p greenplum install
The above may need user/port/password setting depending on how the database has been configured.
Run the MADlib madpack deployment utility to install MADlib into each database that you want to use it in:
After installation gpadmin should grant all privileges on schema madlib to users who will be accessing MADlib functions. Otherwise, users will get "ERROR: permission denied for schema MADlib." Also, install checks (see next step below) will fail if CREATE TEMP TABLE privileges are not granted on the schema where MADlib is installed. See the PostgreSQL docs for information on schemas and privileges.
Test your installation
Postgres:
Code Block language bash /usr/local/madlib/bin/madpack -s madlib –p postgres install-check
Greenplum Database:
Code Block language bash /usr/local/madlib/bin/madpack –p greenplum install-check
The above may need user/port/password setting depending on how the database has been configured.
Please note that if the optimizer_control GUC is set to off in Greenplum, the following install checks will fail, and these MADlib functions will not work: decision tree, random forest, LDA , k-Means, PMML export for decision tree, PMML export for random forest. This will be fixed in a future release (MADLIB-1109). The parameter optimizer_control controls whether the server configuration parameter optimizer can be changed. The parameter optimizer controls whether the GPORCA optimizer is enabled when running SQL queries.
Anchor | ||||
---|---|---|---|---|
|
Prerequisites
Requirements for installing MADlib:
...
Code Block | ||
---|---|---|
| ||
brew install boost@1.60 export BOOST_INCLUDEDIR=/usr/local/opt/boost@1.60/include/ |
Anchor | ||||
---|---|---|---|---|
|
Prerequisites
Requirements for installing MADlib:
- gcc and g++
- For OS X, Clang will work for compiling the source, but not for the documentation. To compile on newer versions of XCode we need to enable the CXX11 flag. Setting
-DCXX11=1
during cmake, will auto-download Boost 1.75.0 if Boost > 1.65.0 is not found on the system.
Note: Setting-DCXX11=1
will enable C++11, which is not fully supported, i.e, MADlib compiles but some install-check/dev-check tests may fail.
- For OS X, Clang will work for compiling the source, but not for the documentation. To compile on newer versions of XCode we need to enable the CXX11 flag. Setting
- python 2.6 or 2.7
- python 3.x is not currently supported by MADlib.
- cmake
- NOTE: the latest version of cmake might cause issues. Please try cmake 3.5.2 in case you get an error or a segmentation fault.
- NOTE: On Centos 6 (possibly other Linux variants), we have seen occasions where cmake will have issues running (seg fault) if the greenplum_path.sh file has been sourced prior to the cmake execution. If you encounter issues, you can use ldd on the cmake executable to confirm dynamic libraries are picked up from the Greenplum installation directories. If this is the case, start a new shell in which the greenplum_path.sh file is not sourced in your current running shell session. You can reference MADLIB-1093 for additional details.
- An installed version of Greenplum Database or PostgreSQL (64-bit) 9.2+ with plpython support enabled.
- NOTE: plpython may not be enabled in Postgres by default.
Installing MADlib
In the $MADLIB_ROOT
directory (location of the MADlib source) run the following commands:
...
Above, we built the executables in the build
folder. This can, however, be any user-named folder (henceforth called $BUILD_ROOT
).
Deploying MADlib
Deploy MADlib into the database with MADlib package manager madpack
located under $BUILD_ROOT/src/bin
.
...
Postgres:
Code Block language bash $BUILD_ROOT/src/bin/madpack -s madlib –p postgres install
if environment variables are defined. Otherwise use a fully defined connection string:
Code Block language bash $BUILD_ROOT/src/bin/madpack -s madlib -p postgres -c [user[/password]@][host][:port][/database] install
Greenplum Database:
Code Block language bash $BUILD_ROOT/src/bin/madpack –p greenplum install
The above may need user/port/password setting depending on how the database has been configured.
To install:
Code Block language bash $BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install
To make sure that the installation is successful:
Code Block language bash $BUILD_ROOT/src/bin/madpack -p postgres -c [user[/password]@][host][:port][/database] install-check
For more information on the usage of madpack:
Code Block language bash $BUILD_ROOT/src/bin/madpack --help
Anchor | ||||
---|---|---|---|---|
|
The variables below will be automatically used by the madpack
installer if no connection string is provided:
...
Code Block | ||
---|---|---|
| ||
export PGPORT=5430 export PGHOST=127.0.0.1 export PGDATABASE=madlibtest $BUILD_ROOT/src/bin/madpack -p postgres install |
Anchor | ||||
---|---|---|---|---|
|
The variables below can be set in GPDB in case memory-related issues show up. Feel free to adjust them based on the specifics of the installed system.
Code Block | ||
---|---|---|
| ||
set max_statement_mem='50GB'; set statement_mem='50GB'; set memory_spill_ratio=80; set gp_resqueue_memory_policy=auto; set work_mem='4GB'; set gp_vmem_protect_limit=20000 |
Anchor | ||||
---|---|---|---|---|
|
- Download the MADlib binary
- Greenplum database : Download the .gppkg binary from Pivotal Network
...
- Greenplum Database:
Upgrading gppkg to a higher version of MADlib:
For example, upgrading from 1.15.1 to 1.16on Redhat / CentOS run the following as gpadmin:
Code Block language bash gppkg -u <madlib_package_upgrading_to>
Upgrade the MADlib deployment in the database
Code Block language bash madpack -p <platform> -c <connection> upgrade
Upgrading gppkg for the same version of MADlib:
For example, upgrading from madlib_gppkg_1.16+1 to madlib_gppkg_1.16+2on Redhat / CentOS run the following as gpadmin:
Code Block language bash gppkg -u <madlib_package_upgrading_to>
MADlib deployment in the database does not need to be upgraded as the MADlib version has not changed.
Anchor | ||||
---|---|---|---|---|
|
Sometimes there are release specific variations of the installation procedures. These exceptions are listed in this section.
06/27/19 - Upgrading MADlib from 1.15
Currently, upgrading the rpm from 1.15 using rpm -U does not work due to a change in the rpm post uninstall script in MADlib version 1.15.1. Below are the steps to follow to upgrade from MADlib version 1.15:
- Remove existing MADlib rpm (this does not affect the database in any way)
rpm -e <madlib 1.15 package name> - Remove old MADlib files
rm -rf /usr/local/madlib/Versions - Install the MADlib 1.15.1 or 1.16 rpm
rpm -i <madlib 1.16 package name> - Upgrade the MADlib deployment in the database
madpack -p <platform> -c <connection> upgrade
01/11/18 - Upgrading MADlib to 1.13
The upgrade to v1.13 has a minor problem with some leftover functions. The issue can be fixed with the following commands before running the regular madpack upgrade command.
...
We have also attached a script to this wiki page called 'fix_upgrade.sh' that you can use.
11/30/16 - Installation of MADlib 1.9.1 on GPDB 4.3.11
The procedure exactly the same as described below for installation of MADlib on GPDB 4.3.10 .
10/19/16 - Installation of MADlib 1.9.1 on GPDB 4.3.10
This is an important note for installation of MADlib on GPDB 4.3.10. It does not apply to any other releases.
...