Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • MADlib runs on Greenplum database (GPDB), HDB/HAWQ and PostgreSQL.  The last two versions of each of these database platforms are directly tested with each MADlib release.   Older versions of these database platforms probably work fine as well, but we don't test them anymore.

  • For MADlib 1.9.x10:

    • GPDB 4.23.x and 4.3.x.  Note that there is a different MADlib build for the ORCA query optimizer (GPDB 4.3.5+) than for prior versions without ORCA.

    • HDB/HAWQ 1.3.x and 2.0.x

    • PostgreSQL 9.4 5 and 9.56

  • OS support:

    • GPDB

      1. RHEL 5.5-5.7, 6.1, 6.2, 6.4, 6.5

      2. CentOS 5.5-5.7, 6.1, 6.2

      3. Oracle Unbreakable Linux 5.5

    • HDB/HAWQ

      1. RHEL 6.1, 6.2, 6.4, 6.5

      2. CentOS 6.1, 6.2

    • PostgreSQL

      1. RHEL 5.x, 6.x

      2. CentOS 5.x, 6.x

      3. Mac OSX 10.6+

    • Note on OS support:

      1. Ubuntu is not on the list of supported platforms for GPDB and HDB/HAWQ so it is not officially supported for MADlib.  However, people who have tried it report that it seems to work OK.

...

  • Data scientists mostly

  • Wide range of verticals including financial services, healthcare, retail, energy, manufacturing and government

Q1-5  What are the benefits of MADlib compared with products like R and

...

scikit-learn?

  • Performance

    • MADlib is a fully parallelized implementation on GPBD and HAWQ for large data sets, so it is far more performant than R or Python libraries.

  • Scalability

    • Add more nodes to achieve higher performance as your data scales.  R and Python libraries are limited by the amount of data you can load into memory on a single node.

    • Using all data, not a sample, can improve accuracy

  • Familiar, user friendly SQL interface

  • Ease of data preparation

    • Supports commonly used database data formats

...

  • There are very few differences and they are listed below.

  • K-means clustering

    • Can specify a user defined distance function for GPDB & PostgreSQL.  For HAWQ, does not support UDFs for distance, so restricted to the built-in distance functions provided.

  • Support vector machines (“early stage development” algorithm)

    • Can specify a user defined kernel for GPDB & PostgreSQL.  For HAWQ, does not support UDFs for kernel, so restricted to the 3 built in kernel functions (dot product, polynomial, Gaussian).

  • “Deprecated modules” quartile and profile have some minor differences and limitations between HAWQ and GPDB.  See the documentation for details.

...

  • Yes.  MADlib models can be exported in PMML format for use in scoring by a PMML evaluator.  

  • The following MADlib 1.9 10 algorithms can be exported in PMML format:

    • Linear regression

    • Logistic regression

    • GLM

    • Multinomial regression

    • Ordinal regression

    • Decision trees

    • Random forest

...