MADlib® is an open-source library for scalable in-database analytics.
It provides data-parallel implementations of mathematical, statistical,
graph and machine learning methods for structured and unstructured data.
Quick Start Guides
Get going with a minimum of fuss.
Learn about MADlib.
- MADlib website
- Jupyter notebooks for many MADlib algorithms
- MADlib YouTube channel including step-by-step guides for common algorithms
- Module and algorithm documentation
Contribute to the project.
- Source code repo
- Contribution Guidelines
- Documentation Guide (Doxygen)
- Ideas for contribution
- Algorithm technical design document
See how the pieces fit together.
See what has been released.
Third Party Components
MADlib incorporates material from the following third-party components:
argparse 1.2.1provides an easy, declarative interface for creating command line tools
Boost 1.47.0 (or newer)provides peer-reviewed portable C++ source libraries
Eigen 3.2.2is a C++ template library for linear algebra
PyYAML 3.10is a YAML parser and emitter for Python
PyXB 1.2.4is a Python library for XML Schema Bindings
- Porter2 stemmer reduces workds to common roots for comparison and operating on.
- UseLATEX.cmake contains CMAKE commands to use the LaTeX compiler
License information regarding MADlib and included third-party libraries can be found inside the license directory.
MAD Skills : New Analysis Practices for Big Data (VLDB 2009)
Hybrid In-Database Inference for Declarative Information Extraction (SIGMOD 2011)
Towards a Unified Architecture for In-Database Analytics (SIGMOD 2012)
The MADlib Analytics Library or MAD Skills, the SQL (VLDB 2012)
PivotalR - lets the user run the functions of the open-source big-data machine learning package MADlib directly from R.
- PyMADlib - a nascent Python wrapper for MADlib, which brings you the power and flexibility of python with the number crunching power of MADlib.