PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment, to interact with the Greenplum database, HAWQ and PostgreSQL on large data sets. It does so by providing an interface to the operations on tables/views in the database.
PivotalR is convenient for people who are familiar with R but have data sets that are too large for R. They can use the familiar R syntax and gain from the massive scalability that MADlib® provides on GPDB and HAWQ. Think of it as a wrapper around MADlib that translates R code into SQL to run on MPP databases.
All heavy lifting, including model computation, is done in the database. A minimal amount of data is transferred between the database and the R client.
Here are some links to learn more about PivotalR:
Github
CRAN package
Pivotal blog posts
http://blog.pivotal.io/pivotal/products/introducing-r-for-big-data-with-pivotalr
http://blog.pivotal.io/pivotal/products/how-to-20-minute-guide-to-get-started-with-pivotalr
Technical paper
What is the difference between PivotalR and PL/R?
PivotalR is a client side package that enables connectivity to backend MPP platforms through the R language, with capabilities to call backend statistics libraries such as MADlib to provide parallel capabilities. In other words, it translates R code into SQL which feeds into GPDB/HAWQ for execution.
PL/R is a PostgreSQL loadable language that allows developers to write functions/triggers in the R programming language. PL/R functionality is initiated from SQL (GPDB/HAWQ function) and executed in R on each GPDB/HAWQ segment.
As of the MADlib 1.8.x release, the following algorithms are supported:
Category | Algorithm |
---|---|
Generalized Linear Models | Linear Regression |
Generalized Linear Models | Logistic Regression |
Generalized Linear Models | Elastic Net Regularization |
Generalized Linear Models | Lasso Regression |
Generalized Linear Models | Ridge Regression |
Generalized Linear Models | Marginal Effects |
Generalized Linear Models | Probit regression |
Generalized Linear Models | Poisson regression |
Generalized Linear Models | Gamma regression |
Cross Validation | Cross Validation |
Descriptive Statistics | Summary |
Support Modules | Array Operations (some) |
Time Series Analysis | ARIMA |
Tree Methods | Decision Tree |
Tree Methods | Random Forest |
Topic Modeling | Latent Dirichlet Allocation |
MADlib Early Stage Development | Linear Algebra Operations (some) |
Supervised Learning | Support Vector Machines |
Clustering | k-Means |
... | |
your | |
contribution | |
here | |
... |
We are actively looking for contributors to add more PivotalR modules to this list.