PivotalR is a package that enables users of R, the most popular open source statistical programming language and environment, to interact with the Greenplum database,  HAWQ and PostgreSQL on large data sets. It does so by providing an interface to the operations on tables/views in the database.   

PivotalR is convenient for people who are familiar with R but have data sets that are too large for R.  They can use the familiar R syntax and gain from the massive scalability that MADlib® provides on GPDB and HAWQ.  Think of it as a wrapper around MADlib that translates R code into SQL to run on MPP databases.

All heavy lifting, including model computation, is done in the database.  A minimal amount of data is transferred between the database and the R client.

Here are some links to learn more about PivotalR:

What is the difference between PivotalR and PL/R?

  • PivotalR is a client side package that enables connectivity to backend MPP platforms through the R language, with capabilities to call backend statistics libraries such as MADlib to provide parallel capabilities.  In other words, it translates R code into SQL which feeds into GPDB/HAWQ for execution.

  • PL/R is a PostgreSQL loadable language that allows developers to write functions/triggers in the R programming language.  PL/R functionality is initiated from SQL (GPDB/HAWQ function) and executed in R on each GPDB/HAWQ segment.

As of the MADlib 1.8.x release, the following algorithms are supported:

 

Category

Algorithm

Generalized Linear Models

Linear Regression

Generalized Linear Models

Logistic Regression

Generalized Linear Models

Elastic Net Regularization

Generalized Linear Models

Lasso Regression

Generalized Linear Models

Ridge Regression

Generalized Linear Models

Marginal Effects

Generalized Linear Models

Probit regression

Generalized Linear Models

Poisson regression

Generalized Linear Models

Gamma regression

Cross Validation

Cross Validation

Descriptive Statistics

Summary

Support Modules

Array Operations (some)

Time Series Analysis

ARIMA

Tree Methods

Decision Tree

Tree Methods

Random Forest

Topic ModelingLatent Dirichlet Allocation

MADlib Early Stage Development

Linear Algebra Operations (some)

Supervised LearningSupport Vector Machines
Clusteringk-Means
... 
your 
contribution 
here 
... 

We are actively looking for contributors to add more PivotalR modules to this list.




  • No labels