Goal / Scope

Final 1.11 release with new functionality and bug fixes.

Release Artifacts

Download the 1.11 release.

Source code, rpm and dmg binaries are posted.

Date of release:  May 16, 2017

JIRAs associated with the 1.11 release

Release Notes

MADlib v1.11
Release Date: 2017-MAY-16
New features:
* New module: Graph - PageRank
- Implements the original PageRank algorithm that assumes a random surfer model (MADLIB-1069)
- Grouping support is included for PageRank (MADLIB-1082)
* Graph - Single Source Shortest Path (SSSP): Add grouping support (MADLIB-1081)
* Pivot: Add support for array and svec output types (MADLIB-1066)
* DT and RF:
- Change default values for 2 parameters (max_depth and num_splits)
- Reduce memory footprint: Assign memory only for reachable nodes (MADLIB-1057)
- Include rows with NULL features in training (MADLIB-1095)
- Update error message for invalid parameter specification (num_splits)
* Array Operations: Add function to unnest 2-D arrays by one level into rows of 1-D arrays (MADLIB-1086)
* Build process on Apache infrastructure (MADLIB-920, MADLIB-1080)
* Updates for Apache Top Level Project readiness (MADLIB-1022, MADLIB-1076, MADLIB-1077, MADLIB 1090)
* Support for GPDB 5.0
Bug fixes:
* DT and RF:
- Fix accuracy issues related to integer categorical variables and tree depth
- Improve visualization of tree(s)
* Elastic Net:
- Fix install check on GPDB 5.0 and HAWQ 2.2 (MADLIB-1088)
- Fix inconsistent results with grouping (MADLIB-1092)
* PCA:
- Fix install check
Other:
- PMML: Skip install check when run without the ‘-t’ option (MADLIB-1078)
- Multiple user documentation improvements
Known issues:
* The threshold parameter for PageRank is supposed to be 1/(num_of_vertices * 100)
as mentioned in the user documentation. However, in this release the default value
has been set to 1e-5. This causes PageRank to converge after only 1 iteration for
graphs larger than one million nodes or so. As a workaround, it is advised to set
the threshold to lower than (1/number of vertices) for graphs that contain
a hundred thousand or more vertices, or alternatively, set it to 0. Setting
threshold to 0 forces the algorithm to run the full max_iter number of iterations.
This issue will be fixed in the next release (MADLIB-1100)
* PageRank has a hard-coded schema name ("madlib") in the install check which 
leads to failure in install check if madlib is installed in a non-default schema location.

 

 

 

 

  • No labels