Goal / Scope

1.17.0 release with new functionality and bug fixes.  Please refer to the Deep Learning section for important information about the deep learning feature.

Release Artifacts

Download the 1.17.0 release

Source code, rpm, deb and dmg binaries are posted.

Date of release:  April 9, 2020

JIRAs associated with the 1.17.0 release

Release Notes

MADlib v1.17.0:

Release Date: April 9, 2020

New features
- DL: Add optional params to madlib_keras_fit_multiple_model (MADLIB-1397)
- DL: Fit and evaluate changes for asymmetric cluster config (MADLIB-1393)
- DL: Make param search fit() function work with existing evaluate and predict (MADLIB-1387)
- DL: ParamSearch: Add utility function for generating model selection table (MADLIB-1375)
- DL: Predict changes for asymmetric cluster config (MADLIB-1394)
- DL: Preprocessor should evenly distribute data on an arbitrary number of segments (MADLIB-1378)
- DL: Preprocessor support for asymmetric segment distribution (MADLIB-1392)
- DL: Remove model_arch_table column from the output of load_model_selection_table (MADLIB-1381)
- DL: Support DL predict without training on MADlib (MADLIB-1359)
- DL: Transfer learning for multi-model (MADLIB-1389)
- Kmeans: Add simple silhouette score for every point (MADLIB-1382)
- Kmeans: Select number of centroids in k-means (MADLIB-1380)
- PostgreSQL 12 support (MADLIB-1391)

Improvements:
- Assoc rules: Add option to set number of posterior in association rules (MADLIB-1327)
- Correlation: Improve correlation and covariance memory usage with large number of groups (MADLIB-1301)
- DL: helper function for asymmetric cluster config (MADLIB-1390)
- DL: Mini-batch preprocessor for images - performance issue (MADLIB-1342)
- DL: Modify warm start logic for DL to handle case of missing weight (MADLIB-1400)
- DL: Param search for multiple models on MPP architecture (MADLIB-1386)
- DL: performance improvements to fit transition function (MADLIB-1418)
- Docs: Enhance Installation Guides (MADLIB-1399)
- Graph: SSSP should not show vertices in output table that are unreachable (MADLIB-1415)
- Knn - add zero check and output distance array (MADLIB-1370)
- LDA: Add stopping criteria on perplexity to LDA (MADLIB-1351)
- Summary: Last optional param in summary errors when NULL (MADLIB-1413)
- Summary: Summary function has dups for MFV for approximate results (MADLIB-1412)
- SVM: Change default num_components for SVM to max(100, 2*num_features) (MADLIB-1384)

Bug fixes:
- DL: Deep Learning module does not work with tables in non-public schemas (MADLIB-1388)
- DL: Exception during madlib_keras_fit when model_arch_id is passed as NULL (MADLIB-1371)
- DL: fit and fit multiple fail with memory exception in gpdb6 (MADLIB-1405)
- DL: fit multiple takes up unnecessary disk space (MADLIB-1406)
- DL: Intermediate tables are not dropped (MADLIB-1404)
- DL: MADlib Keras operations create too many threads (MADLIB-1372)
- DL: metrics_elapsed_time for fit multi_model not captured correctly (MADLIB-1403)
- DL: predict fails with OOM in gpdb6 (MADLIB-1414)
- DL: Remove final function for fit multiple (MADLIB-1416)
- DL: Support schema qualified output tables for fit and fit_multiple (MADLIB-1417)
- Graph: APSP fails if both vertex id column and edge src column has the same name (MADLIB-1407)
- Graph: ASPS Path Function fails if src or dest column type is bigint (MADLIB-1408)
- Graph: Graph/wcc fails if the user specifies a schema for the output table (MADLIB-1411)
- Kmeans: k-means related functions must use same default distance function (MADLIB-1383)
- LDA: Term frequency and LDA - turn off notices (MADLIB-1395)
- MADlib cannot be built on PowerPC machines with Linux (MADLIB-1410)
- Pivot: Pivot documentation should say "out_table" instead of "output_table" (MADLIB-1376)

Other:
- DL: Support up to Keras version 2.2.4, Tensorflow version 1.14
- DL: If 'madlib_keras_fit_multiple_model()' is running on GPDB 5 and some versions of GPDB 6, the database will keep adding to the disk space (in proportion to model size) and will only release the disk space once the fit multiple query has completed execution. This is not the case for GPDB 6.5.0+ where disk space is released during the fit multiple query.
- DL: pg_temp is not allowed as an output table schema for madlib_keras_fit_multiple_model().
- DL: CUDA GPU memory cannot be released until the process holding it is terminated. This process holds the GPU memory until one of the following two things happen: query finishes and user logs out of the Postgres client/session; or, query finishes and user waits for the timeout set by `gp_vmem_idle_resource_timeout`. The default value for this timeout in Greenplum is 18 sec, but it can be changed.
- Build: Enable current versions of bison
- Build: Add cmake variable for gppkg filename
- Build: Add pull request template



  • No labels