When compiling Greenplum from source, there are known issues related to changing the PYTHONPATH environment variable after compilation. In order for the deep learning features of MADlib to function properly, it requires keras, tensorflow, and all their dependencies to be installed in the same python directory that was set in the PYTHONPATH environment variable before compiling Greenplum. If a new directory is added to PYTHONPATH later, it will not get reflected on the segments unless Greenplum is recompiled and restarted.
NOTE: If Greenplum is installed using gppkg or another binary package and the PYTHONPATH is set as default, users should be able to `pip install` keras(2.2.4), tensorflow(1.14) and all other dependencies in the appropriate location, that would be used by MADlib deep learning functions.
- Support for keras/tensorflow on Greenplum/Postgres on CentOS 6:
Currently, defacto, CentOS 6 comes with glibc 2.12, while, Tensorflow installation requires at least glibc 2.17. MADlib Deep Learning module on CentOS 6, requires installing Keras and Tensorflow which might need compiling glibc from source. Having a higher version of glibc with Greenplum 5 may impact database behavior.
- GPU memory management:
- Canceling execution of all deep learning operations on GPU intermittently, may not release GPU memory. However, logging out of the psql session will release all the memory.
- GPU memory cannot be released within the same session, even though the query finishes, eg., if a madlib_keras_fit(), with the argument gpus_per_host>=1, starts execution in a psql session(S1), it will use the underlying GPU memory, but once the query finishes successfully, the GPU memory will not be released (see https://github.com/keras-team/keras/issues/9379 for more info).
- It is advisable to logout of the current psql session when switching from using CPU to GPU for computation or vice versa. Internally in the code, the CUDA environment variable `CUDA_VISIBLE_DEVICES` is set based on the gpus_per_host flag. Once this variable is set to -1 (disable GPU), there is no way to reset it to using GPU and that session will always use only CPU.
- Recommended configuration for GPUs setup: 1 GPU available per segment. If the number of GPUs per segment host is less than the number of segments per segment host, different segments share the same GPU, which may fail in some scenarios.
- Recommended format when specifying metric, optimizer and loss values in compile_params argument: loss=mean_squared_error
Currently, MADlib does not support the format importing individual loss functions, like, loss=losses.mean_squared_error.
Keras's JSON serialization seems not to be compatible for different versions of Keras. For example, we had an issue loading a local Keras 2.2.4 model to the cluster, which had 2.1.6 installed.