Title
MXNet Scala uses native memory to manage NDArray, Symbol, Executor, DataIterators using the MXNet c_api. C APIs provide appropriate interfaces to create, access and free these objects MXNet Scala has corresponding Wrappers and APIs which have pointer references to the native memory.
Current JVM users(Scala/Clojure/Java..) of Apache MXNet have to manage MXNet objects manually using the dispose pattern, there are a few usability problems with this approach.
With this change, MXNet-Scala users will be able to use MXNet Objects in 3 different ways:
1) Use MXNet Objects like regular Java Objects and let the ResourceHandler deal with deAllocating Off-Heap memory. The user needs to selecting the right DeAllocation strategy. ie., periodically run System.gc()/run System.gc() when off-heap memory reaches a certain threshold or Let JVM decide when to run Garbage Collection if it feels pressure on JVM heap.
This approach may not be sufficient for cases where memory intensive objects such as NDArrays are greedily allocated and the time interval for Garbage Collection has not elapsed or when the call to (system.gc) is not honored.
2) Use ResourceScope. ResourceScope collects all MXNet Object Off-heap pointer references within a block scope and releases them at the end of the scope.
This follows the try-with-resources paradigm in Java7+ and is similar to JavaCPP's PointerScope and NDArrayCollector that YiZhi Liu has implemented, but is enhanced to handle MXNet Objects in a generic way.
Since this uses ThreadLocal to manage the scopes, it is not a thread-safe in an application such as Producer-Consumer where the Producer might leave before the consumer has consumed the object and is unavailable.
Scala does not support try-with-resources, so an alternate method .using method is made available on ResourceScope. At the end of the using method, the NativeResources within the block Scope will be disposed. This approach is suggested by Martin Odersky( Creator of Scala in Slide 21 from FOSDEM 2009 )
ResourceScope.using(new ResourceScope()) { var r = NDArray.ones(Shape(2,2)) var r1 = NDArray.ones(Shape(50,50)) } |
Not tested on Java
try (ResourceScope rs = new ResourceScope()) { NDArray r: NDArray.ones(Shape(2,2)) NDArray r1: NDArray.ones(Shape(50,50)) } |
3) Call dispose() explicitly at the end of using each MXNet object. This is the current experience in MXNet aside from NDArrayCollector.
val nd: NDArray = NDArray.ones(Shape(2,2)) nd.dispose() |
For Automatic Resource Management, a NativeResource Interface is used that maintains references to nativeAddress and NativeDeAllocator Address of the MXNet Object.
An Object of type NativeResourceRef tracks NativeResource using PhantomReference.
MXNet Objects extend NativeResource and updates nativeAddress, bytesAllocated & nativeDeAlloctor during the object Creation and calls NativeResource.register to register a phantomRef.
When the Garbage Collector runs and finds that NativeResource Object is not reachable, it adds the tracking PhantomReference to the Reference Queue specified. A Separate Cleanup thread waits on a Reference Queue's remove Blocking method and releases the Native memory upon notification.
The problem with this approach is native Objects are released only after Garbage Collection has determined that the Object is not reachable, For the garbage collection to run, the GC Subsystem has to feel pressure on the JVM Heap, however in MXNet Scala most MXNet objects are allocated in Native Memory and GC does not run as frequently as we would like, there are 2 approaches we can take here to alleviate this problem.
1) call System.gc periodically
2) call System.gc after a threshold of Off-Heap bytes is used.
3) Let GC collector run on its own schedule based on the JVM Implementation the code is run on.
Note: Though there is no guarantee that invoking System.gc() will force Garbage Collection to run, my experiments on OpenJDK 8 show that they are effective and helpful to release Native Memory.
It also should be noted that calling System.gc() is expensive and not recommended to call it repeatedly.
Android manages Native C++ Memory using the PhantomRef approach, this Video from Google IO/17 details on how to use PhantomReference and issues with Finalizers.
ResourceScope
We can create a ResourceScope class that implements the AutoCloseable Interface. This class provides a static method using similar to try-with-resources and takes another parameter block of code to run, any NativeResource created within the block scope registers with the instance of ResourceScope. Upon exit of the block scope, the using method runs close on ResourceScope which releases all the Native Resources.
Scala users can use the using static method that can execute a block scope and at the end of the scope releases the stack of Objects.
One problem I can see with this approach is misuse of ResourceScope, instead of calling the ResourceScope at a more granular scope the user could end up in wrapping up a higher level function and end up holding onto the created memory for a much longer time or until OutOfMemory exception is received. As an example consider when the entire training method which runs for 100s of epochs is passed as a block to be executed within the ResourceScope instead of using the ResourceScope for each epoch or creating separate Resource scopes for data-preprocessing/ training and post-processing.
This can be alleviated by tracking WeakReferences to NativeResources and leveraging the GC to find EOLed Objects + PhantomRef approach discussed above. (Needs to be tested)
Another problem is that since this approach uses ThreadLocal to manage scopes, it is not suitable for producer-consumer kind of applications where the producer might not wait until the consumer has used the NativeResource object.
Long running MNIST example with current code
Long running MNIST example, System.gc() called every epoch
Long running MNIST example, System.gc() called every 5 seconds
Long running MNIST example, System.gc() not explicitly called.
Long running MNIST example on GPU
Running GAN Example on GPU, calling System.gc() every second.
None
Yes, it will continue to work with WarnIfNotDisposed Interface.
Test Plan
Earlier versions of MXNet-Scala made use of WeakReferences and Finalizers to release NativeResources, however this caused segfaults due to MXNet backend requiring all calls going through the same thread and Finalizer running on its own thread.
Ran Finalizer and calling dispose using a Dispatcher pattern, As I researched and learnt more about Finalizers it was clear that this approach is not recommended,
An Item and article on the perils of using Finalizers by Joshua Bloch(Author of Effective Java) is here.
A few key points related to finalizers:
we can provide an interface similar to a Java File for each MXNet Object implementing the AutoCloseable interface and users can use try-with-resources approach that they are familiar to treat them like IO Resources, however this would become tedious when users would have to declare them before hand.
We can implement a object pool for Native objects such as NDArrays and when objects go out of scope they can return to the pool. This could be an extension to the proposed approach and probably useful for use-cases such as Inference where the size of NDArrays do not change between runs.
1. Implement NativeResource, extend NDArray, Symbol, Executor to be compatible with NativeResource.
2. Implement ResourceScope
3. Implement GCStrategy
4. Performance Tests using different GCStrategies.
5. Add Stress Test (long running tests).
I discussed with a few colleagues whom I work with(Frank Liu, Qing, Andrew, Yizhi, Calum..) and they provided the below feedback.
Andrew Ayres What happens when dispose is called within ResourceScope - This should work and not cause any issues.
Andrew Ayres How do we use in Java - TBD
Frank Liu We will get a compiler warning in Java, since the ResourceScope within the try block will not be used – This needs to be researched and resolved.
Frank Liu Using ResourceScope will be an issue with regards to Readability if users use deep nested code such as a Class that creates NDArray objects(they will be unaware that the objects are tracked in ResourceScope and deAllocated when they go out of scope, they might assume it can be safely used in another thread – I can't Right now of a way right now, I will continue to explore
YiZhi Liu we should keep NDCollector for a while and then deprecate after a few releases. – agree
YiZhi Liu We should suggest to use ResourceScope for training and probably use the PhantomRef approach for Inference
Sina Afrooze Using ResourceScope might more natural for Deep Learning applications than periodically calling System.gc
Rakesh: Use a Real life Model instead of MNIST for testing.
Calum Leslie: It might be intrusive to call System.gc() on the user's behalf – mitigated through user controllable properties.
StrongReference: An object that has an active reference such as val nd = NDArray.ones(Shape(2,2)), here nd stores a strong reference to the NDArray created.
WeakReference: A Weak reference is a reference to an object that does not prevent the Garbage collector from Collection.
PhantomReference: Phantom reference objects, which are enqueued after the collector determines that their referents may otherwise be reclaimed.