Title

Link to dev List discussion


https://lists.apache.org/thread.html/3a1d3fdd1fd76617792e7b7129a4a5dba34bea8462bff6173b58426a@%3Cdev.mxnet.apache.org%3E

Feature Shepherd

YiZhi Liu

Problem

MXNet Scala uses native memory to manage NDArray, Symbol, Executor, DataIterators using the MXNet c_api. C APIs provide appropriate interfaces to create, access and free these objects MXNet Scala has corresponding Wrappers and APIs which have pointer references to the native memory. 

Current JVM users(Scala/Clojure/Java..) of Apache MXNet have to manage MXNet objects manually using the dispose pattern, there are a few usability problems with this approach.

Goals/Usecases

User Experience

With this change, MXNet-Scala users will be able to use MXNet Objects in 3 different ways:

1) Use MXNet Objects like regular Java Objects and let the ResourceHandler deal with deAllocating Off-Heap memory. The user needs to selecting the right DeAllocation strategy. ie., periodically run System.gc()/run System.gc() when off-heap memory reaches a certain threshold or Let JVM decide when to run Garbage Collection if it feels pressure on JVM heap.

This approach may not be sufficient for cases where memory intensive objects such as NDArrays are greedily allocated and the time interval for Garbage Collection has not elapsed or when the call to (system.gc) is not honored. 

2) Use ResourceScope.  ResourceScope collects all MXNet Object Off-heap pointer references within a block scope and releases them at the end of the scope. 

This follows the try-with-resources paradigm in Java7+ and is similar to JavaCPP's PointerScope and NDArrayCollector that YiZhi Liu has implemented, but is enhanced to handle MXNet Objects in a generic way.

Since this uses ThreadLocal to manage the scopes, it is not a thread-safe in an application such as Producer-Consumer where the Producer might leave before the consumer has consumed the object and is unavailable.

Scala

Scala does not support try-with-resources, so an alternate method .using method is made available on ResourceScope. At the end of the using method, the NativeResources within the block Scope will be disposed. This approach is suggested by Martin Odersky( Creator of Scala in  Slide 21 from FOSDEM 2009 )

    ResourceScope.using(new ResourceScope()) {
      var r = NDArray.ones(Shape(2,2))
      var r1 = NDArray.ones(Shape(50,50))
  }

Java

Not tested on Java

try (ResourceScope rs = new ResourceScope()) {
	NDArray r: NDArray.ones(Shape(2,2))
	NDArray r1: NDArray.ones(Shape(50,50))
}


3) Call dispose() explicitly at the end of using each MXNet object.  This is the current experience in MXNet aside from NDArrayCollector.

val nd: NDArray = NDArray.ones(Shape(2,2))
nd.dispose()


Open Questions

  1. How to calculate bytesAllocated for MXNet Objects such as Symbol, Executor, DataIterator, etc.,
  2. bytesAllocated for NDArrays are calculated as a product of dtype and Shape product * Sizeof(Float32..)
  3. Running multiple examples for a long period of time, I did not experience any issues when Objects were freed on a separate thread. 
  4. Will there be a situation where a native pointer is still in use but the Scala Object is not reachable.

Proposed Approach

For Automatic Resource Management, a NativeResource Interface is used that maintains references to nativeAddress and NativeDeAllocator Address of the MXNet Object.

An Object of type NativeResourceRef tracks NativeResource using PhantomReference.

MXNet Objects extend NativeResource and updates nativeAddress, bytesAllocated & nativeDeAlloctor during the object Creation and calls NativeResource.register to register a phantomRef.

When the Garbage Collector runs and finds that NativeResource Object is not reachable, it adds the tracking PhantomReference to the Reference Queue specified.  A Separate Cleanup thread waits on a Reference Queue's remove Blocking method and releases the Native memory upon notification. 


The problem with this approach is native Objects are released only after Garbage Collection has determined that the Object is not reachable, For the garbage collection to run, the GC Subsystem has to feel pressure on the JVM Heap, however in MXNet Scala most MXNet objects are allocated in Native Memory and GC does not run as frequently as we would like, there are 2 approaches we can take here to alleviate this problem.

1) call System.gc periodically 

2) call System.gc after a threshold of Off-Heap bytes is used.

3) Let GC collector run on its own schedule based on the JVM Implementation the code is run on.


Note:  Though there is no guarantee that invoking System.gc() will force Garbage Collection to run, my experiments on OpenJDK 8 show that they are effective and helpful to release Native Memory. 

It also should be noted that calling System.gc() is expensive and not recommended to call it repeatedly.

Android manages Native C++ Memory using the PhantomRef approach, this Video from Google IO/17 details on how to use PhantomReference and issues with Finalizers.


ResourceScope

We can create a ResourceScope class that implements the AutoCloseable Interface. This class provides a static method using similar to try-with-resources and takes another parameter block of code to run, any NativeResource created within the block scope registers with the instance of ResourceScope. Upon exit of the block scope, the using method runs close on ResourceScope which releases all the Native Resources.

Scala users can use the using static method that can execute a block scope and at the end of the scope releases the stack of Objects.

One problem I can see with this approach is misuse of ResourceScope, instead of calling the ResourceScope at a more granular scope the user could end up in wrapping up a higher level function and end up holding onto the created memory for a much longer time or until OutOfMemory exception is received. As an example consider when the entire training method which runs for 100s of epochs is passed as a block to be executed within the ResourceScope instead of using the ResourceScope for each epoch or creating separate Resource scopes for data-preprocessing/ training and post-processing. 

This can be alleviated by tracking WeakReferences to NativeResources and leveraging the GC to find EOLed Objects + PhantomRef approach discussed above. (Needs to be tested)

Another problem is that since this approach uses ThreadLocal to manage scopes, it is not suitable for producer-consumer kind of applications where the producer might not wait until the consumer has used the NativeResource object.



Test Results of Prototype

Long running MNIST example with current code

Long running MNIST example, System.gc() called every epoch

Long running MNIST example, System.gc() called every 5 seconds

Long running MNIST example, System.gc() not explicitly called.

Long running MNIST example on GPU

Running GAN Example on GPU, calling System.gc() every second.

Addition of New APIs

None

Backward compatibility

Yes, it will continue to work with WarnIfNotDisposed Interface.

Performance Considerations

  1. Run MNIST Training,  measure average time per epoch in the current code which uses Dispose
  2. Measure time taken when ResourceScope is used for each epoch
  3. Measure time taken when System.gc() is called periodically 
  4. Measure time taken When System.gc() is called on maxOffHeapBytes is reached.
  5. Run Inference using ResourceScope and test Performance.

Test Plan

  1. Run from Master branch(without any changes) and see how it performs - use it as a baseline
  2. Run Tests calling System.gc periodically
  3. Run tests on JDK7 & JDK8 environment
  4. Run tests on OSX 
  5. Run tests using different memory consuming examples(training using large Images, GAN, RNN)
  6. Run inside a container with limited memory
  7. Run tests for ResourceScope and check for Memory Stability.

Alternative Approaches

Earlier versions of MXNet-Scala made use of WeakReferences and Finalizers to release NativeResources, however this caused segfaults due to MXNet backend requiring all calls going through the same thread and Finalizer running on its own thread.

Finalizer with Dispose using Dispatcher Pattern.

Ran Finalizer and calling dispose using a Dispatcher pattern, As I researched and learnt more about Finalizers it was clear that this approach is not recommended, 

An Item and article on the perils of using Finalizers by Joshua Bloch(Author of Effective Java) is here

A few key points related to finalizers:

AutoCloseable and using try-with-resources{}

we can provide an interface similar to a Java File for each MXNet Object implementing the AutoCloseable interface and users can use try-with-resources approach that they are familiar to treat them like IO Resources, however this would become tedious when users would have to declare them before hand.

Object Pooling

We can implement a object pool for Native objects such as NDArrays and when objects go out of scope they can return to the pool. This could be an extension to the proposed approach and probably useful for use-cases such as Inference where the size of NDArrays do not change between runs.

Milestones

1. Implement NativeResource, extend NDArray, Symbol, Executor to be compatible with NativeResource.

2. Implement ResourceScope

3. Implement GCStrategy

4. Performance Tests using different GCStrategies.

5. Add Stress Test (long running tests).

References

  1. Item 7: Avoid finalizers: http://www.informit.com/articles/article.aspx?p=1216151&seqNum=7
  2. How to Manage Native C++ Memory in Android (Google I/O '17): https://www.youtube.com/watch?v=7_caITSjk1k
  3. try-with-resources in Java7+: https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html
  4. Garbage Collection(Chapter 6,7): https://www.amazon.com/Optimizing-Java-techniques-application-performance/dp/1492025798
  5. PhantomReference: https://docs.oracle.com/javase/7/docs/api/java/lang/ref/PhantomReference.html
  6. JavaCPP's PointerScope: https://github.com/bytedeco/javacpp/blob/master/src/main/java/org/bytedeco/javacpp/PointerScope.java
  7. Yizhi's NDArrayCollector: https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/NDArrayCollector.scala

Feedback received so far:

I discussed with a few colleagues whom I work with(Frank Liu, Qing, Andrew, Yizhi, Calum..) and they provided the below feedback.

Glossary:

StrongReference: An object that has an active reference such as val nd = NDArray.ones(Shape(2,2)), here nd stores a strong reference to the NDArray created.

WeakReference: A Weak reference is a reference to an object that does not prevent the Garbage collector from Collection. 

PhantomReference: Phantom reference objects, which are enqueued after the collector determines that their referents may otherwise be reclaimed.