1 Introduction

This document defines enhancements to GemFire for utilizing non-Java-heap memory. Project involves merging the off-heap memory feature from GemFire XD over to GemFire.

2 Related Documents

3 Terminology

4 Architectural Description

The goals and general approach to using off-heap is described in MemScale (high-level charter/goals).

Experimentation reveals that usage of off-heap memory provides a potentially significant performance boost by minimizing CPU usage of the Java garbage collector. The extra processing required to serialize/deserialize and store/fetch the data through sun.misc.Unsafe is offset by having less Java heap memory for the garbage collector to manage as well as not consuming the CPU cycles normally used up by the collector.

4.1 Configuration

4.1.1 off-heap-memory-size GemFire property

4.1.2.1
* <b>Off-Heap Memory</b>
* 
* <dl>
* <a name="off-heap-memory-size"><dt>off-heap-memory-size</dt></a>
* <dd><U>Description</U>: The total size of off-heap memory specified as
* off-heap-memory-size=<n>[g|m]. <n> is the size. [g|m] indicates
* whether the size should be interpreted as gigabytes or megabytes.
* </dd>
* <dd><U>Default</U>: <code>""</code></dd>
* <dd><U>Since</U>: 9.0</dd>
* </dl>

Examples of valid values:

  1. off-heap-memory-size=4096m
  2. off-heap-memory-size=120g

4.1.2 lock-memory GemFire property

 

* <dl>
* <a name="lock-memory"><dt>lock-memory</dt></a>
* <dd><U>Description</U>: Set to true to lock GemFire heap and off-heap memory pages into RAM. 
* This prevents the operating system from swapping the pages out to disk, which can cause severe performance degradation.
* </dd>
* <dd><U>Default</U>: false</dd>
* <dd><U>Since</U>: 9.0</dd>
* </dl>

 

4.1.3 off-heap Region attribute
Regions with "off-heap" set to true will store all their entry values in off-heap memory. This includes values for Region Entries and Queue entries. The region will still use heap memory for everything else (for example the entry key, an entry object, the ConcurrentHashMap). 
4.1.2.1 Declarative configuration of off-heap
<!ATTLIST region-attributes
...
off-heap (false | true) #IMPLIED
>
4.1.2.2 Programmatic configuration of off-heap

 

/**
* Enables this region's usage of off-heap memory if true.
* @since 9.0
* @param offHeap boolean flag to enable off-heap memory
*/
public void setOffHeap(boolean offHeap) {
/**
* Returns whether or not this region uses off-heap memory.
* @return True if a usage of off-heap memory is enabled;
* false if usage of off-heap memory is disabled (default).
* @since 9.0
*/
public boolean getOffHeap();

 

4.2 Calculation of Slab Size

Our implementation organizes the off-heap memory into slabs. Rather than expose another configuration detail to the user, we will calculate the best slab size possible.

We believe that Integer.MAX_VALUE (about 2 GB) should be an acceptable maximum slab size. This determines that 2 GB is the maximum sized object or data allowed.

For simplicity we are making the slabs all of equal size. Therefore the off-heap memory must be divisible by the calculated slab size with negligible waste.

If the total off-heap memory is 2 GB or less, we will have one slab. If the off-heap memory is greater than 2 GB then there will be 2 or more slabs.

public static long calcSlabSize(long offHeapMemorySize) {
  final String offHeapSlabConfig = System.getProperty("gemfire.OFF_HEAP_SLAB_SIZE");
  long slabSize = 0;
  if (offHeapSlabConfig != null && !offHeapSlabConfig.equals("")) {
    slabSize = parseLongWithUnits(offHeapSlabConfig, 100*1024*1024, 1024*1024);
  } else { // calculate slabSize 
    if (offHeapMemorySize < (2*GIGABYTE)) {
      // 2 GB or under has one slab 
      slabSize = offHeapMemorySize;
    } else if (offHeapMemorySize == (2*GIGABYTE)) {
      slabSize = Integer.MAX_VALUE;
    } else {
      // over 2 GB has more than one slab 
      final long modulus = offHeapMemorySize % (2*GIGABYTE);
      if (modulus == 0) {
        // 2 GB divides cleanly 
        slabSize = Integer.MAX_VALUE;
      } else {
        final long numSlabs = (offHeapMemorySize + modulus) / (2*GIGABYTE);
        slabSize = offHeapMemorySize / numSlabs;
      }
    }
  }
  return slabSize;
}

4.3 Off-Heap Memory statistics

namedescription
defragmentationsThe total number of times off-heap memory has been defragmented.
defragmentationTimeThe total time spent on the defragmentation of off-heap memory.
fragmentationThe percentage of off-heap memory fragmentation. Updated every time a defragmentation is performed.
fragmentsThe number of fragments of free off-heap memory. Updated every time a defragmentation is done.
freeMemoryThe amount of off-heap memory, in bytes, that is not being used.
largestFragmentThe largest fragment of memory found by the last defragmentation of off heap memory. Updated every time a defragmentation is done.
maxMemoryThe maximum amount of off-heap memory, in bytes. This is the amount of memory allocated at startup and does not change.
objectsThe number of objects stored in off-heap memory.
readsThe total number of reads of off-heap memory.
usedMemoryThe amount of off-heap memory, in bytes, that is being used to store data.

4.4 M&M Support

4.4.1 JMX Changes

MemberMXBean has new attributes which expose several of the off-heap memory statistics for the member.

If the defragmentationTime is of non-zero value then the member has had to compact the off-heap memory. This may indicate that the member is close to running out of off-heap memory. It also indicates how much time the GemFire XD process is having to spend compacting. Does this help with capacity planning or analyzing usage patterns? Lots of variability in row sizes might possibly result in more compacting than row sizes that are more uniform.

FreeMemory + UsedMemory is the same as MaxMemory so we did not expose MaxMemory as an attribute (it would never change value anyway). Note: the MemberMXBean operation listGemFireProperties would include off-heap-memory-size which is also the same as MaxMemory.

Objects show the number of GemFire objects stored in the Region.

In theory, Fragmentation would be a valuable attribute to monitor. An excessively high Fragmentation percentage would indicate that the off-heap memory is sufficiently fragmented to potentially make it difficult to fill up or utilize. This means the member might become critical (see docs about ResourceManager) and throw LowMemoryException. It's also possible that a member may throw OutOfOffHeapMemoryException, which will cause the cache to close in order to maintain consistency across the cluster. The latter is last resort and each member will attempt to use the off-heap eviction and critical thresholds on the ResourceManager to prevent data inconsistency or member loss. This may deserve further discussion (it's complicated).

RegionMXBean listRegionAttributes includes a new boolean attribute that indicates if the region is configured to use off-heap memory or regular Java heap memory.

4.4.2 Pulse Support

Changes to Pulse for GemFireXD Off-Heap will need to be ported with minor changes if any. If possible, we should expose all the off-heap attributes on MemberMXBean including the new getOffHeapMaxMemory.

Each region using off-heap should also be flagged as an off heap region. The attribute was "enable-off-heap-memory" in GFXD, but will be "off-heap" in GemFire.

4.4.3 GFSH Command Changes

4.5 Design Limits and Constraints

Discuss minimum and maximum configuration and usage. Most of the limits are due to the max size of primitives being used, such as integer for number of slabs or long for memory address pointers.

4.5.1 Maximum Object Size

The maximum off-heap data size is 2 GB, but some of this data includes implementation-specific meta-data which lowers the maximum serialized user data (bytes) to less than 2 GB.

4.5.2 All Members Must Configure Off-Heap Memory Enabled Uniformly

We will not be allowing two members to host data for the same Region with a different value for off-heap enabled. We stepped through some scenarios and the ResourceManager and LRU settings just didn't work properly unless the members were required to be configured the same way. TODO: this constraint may change for rolling a region to off-heap

4.6 Region Entries

Equality checks are done without deserializing the value stored on off-heap memory. If the value is serialized using pdx then PdxInstance equals is used for comparison. Otherwise the raw serialized bytes are compared. This means that custom equals() implementations will not be called when checking off-heap values for equality. Value equality checks are done by Region.replace(K,V,V) and Region.remove(Object, Object).

4.7 Off-Heap Object Details

Off-heap memory is going to perform best for uniform value sizes as this minimizes the impact of fragmentation. We are using a defragmentation strategy which does not move live data. So usage patterns involving mass creates followed by mass destroys (if destroying at all) will also minimize effects of fragmentation.

The worst usage pattern will be those that create and then perform updates with lots of different size values without every destroying or clearing. This may result in severe fragmentation that our defragmentation algorithm will struggle with.

Defragmentation involves moving address spaces to a free list from which future creates will pull from. There are 16384 free lists, one for each value size in powers of 8 up to the power of 16KB (8^16384) which allows for a max size up to 131,072. Beyond this the implementation uses a ConcurrentSkipList for the free list of larger chunks. Contiguous entries in a free list can be combined. The defragmentation of free spaces involves removing them from specific free lists so they can be used for any size.

The number of free lists for values smaller than what's ultimately stored in the ConcurrentSkipList can be changed by setting the System property gemfire.OFF_HEAP_FREE_LIST_COUNT. We do not intend to expose this to the customer and will only use this to enable testing and profiling for determining the best number of free lists.

The on-heap foot-print is a long pointing to an absolute memory address for the off-heap value. The first 8 bytes of the off-heap value contains meta-data about the value including size.

Some values stored in an off-heap region will not use any off-heap memory. If the serialized size of the value is smaller than 8 bytes then it will not consume any off-heap memory.

Also if the value is an instance of java.lang.Long whose most significant byte is just an extension of the sign bit (that is the most significant 9 bits are all 1 or 0) then it will not consume any off-heap memory. These values are stored in the primitive long field that is normally used to reference off-heap memory. Since this primitive field is part of the heap overhead of an off-heap region that value will not consume any off-heap memory. Keep this in mind when estimating how much off-heap memory is needed and when viewing off-heap statistics.

TODO: determine if we will eventually move to off-heap keys because this will impact how we compare two keys for off-heap values.

4.8 PDX


Value is currently copied from off-heap to a heap byte-array when a PDX instance is created. TODO: investigate possibility of changing PDX so that it just uses a reference to the off-heap bytes.

5 Feature Interactions

5.1 Callback EntryEvents

Callback Events (EntryEvents) are not fully usable beyond the scope of the callback on an off-heap region. Attempting to invoke methods which acquire values may throw an exception indicating that the value is no longer available if called after the callback returns. The exception is: IllegalStateException("Attempt to access off heap value after the EntryEvent was released.").

If the user needed to save a reference to the instance of EntryEvent into a variable for use after the CacheListener callback returns, then they will instead need to create an on-heap copy of the value to ensure that the value remains available to them.

For example, invoking com.gemstone.gemfire.cache.EntryEvent#getNewValue() is only safe while the thread is executing within the com.gemstone.gemfire.cache.CacheListener#afterUpdate(EntryEvent) method or other callback method. The EntryEvent methods getOldValue(), getSerializedOldValue(), and getSerializedNewValue() are similarly affected.

5.2 ResourceManager

Existing ResourceManager thresholds would still apply because off-heap values have a heap foot-print which could trigger the existing thresholds.

We will also introduce new ResourceManager support for off-heap memory thresholds (eviction and critical) which will trigger thresholds specific to off-heap memory. The eviction threshold will support all eviction actions. The critical threshold will result in LowMemoryExceptions thrown when attempting write operations on an off-heap table/region.

 We need two sets of thresholds because a GemFire node may have a mix of regions that use off-heap entries and regions that use heap entries.

The ResourceAdvisor, profiles and message will require modification to include off-heap memory in addition to Java heap memory. Low memory messages will be sent separately for heap and off-heap memory status changes. Regions using off-heap will then throw LowMemoryException when either heap or off-heap is critical. Regions using heap will throw LowMemoryException only when heap is critical.

5.3 Compression

Compression can be used with an off-heap region.

5.4 Delta Updates

Every time a delta is applied it will need to: deserialize the off-heap value, apply the delta, and re-serialize the value to off-heap. So off-heap with delta has to do a lot more work than heap with delta (since heap keeps the object form in memory letting it apply the delta without doing a deserialize and serialize).

5.5 Querying/OQL

Use with querying/OQL will probably cause frequent bulk copying to the heap.

5.6 Indexes

Off-heap regions with indexes are supported. In the first implementation the index data for an off-heap region will be on the heap so heap usage will be less but can still be high depending on the type and number of indexes.

5.7 OperationContext

The OperationContext passed to security callbacks may have references to off-heap values. If they do then they will no longer be available after the callback completes. If an attempt it made to get a value from the OperationContext after the callback returns it will throw this exception: IllegalStateException("Attempt to access off-heap value after the OperationContext callback returned.").
 
Note that not all operations done on an off-heap region will keep the values in the OperationContext off-heap. Some of them will copy the value onto the heap before passing it to the OperationContext.

5.8 Result of Region.put, Region.destroy, Region.remove, and Region.Entry.setValue

The result of all these operations are sometimes null even if an old value existed. The reason for this is to give better performance for the normal use case of doing this operation without wanting the old value. All of these operations will now also return null if the region is off-heap and the old value was stored in off-heap memory and not compressed.


5.9 PartitionAttributes localMaxMemory

The default value of localMaxMemory for an off-heap Partitioned Region will be 100% of off-heap-memory-size instead of 90% of the heap memory size.

6 Performance Requirements

General requirements and purpose for the off-heap memory feature:

6.2 Scalability Goals

7 Testing

8 Documentation / Customer Recommendations

In general we recommend against combining the use of off-heap memory with the following due to performance and memory overhead implications:

The following conditions result in significant fragmentation and would make use of off-heap memory questionable:

We do recommend using off-heap memory for all or any of the following:

8.1 Sizing Recommendations

TODO: We need to determine the heap foot-print of off-heap values and then calculate some meaningful guidelines for sizing GemFire servers, a reasonable minimum overall off-heap-memory-size, minimum entry value size (for example: 3 byte value sizes would not be a good match for off-heap), etc.

8.1.1 Sizing Details

type32-bit JVM64-bit JVM (< 32 GB Java heap)64-bit JVM (> 32 GB Java heap)
ordinary object pointer4 bytes4 bytes (compressed OOPS)8 bytes
object header8 bytes12 bytes16 bytes
primitive long8 bytes8 bytes8 bytes
object Long16 bytes24 bytes24 bytes
The 64-bit JVM with compressed OOPS (enabled if Java heap is < 32 GB) performs better than 64-bit JVM > 32 GB Java heap. The 32-bit JVM is limited to 2-4 GB Java heap, so the sweet spot for a customer is to use the 64-bit JVM with < 32 GB Java heap and utilize our MemScale off-heap memory to scale as large as they need.

Based on the chart above we can determine approximate overhead and foot-print of user data and GemFire.