Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note: As of this point, the memory management is used only in the batch operators. The streaming operators follow a different concept.

 

Flink's Managed Memory

Conceptually, Flink splits the heap into three regions:

...

  • The size of the buffers can be adjusted via "taskmanager.network.bufferSizeInBytes", but 32K seems to be a good size for most setups.
  • There are ideas about how to unify the NetworkBuffer Pool and the Memory Manager region.
  • There are ideas to add a mode that makes the allocation of the memory buffers by the MemoryManager lazy (allocated when needed). This decreases the startup time of the TaskManager, but will cause more garbage collections later when the buffers are actually allocated.

Memory Segments

Flink represents all its memory as a collection of Memory Segments. The segment represents a region of memory (by default 32 KiBytes) and
provides methods to access the data at offsets (get and put longs, int, bytes, copy between segments and arrays, ...).
You can think of it as a version of the java.nio.ByteBuffer that is specialized for Flink (see below why we are not using the java.nio.ByteBuffer). 

...

  • It uses the sun.misc.Unsafe methods on its byte arrays, thus the fetching of types like long is much cheaper (no bit shifting necessary).
  • It has only absolute get/put methods and has those for all types, which makes it thread safe. The byte buffer is missing
    the absolute get/put for byte arrays and other ByteBuffers, forcing you to resort to relative methods, which require locking or giving up thread safety.
  • Since we have only one final implementation class of the MemorySegment, the JIT can perform better work in
    de-virtualizing and inlining the methods than for the ByteBuffer, which exists in at least 5 different implementations.

 

...

Impact on Garbage Collection

...