Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Corrected a link.

...

Garbage Collection (GC) frees objects that are no longer referenced by the Java application. There are several Java GC algorithms including Throughput (PS) and Concurrent Mark Sweep (CMS). The PS collector (the default) always and the CMS collector sometimes does what is known as a stop-the-world GC where it stops all application threads to clean up dead objects. This results in a GC pause. Long GC pauses have the potential to cause many issues with Geode JVMs including client timeouts, members being kicked out of the distributed system, etc. It is always recommended to use the CMS collector with Geode processes.

Determination

There are several ways to see GC pauses. One is to enable GC debugging. Startup parameters like below can be added to the JVM to enable GC debugging. For additional details, see the Java HotSpot VM Options guide or the Troubleshooting Guide for HotSpot VM guide.

...

Code Block
2014-09-29T00:37:36.047-0700: 184053.420: [Full GC2014-09-29T00:37:36.048-0700: 184053.421: [CMS: 38824730K->18654967K(98566144K), 21.7356830 secs] 39078944K->18654967K(100453632K), [CMS Perm : 45938K->45360K(46208K)], 21.7391550 secs] [Times: user=20.94 sys=0.78, real=21.73 secs]
 

vsd

Another way to see GC pauses is to use vsd to display the VMGCStats collections andcollectionTime values as well as the StatSampler delayDuration and jvmPausesvalues contained in a given Geode statistics archive.

VMGCStats

VMMemoryPoolStats

Logging the delta between the bottom left and bottom right of the line where it touches 0 shows a 23 second GC pause.

Code Block
Delta on {ConcurrentMarkSweep 19977} collectionTime
  point1: 09/29 00:37:36, 0
  point2: 09/29 00:37:59, 0
  time delta =       23 seconds.
  value delta =      0 Milliseconds / second.
  value/time delta = 0.0 Milliseconds / second per second.
  Of the 3 data points in this range:
    the min is:     0.0
    the max is:     987.954545455
    the average is: 329.318181818
    the stddev is:  570.395822765

StatSampler

The chart below shows the StatSampler delayDuration values. This statistic is set when the statistic sampler thread wakes up to sample statistics. It should be a fairly straight line at the statistic sampling interval. Spikes indicate that the thread didn't wake up at the normal time. This can be due to a number of things including high CPU, low memory and GC pauses. In this case, it spikes to >20 seconds. 

...

Code Block
[warning 2014/09/29 00:37:57.792 PDT server <Thread-10 StatSampler> tid=0xb8] Statistics sampling thread detected a wakeup delay of 20,995 ms, indicating a possible resource issue. Check the GC, memory, and CPU statistics.
 

Action

GC tuning can be difficult. The most important thing is to berify adequate heap headroom using the threshold JVM argument. Anywhere between 35% and 50% headroom is recommended to prevent heap fragmentation.

...

  • Make sure the NewSize is correct
  • Make sure PermGen is correct

See Sizing a Geode Cluster for additional details.