Skip to end of metadata
Go to start of metadata

Description

Garbage Collection (GC) frees objects that are no longer referenced by the Java application. There are several Java GC algorithms including Throughput (PS) and Concurrent Mark Sweep (CMS). The PS collector (the default) always and the CMS collector sometimes does what is known as a stop-the-world GC where it stops all application threads to clean up dead objects. This results in a GC pause. Long GC pauses have the potential to cause many issues with Geode JVMs including client timeouts, members being kicked out of the distributed system, etc. It is always recommended to use the CMS collector with Geode processes.

Determination

There are several ways to see GC pauses. One is to enable GC debugging. Startup parameters like below can be added to the JVM to enable GC debugging. For additional details, see the Java HotSpot VM Options guide or the Troubleshooting Guide for HotSpot VM guide.

  • -verbose:gc
  • -Xloggc:gc.log
  • -XX:+PrintGCDetails
  • -XX:+PrintGCTimeStamps
  • -XX:+PrintGCApplicationStoppedTime

The output from the GC debugging VM arguments is to show GC activity. For example, the -XX:+PrintGCApplicationStoppedTime VM argument shows application pauses. The output below shows a 7 second application pause.

In addition, the -XX:+PrintGCDetails VM argument also shows pauses as shown below. The Full GC below paused the application for 21.73 seconds.

 

vsd

Another way to see GC pauses is to use vsd to display the VMGCStats collections andcollectionTime values as well as the StatSampler delayDuration and jvmPausesvalues contained in a given Geode statistics archive.

VMGCStats

VMMemoryPoolStats

Logging the delta between the bottom left and bottom right of the line where it touches 0 shows a 23 second GC pause.

StatSampler

The chart below shows the StatSampler delayDuration values. This statistic is set when the statistic sampler thread wakes up to sample statistics. It should be a fairly straight line at the statistic sampling interval. Spikes indicate that the thread didn't wake up at the normal time. This can be due to a number of things including high CPU, low memory and GC pauses. In this case, it spikes to >20 seconds. 

VMMemoryPoolStats

The chart below shows the StatSampler jvmPauses values. This statistic shows thedelayDuration in a different way. Any time the dalayDuration is >3 seconds, this statistic is incremented.

VMMemoryPoolStats

It is also accompanied by a warning message in the Geode log like:

 

Action

GC tuning can be difficult. The most important thing is to berify adequate heap headroom using the threshold JVM argument. Anywhere between 35% and 50% headroom is recommended to prevent heap fragmentation.

Some other areas to check are:

  • Make sure the NewSize is correct
  • Make sure PermGen is correct

See Sizing a Geode Cluster for additional details.

  • No labels