Garbage Collection (GC) frees objects that are no longer referenced by the Java application. There are several Java GC algorithms including Throughput (PS) and Concurrent Mark Sweep (CMS). The PS collector (the default) always and the CMS collector sometimes does what is known as a stop-the-world GC where it stops all application threads to clean up dead objects. This results in a GC pause. Long GC pauses have the potential to cause many issues with Geode JVMs including client timeouts, members being kicked out of the distributed system, etc. It is always recommended to use the CMS collector with Geode processes.
There are several ways to see GC pauses. One is to enable GC debugging. Startup parameters like below can be added to the JVM to enable GC debugging. For additional details, see the Java HotSpot VM Options guide or the Troubleshooting Guide for HotSpot VM guide.
The output from the GC debugging VM arguments is to show GC activity. For example, the -XX:+PrintGCApplicationStoppedTime VM argument shows application pauses. The output below shows a 7 second application pause.
In addition, the -XX:+PrintGCDetails VM argument also shows pauses as shown below. The Full GC below paused the application for 21.73 seconds.
Another way to see GC pauses is to use
vsd to display the VMGCStats collections andcollectionTime values as well as the StatSampler delayDuration and jvmPausesvalues contained in a given Geode statistics archive.
Logging the delta between the bottom left and bottom right of the line where it touches 0 shows a 23 second GC pause.
The chart below shows the StatSampler delayDuration values. This statistic is set when the statistic sampler thread wakes up to sample statistics. It should be a fairly straight line at the statistic sampling interval. Spikes indicate that the thread didn't wake up at the normal time. This can be due to a number of things including high CPU, low memory and GC pauses. In this case, it spikes to >20 seconds.
The chart below shows the StatSampler jvmPauses values. This statistic shows thedelayDuration in a different way. Any time the dalayDuration is >3 seconds, this statistic is incremented.
It is also accompanied by a warning message in the Geode log like:
GC tuning can be difficult. The most important thing is to berify adequate heap headroom using the threshold JVM argument. Anywhere between 35% and 50% headroom is recommended to prevent heap fragmentation.
Some other areas to check are:
- Make sure the NewSize is correct
- Make sure PermGen is correct
See Sizing a Geode Cluster for additional details.