Since the G1 (garbage-first) garbage collector has been released, there were expectations that it would finally perform better for larger heap sizes (>16GB). Unfortunately those expectations were not met. While G1 garbage collector is meant to remove larger GC pauses, the sporadic and unpredictable behavior of G1 collector on larger heaps renders it generally unusable for any system sensitive to performance SLAs.
At GridGain, having worked on a distributed caching (data grid) product for many years, we constantly benchmark with various Garbage Collectors to find the optimal configuration for larger heap sizes. From conducting numerous tests, we have concluded that unless you are utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage Collector provided with JDK will render any kind of stable GC performance with heap sizes larger that 16GB. For example, on 50GB heaps we can often encounter up to 5 minute GC pauses, with average pauses of 2 to 4 seconds.
Here is a good comparison of a simple test for a cache 'put' benchmark on 50GB of memory. The blue graph was run with G1 garbage collector set to 200ms average GC pauses, and the green one was run with GridGain OffHeap Memory technology. In each test we are storing over 120,000,000 objects in cache from multiple threads under load on a cluster of 2 servers:
As you see, using GridGain OffHeap memory renders a fairly smooth and predictable green graph.
Here are some of the GC printouts we received from G1 garbage collector:
The benchmark was run on 2 Dell R610 blades, with 8 cores, and 96GB of RAM each. The benchmarking framework used, which helped with generation of pretty graphs, was the open source Yardstick framework, available on GitHub.
At GridGain, having worked on a distributed caching (data grid) product for many years, we constantly benchmark with various Garbage Collectors to find the optimal configuration for larger heap sizes. From conducting numerous tests, we have concluded that unless you are utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage Collector provided with JDK will render any kind of stable GC performance with heap sizes larger that 16GB. For example, on 50GB heaps we can often encounter up to 5 minute GC pauses, with average pauses of 2 to 4 seconds.
Here is a good comparison of a simple test for a cache 'put' benchmark on 50GB of memory. The blue graph was run with G1 garbage collector set to 200ms average GC pauses, and the green one was run with GridGain OffHeap Memory technology. In each test we are storing over 120,000,000 objects in cache from multiple threads under load on a cluster of 2 servers:
As you see, using GridGain OffHeap memory renders a fairly smooth and predictable green graph.
Here are some of the GC printouts we received from G1 garbage collector:
474.404: [GC pause (young) 13278M->9596M(20480M), 0.7638660 secs]The JVM Parameters we ran G1 garbage collector with were:
481.850: [GC pause (young) (initial-mark) 13356M->9674M(20480M), 0.7320680 secs]
482.583: [GC concurrent-root-region-scan-start]
482.784: [GC concurrent-root-region-scan-end, 0.2017740]
482.784: [GC concurrent-mark-start]
489.055: [GC pause (young) 13442M->9752M(20480M), 0.7715580 secs]
495.648: [GC concurrent-mark-end, 12.8631950 sec]
495.675: [GC remark, 0.0303560 secs]
495.715: [GC cleanup 13232M->13232M(20480M), 0.2076440 secs]
496.305: [GC pause (young) 13520M->9830M(20480M), 0.7453250 secs]
505.025: [GC pause (mixed) 13598M->9363M(20480M), 2.7538670 secs]
516.698: [GC pause (mixed) 13131M->9241M(26624M), 2.9020230 secs]
...
1160.396: [GC pause (young) 24923M->15508M(51200M), 3.2503740 secs]
1177.879: [GC pause (young) 25052M->15637M(51200M), 3.4448020 secs]
1202.492: [GC pause (young) 25189M->15765M(51200M), 3.1212860 secs]
1220.057: [GC pause (young) 25325M->15891M(51200M), 4.0836240 secs]
1245.212: [GC pause (young) 25459M->16016M(51200M), 3.6344050 secs]
1262.500: [GC pause (young) 25584M->16141M(51200M), 4.0109490 secs]
G1 garbage collector had average GC pauses between 2 and 4 seconds, with major mixed or concurrent sweeps happening for 6 to 12 seconds in the middle. From the graphs you can easily see that latencies and throughput (operations / second) for G1 garbage collector are all over the place, and often drop to the bottom, rendering the system completely unresponsive multiple times during the 20 minute test run.-Xms50g -Xmx50g -XX:+UseG1GC -XX:MaxGCPauseMillis=200
The benchmark was run on 2 Dell R610 blades, with 8 cores, and 96GB of RAM each. The benchmarking framework used, which helped with generation of pretty graphs, was the open source Yardstick framework, available on GitHub.
View comments