1. Since the G1 (garbage-first) garbage collector has been released, there were expectations that it would finally perform better for larger heap sizes (>16GB). Unfortunately those expectations were not met. While G1 garbage collector is meant to remove larger GC pauses, the sporadic and unpredictable behavior of G1 collector on larger heaps renders it generally unusable for any system sensitive to performance SLAs.

    At GridGain, having worked on a distributed caching (data grid) product for many years, we constantly benchmark with various Garbage Collectors to find the optimal configuration for larger heap sizes. From conducting numerous tests, we have concluded that unless you are utilizing some off-heap technology (e.g. GridGain OffHeap), no Garbage Collector provided with JDK will render any kind of stable GC performance with heap sizes larger that 16GB. For example, on 50GB heaps we can often encounter up to 5 minute GC pauses, with average pauses of 2 to 4 seconds.

    Here is a good comparison of a simple test for a cache 'put' benchmark on 50GB of memory. The blue graph was run with G1 garbage collector set to 200ms average GC pauses, and the green one was run with GridGain OffHeap Memory technology. In each test we are storing over 120,000,000 objects in cache from multiple threads under load on a cluster of 2 servers:
    As you see, using GridGain OffHeap memory renders a fairly smooth and predictable green graph.

    Here are some of the GC printouts we received from G1 garbage collector:
    474.404: [GC pause (young) 13278M->9596M(20480M), 0.7638660 secs] 
    481.850: [GC pause (young) (initial-mark) 13356M->9674M(20480M), 0.7320680 secs]
    482.583: [GC concurrent-root-region-scan-start]
    482.784: [GC concurrent-root-region-scan-end, 0.2017740]
    482.784: [GC concurrent-mark-start]
    489.055: [GC pause (young) 13442M->9752M(20480M), 0.7715580 secs]
    495.648: [GC concurrent-mark-end, 12.8631950 sec]
    495.675: [GC remark, 0.0303560 secs]
    495.715: [GC cleanup 13232M->13232M(20480M), 0.2076440 secs]
    496.305: [GC pause (young) 13520M->9830M(20480M), 0.7453250 secs]
    505.025: [GC pause (mixed) 13598M->9363M(20480M), 2.7538670 secs]
    516.698: [GC pause (mixed) 13131M->9241M(26624M), 2.9020230 secs]
    ...
    1160.396: [GC pause (young) 24923M->15508M(51200M), 3.2503740 secs]
    1177.879: [GC pause (young) 25052M->15637M(51200M), 3.4448020 secs]
    1202.492: [GC pause (young) 25189M->15765M(51200M), 3.1212860 secs]
    1220.057: [GC pause (young) 25325M->15891M(51200M), 4.0836240 secs]
    1245.212: [GC pause (young) 25459M->16016M(51200M), 3.6344050 secs]
    1262.500: [GC pause (young) 25584M->16141M(51200M), 4.0109490 secs]
    The JVM Parameters we ran G1 garbage collector with were:
    -Xms50g -Xmx50g -XX:+UseG1GC -XX:MaxGCPauseMillis=200
    G1 garbage collector had average GC pauses between 2 and 4 seconds, with major mixed or concurrent sweeps happening for 6 to 12 seconds in the middle. From the graphs you can easily see that latencies and throughput (operations / second) for G1 garbage collector are all over the place, and often drop to the bottom, rendering the system completely unresponsive multiple times during the 20 minute test run.

    The benchmark was run on 2 Dell R610 blades, with 8 cores, and 96GB of RAM each. The benchmarking framework used, which helped with generation of pretty graphs, was the open source Yardstick framework, available on GitHub.
    1

    View comments


  2. If you prefer a video demo with coding examples, skip to the screencast at the bottom of this blog.

    Distributed In-Memory Caching generally allows you to replicate or partition your data in memory across your cluster. Memory provides a much faster access to the data, and by utilizing multiple cluster nodes the performance and scalability of the application increases significantly.


    Majority of the products that do distributed caching call themselves In-Memory Data Grids. On top of simply providing hash-table-like access to your data, a data grid product should provide some combination of the following features:
    • Clustering
    • Distributed Messaging
    • Distributed Event Notifications
    • Distributed ACID Transactions
    • Distributed Locks
    • Distributed Data Queries, possibly using SQL
    • Distributed Data Structures, like Maps, Queues, Sets, etc.
    • Clustered Web Sessions
    • OR-Mapping Integration, including Hibernate
    • Persistent Database Support, like Oracle, MySQL, etc.
    Of course the devil is in the details. For example, given the distributed nature of the cluster anything can fail at any point. So a good question to ask is how the failures are handled, especially what if the failures happen during commit. If during commit a cluster can be left in semi-committed state due to failures, it is definitely a problem.

    Another example would be queries. Are the predicate queries being supported? Can you do SQL queries, particularly can the SQL Joins be handled? How are the aggregate functions handled, etc.

    Simplicity of APIs is very important as well. ConcurrentMap API has become a de facto standard of accessing data stored in distributed caches, but not all the products support it. Also, a good thing to check would be whether other standard data structures are supported. For example, GridGain supports Map, Set, BlockingQueue, AtomicLong, AtomicSequence, CountDownLatch, all in distributed fashion.

    And the last, but not least, always check for performance. Load up the cluster and see what the throughput and latencies are, what is the network load on each server, etc. A good benchmarking tool for testing distributed systems is open source Yardstick Framework, available on GitHub.

    Coding Example

    Here is a GridGain Data Grid coding example of some basic operations on distributed caches:
    private static void atomicMapOperations() throws GridException {
        GridCache<Integer, String> cache = GridGain.grid().cache(CACHE_NAME);
    
        // Put and return previous value.
        String v = cache.put(1, "1");
        assert v == null;
    
        // Put and do not return previous value 
        // (all methods ending with 'x' return boolean).
        // Performs better when previous value is not needed.
        cache.putx(2, "2");
    
        // Put asynchronously (every cache operation has async counterpart).
        GridFuture<String> fut = cache.putAsync(3, "3");
    
        // Put-if-absent.
        boolean b1 = cache.putxIfAbsent(4, "4");
        boolean b2 = cache.putxIfAbsent(4, "44");
    
    
        // Put-with-predicate, will succeed if predicate evaluates to true.
        cache.putx(5, "5");
        cache.putx(5, "55", new GridPredicate<GridCacheEntry<Integer, String>>() {
            @Override public boolean apply(GridCacheEntry<Integer, String> e) {
                return "5".equals(e.peek()); // Update only if previous value is "5".
            }
        });
    
        // Transform - assign new value based on previous value.
        cache.putx(6, "6");
        cache.transform(6, new GridClosure<String, String>() {
            @Override public String apply(String v) {
                return v + "6"; // Set new value based on previous value.
            }
        });
    
        // Replace.
        cache.putx(7, "7");
        b1 = cache.replace(7, "7", "77");
        b2 = cache.replace(7, "7", "777");
    }
    

    Screencast

    Here is a brief screencast showing how to get started with basic operations on your cache in under 5 minutes:

    2

    View comments

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.