1. Here is an example of how you can perform MergeSort on a distributed grid product like GridGain. This example is somewhat artificial, as you probably would never do the same thing in real life (executing the same code locally is most likely faster), but it does demonstrate some pretty cool features of GridGain, like recursive task execution and continuations.

    This is the task class which splits array in two and sends remote jobs to sort the new arrays. Remote jobs in their turn execute the same task over and over again until we get to array size of 1, after which we begin merge process.
    class GridMergeSortTask extends GridTaskSplitAdapter<int[], int[]> {
        // Injected Grid instance.
        @GridInstanceResource private Grid grid;
    
        @Override 
        protected Collection<GridJob> split(int gridSize, int[] initArr) {
            Collection<GridJob> jobs = new LinkedList<GridJob>();
    
            for (final int[] arr : splitArray(initArr)) {
                jobs.add(new GridJobAdapterEx() {
                    // Auto-inject job context.
                    @GridJobContextResource
                    private GridJobContext jobCtx;
    
                    // Task execution result future.
                    private GridTaskFuture<int[]> fut;
    
                    @Override public Object execute() throws GridException {
                        if (arr.length == 1)
                            return arr;
    
                        // Future is null before holdcc() is called and
                        // not null after callcc() is called.
                        if (fut == null) {
                            // Launch the recursive child task asynchronously.
                            fut = grid.execute(new GridMergeSortTask(), arr);
    
                            // Add a listener to the future, that will resume the
                            // parent task once the child one is completed.
                            fut.listenAsync(new GridInClosure<GridFuture<int[]>>() {
                                @Override public void apply(GridFuture<int[]> fut) {
                                    // CONTINUATION:
                                    // =============
                                    // Resume suspended job execution.
                                    jobCtx.callcc();
                                }
                            });
    
                            // CONTINUATION:
                            // =============
                            // Suspend job execution to be continued later and
                            // release the executing thread.
                            return jobCtx.holdcc();
                        }
                        else {
                            assert fut.isDone();
    
                            // Return the result of a completed child task.
                            return fut.get();
                        }
                    }
                });
            }
    
            return jobs;
        }
    
        /**
         * GridTask reduce logic. This method is called when both child jobs
         * are completed, and is a Reduce step of Merge Sort algorithm.
         */
        @Override public int[] reduce(List<GridJobResult> results) {
            // This is in case we have a single-element array.
            if (results.size() == 1)
                return results.get(0).getData();
    
            assert results.size() == 2;
    
            int[] arr1 = results.get(0).getData();
            int[] arr2 = results.get(1).getData();
    
            return mergeArrays(arr1, arr2);
        }
    
        private static Iterable<int[]> splitArray(int[] arr) {
            int len1 = arr.length / 2;
            int len2 = len1 + arr.length % 2;
    
            int[] a1 = new int[len1];
            int[] a2 = new int[len2];
    
            System.arraycopy(arr, 0, a1, 0, len1);
            System.arraycopy(arr, len1, a2, 0, len2);
    
            System.out.println("Split array [arr1Len=" + a1.length + 
                ", arr2Len=" + a2.length + ']');
    
            return Arrays.asList(a1, a2);
        }
    
        private static int[] mergeArrays(int[] arr1, int[] arr2) {
            int[] ret = new int[arr1.length + arr2.length];
    
            int i1 = 0;
            int i2 = 0;
    
            // Merge 2 arrays into a resulting array
            for (int i = 0; i < ret.length; i++) {
                if (i1 >= arr1.length) {
                    System.arraycopy(arr2, i2, ret, i, arr2.length - i2);
    
                    break;
                }
                else if (i2 >= arr2.length) {
                    System.arraycopy(arr1, i1, ret, i, arr1.length - i1); 
    
                    break;
                }
                else
                    ret[i] = arr1[i1] <= arr2[i2] ? arr1[i1++] : arr2[i2++];
            }
    
            System.out.println("Merged arrays [resLen=" + ret.length + 
                ", arr1Len=" + arr1.length + ", arr2Len=" + arr2.length + ']');
    
            return ret;
        }
    }
    
    And here is how you would call this task:
        public static void main(String[] args) throws GridException {
            Grid grid = G.start();
    
            try {
                int[] inArr = generateRandomArray(30);
    
                System.out.println("Unsorted array: " + Arrays.toString(inArr));
    
                int[] outArr = grid.execute(new GridMergeSortTask(), inArr).get();
    
                System.out.println("Sorted array: " + Arrays.toString(outArr));
            }
            finally {
                G.stop(true);
            }
        }
    
        private static int[] generateRandomArray(int size) {
            int[] ret = new int[size];
    
            Random rnd = new Random();
    
            for (int i = 0; i < ret.length; i++)
                ret[i] = rnd.nextInt(100);
    
            return ret;
        }
    
    Enjoy!
    0

    Add a comment

  2. I have been getting many questions of how to tune GridGain, so I decided to create a brief manual which covers most important tuning properties.

    1. GridGain is multi-threaded - Use It

    If you are experiencing somewhat slow performance for cache updates, you should ask yourselves whether you are utilizing full computing power (all the cores) on your machine. GridGain is multi-threaded internally, but if you are doing sequential operations one after another from a single thread, then you are not using multithreading. Generally it makes sense to use the amount of threads of about 2 or 3 times the number of cores for populating grid. All GridGain APIs are thread-safe, so you don't have to worry about any concurrency issue when populating data.

    2. Use Collocated Computations

    GridGain enables you to execute MapReduce computations in memory. However, most computations usually work on some data which is cached on remote grid nodes. Loading that data from remote nodes is usually expensive and it is a lot more cheaper to send the computation to the node where the data is. The easiest way to do it is to use GridProjection.affinityRun(...) method; however GridGain has plenty of "mapKeysToNodes(...)" methods to help users figure out data ownership within Grid.

    3. Use Data Loader

    If you need to upload lots of data into cache, use org.gridgain.grid.GridDataLoader to do it. Data loader will properly batch the updates prior to sending them to remote nodes and will properly control number of parallel operations taking place on each node to avoid thrashing. Generally it  provides performance of 10x than doing a bunch of single-threaded updates.

    4. Tune Initial Cache Size

    To avoid internal resizing of cache maps you should always provide proper cache start size - not doing so can significantly hurt performance as some CPU cycles will be spent on GridGain resizing internal cache maps instead of application logic. You can configure cache start size via GridCacheConfiguration.getStartSize() configuration property.

    5. Tune Near Cache

    When using Partitioned cache, GridGain will front this cache with local Near cache to make sure that if entry does not belong to local partitions, it will still be cached in a smaller local cache for better performance on next access. 

    However, most usages of GridGain happen from collocated computations, i.e. computations submitted to the grid are usually routed to the nodes where the data resides automatically. In cases like this, using Near cache is redundant, as all data access happens from memory anyway. To save on performance, you can disable Near cache by setting GridCacheConfiguration.isNearEnabled() configuration property.

    6. Tune Off-Heap Memory

    If you plan to allocate large amounts of memory to your JVM for data caching (usually more than 10GB of memory), then your application will most likely suffer from prolonged lock-the-world GC pauses which can significantly hurt latencies. To avoid GC pauses use off-heap memory to cache data - essentially your data is still cached in memory, but JVM does not know about it and GC is not affected.

    The only configuration property to set to enable off-heap memory is GridCacheConfiguration.getMaxOffHeapMemory() which will tell GridGain how much off-heap memory to make available for your application. By default off-heap memory is disabled.

    7. Tune Swap Storage

    First of all, if you don't plan to use swap storage (i.e. disk overflow storage), you should not change any default swap settings (swap storage is disabled by default). If you do need to use swap storage, then you should enable it via GridCacheConfiguration.isSwapEnabled() configuration property.

    8. Tune Query Indexing

    There are several configuration properties that you should watch out for here. First of all and most importantly, if you don't plan to use cache queries at all, you should disable indexing altogether via GridCacheConfiguration.isQueryIndexEnabled() configuration property.

    If you do plan to use cache queries, you should properly enable/disable indexing of primitive keys and values on GridH2IndexingSpi. You should enable indexing for primitive keys by setting setDefaultIndexPrimitiveKey() to true on the SPI only if you plan to use primitive cache keys in your cache queries. The same goes for indexing primitive values controlled by setDefaultIndexPrimitiveValue(...) property. 

    Also, if for every value class you don't plan to have different key classes (essentially every value class has one key class), set setDefaultIndexFixedTyping(...) on the SPI to true. This way GridGain will store key types as corresponding SQL types instead of binary form which provides faster performance for key lookups.

    9. Tune Eviction Policy

    Again, if you don't plan to over-populate your cache, i.e. if you don't need any eviction policy at all, then you should disable eviction policy altogether via GridCacheConfiguration.isEvictionEnabled() configuration property. 

    If you do need GridGain to make sure that data in cache does not overgrow beyond allowed memory limits, you should carefully choose the eviction policy you need. Most likely you will need either FIFO or LRU eviction policies shipped with GridGain, however depending on your application, you may need to configure LIRs or plugin your own custom eviction policy. Regardless of which eviction policy you use, you should carefully chose the maximum amount of entries in cache allowed by eviction policy - if cache size overgrows this limit, then evictions will start occurring. Usually max size is controlled by setMaxSize(...) configuration property on the instance of eviction policy.

    You should also almost always configure "setAllowEmptyEntries(...)" configuration property to false. By default GridGain will keep entries with null values in cache to preserve some other properties of the entry, like time-to-live for example. However, if you don't use time-to-live then most likely you should discard the entry once it gets expired or invalidated.

    10. Use Write-Behind Caching

    If you can afford for your persistent store to be behind your in-memory cache, then use write-behind caching. When write-behind is enabled, GridGain will batch up cache updates and flush them to database in batches in the background which can often provide significant performance benefits. You can enable write-behind caching via GridCacheConfiguration.isWriteBehindEnabled() configuration property.
    6

    View comments

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.