1. Before diving deeper into what it means to easily cluster an application, let's start from defining what  a cluster really is. Wikipedia has a pretty good explanation of clustering here, which is a high level definition that covers fault tolerance, load balancing, scheduling, etc. However, the real magic behind clustering is in making these complex distributed operations seem easy.


    From development standpoint ability to cluster an application in most cases can be reduced to being able to easily perform the following functions:
    1. Get list of all currently alive cluster nodes
    2. Ability to create sub-groups of nodes within cluster at will
    3. Exchange messages between any nodes within cluster
    4. Listen to events from any node within a specified group
    5. Easily compute and share data on any of the cluster nodes
    Here are the coding examples on how to achieve the above in GridGain. I hope the code is simple enough to understand, but would be interesting to get some feedback on it. Feel free to comment.

    Let's start from getting list of all cluster nodes:

    Collection<GridNode> nodes = GridGain.grid().nodes();
    

    Now, let's create different sub-groups of nodes within cluster:
    // Remote nodes (all nodes, excluding local)
    GridProjection rmtNodes = grid.forRemotes();
    
    // Random remote node.
    GridProjection rmtRandomNode = rmtNodes.forRandom();
    
    // Current CPU load of remote random node.
    double cpu = rmtRandomNode.node().metrics().getCurrentCpuLoad();
    
    // All nodes on the same physical host as remote random node.
    GridProjection hostNodes = grid.forHost(rmtRandomNode.node());
    
    // All nodes marked by user as worker nodes.
    GridProjeciton workers = grid.forAttribute("worker", "true");
    

    Here is an example of message exchange between cluster nodes in GridGain cluster:
    // User-defined message topics.
    private enum TOPIC { MYTOPIC }
     
    // Get message instance to provide messaging functionality 
    // over a projection of remote nodes.
    GridMessaging msg = grid.forRemotes().message();
     
    // Register message listeners on all remote grid nodes.
    msg.remoteListen(MYTOPIC, new GridBiPredicate<UUID, String>() {
        @Override public boolean apply(UUID sndrNodeId, String msg) {
            System.out.println("Received message: " + msg");
             
            return true; // Return true to continue listening.
        }
    }).get();
     
    msg.send(MYTOPIC, "Hello World");
    

    This example shows how to subscribe an event listener on all grid nodes:
    // This optional local callback is called for each event notification
    // that passed remote predicate listener.
    GridBiPredicate<UUID, GridCacheEvent> locLsnr = new GridBiPredicate<UUID, GridCacheEvent>() {
        @Override public boolean apply(UUID uuid, GridCacheEvent evt) {
            System.out.println("Received event: " + evt.name());
    
            return true; // Continue listening.
        }
    };
    
    // Remote listener which only accepts events for keys that are
    // greater or equal than 10 and if event node is primary caching node for this key.
    GridPredicate<GridCacheEvent> rmtLsnr = new GridPredicate<GridCacheEvent>() {
        @Override public boolean apply(GridCacheEvent evt) {
            System.out.println("Cache event: " + evt.name());
    
            int key = evt.key(); // Cache key.
    
            return key >= 10 && cache.affinity().isPrimary(grid.localNode(), key);
        }
    };
    
    // Subscribe to specified cache events on all nodes that have "myCache" running.
    grid.forCache("myCache").events().remoteListen(locLsnr, rmtLsnr, EVT_CACHE_OBJECT_PUT).get();
    

    And finally, an example that distributes computations to the nodes where the data is cached:
    for (int i = 0; i < KEY_CNT; i++) {
        final int key = i;
    
        // This runnable will execute on the remote node where
        // the data with the given key is cached. 
        GridRunnable run = new GridRunnable() {
            @Override public void run() {
                System.out.println("Computing [key= " + key + ", value=" + cache.peek(key) + ']');
            }
        };
    
        grid.compute().affinityRun("myCache", key, run).get();
    }
    

    0

    Add a comment


  2. As you may already know, GridGain went open source last week. Going open source was a lot more involved than simply opening up our code. We put significant amount of thought into simplifying our APIs and making our development process as community friendly as possible.

    As an example, take a look at how in GridGain you can take any local operation and distribute it across the cluster. Let's take GridCache interface. In addition to distributed methods, like get(...) or put(...), many APIs on this interface are local. For example, method size() will return number of entries locally cached, or method containsValue(...) will check if value is cached on local node. These APIs are made local on purpose - we anticipated that for certain methods providing local information would be safer and more useful. However, what if you need to see if value is contained across the whole cache, not just local node cache.

    In GridGain, to make any operation distributed you need to execute it across multiple nodes using GridCompute functionality. Here is how the global contains method would look like:
    public boolean contains(final V val) {
        // Not all nodes in the grid may participate in caching data.
        // We want to make sure that we send our computation only to caching nodes.
        GridProjection cacheNodes = GridGain.grid().forCache("myCache");
    
        Collection<Boolean> results = cacheNodes.compute().broadcast(
            new GridCallable<Boolean>() {
                @Override public Boolean call() {
                    return GridGain.grid().cache("myCache").containsValue(val);
                }
            }
        ).get();
        
        return results.contains(true); 
    }
    
    You can do the same for any other operation as well. For example, if you need to find out total cache size across all cache nodes, you would simply broadcast a computation that returned cache.size() from all nodes, and them add them up to get a total value.
    0

    Add a comment




  3. Yesterday GridGain released it's 6.0 version under the Apache 2.0 open source license. Our CTO, Nikita Ivanov, wrote about the new GridGain features and licensing in his blog here, so I will not repeat them. Instead, I will briefly describe our vision behind In-Memory Computing and why we made the move to open source.

    Why is In-Memory Computing important? The simple answer is that there is no other way to process today’s enormous data volumes. In order to get answers from 100’s of terabytes of data in milliseconds you absolutely must have an In-Memory solution in your architecture. This is being validated by not just GridGain. Large vendors, such as Oracle (in-memory database and in-memory Exadata), IBM (BLU analytics), SAP (Hana), are also moving in the same direction.

    So, with all those solutions out there, what makes GridGain different? In a nutshell, we provide a unified In-Memory Computing Platform aimed to solve a wide range of use cases. Our platform is composed of multiple natively integrated products, including High Performance Computing (HPC), the industry’s fastest In-Memory Data Grid (IMDG), CEP-based Streaming, and a plug-and-play Apache Hadoop Accelerator. With our new open source strategy, all of these products are now freely available for download, either a la carte or together as part of a larger platform edition.

    With GridGain In-Memory Computing Platform you can process in parallel 100s of thousands of computational jobs per second, store terabytes of data in memory for fast transactional access and SQL querying, index into never-ending streams of incoming data, or give your Hadoop installations up to 100x boost.

    We've been around the block as well. The product has been vetted by many customers, including large production deployments exceeding thousands of nodes. Open sourcing our platform just seemed like a natural way to share our technology with community and continue growing as a part of a larger in-memory eco-system. Unlike other commercial open source offerings, we went with a very liberal Apache license and with a feature set more than adequate to give GridGain open source users the ability to deploy in production. The product even includes Management and Monitoring, which most vendors rarely offer free of change.

    In the upcoming days, I will be giving coding examples, demonstrating the ease of use of our APIs, and sharing various use cases. In the mean time, please feel free to download GridGain and give it a try. You can start by taking a look at our Getting Started guide and trying a few examples.


    6

    View comments

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.