1. If you don't like to read and prefer video demos, you can skip directly to the Screencast at the bottom of this post.

    What do Clustering frameworks really do? More often than not clustering frameworks will provide capability to auto-discover servers on the network, share resources, and schedule tasks. Some will also add distributed messaging and distributed event notification capabilities.

    While there are some well known clustering frameworks, like Zookeeper or Mesos, they usually provide very rudimentary clustering capabilities. However, often on top of basic clustering, you also need to perform MapReduce computations, distribute closures, or distribute data. For cases like these, Compute Grids (a.k.a. High Performance Computing Grids) or Data Grids become very useful.

    For those not familiar with term "Data Grid", it is simply a Distributed Cache with more advanced features, like distributed data querying, transactions, etc...

    Compute Grids or Data Grids often provide very advanced clustering APIs which are very simple to use. Here I will show some basic examples on top of GridGain In-Memory Data Grid, which is Open Source and licensed under Apache license.

    GridGain clustering supports auto-node discovery, but at the same time adds capabilities to create any virtual sub-groups of grid nodes within cluster and exchange messages between them or get remote event notifications. While I have blogged about it in more detail before, here is a pretty simple example which demonstrates auto-discovery and distributed computations on the cluster:
    try (Grid grid = GridGain.start()) {
       // Create sample runnable.
       Runnable r = new GridRunnable() {
          @Override public void run() {
             System.out.println("Hello World");
          }
       }
     
       // Broadcast to all grid nodes.
       grid.compute().broadcast(r).get();
     
       // Broadcast to remote nodes only.
       grid.forRemotes().compute().broadcast(r).get();
     
       // Unicast to some remote node picked by load balancer.
       grid.forRemotes().compute().run(r).get();
     
       // Unicast to some node with CPU load less than 50%.
       grid.forPredicate(new GridPredicate<GridNode>() {
          @Override public boolean apply(GridNode node) {
             return node.metrics().getCurrentCpuLoad() < 0.5;
          }
       }).compute().run(r).get();
    }
    

    Screencast

    Here is a brief screencast showing how to get started with running computations on your cluster in under 5 minutes:

    0

    Add a comment


  2. I am pleased to announce that GridGain 6.1.0 has been released today. This is the first main upgrade since GridGain 6.0.0 was released in February and contains some cool new functionality and performance improvements:

    Support for JDK8

    With GridGain 6.1.0 you can execute JDK8 closures and functions in distributed fashion on the grid:
    try (Grid grid = GridGain.start()) {
      grid.compute().broadcast((GridRunnable)() -> 
          System.out.println("Hello World")).get();
    }
    

    Geospatial Indexes

    GridGain allows to easily query in-memory data in SQL using in-memory indexes. Now you can extend SQL to geospatial queries. For example, query below will find all points on the map within a certain square region:

    Polygon square = factory.createPolygon(new Coordinate[] {
       new Coordinate(0, 0),
       new Coordinate(0, 100),
       new Coordinate(100, 100),
       new Coordinate(100, 0),
       new Coordinate(0, 0)
    });
    
    cache.queries().
        createSqlQuery(MapPoint.class, "select * from MapPoint where location && ?").
             queryArguments(square).
                 execute().get();
    

    Near Cache in Atomic Mode

    Prior to 6.1.0 GridGain supported near cache only in transactional mode. Starting with 6.1.0 near cache support was added to atomic mode as well.

    Near cache allows for client-side caching (vs traditional server side caching) and renders significant performance improvements in some cases.

    Fair Affinity Functions

    Many know that Consistent Hashing provides a consistent distribution of data within a cluster that is resilient to server failures, but not many know that consistent hashing is not very fair. The discrepancies in distribution can be up to 20% which means that some servers will end up with 20% more data than others. This may create uneven load distribution when running cluster-enabled computations or queries.

    GridGain 6.1 added two more affinity functions in addition to consistent hashing: Rendezvous and Fair.

    Rendezvous affinity function works faster than consistent hashing and for smaller topologies (under 10 servers) provides a pretty fair distribution. One of the nice features here is that cache key affinity survives full cluster restarts. This means that you can back up data to disk and then reload it on restart knowing that all keys are still mapped to the same node.

    Fair affinity function provides absolutely fair cache key distribution with all grid nodes holding absolutely equal amount of keys at all times. However, fair affinity function may change key-to-node assignment upon full cluster restarts.

    Other Enhancements

    Other fixes and enhancements involve improvements to multicast protocol for discovery and significant performance improvements for distributed cache queues.

    You can download GridGain 6.1 here.

    1

    View comments

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.