1. I plan to blog in more detail about support of native SQL in GridGain data grid in a few days, but here is just a taste on how complex of an SQL statement you can execute against GridGain distributed in-memory cache. Note that you are not querying a database - you are querying objects cached in memory.

    The query below utilizes H2 database SQL over JDBC syntax to query a collection of QueueItems at specified positions of a dynamic priority-based queue stored in distributed GridGain cache. As we know, querying an element at a certain position of a result set is not trivial in any SQL, and nested 'select' statements with 'rownum' field usually need to be used. On top of that we need to pass a collection of item positions to a query, which is also a fairly advanced feature.

    String sql =
    "select * from (" +
    "select *, rownum as r from (" +
    "select * from QueueItem where qid=? " +
    "order by priority desc, seq asc" +
    ")" +
    ") where r in (select * from table(x int=?))"

    GridCacheQuery itemsQry = cache.createQuery(SQL, QueueItem.class, sql);

    // Specify positions of queue items we want to get.
    Collection<Integer> positions = Collections.asList(10, 12, 14);

    // Query items at specified positions from queue with ID 123.
    GridCacheQueryFuture fut = itemsQry.queryArguments(123, positions).execute();

    System.out.println("Queried queue items at specified positions: " + fut.get());

    Pretty powerful in my view especially when you also get smart routing, pagination, custom transformers, reducers, visit-only support, text queries, and a lot more...

    More information on GridGain queries can be found here or here.

    4

    View comments

  2. - Why use GridGain if there is Infinispan?
          - Why use Infinispan if there is Hazelcast?
                - Why use Hazelcast if there is Map over RMI?
                      - Why use Map over RMI if you can do it yourself?



    I think you see the pattern... As you climb up this ladder you get more and more value from your grid product.

    0

    Add a comment

  3. I am pleased to announce that we just recently released GridGain 3.0.4. The last couple of releases have been focused, among other things, around convenient and effective collocation of computations and data, and also grouping of data that is usually accessed together on the same nodes. Sending computations exactly to the nodes where the accessed data is residing is one of the key components in achieving better scalability. Without collocation, nodes fetch various data from other nodes for brief periods of time, just to perform often a quick computation and discard it almost immediately thereafter. This creates unnecessary data traffic, a.k.a. data noise, and can at times bring a server to its knees.

    In my previous blog post I showed how to collocate computations and data using direct API via GridCache.mapKeyToNode(..) method. We have also added analogous methods on Grid API to provide capability of finding data affinity on the nodes that do not cache any data themselves. In our latest 3.0.4 release we have also added a very convenient way to provide collocation via @GridCacheAffinityMapped annotation.

    Say you have 2 types of objects, Person and Company. Multiple persons can work for the same company. This means that you generally may wish to access Person objects together with the Company for which they work. To do that in a scalable fashion, you may wish to ensure that all people working for the same company are cached on the same node. This way you can send computations to that node and access multiple people from the same company locally. Here is how it can be done in GridGain.
    public class PersonKey {
    // Person ID used to identify a person.
    private String personId;

    // Company ID which will be used for data affinity.
    @GridCacheAffinityMapped
    private String companyId;
    ...
    }
    ...
    // Instantiate person keys with same company ID.
    Object personKey1 = new PersonKey("myPersonId1", "myCompanyId");
    Object personKey2 = new PersonKey("myPersonId2", "myCompanyId");

    // Both, the company and the person objects will be cached on the same node.
    cache.put("myCompanyId", new Company(..));
    cache.put(personKey1, new Person(..));
    cache.put(personKey2, new Person(..));

    Now, if you want to perform a computation which involves multiple people working for the same company, all you have to do is send a grid job to the node where those people are cached. Here is how you would send a computation to the node which caches all people for the company with ID "myCompanyId".
    G.grid().run(GridClosureCallMode.BALANCE, new Runnable() {
    // This annotation specifies that computation should be routed
    // precisely to the node where all objects with affinity key
    // of 'myCompanyId' are cached.
    @GridCacheAffinityMapped
    private String companyId = "myCompanyId";

    @Override public void run() {
    // Some computation logic here.
    ...
    }
    };

    Now, when you properly collocate all your data within your data grid and then route your computations to the nodes where your data is cached, all cache operations become LOCAL, hence achieving best performance and scalability without any data noise. Kind of goes inline with the first rule of distributed programming, which is DO NOT DISTRIBUTE.

    3

    View comments

  4. I have been thinking how a HelloWorld example should look for data grid. After checking some other products I have noticed that the most popular approach for a HelloWolrd app on a data grid is creating an example which has two counter parts: client and server. The client part usually prints out the operation on cache, and the remote server would usually print out the same operation whenever the data ends up on that server. This way users can see that the value stored in cache actually does get distributed to remote nodes.

    After looking at such examples it occurred to me that this client/server approach can be implemented a lot simpler in GridGain using zero deployment and basic event subscription. All we need to do is make sure that cache operations get printed out on remote nodes so we can visualize what's going on. However, for that we don't need to create a separate server app - we can do it all from our client example code.

    So, let's make sure that events are printed out. For that we will execute a closure on all grid nodes which will subscribe to cache events and print them. This closure can be executed directly from example code and will be automatically deployed on remote nodes. Here is how the code will look like:
    // Execute this runnable on all grid nodes, local and remote.
    G.grid().run(BROADCAST, new Runnable() {
        @Override public void run() {
            // Event listener which will print out cache events, so we 
            // can visualize what happens on remote nodes.
            GridLocalEventListener lsnr = new GridLocalEventListener() {
                @Override public void onEvent(GridEvent e) {
                    System.out.println("Event '" + e.type() + "' for key: " +
                        ((GridCacheEvent)e).key());
                }
            };
    
            // GridNodeLocal is a ConcurrentMap attached to every grid node.
            Object prev = grid.nodeLocal().putIfAbsent("lsnr", lsnr);
    
            // Make sure that we only subscribe one time regardless
            // of how many times we run the example.
            if (prev == null)
                grid.addLocalEventListener(lsnr,
                    EVT_CACHE_OBJECT_PUT, 
                    EVT_CACHE_OBJECT_READ, 
                    EVT_CACHE_OBJECT_REMOVED);
         }
    });
    Note how easy it is in GridGain to execute any kind of code on all grid nodes (or any subset of nodes) without actually having to deploy anything. Now lets play with some basic cache operations and see what happens:
    // Create strongly typed cache projection to avoid casting.
    final GridCacheProjection<Integer, String> cache = 
        G.grid().cache().projection(Integer.class, String.class);
    
    // Store some values in cache.
    for (int i = 0; i < 10; i++)
        cache.put(i, "value-" + i);
    
    // Note that size may differ depending on whether cache
    // is distributed or partitioned.
    System.out.println("Cache size: " + cache.size());
    
    // Visit every cache element stored on local node.
    // Note that 'CI1' is a just a type alias for 'GridInClosure' type.
    cache.forEach(new CI1<GridCacheEntry<Integer, String>>() {
        @Override public void apply(GridCacheEntry<Integer, String> e) {
            // Peek at locally cached values.
            System.out.println("Visited locally cached entry: " + e.peek());
        }
    });
    
    // Collocate computations and data.
    for (int i = 0; i < 10; i++) {
        final int key = i;
    
        // Find primary node for a key.
        final UUID nodeId = cache.mapKeyToNode(key);
    
        // Execute your computations on nodes where the data is cached to avoid a 
        // potentially heavy operation of bringing data to the local node. 
        // This is called Collocation of Computations and Data.
        G.grid().node(nodeId).run(UNICAST, new Runnable() {
            @Override public void run() {
                System.out.println("Collocating computations and data " + 
                    "on node: " + nodeId);
    
                // Usually you would do something more complex than this :)
                System.out.println("Cached value: " + cache.peek(key));
            }
        });
    }
    
    // The 'get' operation will bring values from remote nodes
    // even if they are not cached on local node. Generally,
    // you would want to avoid it, if possible, as it may 
    // create unnecessary data traffic.
    for (int i = 0; i < 10; i++)
        System.out.println("Cached value: " + cache.get(i));
    The example above is just a small sample of what you can do with GridGain data grid. Note that if the cache is configured to be replicated (which is default), then data will be replicated to all nodes and every node will get the same copy. If cache is partitioned, then only a designated primary node (and also backup nodes, if any) will get to cache a specific key-value pair.

    Also note how easily we brought our computations to the nodes where the data is cached, as opposed to bringing the data to the computations. Performing computations without any unnecessary data movement (a.k.a. data noise) is one of the most important elements in achieving better scalability.

    To run this example, startup a few stand alone GridGain nodes by executing GRIDGAIN_HOME/bin/ggstart.{sh|bat} script and watch what happens.

    Enjoy!

    1

    View comments







  5. GridGain will present at NYC Scala Meetup on January 4th, all details are here: http://www.meetup.com/ny-scala/calendar/15665134/.

    The topic is “Distributed Computing with Scala and GridGain“. As always – lots of live coding, world’s shortest MapReduce app, Scala DSL for distributed programming, grid-enabled PingPong, and more...

    Hope to see you there!

    0

    Add a comment

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.