1. GridGain 2.0.3 has been released. For the most part it was a stability release which had undergone tremendous amount of testing and as a result is a great improvement to overall fault-tolerance and scalability of the product. We have tested all sorts of scenarios where nodes kept joining and leaving grid at will under significant load and introduced a lot of performance improvements.

    However, this release does have a new feature I am very excited about - Grid-Enabled ExecutorService which executes all the tasks submitted to it on remote grid nodes. Basically, you use it as you would normally use java.util.concurrent.ExecutorService, but you get all the cool GridGain features right out of the box, such as peer-class-loading, fault-tolerance, load balancing, job scheduling and collision resolution, etc...

    Here is a "Hello World" example that shows how simple it is to use it. Let's first create a simple java.util.concurrent.Callable that prints out a string and returns the number of characters in that string:

    public class ExampleCallable implements Callable<Integer>, Serializable {
    /** String argument. */
    private String arg;

    public ExampleCallable(String arg) {
    this.arg = arg;
    }

    public Integer call() {
    // Print out string passed in.
    System.out.println(arg);

    // Return number of characters.
    return arg.length();
    }
    }

    Now, let's execute our ExampleCallable on the grid:

    public final class GridExecutorExample {
    public static void main(String[] args) throws GridException {
    // Start grid node.
    GridFactory.start();

    try {
    Grid grid = GridFactory.getGrid();

    // Create new grid-enabled ExecutorService
    ExecutorService exec = grid.newGridExecutorService();

    // Execute ExampleCallables on the grid.
    Future<Integer> hello = exec.submit(new ExampleCallable("Hello");
    Future<Integer> world = exec.submit(new ExampleCallable("World");

    // Print out number of characters from both executions.
    System.out.println("'Hello' character count: " + hello.get());
    System.out.println("'World' character count: " + world.get());

    // Close executor service.
    exec.shutdown();
    }
    finally {
    // Stop grid node.
    GridFactory.stop(true);
    }
    }
    }

    To make it interesting, let's start a couple of stand alone grid nodes by simply executing gridgain.sh or gridgain.bat script under GRIDGAIN_HOME/bin folder (you can start them on the same physical box if you like).

    Note that we don't need to do any deployment of our code to the grid. All required classes will be peer-class-loaded automatically.

    After running our example, you should observe that one node will print word "Hello" and another node will print word "World".

    Enjoy!

     

    0

    Add a comment

  2. When designing your tasks for execution on the grid you need to decide whether or not you need to split your task into smaller jobs for parallel execution, and what the size of your split should be.

    When Not To Split

    People often think of a grid as of infrastructure for execution of long running tasks. That is not always the case. Compute grids can add a lot of value even for quick, short running jobs. Imagine, for example, that your application constantly needs to calculate in real time a bunch of statistical metrics and averages on a financial portfolio, let's say for displaying them on UI. Although every single calculation may not take too long, having all calculations performed concurrently on the same server or a thick client can bring it to its knees fairly quickly. A good solution would be to take every calculation and execute it on a separate grid node.

    The example above is a good case for when not to split your tasks. When deciding whether to split or not, you should take into consideration the time of local execution. If your task can execute locally fast enough, then don't split it and run it on the grid as a whole. By doing that you get the following benefits:
    • You remove a single point of failure. If a grid node crashes in a middle of calculation, then GridGain will automatically fail it over to another node.
    • You balance the load across your grid nodes. GridGain will automatically load balance your jobs to the least loaded nodes. You can also turn on job stealing and have less loaded nodes steal jobs from more loaded nodes.
    • You improve overall scalability of your application. Now you can add or remove grid nodes on demand whenever your application load peaks or slows down and, hence, keep the response times constant regardless of the load. For example, you can configure your grid to include more nodes into topology as application load grows.
    • You get GridGain's simplicity. Here is how simple it can be to execute some portfolio calculation on the grid. Note that all you have to do is attach @Gridify annotation to your Java method and that's it!

    @Gridify
    public void calculatePortfolioPosition() {
    ...
    }

    How To Split

    Now, let's say you really need to split your task into smaller jobs in order to speed up execution. A good formula to decide on what the size of your split should be is to take the time your task takes to execute locally and divide it by the time you would like to achieve. So if your task executes in 2 seconds and you would like to achieve 100 milliseconds, then the number of jobs your task should split into should be (2000 ms / 100 ms = 20). In reality, the execution time will be slightly more than 100ms as most likely your jobs will not be absolutely equal, and there is also a slight network communication overhead.

    That is not to say that for this example you would only need to have 20 nodes in the grid. Ideally you should have as many nodes as your application needs in order to handle the load - let GridGain pick the most available 20 nodes for execution of individual jobs within your task.

    For more information visit our Wiki or watch Grid Application In 15 Minutes screencast.

     

    0

    Add a comment

  3. We sometimes get questions from users on how to ensure Master-Worker pattern within peer-to-peer (P2P) architecture in GridGain. When designing our API and our deployment model, we purposely went with P2P architecture because we wanted to have ultimate freedom on how a grid node is used. As a result, in GridGain a node can act as master or worker or both, depending on your configuration. Moreover, you don't even have to change a single line of code to get this to work.

    The following example shows on how it can be done. In GridGain every node has a notion of attributes which it gets at startup. Here is an example that shows how a node can get a "worker" attribute from Spring XML configuration file at startup:

    <bean
    id="grid.cfg"
    class="org.gridgain.grid.GridConfigurationAdapter"
    scope="singleton">
    ...
    <property name="userAttributes">
    <map>
    <!-- Make this node a worker node. -->
    <entry key="segment.worker" value="true"/>
    </map>
    </property>
    ...
    </bean>

    Now, we need to make sure that only worker nodes are passed into GridTask.map(...) method on the master nodes. To do this, on the master nodes we need to configure GridAttributesTopologySpi, the purpose of which is to filter nodes based on their attributes. Here is how the configuration will look like:

    <bean
    id="grid.cfg"
    class="org.gridgain.grid.GridConfigurationAdapter"
    singleton="true">
    ...
    <property name="topologySpi">
    <bean class="org.gridgain.grid.spi.topology.attributes.GridAttributesTopologySpi">
    <property name="attributes">
    <map>
    <!-- Include only worker nodes. -->
    <entry key="segment.worker" value="true"/>
    </map>
    </property>
    </bean>
    </property>
    ...
    </bean>

    That's it! To verify that it works, we can add assertion into our GridTask implementation to make sure that all included nodes are indeed "worker" nodes as follows:

    public class FooBarGridTask
    extends GridTaskAdapter<String, String> {
    ...
    public Map<GridJob, GridNode> map(
    List<GridNode> topology, String arg) {

    Map<GridJob, GridNode> jobs =
    new HashMap<GridJob, GridNode>(topology.size());

    for (GridNode node : topology) {
    String workerAttr =
    node.getAttribute("segment.worker");

    // Assert that worker attribute is present and
    // is assigned value "true".
    assert workerAttr != null;
    assert Boolean.getBoolean(workerAttr) == true;

    jobs.put(new FooBarWorkerJob(arg), node);
    }

    return jobs;
    }
    ...
    }

    Note, that although we only segmented grid into 2 segments, masters and workers, you can configure as many segments as you like by providing additional node attributes. For example, you can have several worker groups each responsible for processing only a certain subset of jobs. Take a look at Segmenting Grid Nodes article on our Wiki for additional examples.

    Enjoy!

     

    0

    Add a comment


  4. VS.




    I’ve been repeatedly asked on how GridGain relates to Hadoop. Having been answering this questions over and over again I’ve compacted it to just few words. 
    We love Hadoop HDFS, but we are sorry for people who have to use Hadoop MapReduce. 
    Let me explain...

    Hadoop HDFS














    We love Hadoop HDFS. It is a new and improved version of enterprise tape drive. It is an excellent technology for storing historically large data sets (TB and PB scale) in a distributed disk-based storage. Essentially, every computer in Hadoop cluster contributes portion of its disk(s) to Hadoop HDFS and you have a unified view on this large virtual file system.

    It has its shortcomings too like slow performance, complexity of ETL, inability to update the file that’s already been written or inability to deal effectively with small files – but some of them are expected and project is still in development so some of these issues will be mitigated in the future. Still – today HDFS is probably the most economical way to keep very large static data set of TB and PB scale in distributed file system for a long term storage.

    GridGain provides several integration points for HDFS like dedicated loader and cache loaders. Dedicated data loader allows data to be bulk-loaded into In-Memory Data Grid while cache loader allows for much more fine grained transactional loading and storing of data to and from HDFS.
    Many clients using GridGain with HDFS is a good litmus test for that integration.

    Hadoop MapReduce












    As much as we like Hadoop HDFS we think Hadoop’s implementation of MapReduce processing is inadequate and outdated: 
    Would you run your analytics today off the tape drives? That’s what you do when you use Hadoop MapReduce?
    The fundamental flaw in Hadoop MapReduce is an assumption that a) storing data and b) acting upon data should be based off the same underlying storage.

    Hadoop MapReduce runs jobs over the data stored in HDFS and thus inherits, and even amplifies, all the shortcomings of HDFS. Extremely slow performance, disk-based storage that leads to heavy batch orientations which in turn leads to inability to effectively process low latency tasks… which ultimately makes Hadoop MapReduce an “elephant in the room” when it comes to inability to deliver real time big data processing.

    Yet one of the most glaring shortcomings of Hadoop MapReduce is that you’ll never be able to run your jobs over the live data. HDFS by definition requires some sort of ETL process to load data from traditional online/transactional (i.e. OLTP) systems into HDFS. By the time the data is loaded (hours if not days later) – the very data you are going to run your jobs over is… stale or frankly dead.

    GridGain

    GridGain’s MapReduce implementation addresses many of these problems. We keep both highly transactional and unstructured data smartly cached in extremely scalable In-Memory Data Grid and provide industry first fully integrated In-Memory Compute Grid that allows to run MapReduce or Distributed SQL/Lucene queries over the data in memory.

    Having both data and computations co-located in memory makes low latency or streaming processing simple.

    You, of course, can still keep any data for a long term storage in underlying SQL, ERP or Hadoop HDFS storages when using GridGain – and GridGain intelligently supports any type of long terms storage.

    Yet GridGain doesn’t force you to use the same storage for processing data – we are giving you the choice to use the best of two worlds: keep data in memory for processing, and keep data in HDFS for long term storage.


    GridGain - Real Time Big Data from GridGain Systems on Vimeo.
    4

    View comments

  5. GridGain and GridDynamics announced partnership today.

    We are looking forward to working together with Grid Dynamics as there is a lot of synergy between the 2 companies. GridDynamics brings to the table a broad grid computing expertise which together with GridGain open source product and professional support will help us deliver best-of-breed cost effective solutions to our clients.

    You can see the full press release here.
    0

    Add a comment

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.