1. In a nutshell, Grid Computing is a way to distribute your computations across multiple computers (nodes). However, even JMS does that, but JMS is not a grid computing product - it's a messaging protocol. To correctly classify Grid Computing products we have to split them into 2 categories: Compute Grids and Data Grids.

    Compute Grid
    Compute Grids allow you to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel. The obvious benefit here is that your computation will perform faster as it now can use resources from all grid nodes in parallel. One of the most common design patterns for parallel execution is MapReduce. However, Compute Grids are useful even if you don't need to split your computation - they help you improve overall scalability and fault-tolerance of your system by offloading your computations onto most available nodes. Some of the "must have" Compute Grid features are:
    • Automatic Deployment - allows for automatic deployment of classes and resources onto grid without any extra steps from user. This feature alone provides one of the largest productivity boosts in distributed systems. Users usually are able to simply execute a task from one grid node and as task execution penetrates the grid, all classes and resources are also automatically deployed.
    • Topology Resolution - allows to provision nodes based on any node characteristic or user-specific configuration. For example, you can decide to only include Linux nodes for execution, or to only include a certain group of nodes within certain time window. You should also be able to choose all nodes with CPU loaded, say, under 50% that have more than 2GB of available Heap memory.
    • Collision Resolution - allows users to control which jobs get executed, which jobs get rejected, how many jobs can be executed in parallel, order of overall execution, etc.
    • Load Balancing - allows to balance properly balance your system load within grid. Usually range of load balancing policies varies within products. Some of the most common ones are Round Robin, Random, or Adaptive. More advanced vendors also provide Affinity Load Balancing where grid jobs always end up on the same node based on job's affinity key. This policy works well with Data Grids described below.
    • Fail-over - grid jobs should automatically fail-over onto other nodes in case of node crash or some other job failure.
    • Data Grid Integration - it is important that Compute Grid are able to natively integrate with Data Grids as quite often businesses will need both, computational and data features working within same application.

    Some Compute Grid vendors:

    GridGain - Professional Open Source
    JPPF - Open Source

    Data Grid
    Data Grids allow you to distribute your data across the grid. Most of us are used to the term Distributed Cache rather than Data Grid (data grid does sound more savvy though). The main goal of Data Grid is to provide as much data as possible from memory on every grid node and to ensure data coherency. Some of the important Data Grid features include:
    • Data Replication - all data is fully replicated to all nodes in the grid. This strategy consumes the most resources, however it is the most effective solution for Read-Mostly scenarios, as data is available everywhere for immediate access.
    • Data Invalidation - in this scenario, nodes load data on demand. Whenever data changes on one of the nodes, then the same data on all other nodes is purged (invalidated). Then this data will be loaded on-demand the next time it is accessed.
    • Distributed Transactions - transactions are required to ensure Data Coherency. Cache updates must work just like database updates - whenever an update failed, then the whole transaction must be rolled back. Most Data Grid support various Transaction Policies, such as Read Committed, Write Committed, Serializable, etc...
    • Data Backups - useful for fail-over. Some Data Grid products provide ability to assign backup nodes for the data. This way whenever a node crashes, the data is immediately available from another node.
    • Data Affinity/Partitioning - data affinity allows you to split/partition your whole data set into multiple subsets and assign every subset to a grid node. In the purest form, data is not replicated between nodes at all, every node is only responsible for it's own subset of data. However, various Data Grid products may provide different flavors of Data Affinity, such as replication only to back up nodes for example.

      Data Affinity is one of the more advanced features, and is not provided by every vendor. To my knowledge, out of commercial vendors Oracle Coherence and GemStone have it (there may be others). In Professional Open Source space you can take a look at combination of GridGain With Affinity Load Balancing and JBossCache.
    Some Data Grid/Cache vendors:

    Oracle Coherence - Commercial
    GemStone - Commercial
    GigaSpaces - Commercial
    JBossCache - Professional Open Source
    EhCache - Open Source

    6

    View comments

  2. Recently I had to help a client who is using GridGain to process XML messages from external sources. The business case is that multiple external sources send XML messages that take a while to process and grid architecture was chosen to provide timely parsing/processing and ensure overall redundancy and automatic fail-over. XStream was chosen as a XML-to-object mapper.

    The best thing about this solution was that even though XStream does need some initialization, with GridGain you are able to start up bare stand-alone remote grid nodes and deploy XStream resources onto remote nodes using Peer Class Loading automatically. This can be achieved with @GridUserResource injection in GridGain.

    Here is how the code for the deployed XStream resource looks. Note the @GridUserResourceOnDeployed and @GridUserResourceOnUndeployed annotations that control the resource lifecycle on remote node.

    GridXStreamResource {
    // XStream instance used for parsing.
    private XStream xstream = null;

    // Gets fully initialized instance of XStream.
    public XStream getXStream() {
    return xstream;
    }

    // Callback invoked once whenever a task
    // is deployed on a processing grid node.
    @GridUserResourceOnDeployed
    private void onDeployed() {
    xstream = new XStream();

    // Initialize for processing certain XML bean classes.
    xstream.processAnnotations(MyXmlBean1.class);
    xstream.processAnnotations(MyXmlBean2.class);
    }

    // Destructs the resource.
    @GridUserResourceOnUndeployed
    private void onUndeployed() {
    // Give to GC.
    xstream = null;
    }
    }

    Below is the code for GridTask and GridJob that will automatically deploy all code including GridXStreamResource onto remote nodes.

    public class XmlProcessingTask extends GridTaskSplitAdapter<byte[], Boolean> {
    @Override
    protected Collection split(int gridSize, byte[] payload) {
    // We only had to create one job per message without splitting it.
    return Collections.singletonList(new GridJobAdapter<byte[]>(payload) {
    // Inject XStream instance used for parsing messages.
    @GridUserResource
    private transient GridXstreamResource xstreamRsrc = null;

    public Serializable execute() {
    byte[] payload = getArgument();

    // Parse XML message.
    MyXmlBean1 bean = (MyXmlBean1)xstreamRsrc.getXStream().
    fromXML(new ByteArrayInputStream(payload));

    // Add processing logic here.

    return true;
    }
    });
    }

    public Boolean reduce(List<GridJobResult> gridJobResults) {
    // Simply return result from remote job.
    return gridJobResults.get(0).getData();
    }
    }

    Here is the code that executes the above task on the grid:

    // XML payload received from external source.
    byte[] payload = ...;

    GridTaskFuture<Boolean> future = grid.execute(XmlProcessingTask.class, payload);

    // Wait for result.
    boolean success = fugure.get();

    Now, to run it, simply startup standalone GridGain nodes by executing gridgain.sh or gridgain.bat script that comes with installation and watch your code automatically deploy and execute on remote nodes.

    Enjoy!

     

    0

    Add a comment

  3. We have been getting many requests about fully wiring GridGain from Spring using dependency injection. Up until now, GridGain was configured from Spring, but to get an instance of Grid you had to call GridFactory directly.

    For example, one way to configure GridGain is as follows:

    Grid grid = GridFactory.start(String configPath)

    GridTaskFuture future = grid.execute(MyTask.class, myArg);

    where configPath is path to Spring XML configuration file containing GridConfigurationAdapter bean. However this approach requires invocation of a static factory method and made it very inconvenient to reference the whole Grid instance from within other Spring beans.

    With GridGain 2.1, which is planned for release within 2-3 weeks, we added GridSpringBean which is a fully initialized instance of Grid. Here is how configuration will look like:

    <!-- Example of bean definition with given configuration. -->
    <bean id="mySpringBean" class="org.gridgain.grid.GridSpringBean"
    scope="singleton">
    <property name="configuration">
    <bean id="grid.cfg" class="org.gridgain.grid.GridConfigurationAdapter"
    scope="singleton">
    <property name="gridName" value="mySpringGrid"/>
    </bean>
    </property>
    </bean>

    The above example still configures GridConfigurationAdapter (with all defaults, that's why it's empty), however now this adapter is part of GridSpringBean which implements Grid interface. Note that by virtue of implementing InitializngBean and DisposableBean interfaces from Spring, the grid will be automatically started and stopped whenever ApplicationContext is initialized or destroyed.

    Here is an example of how it could be used:

    AbstractApplicationContext ctx = new FileSystemXmlApplicationContext(
    "/path/to/spring/file");

    // We register Spring shutdown hook to provide
    // automatic beans destruction by Spring.
    ctx.registerShutdownHook();

    // Get Grid from Spring.
    Grid grid = (Grid)ctx.getBean("mySpringBean");

    // Execute your task.
    GridTaskFuture<Integer> future = grid.execute(MyTask.class, myArg);

    // Wait for task completion.
    future.get();

    Enjoy!

     

    0

    Add a comment

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.