In a nutshell, Grid Computing is a way to distribute your computations across multiple computers (nodes). However, even JMS does that, but JMS is not a grid computing product - it's a messaging protocol. To correctly classify Grid Computing products we have to split them into 2 categories: Compute Grids and Data Grids.
Compute Grids allow you to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel. The obvious benefit here is that your computation will perform faster as it now can use resources from all grid nodes in parallel. One of the most common design patterns for parallel execution is MapReduce. However, Compute Grids are useful even if you don't need to split your computation - they help you improve overall scalability and fault-tolerance of your system by offloading your computations onto most available nodes. Some of the "must have" Compute Grid features are:
- Automatic Deployment - allows for automatic deployment of classes and resources onto grid without any extra steps from user. This feature alone provides one of the largest productivity boosts in distributed systems. Users usually are able to simply execute a task from one grid node and as task execution penetrates the grid, all classes and resources are also automatically deployed.
- Topology Resolution - allows to provision nodes based on any node characteristic or user-specific configuration. For example, you can decide to only include Linux nodes for execution, or to only include a certain group of nodes within certain time window. You should also be able to choose all nodes with CPU loaded, say, under 50% that have more than 2GB of available Heap memory.
- Collision Resolution - allows users to control which jobs get executed, which jobs get rejected, how many jobs can be executed in parallel, order of overall execution, etc.
- Load Balancing - allows to balance properly balance your system load within grid. Usually range of load balancing policies varies within products. Some of the most common ones are Round Robin, Random, or Adaptive. More advanced vendors also provide Affinity Load Balancing where grid jobs always end up on the same node based on job's affinity key. This policy works well with Data Grids described below.
- Fail-over - grid jobs should automatically fail-over onto other nodes in case of node crash or some other job failure.
- Data Grid Integration - it is important that Compute Grid are able to natively integrate with Data Grids as quite often businesses will need both, computational and data features working within same application.
Some Compute Grid vendors:
GridGain - Professional Open Source
JPPF - Open Source
Data Grids allow you to distribute your data across the grid. Most of us are used to the term Distributed Cache rather than Data Grid (data grid does sound more savvy though). The main goal of Data Grid is to provide as much data as possible from memory on every grid node and to ensure data coherency. Some of the important Data Grid features include:
- Data Replication - all data is fully replicated to all nodes in the grid. This strategy consumes the most resources, however it is the most effective solution for Read-Mostly scenarios, as data is available everywhere for immediate access.
- Data Invalidation - in this scenario, nodes load data on demand. Whenever data changes on one of the nodes, then the same data on all other nodes is purged (invalidated). Then this data will be loaded on-demand the next time it is accessed.
- Distributed Transactions - transactions are required to ensure Data Coherency. Cache updates must work just like database updates - whenever an update failed, then the whole transaction must be rolled back. Most Data Grid support various Transaction Policies, such as Read Committed, Write Committed, Serializable, etc...
- Data Backups - useful for fail-over. Some Data Grid products provide ability to assign backup nodes for the data. This way whenever a node crashes, the data is immediately available from another node.
- Data Affinity/Partitioning - data affinity allows you to split/partition your whole data set into multiple subsets and assign every subset to a grid node. In the purest form, data is not replicated between nodes at all, every node is only responsible for it's own subset of data. However, various Data Grid products may provide different flavors of Data Affinity, such as replication only to back up nodes for example.
Data Affinity is one of the more advanced features, and is not provided by every vendor. To my knowledge, out of commercial vendors Oracle Coherence and GemStone have it (there may be others). In Professional Open Source space you can take a look at combination of GridGain With Affinity Load Balancing and JBossCache.
Oracle Coherence - Commercial
GemStone - Commercial
GigaSpaces - Commercial
JBossCache - Professional Open Source
EhCache - Open Source