Thursday, March 26, 2009

Auto-Searches On The Cloud

In GridGain 3.0 we are coming up with a set of useful annotations to automate grid-enabling of common functionality that works on ranges and collections. The whole idea is that given a certain collection, GridGain can automatically split that collection into sub-collections, send them to remote nodes for execution, get results back, reduce them and return them back to user. Sort of automatic map-reduce for collecions.

Let's take search for example. You can model almost any search as taking a collection of values, picking the right value in that collection, and returning that value. For example, let's assume that we need to find the max value in a collection (forgive the simplicity of the example, in real life you would probably be grid-enabling searches that are a lot more complex than this one).

In Java this would look like the following:


public Integer findMax(Collection<Integer> vals) {
Integer max = Collections.max(vals);

return max;
}
If we were to grid-enable above functionality, we would have to do the following:
  1. Map Step: Split the initial collection into a number of sub-collections
  2. Send each sub-collection to a remote node
  3. Have every remote node find a max value in the sub-collection assigned to it and return it.
  4. Reduce Step: Find the maximum out of all values returned from remote nodes and return it to user.
As simple as it is, you can apply the same kind of steps to many other searches you do in real life.

To do this search in GridGain 3.0, all you would have to do is attach @GridifySetToValue(recursive=true) annotation to your method and you are done:

@GridifySetToValue(recursive=true)
public Integer findMax(Collection<Integer> vals) {
Integer max = Collections.max(vals);

return max;
}
By a virtue of attaching a single annotation, you are basically telling GridGain to perform steps 1 to 4 described above automatically. On top of that, with GridGain peer-class-loading functionality no code needs to be explicitly deployed to remote nodes at all. Simply bring up several GridGain images on a cloud and they are ready to start computing whatever you throw at them.

In the coming weeks I will show how some other annotations can be used to automate grid-enabling of other common tasks we encounter on daily basis.

I should also mention that you can achieve the above with relative ease on the current version of GridGain (you would have to do some of the steps manually though).

Stay tuned for GridGain 3.0 scheduled for release this summer.

 

4 comments:

Sergio Bossa said...

Hi Dmitriy,

it seems GridGain sends all sub-collections from one node to other remote nodes through the network: isn't that a questionable practice?

What about data affinity strategies?

dsetrakyan said...

Hi Sergio.

In this example GridGain sends only sub-collections, but in general, you can you GridGain any MapReduce purposes. GridGain basically allows you to take any task, split it into jobs, send them to remote nodes and reduce the results.

As far as affinity, GridGain does support node affinity. As a matter of fact, with GridGain you can take any cache that does not support affinity, e.g. JBoss Cache, and turn in into partitioned cache with data affinity. Take a look at our GridAffinityLoadBalancingSpi and at our JBoss Cache examples on our Wiki.

Best.

Sergio Bossa said...

I know that.
I successfully used GridGain in several projects :)

I was just referring to your particular example, which doesn't seem to take into account any data affinity strategy, given that you transfer the sub-collections over the network.

Cheers,

Sergio B.

dsetrakyan said...

Hi Sergio.

I think the approach above is meant to simplify some more generic cases, where data affinity does not matter.

If your algorithm requires data affinity, then you should switch back to standard GridGain MapReduce approach.

Best.