
I am pleased to announce that we just recently released
GridGain 3.0.4. The last couple of releases have been focused, among other things, around convenient and effective
collocation of computations and data, and also
grouping of data that is usually accessed together on the same nodes. Sending computations exactly to the nodes where the accessed data is residing is one of the key components in achieving better scalability. Without collocation, nodes fetch various data from other nodes for brief periods of time, just to perform often a quick computation and discard it almost immediately thereafter. This creates unnecessary data traffic, a.k.a.
data noise, and can at times bring a server to its knees.
In my
previous blog post I showed how to collocate computations and data using direct API via
GridCache.mapKeyToNode(..) method. We have also added analogous methods on
Grid API to provide capability of finding data affinity on the nodes that do not cache any data themselves. In our latest 3.0.4 release we have also added a very convenient way to provide collocation via
@GridCacheAffinityMapped annotation.
Say you have 2 types of objects,
Person and
Company. Multiple persons can work for the same company. This means that you generally may wish to access
Person objects together with the
Company for which they work. To do that in a scalable fashion, you may wish to ensure that all people working for the same company are cached on the same node. This way you can send computations to that node and access multiple people from the same company locally. Here is how it can be done in GridGain.
public class PersonKey {
// Person ID used to identify a person.
private String personId;
// Company ID which will be used for data affinity.
@GridCacheAffinityMapped
private String companyId;
...
}
...
// Instantiate person keys with same company ID.
Object personKey1 = new PersonKey("myPersonId1", "myCompanyId");
Object personKey2 = new PersonKey("myPersonId2", "myCompanyId");
// Both, the company and the person objects will be cached on the same node.
cache.put("myCompanyId", new Company(..));
cache.put(personKey1, new Person(..));
cache.put(personKey2, new Person(..));
Now, if you want to perform a computation which involves multiple people working for the same company, all you have to do is send a grid job to the node where those people are cached. Here is how you would send a computation to the node which caches all people for the company with ID
"myCompanyId".
G.grid().run(GridClosureCallMode.BALANCE, new Runnable() {
// This annotation specifies that computation should be routed
// precisely to the node where all objects with affinity key
// of 'myCompanyId' are cached.
@GridCacheAffinityMapped
private String companyId = "myCompanyId";
@Override public void run() {
// Some computation logic here.
...
}
};
Now, when you properly collocate all your data within your data grid and then route your computations to the nodes where your data is cached, all cache operations become
LOCAL, hence achieving best performance and scalability without any
data noise. Kind of goes inline with the
first rule of distributed programming, which is
DO NOT DISTRIBUTE.
View comments