Some of us may have already heard the terms Data Grid and
Data Fabric, however, neither of these terms has been well defined in the
industry. In this blog, I will try to add some clarity to both terms by
outlining some main features for data grids and data fabrics.
What is a Data Grid
Often when doing meetup presentations about Apache Ignite, I ask the crowd if anyone has ever heard of what a Data Grid is. I usually get only a few hands. However, when I flip the question and ask what Distributed Caching is, everyone in the room immediately raises their hands and nods in understanding. The reality is that a Data Grid can be viewed as a Distributed Cache with extra features, so if you do know what a Distributed Cache is, you probably already know a lot about Data Grids as well.
Generally, the term distributed
cache means ability to replicate data in memory, so it is accessible from
anywhere in the cluster. Data Grids usually accomplish this by partitioning
data in memory, where each cluster member is responsible only for its own
subset of the data. You can also think of it as a distributed Hash Table. This
way, the more servers are available in your cluster, the more data you can
cache.
Data grids are generally known for having a fairly rich
feature set on top of in-memory caches. The 3 main features that are absolutely
mandatory for any data grid solution are:
- distributed transactions
- distributed queries
- collocation of compute and data
- SQL support
- Off-Heap Memory (to avoid lengthy GC pauses)
- WebSession Caching
- Hibernate Integration
- Database Integration
Some of the popular Data Grid providers include Apache Ignite (incubating), Hazelcast and Infinispan in the open source space, and
Oracle Coherence and GridGain commercial offerings. GridGain is a commercial
offering of the Apache Ignite.
What is an In-Memory Data Fabric
In Memory Data Fabrics represent the natural evolution of in-memory computing. Data Fabrics generally take a broader approach to in memory computing, grouping the whole set of in memory computing use cases into a collection of well-defined independent components. Usually a Data Grid is just one of the components provided by a Data Fabric. Additionally to the data grid functionality, an In-Memory Data Fabric typically also includes a Compute Grid, CEP Streaming, an In-Memory File System, and more.
The main advantage of an In-Memory Data Fabric is that all of
the provided in-memory computing components can be used independently, while
being well integrated with each other. For example, in Apache Ignite a Compute Grid knows how to
load-balance and schedule computations within a cluster, but when used together
with a Data Grid, the Compute Grid will also route all the computations that
process data to the cluster members responsible for caching that data. The same
goes for Streaming and CEP - when working with streamed data, all the
processing happens on the cluster members responsible for caching that data as
well.
Commonly seen features of In-Memory Data Fabrics include:
- Data Grid (must have for any Data Fabric)
- Compute Grid
- Service Grid
- Streaming & CEP
- Distributed File System
- In-Memory Database
Apache Ignite, an Apache Incubator project, is the only
In-Memory Data Fabric available in the Open Source space. GridGain provides a
commercial, enterprise edition of Apache Ignite that is targeted toward
production, business critical use cases.
View comments