1. Some of us may have already heard the terms Data Grid and Data Fabric, however, neither of these terms has been well defined in the industry. In this blog, I will try to add some clarity to both terms by outlining some main features for data grids and data fabrics.

    What is a Data Grid


    Often when doing meetup presentations about Apache Ignite, I ask the crowd if anyone has ever heard of what a Data Grid is. I usually get only a few hands. However, when I flip the question and ask what Distributed Caching is, everyone in the room immediately raises their hands and nods in understanding. The reality is that a Data Grid can be viewed as a Distributed Cache with extra features, so if you do know what a Distributed Cache is, you probably already know a lot about Data Grids as well.

    Generally, the term distributed cache means ability to replicate data in memory, so it is accessible from anywhere in the cluster. Data Grids usually accomplish this by partitioning data in memory, where each cluster member is responsible only for its own subset of the data. You can also think of it as a distributed Hash Table. This way, the more servers are available in your cluster, the more data you can cache.

    Data grids are generally known for having a fairly rich feature set on top of in-memory caches. The 3 main features that are absolutely mandatory for any data grid solution are:
    • distributed transactions
    • distributed queries
    • collocation of compute and data

     Without the above 3 features, you cannot really call a product a data grid. Many vendors also differentiate between each other by adding other popular features, including: 
    • SQL support
    • Off-Heap Memory (to avoid lengthy GC pauses)
    • WebSession Caching
    • Hibernate Integration
    • Database Integration 

    Some of the popular Data Grid providers include Apache Ignite (incubating), Hazelcast and Infinispan in the open source space, and Oracle Coherence and GridGain commercial offerings. GridGain is a commercial offering of  the Apache Ignite.

    What is an In-Memory Data Fabric


    In Memory Data Fabrics represent the natural evolution of in-memory computing. Data Fabrics generally take a broader approach to in memory computing, grouping the whole set of in memory computing use cases into a collection of well-defined independent components. Usually a Data Grid is just one of the components provided by a Data Fabric.  Additionally to the data grid functionality, an In-Memory Data Fabric typically also includes a Compute Grid, CEP Streaming, an In-Memory File System, and more.

    The main advantage of an In-Memory Data Fabric is that all of the provided in-memory computing components can be used independently, while being well integrated with each other. For example, in Apache Ignite a Compute Grid knows how to load-balance and schedule computations within a cluster, but when used together with a Data Grid, the Compute Grid will also route all the computations that process data to the cluster members responsible for caching that data. The same goes for Streaming and CEP - when working with streamed data, all the processing happens on the cluster members responsible for caching that data as well.

    Commonly seen features of In-Memory Data Fabrics include:
    • Data Grid (must have for any Data Fabric)
    • Compute Grid
    • Service Grid
    • Streaming & CEP
    • Distributed File System
    • In-Memory Database

    Apache Ignite, an Apache Incubator project, is the only In-Memory Data Fabric available in the Open Source space. GridGain provides a commercial, enterprise edition of Apache Ignite that is targeted toward production, business critical use cases.
    4

    View comments


  2. Today, as part of the community of Apache Ignite (incubating), I am proud to announce that we have made the first code drop of the Apache Ignite In-Memory Data Fabric – Apache Ignite v1.0 Release Candidate - available.

    The Apache Ignite project started in September 2014, when the Open Source edition of the GridGain In-Memory Data Fabric was donated to the Apache Software Foundation and branded as Apache Ignite. Now, after 5 months of many dev-list discussions and late nights, we finally have released a stable, well-tested release candidate of Apache Ignite 1.0.

    Going forward, the role of GridGain engineers will be to continue to actively contribute to the Ignite code base, but also to provide a hardened, enterprise-grade feature set on top of Apache Ignite. GridGain support will be available for both Apache Ignite and the GridGain In-Memory Data Fabric Enterprise Edition.

    What is an In-Memory Data Fabric

    So why should you care about Apache Ignite? First and foremost, Ignite is lightning fast and has virtually unlimited scale. Ignite is based on the former GridGain In-Memory Data Fabric Open Source edition, the leading open source in-memory data fabric, which has several known 1000+ node deployments.

    Apache Ignite has a very rich feature set. From the get-go, our main goal was to make Apache Ignite an all-in-one stop for everything you need for in-memory computing.

    Some of the main features of the project include:
    • Advanced Clustering
    • Distributed Caching (JCache)
    • Data Grid
    • Compute Grid
    • Service Grid
    • Streaming & CEP
    • Distributed File System - IgniteFS
    • Hadoop Accelerator
    • Distributed Data Structures
    • Distributed Messaging
    • Distributed Events


    But one of the coolest, new features in Ignite is its ability to automatically integrate with different RDBMS systems, such as Oracle, MySql, Postgres, DB2, Microsoft SQL, etc. This feature automatically generates the application domain model based on the schema definition of the underlying database, and then loads the data in memory.

    While you may be able to get some subset of the above functionality from other individual point solutions, the main benefit you get from the Apache Ignite In-Memory Data Fabric is the integration of all these components. For example, Ignite will automatically route your computations that need to process data to the cluster nodes responsible for caching this data. The same goes for the processing of streaming data as well. This approach is called "collocation of compute and data" and when applied, significantly reduces network traffic and increases scalability and performance.

    Here is an example of how you would broadcast a computation in Ignite:
    compute.broadcast(() -> System.out.println("Hello World");
    

    Here is cache example of how to perform an in-memory distributed transaction in Ignite:
    try (Transaction tx = ignite.transactions().txStart()) {
        Integer hello = cache.get("Hello");
      
        if (hello == 1)
            cache.put("Hello", 11);
      
        cache.put("World", 22);
      
        tx.commit();
    }
    

    Ease Of Use

    Despite the breadth of its feature set, Apache Ignite is easy to deploy and use.

    Installation

    The product does not have any custom installers. It comes as one ZIP file, which is ready to go once you unzip it. To startup multiple cluster nodes, simply execute "bin/ignite.sh" script multiple times.

    Dependencies

    The project has 1 main dependency - ignite-core.jar. All other dependencies, like integration with Spring for configuration, or with H2 database for SQL, etc. can be added to the process a la carte by dragging corresponding folders form "libs/optional" folder into "libs/" folder.

    Maven

    The project is fully mavenized, and is composed of over a dozen of maven artifacts that can be imported and used in any combination.

    Standard APIs

    Ignite is based on standard Java APIs. For distributed caches and data grids, Ignite in its final 1.0 release will implement the JCache (JSR107) standard. For distributed computations, you can utilize standard ExecutorService API. There are also distributed Queues and Sets. Ignite also has implemented most of the data structures from the java.util.concurrent package by distributing them in memory. IgniteFS – the distributed file system provided by Ignite implements the standard Hadoop FileSystem API and can be automatically plugged into any Hadoop installation.

    Code Quality

    Apache Ignite is very stable and well tested. The development process is structured in such a way that before any merge to the main branch happens, over 10,000 tests are executed on top of JetBrains Team City, and all need to pass.

    Moreover, Ignite has its own QA team. All the main functionality undergoes scrupulous testing for every release. Also, every release is benchmarked against a previous release to ensure that it is at least as fast (or faster) and as stable as the previous release.

    Also, the project inherited several years of thorough testing and stability-tuning from the GridGain In-Memory Data Fabric Open Source edition, which boasts over a thousand production installations.

    Community

    Even though the project has been in Apache for less than 4 months, it already has a vibrant and growing community. The project currently has 11 committers and about as many contributors, all of whom are very active. Some have joined the project just recently, but have already been actively contributing.

    We always welcome community contributions. If you would like to contribute, send an email to the Ignite dev list and we will get you started. And even if you are not ready to contribute immediately, I would like to invite everyone to join our dev list. Most of the discussions happen there, and you can find out a lot about where the project is going and also provide your own ideas.

    Another way, of course, to familiarize yourself with Apache Ignite, is to take a look at the code and see what it can do for your project.

    You can download the Ignite bits on the Apache Ignite homepage.

    Apache®, Apache Ignite, Ignite®, and the Apache Ignite logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
    3

    View comments

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.