If you ever looked at Apache Ignite, you have probably noticed that it is a fairly rich platform with lots of components. However, despite the extensive feature set, Ignite community aims to make the platform easy to use and understand. Here is how the Ignite community defines their project:

Apache Ignite is
the in-memory computing platform
that is durable, strongly consistent, and highly available
with powerful SQL, key-value and processing APIs

So, in summary, Ignite looks like a distributed data storage that can work both, in-memory and on-disk, and provides SQL, key-value and processing APIs to the data. Sounds simple enough. However, to get a complete picture, perhaps it is better to define Ignite by answering several "Is Ignite a ...?" questions:

Is Ignite a persistent or pure in-memory storage?

Both. Native persistence in Ignite can be turned on and off. This allows Ignite to store data sets bigger than can fit in the available memory. Essentially, the smaller operational data sets can be stored in-memory only, and larger data sets that do not fit in memory can be stored on disk, using memory as a caching layer for better performance.


Is Ignite an in-memory database (IMDB)?

Yes. Even though Ignite durable memory works well in-memory and on-disk, the disk persistence can be disabled and Ignite can act as a pure distributed in-memory database, with support for SQL and distributed joins.


Is Ignite an in-memory data grid (IMDG)?

Yes. Ignite is a full-featured data grid, which can be used either in pure in-memory mode or with Ignite native persistence. It can also automatically integrate with any 3rd party databases, including any RDBMS or NoSQL stores.


Is Ignite a distributed database?

Yes. Data in Ignite is either partitioned or replicated across a cluster of multiple nodes. This provides scalability and adds resiliency to the system. Ignite automatically controls how data is partitioned, however, users can plugin their own distribution (affinity) functions and collocate various pieces of data together for efficiency.


Is Ignite an SQL database?

Not fully. Although Ignite aims to behave like any other relational SQL database, there are differences in how Ignite handles constraints and indexes. Ignite supports primary and secondary indexes, however, the uniqueness can only be enforced for the primary indexes. Ignite also does not support foreign key constraints.

Essentially, Ignite purposely does not support any constraints that would entail a cluster broadcast message for each update and significantly hurt performance and scalability of the system.


Is Ignite a transactional database? 

Not fully. ACID Transactions are supported, but only at key-value API level. Ignite also supports cross-partition transactions, which means that transactions can span keys residing in different partitions on different servers. At SQL level Ignite supports atomic, but not yet transactional consistency. Ignite community plans to implement SQL transactions in version 2.4.


Is Ignite a key-value store?

Yes. Ignite provides a feature rich key-value API, that is JCache (JSR-107) compliant and supports Java, C++, and .NET.

You can find out more about Ignite by visiting the freshly redesigned Ignite website.


1

View comments

  1. If you ever looked at Apache Ignite, you have probably noticed that it is a fairly rich platform with lots of components. However, despite the extensive feature set, Ignite community aims to make the platform easy to use and understand. Here is how the Ignite community defines their project:

    Apache Ignite is
    the in-memory computing platform
    that is durable, strongly consistent, and highly available
    with powerful SQL, key-value and processing APIs

    So, in summary, Ignite looks like a distributed data storage that can work both, in-memory and on-disk, and provides SQL, key-value and processing APIs to the data. Sounds simple enough. However, to get a complete picture, perhaps it is better to define Ignite by answering several "Is Ignite a ...?" questions:

    Is Ignite a persistent or pure in-memory storage?

    Both. Native persistence in Ignite can be turned on and off. This allows Ignite to store data sets bigger than can fit in the available memory. Essentially, the smaller operational data sets can be stored in-memory only, and larger data sets that do not fit in memory can be stored on disk, using memory as a caching layer for better performance.


    Is Ignite an in-memory database (IMDB)?

    Yes. Even though Ignite durable memory works well in-memory and on-disk, the disk persistence can be disabled and Ignite can act as a pure distributed in-memory database, with support for SQL and distributed joins.


    Is Ignite an in-memory data grid (IMDG)?

    Yes. Ignite is a full-featured data grid, which can be used either in pure in-memory mode or with Ignite native persistence. It can also automatically integrate with any 3rd party databases, including any RDBMS or NoSQL stores.


    Is Ignite a distributed database?

    Yes. Data in Ignite is either partitioned or replicated across a cluster of multiple nodes. This provides scalability and adds resiliency to the system. Ignite automatically controls how data is partitioned, however, users can plugin their own distribution (affinity) functions and collocate various pieces of data together for efficiency.


    Is Ignite an SQL database?

    Not fully. Although Ignite aims to behave like any other relational SQL database, there are differences in how Ignite handles constraints and indexes. Ignite supports primary and secondary indexes, however, the uniqueness can only be enforced for the primary indexes. Ignite also does not support foreign key constraints.

    Essentially, Ignite purposely does not support any constraints that would entail a cluster broadcast message for each update and significantly hurt performance and scalability of the system.


    Is Ignite a transactional database? 

    Not fully. ACID Transactions are supported, but only at key-value API level. Ignite also supports cross-partition transactions, which means that transactions can span keys residing in different partitions on different servers. At SQL level Ignite supports atomic, but not yet transactional consistency. Ignite community plans to implement SQL transactions in version 2.4.


    Is Ignite a key-value store?

    Yes. Ignite provides a feature rich key-value API, that is JCache (JSR-107) compliant and supports Java, C++, and .NET.

    You can find out more about Ignite by visiting the freshly redesigned Ignite website.


    1

    View comments

  2. Ignite is the in-memory computing platform
    that is durable, strongly consistent, and highly available
    with powerful SQL, key-value and processing APIs

    Starting with 2.1 release, Apache Ignite has become one of a very few in-memory computing systems that provides its own distributed persistence layer. Essentially, users do not have to integrate Ignite with any type of 3rd party databases (although such integration is supported), and start using Ignite as a primary storage of their data on disk and in memory.

    So, what makes Ignite data storage unique? Let us look at a few important features provided by Ignite. You will probably notice that some of these features can also be seen in other data storage systems. However, it is the combination of these features in one cohesive platform that makes Ignite stand out among others. 

    1. Durable Memory

    Ignite durable memory component treats RAM not just as a caching layer, but as a complete fully-functional storage layer. This means that users can turn the persistence on and off as needed. If the persistence is off, then Ignite, just like always, can act as a distributed In-Memory Database or as an In-Memory Data Grid, depending whether you prefer to use SQL or key-value APIs. If the persistence is turned on, then Ignite becomes a distributed, horizontally scalable database that guarantees full data consistency and is resilient to full cluster failures. On top of that, the data is stored in off-heap memory so there are no GC pauses even on large data sets.


    2. Complete SQL support

    With the latest release, in addition to SQL querying, Ignite added support for DDL and DML, allowing users to interact with Ignite using pure SQL without writing any code. This means that users can create tables and indexes, insert, update, and query data using only SQL. Having such complete SQL support makes Ignite a one-of-a-kind distributed SQL database.

    3. ACID compliance

    Data stored in Ignite is ACID-compliant both in memory and on disk, making Ignite a strongly consistent system. Ignite transactions work across the network and can span multiple servers. This makes Ignite stand out from the eventually consistent NoSQL systems that hardly support any type of transactions.

    4. Collocated Processing

    Most traditional SQL and NoSQL databases work in a client-server fashion, meaning that data must be brought to the client side for processing. This approach requires lots of data movement from servers to clients and generally does not scale.  Ignite, on the other hand, allows for sending computations to the data, moving only the light weight compute functions across the network. As a result, Ignite scales better and minimizes data movement. When collocated, all the data processing happens locally on the node that stores the data, and only the result is brought back to the user.

    5. Scalability and Durability

    Ignite is an elastic, horizontally scalable distributed system that supports adding and removing cluster nodes on demand. Ignite also allows for storing multiple copies of the data, making it resilient to partial cluster failures. If the persistence is enabled, then data stored in Ignite will also survive full cluster failures. Cluster restarts in Ignite can be very fast, as the data becomes operational instantaneously directly from disk. As a result the data does not need to be preloaded in memory to begin processing, and Ignite caches will lazily warm up resuming the in memory performance.

    5

    View comments

  3. Today the GridGain team has announced the release of enterprise-grade GridGain In-Memory Data Fabric v. 7.5, based on Apache Ignitetm v. 1.5. For those not familiar with GridGain or Apache Ignite, it provides the ability to distribute, cache, and compute on data in memory, including such features as in-memory data grid, compute grid, ANSI-99 in-memory SQL, real-time streaming, in-memory file system, and many more.

    Some of the most important features of this release, among others, include deadlock-free in-memory transactions, significant improvements to the zero-deployment model, and major performance improvements. All these features have been available in Apache Ignite 1.5 for a while, but now, after many rounds of load testing and bug fixes, have finally received GridGain's ready-for-production stamp of approval.

    Deadlock-Free Transactions

    Deadlocks usually happen when the same objects are concurrently updated in a different order by different transactions. Transactions begin to indefinitely wait on each other to complete, causing deadlock scenarios. Such problems are very difficult to spot and are even harder to debug. In production, they would require a full cluster restart, leading to costly system down times.

    The traditional solution to the deadlock problem is to ensure that applications acquire locks in the same order. However, this is easier said than done, especially in large distributed teams. Just imagine how many objects may be updated by a simple "transfer(...)" method on some bank's API.  Grouping such calls in a common transaction is almost certain to generate a deadlock.

    A much better solution is to drop the locks altogether, which is what Apache Ignite community did. Essentially, all the transactions are given a chance to succeed until it is impossible to logically order some transaction. When this happens, the transaction optimistically fails with an exception and is allowed to be retried. Turns out that this optimistic-serializable consistency model is about 50% faster than its pessimistic counterpart. However, the biggest benefit is that the deadlocks are now impossible.

    Zero-Deployment (revisited)

    With this release user objects are always kept in the binary form and are never deserialized on the server side. This makes GridGain servers agnostic to user domain models, allowing users to dynamically add or remove fields to their data types, or create and deploy new data types on the fly. When it comes to executing computations, Ignite provides a distributed class-loader which will automatically undeploy the old computation logic and deploy the new one.

    Such combination of the binary protocol together with the distributed class-loader creates a deployment-free cluster environment, where both data model and computation logic can be dynamically updated without any explicit deployment steps or down times.

    Pushing Performance Boundaries

    With the introduction of the revised compact binary protocol, GridGain and Ignite became a lot faster, beating its nearest data grid competitor, Hazelcast, by over 100% in throughput and latencies. The performance benchmarks are public and can be viewed and downloaded on the Ignite website.

    Other features of 7.5 release include OSGI compliance as well as new data streamers, including support for Twitter, MQTT, and Flume real time streams.

    The latest release can be downloaded here.

    0

    Add a comment

  4. In my previous post I have demonstrated benchmarks for atomic JCache (JSR 107) operations and optimistic transactions between Apache Ignitetm data grid and Hazelcast. In this blog I will focus on benchmarking the pessimistic transactions.

    The difference between optimistic and pessimistic modes is in the lock acquisition. In pessimistic mode locks are acquired on first access, while in optimistic mode locking happens during the commit phase. Pessimistic transactions provide a more consistent view on the data, given that, since locks are acquired early, you are guaranteed that no changes will happen to the data between transaction start and commit steps.

    Yardstick Framework

    Just like before, I will be using Yardstick Framework for the benchmarks, specifically Yardstick-Docker extension. 

    Transparency

    One of the most important characteristics of any benchmark is full transparency. The code for both, Apache Ignite and Hazelcast benchmarks is provided in the corresponding GIT repos:
    1. https://github.com/yardstick-benchmarks/yardstick-ignite
    2. https://github.com/yardstick-benchmarks/yardstick-hazelcast/
    Both, Apache Ignite and Hazelcast teams were given the opportunity to review the configuration and provide feedback. 

    Hardware

    Both benchmarks were executed on 4 AWS c4.2xlarge instances used as servers and 1 AWS c4.2xlarge instance used as the client and the driver for the benchmark.

    Benchmarks

    In this benchmark we attempt to compare pessimistic cache transactions only. Both, Ignite and Hazelcast have many other features that you can learn more about on their respective websites.
    The benchmarks were run in 2 modes, synchronous backups and asynchronous backups. In case of synchronous backups, the client waited until both, primary and backup copies were updated. In case of asynchronous backups, the client waited only for the primary copies to be updated and the backups were updated asynchronously. This was controlled with configuration properties of both products.

    Also, in both benchmarks clients were allowed to read data from backups whenever necessary.
    The code used for the benchmark execution is very simple and can be found on GitHub:

    Apache Ignite:

    try (Transaction tx = ignite().transactions().txStart()) {
        Object val = cache.get(key);
    
        if (val != null)
            key = nextRandom(args.range() / 2, args.range());
    
        cache.put(key, new SampleValue(key));
    
        tx.commit();
    }

    Hazelcast:

    TransactionContext tCtx = hazelcast().newTransactionContext(txOpts);
    
    tCtx.beginTransaction();
    
    TransactionalMap<Object, Object> txMap = tCtx.getMap("map");
    
    Object val = txMap.getForUpdate(key);
    
    if (val != null)
        key = nextRandom(args.range() / 2, args.range());
    
    txMap.put(key, new SampleValue(key));
    
    tCtx.commitTransaction();

    Result:

    Just like with optimistic transactions, we found that in pessimistic mode, Apache Ignite data grid is about 44% faster than Hazelcast. Apache Ignite averaged approximately 16,500 transactions per second, while Hazelcast came in at about 11,000 transactions per second.

    Here is a sample graph produced by Yardstick:


    Also, when running Hazelcast benchmarks, the following exception kept popping up in the logs, which keeps me wondering about the consistency of the data cached in Hazelcast overall:
    SEVERE: [172.30.1.95]:57500 [dev] [3.4.2] Lock is not owned by the transaction! Caller: fa705359-7154-4346-a5f2-292e1a2a75a5, Owner: Owner: fa705359-7154-4346-a5f2-292e1a2a75a5, thread-id: 105
    com.hazelcast.transaction.TransactionException: Lock is not owned by the transaction! Caller: fa705359-7154-4346-a5f2-292e1a2a75a5, Owner: Owner: fa705359-7154-4346-a5f2-292e1a2a75a5, thread-id: 105
            at com.hazelcast.map.impl.tx.TxnPrepareBackupOperation.run(TxnPrepareBackupOperation.java:48)
            at com.hazelcast.spi.impl.Backup.run(Backup.java:92)
            at com.hazelcast.spi.impl.BasicOperationService$OperationHandler.handle(BasicOperationService.java:749)
            at com.hazelcast.spi.impl.BasicOperationService$OperationHandler.access$500(BasicOperationService.java:725)
            at com.hazelcast.spi.impl.BasicOperationService$OperationPacketHandler.handle(BasicOperationService.java:699)
            at com.hazelcast.spi.impl.BasicOperationService$OperationPacketHandler.handle(BasicOperationService.java:643)
            at com.hazelcast.spi.impl.BasicOperationService$OperationPacketHandler.access$1500(BasicOperationService.java:630)
            at com.hazelcast.spi.impl.BasicOperationService$BasicDispatcherImpl.dispatch(BasicOperationService.java:582)
            at com.hazelcast.spi.impl.BasicOperationScheduler$OperationThread.process(BasicOperationScheduler.java:466)
            at com.hazelcast.spi.impl.BasicOperationScheduler$OperationThread.doRun(BasicOperationScheduler.java:458)
            at com.hazelcast.spi.impl.BasicOperationScheduler$OperationThread.run(BasicOperationScheduler.java:432)
    
    3

    View comments

  5. Recently I have been doing many benchmarks comparing the incubating Apache Ignitetm project to other products. In this blog I will describe my experience in comparing Apache Ignite Data Grid vs Hazelcast Data Grid.

    Yardstick Framework

    I will be using Yardstick Framework for the benchmarks, specifically Yardstick-Docker extension. Yardstick is an open source framework for performing distributed benchmarks. One of the best things about Yardstick is that it generates graphs at the end, so we can observe how the benchmark behaved throughout the whole execution.

    Transparency

    One of the most important characteristics of any benchmark is full transparency. The code for both, Apache Ignite and Hazelcast benchmarks is provided in the corresponding GIT repos:
    1. https://github.com/yardstick-benchmarks/yardstick-ignite
    2. https://github.com/yardstick-benchmarks/yardstick-hazelcast/
    On startup, Yardstick simply accepts the URL of a GIT repo as a parameter and executes all the benchmarks provided in that repository. This approach makes it really easy to change existing benchmarks or add new ones.

    In the interest of full disclosure, I should also mention that I am one of the committers for Apache Ignite project. However, to the best of my ability, I try to stay away from any opinions and simply state the discovered facts here.

    Hardware

    Both benchmarks were executed on 4 AWS c4.2xlarge instances used as servers and 1 AWS c4.2xlarge instance used as the client and the driver for the benchmark.

    Benchmarks

    Yardstick S3 functionality automatically adds benchmark results to the specified S3 bucket on Amazon S3 store. Moreover, if you run multiple sets of benchmarks, e.g. Apache Ignite and Hazelcast benchmarks, then Yardstick will automatically generate comparison graphs and store them in S3 bucket as well.
    In this benchmark we attempt to compare Data Grid basic cache operations and transactions only. Both, Ignite and Hazelcast have many other features that you can find out on their respective websites.
    After some tweaking and tuning, here is what I found about Ignite and Hazelcast:
    1. Both, Apache Ignite and Hazelcast, support distributed data grids (i.e. distributed partitioned caches). In short, they can be viewed as distributed partitioned key-value in-memory stores.
    2. Both, Apache Ignite and Hazelcast, implement JCache (JSR 107) specification
    3. Both are fairly easy to configure and introduce minimal dependencies into the project. 
    4. Both have redundancy and failover. In the benchmarks, we configure both products with 1 primary and 1 backup copies for each key stored in cache.
    5. Apache Ignite and Hazelcast have different configuration properties, but it is possible to configure them in the same way for the benchmark.
    6. Both have support for ACID transactions. Ignite allows to set OPTIMISTIC or PESSIMISTIC mode for transactions. Hazelcast also can be coded to work in OPTIMISTIC and PESSIMISTIC modes, even though they don't call it that way explicitly.
    7. The querying capabilities of both products are very different. I will be benchmarking them in the nearest future and will describe them in my next blog.

    Basic Atomic Operations

    We compared basic puts and puts-and-gets into the cache.

    The code used for the benchmark execution can be found on GitHub:


    Result:
    We found that both Ignite and Hazelcast exhibit about the same performance with Ignite being about 4% to 7% faster on most of the runs.

    Here are the graphs produced by Yardstick:



    Basic Transaction Operations

    We compared basic transactional puts and puts-and-gets into the cache in OPTIMISTIC mode.

    The code used for the benchmark execution can be found on GitHub:


    Result:
    The performance difference for OPTIMISTIC transactions was much bigger, with Ignite transactions outperforming Hazelcast transactions by about 35% to 45%.

    Here are the graphs produced by Yardstick:



    In my following blogs I will compare the query performance of both products as well and will post my findings.
    7

    View comments


  6. In its 1.0 release Apache Ignitetm added much better streaming support with ability to perform various data transformations, as well as query the streamed data using standard SQL queries. Streaming in Ignite is generally used to ingest continuous large volumes of data into Ignite distributed caches (possibly configured with sliding windows). Streamers can also be used to simply preload large amounts of data into caches on startup.

    Here is an example of processing a stream of random numbers.

    1. The stream gets partitioned to multiple cluster nodes in such a way that same numbers will always be processed on the same node. 
    2. Upon receiving a number, our StreamTransformer will get the current count for that number and increment it by 1.

    try (IgniteDataStreamer<Integer, Long> stmr = ignite.dataStreamer("numbers")) {
        // Allow data updates.
        stmr.allowOverwrite(true);
    
        // Configure data transformation to count random numbers 
        // added to the stream.
        stmr.receiver(StreamTransformer.from((e, arg) -> {
            // Get current count.
            Long val = e.getValue();
    
            // Increment count by 1.
            e.setValue(val == null ? 1L : val + 1);
    
            return null;
        }));
    
        // Stream 10 million of random numbers in the range of 0 to 1000.
        for (int i = 1; i <= 10_000_000; i++) {
            stmr.addData(RAND.nextInt(1000), 1L);
    
            if (i % 500_000 == 0)
                System.out.println("Number of tuples streamed into Ignite: " + i);
        }
    }
    
    As we are streaming the data into the system, we can also query it using standard SQL. In this case, the data type name (in the example below it is "Long") is treated as a table name.

    In the query below, we select 10 most popular numbers out of the stream.
    // Query top 10 most popular numbers every.
    SqlFieldsQuery top10Qry = new SqlFieldsQuery(
        "select _key, _val from Long order by _val desc limit 10");
    
    // Execute query and get the whole result set.
    List<List<?>> top10 = stmCache.query(top10Qry).getAll();
    
    1

    View comments


  7. In this example we will stream text into Apache Ignite and count each individual word. We will also issue periodic SQL queries into the stream to query top 10 most popular words.

    The example will work as follows:
    1. We will setup up a cache to hold the words as they come from a stream.
    2. We will setup a 1 second sliding window to keep the words only for the last 1 second.
    3. StreamWords program will stream text data into Ignite.
    4. QueryWords program will query top 10 words out of the stream.

    Cache Configuration

    We define a CacheConfig class which will provide configuration to be used from both programs, StreamWords and QueryWords. The cache will be a partitioned cache which will store words as values. To guarantee that identical words are cached on the same data node, we use AffinityUuid type for unique cache keys.

    Note that in this example we use a sliding window of 1 second for our cache. This means that words will disappear from cache after 1 second since they were first entered into cache.
    public class CacheConfig {
      public static CacheConfiguration<String, Long> wordCache() {
        CacheConfiguration<String, Long> cfg = new CacheConfiguration<>("words");
    
        // Index individual words.
        cfg.setIndexedTypes(AffinityUuid.class, /*word type*/String.class);
    
        // Sliding window of 1 seconds.
        cfg.setExpiryPolicyFactory(FactoryBuilder.factoryOf(
          new CreatedExpiryPolicy(new Duration(SECONDS, 1))));
    
        return cfg;
      }
    }
    

    Stream Words

    We define a StreamWords class which will be responsible to continuously read words form a local text file ("alice-in-wonderland.txt" in our case) and stream them into Ignite "words" cache.

    Example
    public class StreamWords {
      public static void main(String[] args) throws Exception {
        // Mark this cluster member as client.
        Ignition.setClientMode(true);
    
        try (Ignite ignite = Ignition.start("examples/config/example-ignite.xml")) {
          // The cache is configured with sliding window holding 1 second of the streaming data.
          IgniteCache<AffinityUuid, String> stmCache = ignite.getOrCreateCache(CacheConfig.wordCache());
    
          try (IgniteDataStreamer<AffinityUuid, String> stmr = ignite.dataStreamer(stmCache.getName())) {
            // Stream words from "alice-in-wonderland" book.
            while (true) {
              InputStream in = StreamWords.class.getResourceAsStream("alice-in-wonderland.txt");
    
              try (LineNumberReader rdr = new LineNumberReader(new InputStreamReader(in))) {
                for (String line = rdr.readLine(); line != null; line = rdr.readLine()) {
                  for (String word : line.split(" "))
                    if (!word.isEmpty())
                      // Stream words into Ignite.
                      // By using AffinityUuid as a key, we ensure that identical
                      // words are processed on the same cluster node.
                      stmr.addData(new AffinityUuid(word), word);
                }
              }
            }
          }
        }
      }
    }
    

    Query Words

    We define a QueryWords class which will periodically query word counts form the cache.

    SQL Query
    1. We use standard SQL to query popular words.
    2. Ignite SQL treats Java classes as SQL tables. Since our words are stored as simple String type, the SQL query below queries String table.
    3. Ignite always stores cache keys and values as "_key" and "_val" fields. In our case, "_val" is the word, so we use this syntax in our SQL query.
    Example
    public class QueryWords {
      public static void main(String[] args) throws Exception {
        // Mark this cluster member as client.
        Ignition.setClientMode(true);
    
        try (Ignite ignite = Ignition.start()) {
          IgniteCache<String, Long> stmCache = ignite.getOrCreateCache(CacheConfig.wordCache());
    
          // Select top 10 words.
          SqlFieldsQuery top10Qry = new SqlFieldsQuery(
              "select _val, count(_val) as cnt from String " + 
                "group by _val " + 
                "order by cnt desc " + 
                "limit 10",
              true /*collocated*/
          );
    
          // Query top 10 popular numbers every 5 seconds.
          while (true) {
            // Execute queries.
            List<List<?>> top10 = stmCache.query(top10Qry).getAll();
    
            // Print top 10 words.
            ExamplesUtils.printQueryResults(top10);
    
            Thread.sleep(5000);
          }
        }
      }
    }
    

    Starting Server Nodes

    In order to run the example, you need to start data nodes. In Ignite, data nodes are called server nodes. You can start as many server nodes as you like, but you should have at least 1 in order to run the example.

    Server nodes can be started from command line as follows:
    bin/ignite.sh
    
    You can also start server nodes programmatically, like so:
    public class ExampleNodeStartup {
        public static void main(String[] args) throws IgniteException {
            Ignition.start();
        }
    }
    
    Here is how the output of the QueryWords program looks like on my MacBook Pro laptop (I have started 2 server nodes and one StreamWords program as well)
    ...
    Query results:
    (the,2890)
    (and,1355)
    (to,1298)
    (a,1139)
    (of,1029)
    (said,1002)
    (in,912)
    (she,820)
    (was,766)
    (you,711)
    Query results:
    (the,1679)
    (to,830)
    (and,810)
    (a,680)
    (of,629)
    (she,491)
    (it,357)
    (in,330)
    (said,315)
    (was,274)
    ...
    
    0

    Add a comment


  8. Ever seen a product which has duplicated mirrored APIs for synchronous and asynchronous processing? I never liked such APIs as they introduce extra noise to what otherwise could be considered a clean design. There is really no point to have myMethod() and myMethodAsync() methods while all you are trying to do is to change the mode of method execution from synchronous to asynchronous.

    So how do we approach this problem? How do we make the same API operate synchronously or asynchronously by simply flipping a switch? I believe that Apache Ignite project came up with a very neat and elegant solution to this problem.

    Enhance the Java Futures

    First of all, we need to enhance standard Java futures. Standard java.util.concurrent.Future allows you to cancel a task and wait for its completion synchronously, but it misses the whole point of asynchronous execution, which is to be notified about the completion of the operation asynchronously. Java 8 actually addresses this problem with CompletableFuture abstraction, however, it does a lot more than that and may be an overkill for most of the programming tasks.

    Apache Ignite has an API called IgniteFuture which extends standard java.util.concurrent.Future and adds ability to register asynchronous callbacks and chain callbacks one after another:
    public interface IgniteFuture<V> extends Future<V> {
        ...
    
        /**
         * Registers listener closure to be asynchronously notified whenever future completes.
         *
         * @param lsnr Listener closure to register. If not provided - this method is no-op.
         */
        public void listen(IgniteInClosure<? super IgniteFuture<V>> lsnr);
    
        /**
         * Make a chained future to convert result of this future (when complete) into a new format.
         * It is guaranteed that done callback will be called only ONCE.
         *
         * @param doneCb Done callback that is applied to this future when it finishes to produce chained future result.
         * @return Chained future that finishes after this future completes and done callback is called.
         */
        public <T> IgniteFuture<T> chain(IgniteClosure<? super IgniteFuture<V>, T> doneCb);
        
        ...
    }
    
    Now that we have the truly asynchronous futures, let's see how we can avoid duplicity of synchronous and asynchronous APIs.

    IgniteAsyncSupport

    By default, all API invocations in Apache Ignite are synchronous. From usability standpoint this makes sense, as most of the time we all utilize synchronous APIs and resort to asynchronous ones only when we really have to.

    For whenever we need asynchronous behavior, Apache Ignite has IgniteAsyncSupport interface which is a parent to all the APIs that require both, synchronous and asynchronous mode of operation. In Ignite, such APIs usually have to do with distributed operations and may take longer to comlete, like storing data in distributed caches, or executing a distributed computation.

    The main method here is IgniteAsyncSupport.withAsync() which switches any API into asynchronous mode of operation. Whenever asynchronous mode is enabled, the APIs will always store a future for every previous call on per-thread basis. This way, after having invoked an API in asynchronous mode, you can always get the IgniteFuture for that call and listen for the result asynchronously.

    Here is an example of synchronous and asynchronous computations on the cluster.

    Synchronous Compute:
    // IgniteCompute has synchronous and asynchronous modes.
    IgniteCompute compute = ignite.compute();
    
    // Execute a job synchronously and wait for the result.
    String res = compute.call(() -> "Hello world");
    

    Asynchronous Compute:
    // Enable asynchronous mode (note that the same IgniteCompute API is used).
    IgniteCompute asyncCompute = ignite.compute().withAsync();
    
    // Asynchronously execute a job.
    asyncCompute.call(() -> "Hello world");
    
    // Get the future for the above invocation.
    IgniteFuture<String> fut = asyncCompute.future();
    
    // Asynchronously listen for completion and print out the result.
    fut.listen(f -> System.out.println("Job result: " + f.get()));

    Here is an example of a how asynchronous mode is enabled for distributed caches.

    Synchronous Cache:
    IgniteCache<String, Integer> cache = ignite.jcache("mycache");
    
    // Synchronously store value in cache and get previous value.
    Integer val = cache.getAndPut("1", 1);
    
    Asynchronous Cache:
    // Enable asynchronous mode (note that the same IgniteCache API is used).
    IgniteCache<String, Integer> asyncCache = ignite.jcache("mycache").withAsync();
    
    // Asynchronously store value in cache.
    asyncCache.getAndPut("1", 1);
    
    // Get future for the above invocation.
    IgniteFuture<Integer> fut = asyncCache.future();
    
    // Asynchronously listen for the operation to complete.
    fut.listenAsync(f -> System.out.println("Previous cache value: " + f.get()));
    

    See Ignite documentation for more information about Ignite Asynchronous Mode.



    Apache Ignite is a distributed In-Memory Data Fabric which allows to distribute and cache data in memory, perform distributed computations, streaming, etc... Since most of the supported functionality in Ignite is distributed, having properly implemented asynchronous mode of operation becomes very critical.
    1

    View comments


  9. An easy-to-manage network cluster is a cluster in which all nodes are equal and can be brought up with identical configuration. However, even though all nodes are equal, it often still makes sense to assign application-specific roles to them, like "workers', "clients", or "data-nodes". In Apache Ignite, this concept is called cluster groups.

    Apache Ignite is an In-Memory Data Fabric composed of multiple distributed components with Clustering APIs serving as the main backbone for the rest of the components, including Data Grid, Compute Grid, and Service Grid. I am one of the committers to this project and generally blog quite a bit about it.

    You can create virtual cluster groups in Ignite based on any application-specific custom filter. However, to make things easier, Ignite comes with some predefined filters.


    Select Remote Nodes

    Here is how you can execute a simple closure on all remote nodes. Remote nodes include all cluster members, except for the member who is starting the execution.

    final Ignite ignite = Ignition.ignite();
    
    IgniteCluster cluster = ignite.cluster();
    
    // Get compute instance which will only execute
    // over remote nodes, i.e. not this node.
    IgniteCompute compute = ignite.compute(cluster.forRemotes());
    
    // Broadcast to all remote nodes and print the ID of the node 
    // on which this closure is executing.
    compute.broadcast(() -> System.out.println("Hello Node: " + cluster.localNode().id());
    


    Select Worker Nodes

    You can assign application specific roles to cluster members, like "masters" and "workers", for example. This can be done via user attributes specified on node startup. For example, here is how you can bring up a cluster node with "ROLE" attribute set to "worker":

    IgniteConfiguration cfg = new IgniteConfiguration();
    
    Map<String,String> attrs = Collections.singletonMap("ROLE", "worker");
    
    cfg.setUserAttributes(attrs);
    
    // Start Ignite node.
    Ignite ignite = Ignition.start(cfg);
    

    Then here is how you would execute a closure only over nodes with role "worker":

    IgniteCluster cluster = ignite.cluster();
    
    // Get compute instance which will only execute over "worker" nodes.
    IgniteCompute compute = ignite.compute(cluster.forAttribute("ROLE", "worker"));
    
    // Broadcast to all "worker" nodes and print the ID of the node 
    // on which this closure is executing.
    compute.broadcast(() -> System.out.println("Hello Node: " + cluster.localNode().id());
    


    Custom Cluster Groups

    And finally, you can create custom cluster groups based on any user-defined predicates. Such cluster groups will always only include the nodes that pass the predicate. For example, here is how we would create a group of cluster nodes that have CPU utilization less than 50%:

    // Nodes with less than 50% CPU load.
    ClusterGroup readyNodes = cluster.forPredicate((node) -> node.metrics().getCurrentCpuLoad() < 0.5);
    
    // Broadcast to all nodes with CPU load less than 50% and
    // print the ID of the node on which this closure is executing.
    compute.broadcast(() -> System.out.println("Hello Node: " + cluster.localNode().id());
    

    For more on cluster groups, visit Ignite Cluster Groups documentation.
    5

    View comments


  10. Some of us may have already heard the terms Data Grid and Data Fabric, however, neither of these terms has been well defined in the industry. In this blog, I will try to add some clarity to both terms by outlining some main features for data grids and data fabrics.

    What is a Data Grid


    Often when doing meetup presentations about Apache Ignite, I ask the crowd if anyone has ever heard of what a Data Grid is. I usually get only a few hands. However, when I flip the question and ask what Distributed Caching is, everyone in the room immediately raises their hands and nods in understanding. The reality is that a Data Grid can be viewed as a Distributed Cache with extra features, so if you do know what a Distributed Cache is, you probably already know a lot about Data Grids as well.

    Generally, the term distributed cache means ability to replicate data in memory, so it is accessible from anywhere in the cluster. Data Grids usually accomplish this by partitioning data in memory, where each cluster member is responsible only for its own subset of the data. You can also think of it as a distributed Hash Table. This way, the more servers are available in your cluster, the more data you can cache.

    Data grids are generally known for having a fairly rich feature set on top of in-memory caches. The 3 main features that are absolutely mandatory for any data grid solution are:
    • distributed transactions
    • distributed queries
    • collocation of compute and data

     Without the above 3 features, you cannot really call a product a data grid. Many vendors also differentiate between each other by adding other popular features, including: 
    • SQL support
    • Off-Heap Memory (to avoid lengthy GC pauses)
    • WebSession Caching
    • Hibernate Integration
    • Database Integration 

    Some of the popular Data Grid providers include Apache Ignite (incubating), Hazelcast and Infinispan in the open source space, and Oracle Coherence and GridGain commercial offerings. GridGain is a commercial offering of  the Apache Ignite.

    What is an In-Memory Data Fabric


    In Memory Data Fabrics represent the natural evolution of in-memory computing. Data Fabrics generally take a broader approach to in memory computing, grouping the whole set of in memory computing use cases into a collection of well-defined independent components. Usually a Data Grid is just one of the components provided by a Data Fabric.  Additionally to the data grid functionality, an In-Memory Data Fabric typically also includes a Compute Grid, CEP Streaming, an In-Memory File System, and more.

    The main advantage of an In-Memory Data Fabric is that all of the provided in-memory computing components can be used independently, while being well integrated with each other. For example, in Apache Ignite a Compute Grid knows how to load-balance and schedule computations within a cluster, but when used together with a Data Grid, the Compute Grid will also route all the computations that process data to the cluster members responsible for caching that data. The same goes for Streaming and CEP - when working with streamed data, all the processing happens on the cluster members responsible for caching that data as well.

    Commonly seen features of In-Memory Data Fabrics include:
    • Data Grid (must have for any Data Fabric)
    • Compute Grid
    • Service Grid
    • Streaming & CEP
    • Distributed File System
    • In-Memory Database

    Apache Ignite, an Apache Incubator project, is the only In-Memory Data Fabric available in the Open Source space. GridGain provides a commercial, enterprise edition of Apache Ignite that is targeted toward production, business critical use cases.
    4

    View comments

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.