Apr

14

Benchmarking Data Grids: Apache Ignite vs Hazelcast, Part I
Recently I have been doing many benchmarks comparing the incubating Apache Ignite^tm project to other products. In this blog I will describe my experience in comparing Apache Ignite Data Grid vs Hazelcast Data Grid.

Yardstick Framework

I will be using Yardstick Framework for the benchmarks, specifically Yardstick-Docker extension. Yardstick is an open source framework for performing distributed benchmarks. One of the best things about Yardstick is that it generates graphs at the end, so we can observe how the benchmark behaved throughout the whole execution.

Transparency

One of the most important characteristics of any benchmark is full transparency. The code for both, Apache Ignite and Hazelcast benchmarks is provided in the corresponding GIT repos:
https://github.com/yardstick-benchmarks/yardstick-ignite

https://github.com/yardstick-benchmarks/yardstick-hazelcast/

On startup, Yardstick simply accepts the URL of a GIT repo as a parameter and executes all the benchmarks provided in that repository. This approach makes it really easy to change existing benchmarks or add new ones.

In the interest of full disclosure, I should also mention that I am one of the committers for Apache Ignite project. However, to the best of my ability, I try to stay away from any opinions and simply state the discovered facts here.
Hardware

Both benchmarks were executed on 4 AWS c4.2xlarge instances used as servers and 1 AWS c4.2xlarge instance used as the client and the driver for the benchmark.

Benchmarks

Yardstick S3 functionality automatically adds benchmark results to the specified S3 bucket on Amazon S3 store. Moreover, if you run multiple sets of benchmarks, e.g. Apache Ignite and Hazelcast benchmarks, then Yardstick will automatically generate comparison graphs and store them in S3 bucket as well.

In this benchmark we attempt to compare Data Grid basic cache operations and transactions only. Both, Ignite and Hazelcast have many other features that you can find out on their respective websites.
After some tweaking and tuning, here is what I found about Ignite and Hazelcast:
Both, Apache Ignite and Hazelcast, support distributed data grids (i.e. distributed partitioned caches). In short, they can be viewed as distributed partitioned key-value in-memory stores.

Both, Apache Ignite and Hazelcast, implement JCache (JSR 107) specification

Both are fairly easy to configure and introduce minimal dependencies into the project.

Both have redundancy and failover. In the benchmarks, we configure both products with 1 primary and 1 backup copies for each key stored in cache.

Apache Ignite and Hazelcast have different configuration properties, but it is possible to configure them in the same way for the benchmark.

Both have support for ACID transactions. Ignite allows to set OPTIMISTIC or PESSIMISTIC mode for transactions. Hazelcast also can be coded to work in OPTIMISTIC and PESSIMISTIC modes, even though they don't call it that way explicitly.

The querying capabilities of both products are very different. I will be benchmarking them in the nearest future and will describe them in my next blog.

Basic Atomic Operations
We compared basic puts and puts-and-gets into the cache.

The code used for the benchmark execution can be found on GitHub:

Apache Ignite: IgnitePutBenchmark and IgnitePutGetBenchmark.

Hazelcast: HazelcastPutBenchmark and HazelcastPutGetBenchmark.

Result:
We found that both Ignite and Hazelcast exhibit about the same performance with Ignite being about 4% to 7% faster on most of the runs.

Here are the graphs produced by Yardstick:

Basic Transaction Operations
We compared basic transactional puts and puts-and-gets into the cache in OPTIMISTIC mode.

The code used for the benchmark execution can be found on GitHub:

Apache Ignite: IgnitePutTxBenchmark and IgnitePutGetTxBenchmark.

Hazelcast: HazelcastPutTxBenchmark and HazelcastPutGetTxBenchmark.

Result:
The performance difference for OPTIMISTIC transactions was much bigger, with Ignite transactions outperforming Hazelcast transactions by about 35% to 45%.

Here are the graphs produced by Yardstick:
In my following blogs I will compare the query performance of both products as well and will post my findings.
GridGain - Fastest In-Memory Computing
Posted 14th April 2015 by Anonymous

7
View comments
Apr

8

Streaming and Transforming Data with Apache Ignite
In its 1.0 release Apache Ignite^tm added much better streaming support with ability to perform various data transformations, as well as query the streamed data using standard SQL queries. Streaming in Ignite is generally used to ingest continuous large volumes of data into Ignite distributed caches (possibly configured with sliding windows). Streamers can also be used to simply preload large amounts of data into caches on startup.

Here is an example of processing a stream of random numbers.
1. The stream gets partitioned to multiple cluster nodes in such a way that same numbers will always be processed on the same node.
2. Upon receiving a number, our StreamTransformer will get the current count for that number and increment it by 1.
```
try (IgniteDataStreamer<Integer, Long> stmr = ignite.dataStreamer("numbers")) {
    // Allow data updates.
    stmr.allowOverwrite(true);

    // Configure data transformation to count random numbers 
    // added to the stream.
    stmr.receiver(StreamTransformer.from((e, arg) -> {
        // Get current count.
        Long val = e.getValue();

        // Increment count by 1.
        e.setValue(val == null ? 1L : val + 1);

        return null;
    }));

    // Stream 10 million of random numbers in the range of 0 to 1000.
    for (int i = 1; i <= 10_000_000; i++) {
        stmr.addData(RAND.nextInt(1000), 1L);

        if (i % 500_000 == 0)
            System.out.println("Number of tuples streamed into Ignite: " + i);
    }
}
```
As we are streaming the data into the system, we can also query it using standard SQL. In this case, the data type name (in the example below it is "Long") is treated as a table name.

In the query below, we select 10 most popular numbers out of the stream.
```
// Query top 10 most popular numbers every.
SqlFieldsQuery top10Qry = new SqlFieldsQuery(
    "select _key, _val from Long order by _val desc limit 10");

// Execute query and get the whole result set.
List<List<?>> top10 = stmCache.query(top10Qry).getAll();
```
GridGain - Fastest In-Memory Computing
Posted 8th April 2015 by Anonymous

1
View comments

Apache Ignite Word Count Streaming Example

In this example we will stream text into Apache Ignite and count each individual word. We will also issue periodic SQL queries into the stream to query top 10 most popular words.

The example will work as follows:

We will setup up a cache to hold the words as they come from a stream.
We will setup a 1 second sliding window to keep the words only for the last 1 second.
StreamWords program will stream text data into Ignite.
QueryWords program will query top 10 words out of the stream.

Cache Configuration

We define a CacheConfig class which will provide configuration to be used from both programs, StreamWords and QueryWords. The cache will be a partitioned cache which will store words as values. To guarantee that identical words are cached on the same data node, we use AffinityUuid type for unique cache keys.

Note that in this example we use a sliding window of 1 second for our cache. This means that words will disappear from cache after 1 second since they were first entered into cache.

public class CacheConfig {
  public static CacheConfiguration<String, Long> wordCache() {
    CacheConfiguration<String, Long> cfg = new CacheConfiguration<>("words");

    // Index individual words.
    cfg.setIndexedTypes(AffinityUuid.class, /*word type*/String.class);

    // Sliding window of 1 seconds.
    cfg.setExpiryPolicyFactory(FactoryBuilder.factoryOf(
      new CreatedExpiryPolicy(new Duration(SECONDS, 1))));

    return cfg;
  }
}

Stream Words

We define a StreamWords class which will be responsible to continuously read words form a local text file ("alice-in-wonderland.txt" in our case) and stream them into Ignite "words" cache.

Example

public class StreamWords {
  public static void main(String[] args) throws Exception {
    // Mark this cluster member as client.
    Ignition.setClientMode(true);

    try (Ignite ignite = Ignition.start("examples/config/example-ignite.xml")) {
      // The cache is configured with sliding window holding 1 second of the streaming data.
      IgniteCache<AffinityUuid, String> stmCache = ignite.getOrCreateCache(CacheConfig.wordCache());

      try (IgniteDataStreamer<AffinityUuid, String> stmr = ignite.dataStreamer(stmCache.getName())) {
        // Stream words from "alice-in-wonderland" book.
        while (true) {
          InputStream in = StreamWords.class.getResourceAsStream("alice-in-wonderland.txt");

          try (LineNumberReader rdr = new LineNumberReader(new InputStreamReader(in))) {
            for (String line = rdr.readLine(); line != null; line = rdr.readLine()) {
              for (String word : line.split(" "))
                if (!word.isEmpty())
                  // Stream words into Ignite.
                  // By using AffinityUuid as a key, we ensure that identical
                  // words are processed on the same cluster node.
                  stmr.addData(new AffinityUuid(word), word);
            }
          }
        }
      }
    }
  }
}

Query Words

We define a QueryWords class which will periodically query word counts form the cache.

SQL Query

We use standard SQL to query popular words.
Ignite SQL treats Java classes as SQL tables. Since our words are stored as simple String type, the SQL query below queries String table.
Ignite always stores cache keys and values as "_key" and "_val" fields. In our case, "_val" is the word, so we use this syntax in our SQL query.

Example

public class QueryWords {
  public static void main(String[] args) throws Exception {
    // Mark this cluster member as client.
    Ignition.setClientMode(true);

    try (Ignite ignite = Ignition.start()) {
      IgniteCache<String, Long> stmCache = ignite.getOrCreateCache(CacheConfig.wordCache());

      // Select top 10 words.
      SqlFieldsQuery top10Qry = new SqlFieldsQuery(
          "select _val, count(_val) as cnt from String " + 
            "group by _val " + 
            "order by cnt desc " + 
            "limit 10",
          true /*collocated*/
      );

      // Query top 10 popular numbers every 5 seconds.
      while (true) {
        // Execute queries.
        List<List<?>> top10 = stmCache.query(top10Qry).getAll();

        // Print top 10 words.
        ExamplesUtils.printQueryResults(top10);

        Thread.sleep(5000);
      }
    }
  }
}

Starting Server Nodes

In order to run the example, you need to start data nodes. In Ignite, data nodes are called server nodes. You can start as many server nodes as you like, but you should have at least 1 in order to run the example.

Server nodes can be started from command line as follows:

bin/ignite.sh

You can also start server nodes programmatically, like so:

public class ExampleNodeStartup {
    public static void main(String[] args) throws IgniteException {
        Ignition.start();
    }
}

Here is how the output of the QueryWords program looks like on my MacBook Pro laptop (I have started 2 server nodes and one StreamWords program as well)

...
Query results:
(the,2890)
(and,1355)
(to,1298)
(a,1139)
(of,1029)
(said,1002)
(in,912)
(she,820)
(was,766)
(you,711)
Query results:
(the,1679)
(to,830)
(and,810)
(a,680)
(of,629)
(she,491)
(it,357)
(in,330)
(said,315)
(was,274)
...

Posted 6th April 2015 by Anonymous

Distributed Thoughts

By Dmitriy Setrakyan

Benchmarking Data Grids: Apache Ignite vs Hazelcast, Part I

Yardstick Framework

Transparency

Hardware

Benchmarks

Basic Atomic Operations

Basic Transaction Operations

View comments

Streaming and Transforming Data with Apache Ignite