Distributed Thoughts: Benchmarking Data Grids: Apache Ignite vs Hazelcast, Part I

Recently I have been doing many benchmarks comparing the incubating Apache Ignite^tm project to other products. In this blog I will describe my experience in comparing Apache Ignite Data Grid vs Hazelcast Data Grid.

Yardstick Framework

I will be using Yardstick Framework for the benchmarks, specifically Yardstick-Docker extension. Yardstick is an open source framework for performing distributed benchmarks. One of the best things about Yardstick is that it generates graphs at the end, so we can observe how the benchmark behaved throughout the whole execution.

Transparency

One of the most important characteristics of any benchmark is full transparency. The code for both, Apache Ignite and Hazelcast benchmarks is provided in the corresponding GIT repos:

On startup, Yardstick simply accepts the URL of a GIT repo as a parameter and executes all the benchmarks provided in that repository. This approach makes it really easy to change existing benchmarks or add new ones.

In the interest of full disclosure, I should also mention that I am one of the committers for Apache Ignite project. However, to the best of my ability, I try to stay away from any opinions and simply state the discovered facts here.

Hardware

Both benchmarks were executed on 4 AWS c4.2xlarge instances used as servers and 1 AWS c4.2xlarge instance used as the client and the driver for the benchmark.

Benchmarks

Yardstick S3 functionality automatically adds benchmark results to the specified S3 bucket on Amazon S3 store. Moreover, if you run multiple sets of benchmarks, e.g. Apache Ignite and Hazelcast benchmarks, then Yardstick will automatically generate comparison graphs and store them in S3 bucket as well.

In this benchmark we attempt to compare Data Grid basic cache operations and transactions only. Both, Ignite and Hazelcast have many other features that you can find out on their respective websites.

After some tweaking and tuning, here is what I found about Ignite and Hazelcast:

Both, Apache Ignite and Hazelcast, support distributed data grids (i.e. distributed partitioned caches). In short, they can be viewed as distributed partitioned key-value in-memory stores.
Both, Apache Ignite and Hazelcast, implement JCache (JSR 107) specification
Both are fairly easy to configure and introduce minimal dependencies into the project.
Both have redundancy and failover. In the benchmarks, we configure both products with 1 primary and 1 backup copies for each key stored in cache.
Apache Ignite and Hazelcast have different configuration properties, but it is possible to configure them in the same way for the benchmark.
Both have support for ACID transactions. Ignite allows to set OPTIMISTIC or PESSIMISTIC mode for transactions. Hazelcast also can be coded to work in OPTIMISTIC and PESSIMISTIC modes, even though they don't call it that way explicitly.
The querying capabilities of both products are very different. I will be benchmarking them in the nearest future and will describe them in my next blog.

Basic Atomic Operations

We compared basic puts and puts-and-gets into the cache.

The code used for the benchmark execution can be found on GitHub:

Apache Ignite: IgnitePutBenchmark and IgnitePutGetBenchmark.
Hazelcast: HazelcastPutBenchmark and HazelcastPutGetBenchmark.

Result:
We found that both Ignite and Hazelcast exhibit about the same performance with Ignite being about 4% to 7% faster on most of the runs.

Here are the graphs produced by Yardstick:

Basic Transaction Operations

We compared basic transactional puts and puts-and-gets into the cache in OPTIMISTIC mode.

The code used for the benchmark execution can be found on GitHub:

Apache Ignite: IgnitePutTxBenchmark and IgnitePutGetTxBenchmark.
Hazelcast: HazelcastPutTxBenchmark and HazelcastPutGetTxBenchmark.

Result:
The performance difference for OPTIMISTIC transactions was much bigger, with Ignite transactions outperforming Hazelcast transactions by about 35% to 45%.

Here are the graphs produced by Yardstick:

In my following blogs I will compare the query performance of both products as well and will post my findings.

7 comments:

UnknownJune 2, 2015 at 1:37 AM
I am curious to know following about this benchmark:

1. From where these operations are performed, from server or from client (for both Ignite and Hazelcast)? I am guessing they are from server.

2. What was the duration of operations in a benchmark run? The snapshot shows 60 sec, however the configuration in code says "Note that each benchmark is set to run for 300 seconds (5 mins) with warm-up set to 60 seconds (1 minute)".
UnknownJune 2, 2015 at 7:42 PM
1. The benchmarks were performed from the server, but I have also tried running them from a client and the results were similar.

2. When I ran benchmarks, I shortened their duration, as in this case it had no effect on the actual performance for either product.

The benchmarks are stored in the Yardstick repository and are absolutely transparent. You can play with different configuration and share your findings here.
UnknownJune 23, 2015 at 11:09 PM
Thanks for the responses. You mentioned that you used partitioned cache with 1 backup (i.e 1 primary and 1 backup), i am interested in more details:

1. What is the partition/bucket counts for the cache?

2. You used FULL_SYNC mode. Here, does the thread doing operation waits for primary and all secondaries to update?

3. Is it 'master-slave' model where only primary is responsible to process events, or is it 'master-master' model where any partition can process the events?

Thanks!
UnknownOctober 28, 2015 at 11:08 AM
It seems you ran the benchmarks with HZ 3.4 which was the version available back in April. In 3.5 there were several fixes and improvements. Is it possible to execute the TX benchmarks again but using 3.5?

Also, I can't see the transaction aspect in IgnitePutBenchmark.
UnknownJuly 6, 2017 at 1:34 AM
I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........

Tuesday, April 14, 2015

Benchmarking Data Grids: Apache Ignite vs Hazelcast, Part I