If you ever looked at Apache Ignite, you have probably noticed that it is a fairly rich platform with lots of components. However, despite the extensive feature set, Ignite community aims to make the platform easy to use and understand. Here is how the Ignite community defines their project:

Apache Ignite is
the in-memory computing platform
that is durable, strongly consistent, and highly available
with powerful SQL, key-value and processing APIs

So, in summary, Ignite looks like a distributed data storage that can work both, in-memory and on-disk, and provides SQL, key-value and processing APIs to the data. Sounds simple enough. However, to get a complete picture, perhaps it is better to define Ignite by answering several "Is Ignite a ...?" questions:

Is Ignite a persistent or pure in-memory storage?

Both. Native persistence in Ignite can be turned on and off. This allows Ignite to store data sets bigger than can fit in the available memory. Essentially, the smaller operational data sets can be stored in-memory only, and larger data sets that do not fit in memory can be stored on disk, using memory as a caching layer for better performance.


Is Ignite an in-memory database (IMDB)?

Yes. Even though Ignite durable memory works well in-memory and on-disk, the disk persistence can be disabled and Ignite can act as a pure distributed in-memory database, with support for SQL and distributed joins.


Is Ignite an in-memory data grid (IMDG)?

Yes. Ignite is a full-featured data grid, which can be used either in pure in-memory mode or with Ignite native persistence. It can also automatically integrate with any 3rd party databases, including any RDBMS or NoSQL stores.


Is Ignite a distributed database?

Yes. Data in Ignite is either partitioned or replicated across a cluster of multiple nodes. This provides scalability and adds resiliency to the system. Ignite automatically controls how data is partitioned, however, users can plugin their own distribution (affinity) functions and collocate various pieces of data together for efficiency.


Is Ignite an SQL database?

Not fully. Although Ignite aims to behave like any other relational SQL database, there are differences in how Ignite handles constraints and indexes. Ignite supports primary and secondary indexes, however, the uniqueness can only be enforced for the primary indexes. Ignite also does not support foreign key constraints.

Essentially, Ignite purposely does not support any constraints that would entail a cluster broadcast message for each update and significantly hurt performance and scalability of the system.


Is Ignite a transactional database? 

Not fully. ACID Transactions are supported, but only at key-value API level. Ignite also supports cross-partition transactions, which means that transactions can span keys residing in different partitions on different servers. At SQL level Ignite supports atomic, but not yet transactional consistency. Ignite community plans to implement SQL transactions in version 2.4.


Is Ignite a key-value store?

Yes. Ignite provides a feature rich key-value API, that is JCache (JSR-107) compliant and supports Java, C++, and .NET.

You can find out more about Ignite by visiting the freshly redesigned Ignite website.


1

View comments

If you ever looked at Apache Ignite, you have probably noticed that it is a fairly rich platform with lots of components. However, despite the extensive feature set, Ignite community aims to make the platform easy to use and understand.

1

Ignite is the in-memory computing platform

that is durable, strongly consistent, and highly available

with powerful SQL, key-value and processing APIs

Starting with 2.1 release, Apache Ignite has become one of a very few in-memory computing systems that provides its own distributed persistence la

5

Today the GridGain team has announced the release of enterprise-grade GridGain In-Memory Data Fabric v. 7.5, based on Apache Ignitetm v. 1.5.

In my previous post I have demonstrated benchmarks for atomic JCache (JSR 107) operations and optimistic transactions between Apache Ignitetm data grid and Hazelcast. In this blog I will focus on benchmarking the pessimistic transactions.

3

Recently I have been doing many benchmarks comparing the incubating Apache Ignitetm project to other products. In this blog I will describe my experience in comparing Apache Ignite Data Grid vs Hazelcast Data Grid.

7

In its 1.0 release Apache Ignitetm added much better streaming support with ability to perform various data transformations, as well as query the streamed data using standard SQL queries.

1

In this example we will stream text into Apache Ignite and count each individual word. We will also issue periodic SQL queries into the stream to query top 10 most popular words.

The example will work as follows:

We will setup up a cache to hold the words as they come from a stream.

Ever seen a product which has duplicated mirrored APIs for synchronous and asynchronous processing? I never liked such APIs as they introduce extra noise to what otherwise could be considered a clean design.

1

An easy-to-manage network cluster is a cluster in which all nodes are equal and can be brought up with identical configuration. However, even though all nodes are equal, it often still makes sense to assign application-specific roles to them, like "workers', "clients", or "data-nodes".

5

Some of us may have already heard the terms Data Grid and Data Fabric, however, neither of these terms has been well defined in the industry. In this blog, I will try to add some clarity to both terms by outlining some main features for data grids and data fabrics.

4

Today, as part of the community of Apache Ignite (incubating), I am proud to announce that we have made the first code drop of the Apache Ignite In-Memory Data Fabric – Apache Ignite v1.0 Release Candidate - available.

3

ChronoTrack is an industry trusted provider of race solutions for race organizers and timing partners. Its hardware and software solutions paired with certified network of partners provide the most comprehensive set of timing, race management and live race services available.

In my previous blogs I have talked at length about 2-Phase-Commit transaction protocol for in memory caches, and how in-memory caches can handle failures a lot more efficiently than disk-based databases.

1

Generally, persistent disk-oriented systems will require the additional 3rd phase in commit protocol in order to ensure data consistency in case of failures.

4

2-Phase-Commit is probably one of the oldest consensus protocols and is known for its deficiencies when it comes to handling failures, as it may indefinitely block the servers waiting in prepare state.

4

I am pleased to announce the release of GridGain Open Source In-Memory Computing Platform 6.2.0. The main components of the platform are: compute grid, data grid (or in-memory distributed cache), and CEP streaming.

For those who are not familiar with GridGain, it is an open source distributed data grid product mainly focusing on distributed in-memory caching,  distributed computations, and streaming.

1

Since the G1 (garbage-first) garbage collector has been released, there were expectations that it would finally perform better for larger heap sizes (>16GB). Unfortunately those expectations were not met.

1

If you prefer a video demo with coding examples, skip to the screencast at the bottom of this blog.

Distributed In-Memory Caching generally allows you to replicate or partition your data in memory across your cluster.

2

If you don't like to read and prefer video demos, you can skip directly to the Screencast at the bottom of this post.

I am pleased to announce that GridGain 6.1.0 has been released today.

1

Having spoken with many customers evaluating our product I am noticing that a majority of folks evaluating in-memory computing, whether it be data grid, map reduce, or streaming, do not know how to appropriately perform benchmarking.

3

Before diving deeper into what it means to easily cluster an application, let's start from defining what  a cluster really is. Wikipedia has a pretty good explanation of clustering here, which is a high level definition that covers fault tolerance, load balancing, scheduling, etc.

As you may already know, GridGain went open source last week. Going open source was a lot more involved than simply opening up our code. We put significant amount of thought into simplifying our APIs and making our development process as community friendly as possible.

Yesterday GridGain released it's 6.0 version under the Apache 2.0 open source license. Our CTO, Nikita Ivanov, wrote about the new GridGain features and licensing in his blog here, so I will not repeat them.

6

Clustering and failing-over web sessions is certainly not a new topic, but time proves that this topic never gets old either.  No e-commerce site wants to loose any customers in case of a server crash.

5

Data loading usually has to do with initializing cache data on startup. However, quite often caches need to be loaded or reloaded periodically and not only on startup.

3

As you may already know, you can use any standard SQL constructs in your queries, including functions like min(), max(), sum(), etc... GridGain also supports custom SQL functions which can be used in your SQL cache queries.

I haven't blogged in a while mainly because of fundraising, lots of hiring, and launching new products (In-Memory Streaming, In-Memory Hadoop Accelerator, and In-Memory NoSQL Accelerator) that kept us busy at GridGain in the past several months.

1

Probably anyone who has ever worked with serialization of objects, be that in Java or any other language, knows that it should be avoided whenever possible. Just like the first rule of distribution is "Do not distribute!", the first rule of serialization should be "Do not serialize!".

31

Overview

GridGain is Java-based middleware for in-memory processing of big data in a distributed environment.

1

GridGain 4.3.1 service release includes several important bug fixes and host of new optimizations. It is 100% backward compatible and it is highly recommended update for anyone running production systems on 4.x code line.

In-memory processing has been a pretty hot topic lately.

9

Here is an example of how you can perform MergeSort on a distributed grid product like GridGain.

I have been getting many questions of how to tune GridGain, so I decided to create a brief manual which covers most important tuning properties.

1.

6

Should you ever delay events in your cluster? Kind of interesting question.

1

A pretty cool locking mechanism which we use every now and then at GridGain is concurrent segmented "striped" locks. Sometimes, your objects are constantly recreated, so you can't really attach a mutex to them.

7

We have added many cool features in GridGain 4.1. One of them is tight integration with Hadoop ecosystem. There are two ways you can integrate with Hadoop.

1

Recently, at one of the customer meetings, I was asked whether GridGain comes with its own database. Naturally my reaction was - why?!? GridGain easily integrates pretty much with any persistent store you wish, including any RDBMS, NoSql, or HDFS stores.

Most of the grid products on the market tout their GUI management consoles for managing and monitoring of their grids. Trying to use a grid without management is the same as trying to drive without GPS navigation - you can get from point A to point B, but it requires a lot more effort.

3

Lately there has been lots of noise about "Real Time" Big Data.

We’ll be talking about using GridGain for highly distributed HPC programming with Scala at NYC Scala Meetup on June 5th.

As one of the main example we will be walking through a realtime word counting program with constantly changing text and will compare it with a Hadoop word counting example.

Latest GridGain releases have had a big focus on remote grid clients, or as we like to call them remote drivers. We have recently added Java/Android and C#/.NET clients with C++ and Objective-C clients right around the corner.

GridGain 4.0.2 has been released today. This is a point release that includes several bug fixes as well as number of new features.

With 4.0.1 we are introducing native support for .NET with our C# Client.

Come and see me talk about Streaming Map Reduce on GridGain and do some live coding of famous Hadoop’s example of counting popular words… but in Real Time context on GridGain. As always – live coding from scratch in Scala is never dull!

See you there.

About me
About me
- Antoine de Saint-Exupery -
- Antoine de Saint-Exupery -
"A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Blog Archive
Blogs I frequent
Loading
Dynamic Views theme. Powered by Blogger.