Download - An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017

Introduction to Apache Ignite and GridGainMandhir Gidda

ROME 24-25 MARCH 2017

© 2015 The Apache Software Foundation. Apache, Apache Ignite, the Apache feather and the Apache Ignite logo are trademarks of The Apache Software Foundation.

In-Memory Computing Platform

Built on

MANDHIR GIDDAEMEA Solutions Architect

Agenda• GridGain & Apache Ignite Project• Ignite In-Memory Computing

Platform• Introduction to Clustering• Data Grid• Compute Grid, Service Grid &

Streaming• Hadoop & Spark Integration

• GridGain Roadmap• Q & A

© 2016 GridGain Systems, Inc.

What is Apache Ignite

High-performance distributed in-memory platform for computing and transacting on large-scale data sets in near real-time.


Apache Ignite Project - Recap• 2007: First version of GridGain (compute

grid)• Oct. 2014: GridGain contributes Ignite to

ASF• Aug. 2015: Ignite is the second fastest

project to graduate after Spark

• Today vs. Feb. 2016: • 28 more contributors: 88+ contributors• Huge development momentum -

Estimated 248 years of effort since the first commit in February, 2014 vs 192 year last Feb. [Openhub]

• 200k more SLOC and 2.5k more commits: 900k+ SLOC & more than 18.5k commits

February 2017

https://www.openhub.net/p/apache-ignite/estimated_cost


• What is GridGain Enterprise Edition?• Is a binary build of Apache Ignite™ created by

GridGain• Added enterprise features for enterprise

deployments• Earlier features and bug fixes by a few weeks• Heavily tested


Customer Use CasesAutomated Trading

SystemsReal time analysis of trading positions & market risk. High volume transactions, ultra low latencies.

Financial ServicesFraud Detection, Risk Analysis, Insurance rating and modelling.

Online & Mobile AdvertisingStream processing, geo-targeting & personalisation & segmentation

Big Data/Visual AnalyticsCustomer 360 view, real-time analysis of KPIs, up-to-the-second operational BI.

Online Gaming Real-time back-ends for mobile and massively parallel games.

SaaS Platforms & AppsHigh performance next-generation architectures for Software as a Service Application vendors.

Travel & E-CommerceHigh performance next-generation architectures for online hotel booking.


What In-Memory Capabilities are Supported?

‣ HPC‣ Machine learning‣ Risk analysis‣ Grid computing

‣ HA API Services‣ Scalable

Middleware

‣ Web-session clustering

‣ Distributed caching‣ In-Memory SQL

‣ Real-time Analytics‣ Big Data‣ Monitoring tools

‣ Big Data‣ Realtime Analytics‣ Batch processing

‣ Distributed In-Memory File System

‣ Node2Node & Topic-based Messaging

‣ Fault Tolerance‣ Multiple backups‣ Cluster groups‣ Auto Rebalancing

‣ Complex event processing

‣ Event driven design

‣ Distributed queues‣ Atomic variables‣ Dist. Semaphore


Introduction to Clustering


Definitions and TerminologyAn Ignite cluster is a group of Ignite nodes working together to accomplish tasks like distributed compute and caching

An Ignite node is a single Ignite process running in a JVM

Many Ignite nodes can live on one physical server or JVM

Ignite nodes can be Clients or Servers . Nodes can be named and logically grouped into Cluster Groups

Server/VM/Container Server/VM/Container

JVM

Ignite

JVM

Ignite

Ignite

JVM

Ignite…


Shared-nothing architecture involves multiple identical nodes forming a cluster with no single master or coordinator

All nodes in a shared-nothing cluster run the exact same processes

Nodes communicate using message passing as Peer to Peer

Unlike Master-Slave, No single point of failure or bottleneck = Linearly scalable & highly available

Operational efficiency at scale

Ignite Clustering

Server/VM/Container

JVMIgnite

Server/VM/Container

JVMIgnite

Server/VM/Container

JVMIgnite

Server/VM/Container

JVMIgnite


An Ignite node can be started as a client or a server.

Server nodes participate in caching and computations. Client nodes can also participate in computations.

Client nodes are used for IgniteAPI operations from the client side such as cache operations, issuing SQL, transactions, and data streaming.

Ignite Clients & Servers

Server Server

Server

Server

ClientClient Client

Data & Compute Nodes

Client Connectors


– Distributed Key-Value Store– Pluggable SPI Design

– E.g FailoverSPI, LoadBalancerSPI– Fault Tolerance and Scalability

– CacheMode, DiscoverySPI – SQL Queries (ANSI 99)– ACID Transactions– In-Memory Indexes– RDBMS / NoSQL Integration– 100% JCache Compliant (JSR 107)

High-level Architecture


In-Memory Data Grid


Data Grid: Cache Modes & Horizontal Scaling

Replicated Cache

All data replicated to each node

Highest availability topology

Every data update propagated to all nodes, scaling can become an issue

Best for scenarios where dataset is small, and high read activity > 80%


Data Grid: Cache Modes & Horizontal Scaling

Partitioned Cache

Most scalable distributed topology

Dataset is divided into partitions, and distributed equally amongst nodes (x TB)

Updates are cheap, 1x Primary partition and optionally 1+ backup partition

Reads expensive if data is not collocated

Near-Cache use is more relevant

Best for large datasets, frequent updates


• Vertical Scale – OnHeap, OffHeap, OffHeap_Values,

Swap Space• Avoid Java GC Collection

Pauses• Off-Heap Indexes

– OffHeap, OffHeap_Values• Full RAM Utilization• I/O + Network Efficiency

• OffHeap to disk or network -> zero copies. OS optimized

• Simple Configuration

Data Grid: Off-Heap Memory


Data Grid: External Persistence

• Read-through & Write-through

• Support for Write-behind• Configurable eviction

policies• LRU, FIFO, Sorted, TTL

• DB schema mapping wizard:• Generates all the XML configuration

and Java POJOs


Data Grid: Cache APIs• Predicate-based Scan Queries• Text Queries based on Lucene

indexing• Query configuration using annotations,

Spring XML or simple Java code• SQL Queries: ANSI-99 Compliant• Memcached (PHP, Java, Python,

Ruby)• HTTP REST API• JDBC & ODBC


• ANSI-99 SQL• In-Memory Indexes (On and

Off-Heap)• Automatic Group By,

Aggregations, Sorting• Cross-Cache Joins, Unions• Use local H2 engine

Data Grid: SQL Support (ANSI 99)

© 2017 GridGain Systems, Inc. GridGain Company Confidential

In-Memory SQL Grid (aka. IMDB)• JDBC and ODBC as a connection point• SQL for data access• DML for Data Modification• DDL for Cache and Index Management• Advantages

– GridGain as a distributed SQL based storage– No need to rewrite application logic– Support of variety of tools and languages– Ad-hoc queries (GeoSpatial)

• Available in Apache Ignite (open source)

Application Layer

GridGain Cluster ANSI SQL-99 DML DDL

ODBC/JDBC

ACID Transactions

Data Layer

Persistent Storage


Data Grid: Transactions• Fully ACID• Support for Transactional &

Atomic • Cross-cache transactions• Optimistic and Pessimistic

concurrency modes with multiple isolation levels

• Deadlock protection (Serializable)• JTA Integration


Distributed Java Structures

Use of java.util.Concurrent• Distributed Map (cache)• Distributed Set• Distributed Queue• CountDownLatch• AtomicLong• AtomicSequence• AtomicReference• Distributed

ExecutorService


Continuous Queries• Execute a query and get

notified on data changes captured in the filter

• Remote filter to evaluate event and local listener to receive notification

• Guarantees exactly once delivery of an event


• IgniteDataStreamer - Collocation & Indexing, millions of objects/sec

• StreamReceiver/StreamVisitor• Sliding Windows for

CEP/Continuous Query,• JMS, Kafka, MQTT, Flume,

Camel data streamer integrations

• Real-time visual analytics/BI

Streaming and CEP


• Create chains of event processors & transform an object through various states• Synchronous or asynchronous execution of remote filters & listeners with thread control

Payment Validator Payment Verifier Payment Processor

Ignite Cache

Event Processing using Ignite (SEDA)


Event Processing using Ignite


Data Grid: Tiered Memory & Local Store

• Tiered Memory• On-Heap -> Off-Heap -> Swap

(volatile)• Persistent On-Disk Store• Fast Recovery• Local Data Reload

• Eliminate Network and Db impacts when reloading in-memory store


Ultimate EditionGridGain Disk Storage


Current GridGain Model

Application LayerMobile Cloud/SaaS Social IoT Enterprise Applications

Data LayerRDBMS NoSQL Hadoop

In-Memory Computing Layer Data Grid Compute Grid Streaming Service Grid ANSI SQL-99 File System Spark Integration

Unified API

ACID Transactions


Current GridGain Model

Application LayerMobile Cloud/SaaS Social IoT Enterprise Applications

Data LayerRDBMS NoSQL Hadoop

In-Memory Computing Layer Data Grid Compute Grid Streaming Service Grid ANSI SQL-99 File System DML/DDL Spark Integration

Unified API

ACID Transactions

Data Layer Disk Storage


• All the features of Enterprise Edition Including

– Persistent Data Store– Ability to query memory and

disk together – Instantaneous Restarts– Full and Incremental Cluster

Snapshots– Point-In-Time Recovery

• Special license is required

Ultimate Edition: Features Set


• Multiple (up to 32) Data Centres

• Complex Replication Technologies

• Active-Active & Active-Passive

• Smart Conflict Resolution• Durable Persistent Queues• Automatic Throttling• GridGain Enterprise

Data Grid: DC Replication


In-Memory Compute Grid & the rest


Client-Server vs. Affinity Colocation

12

43 Data 1

Job 1

2

3Data 2

Job 2

ProcessingNode 1

ProcessingNode 2

ClientNode

Data Node 1

Data Node 2

ProcessingNode 1

1

3

4Data 1

Data 2

2

2

1. Initial Request2. Fetch data from remote

nodes3. Process entire data-set4. Return to client

1. Initial Request2. Co-locating processing with data3. Return partial result4. Reduce & return to client


• Direct API for MapReduce (ComputeTask, map(), result(), reduce() )

• Cron-like Task Scheduling• State Checkpoints• Load Balancing

• Round-robin• Random & weighted• Probe

• Automatic Failover - FailoverSPI• Per-node Shared State• Zero Deployment

• P2P distributed class loading• Inter-node bytecode transfer

In-Memory Compute Grid


• Distributed Closures• Java lambda expressions• Java Runnable(s)/Callable(s)

• ExecutorService (JDK)• Distributed, Fault Tolerant,

Load Balanced• Sync or Async• Task Deployment (GAR)

In-Memory Compute Grid


Messaging & Events

• Topic-based messaging• Ordered & Unordered

messages• Local & Remote message

listeners• Local & Remote event

listeners• Trigger actions from any cluster

events or operations• Query events via

IgniteEvents API• Full Events logging not best

practice


• Deploy arbitrary user-defined services on cluster

• Control (SLA) how many service instances are deployed on each cluster or node

• Automatically ensure deployment and fault tolerance

• Singletons on the Cluster– Cluster Singleton– Node Singleton– Key Singleton

• Guaranteed Availability– Auto Redeployment in Case of

Failures

In-Memory Service Grid


• Resilience - Build an in-memory resilient service layer between your client application and the grid

• Shielding- Only expose application APIs and not direct grid APIs

• Continuations - Call services internally via compute tasks to create service chains

In-Memory Service Grid


Hadoop & Spark Integration


• Ignite In-Memory File System (IGFS)

– Hadoop-compliant– Easy to Install– On-Heap and Off-Heap– Caching Layer for HDFS– Write-through and Read-through

HDFS– Any Hadoop distribution– Performance Boost

IGFS: In-Memory File System

MR HIVE PIG

In-Memory MapReduce

IGFS

HDFS

IGFS

YARN }Any Hadoop Distro


Hadoop Accelerator: Map Reduce

• In-Memory Performance• Zero Code Change

• Use existing MR code• Use existing Hive queries

• No Name Node• No Network Noise• In-Process Data Colocation• Eager Push Scheduling

User Application

Hadoop Client

Ignite Client

Hadoop Jobtracker

Hadoop Name Node

Hadoop Tasktracker

Hadoop Tasktracker

Ignite Data Node (IGFS)

Ignite Data Node (IGFS)

Hadoop Data Node

(HDFS)

Hadoop Data Node

(HDFS)Ignite PathHadoop Path


• IgniteRDD – Share RDD across jobs on

the host– Share RDD across jobs in

the application– Share RDD globally

• Faster SQL– In-Memory Indexes– SQL on top of Shared RDD

Spark & Ignite IntegrationSpark Application

Spark Worker

Spark Job

Spark Job

Ignite Node

Yarn Mesos Docker Cloud

Server

Spark Worker

Spark Job

Spark Job

Ignite Node

Server

Spark Worker

Spark Job

Spark Job

Ignite Node

Server

In-Memory Shared RDDs


• Docker• Amazon AWS• Azure Marketplace• Google Cloud• Apache JClouds• Mesos• YARN• Apache Karaf (OSGi)

Deployment


Thank You!

www.gridgain.comhttps://ignite.apache.org

@gridgain #gridgainThank you for joining us. Follow the

conversation.

Author: Mandhir Gidda

http://www.gridgain.com/

https://ignite.apache.org/

Download - An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017

Top Related