in-memory computing essentials for architects and engineers

39
© 2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk

Upload: denis-magda

Post on 21-Jan-2018

247 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

In-Memory Performance

Durability of Disk

Page 2: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

In-Memory Computing Essentials

for Java Developers

Denis MagdaIgnite PMC Chair

GridGain Director of Product Management

Page 3: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

• Apache Ignite Overview

• Clustering and Deployment

• Distributed Storage

• Distributed SQL

• Distributed Computations

• Machine Learning

• Memory Architecture & Persistence

Agenda

Page 4: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Apache Ignite In-Memory Computing Platform

Memory-Centric Storage

Ignite Native Persistence(Flash, SSD, Intel 3D XPoint)

Third-Party Persistence(RDBMS, HDFS, NoSQL)

SQL Transactions Compute Services MLStreamingKey/Value

IoTFinancialServices

Pharma &Healthcare

E-CommerceTravel & Logistics

Telco

Page 5: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Clustering and Deployment

Page 6: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Clustering

• Server Nodes

• Act as containers for data and computations

• Generally started as standalone processes

• Client Nodes

• Provide a cluster entry point to run operations

• Embedded in applications code

Page 7: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Deployment

• Nodes are logical entities

• Runs in a JVM process

• Many nodes in a single JVM process

• On-Premise and Cloud

• Physical server or VM

• AWS, Azure, Google Compute Engine

• Kubernetes, Mesos, YARN

Page 8: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Distributed Storage

Page 9: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Distributed Storage

JCache Transactions Compute SQL

RDBMS

NoSQL

HDFS

Server Node

Distributed Key-Value Store

Dynamic

Scaling

Distributed

partitioned

hash map

ACID TransactionJCache & SQL

Server Node Server Node

3rd party storage caching

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Page 10: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Where Entry Goes?

Ignite Node 1 Ignite Node 2

put (key, value)

? ?

Page 11: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Key to Node Mapping

Key Partition

Server Node

ON-DISK

Page 12: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Caches and Partitions

K1, V1

K2, V2

K3, V3

K4, V4

Partition 1

K5, V5

K6, V6

K7,V7

K8, V8 K9, V9

Partition 2

Cache

Page 13: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Partitions Distribution

Ignite Node 1 Ignite Node 2

0 2 4 6 8

10 12 14

1 3 5 7 9

11 13 15

Page 14: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Where Entry Goes?

Ignite Node 1 Ignite Node 2

put (key, value)

0 2 4 1 3 5

? ?

Page 15: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Where Entry Goes?

Ignite Node 1 Ignite Node 2

put (key, value)

0 2 4 1 3 5

Page 16: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Backup Copies

Ignite Node Ignite Node

Ignite Node Ignite Node

0 1

2 3

Page 17: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Backup Copies

Ignite Node Ignite Node

Ignite Node Ignite Node

0 1

2 3

0

1

2

3

Page 18: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Distributed SQL

Page 19: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Distributed SQL

JDBC ODBC SQL API

Java .NET C++ BI

SELECT, UPDATE,

INSERT, MERGE,

DELETE, CREATE

and ALTER

DDL, DML Support

Cross-platform

Compatibility

Indexes in

RAM or Disk

Dynamic

Scaling

Server Node Server NodeServer Node

Apache Ignite Cluster

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Tools

Page 20: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Connectivity

• JDBC

• ODBC

• REST

• Java, .NET and C++ APIs

// Register JDBC driver.

Class.forName("org.apache.ignite.IgniteJdbcThinDriver");

// Open the JDBC connection.

Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50");

./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/

Page 21: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Data Definition Language

• CREATE/DROP TABLE

• CREATE/DROP INDEX

• ALTER TABLE

• Changes Durability

• Ignite Native Persistence

CREATE TABLE `city` (

`ID` INT(11),

`Name` CHAR(35),

`CountryCode` CHAR(3),

`District` CHAR(20),

`Population` INT(11),

PRIMARY KEY (`ID`, `CountryCode`)

) WITH "template=partitioned, backups=1, affinityKey=CountryCode";

Page 22: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Data Manipulation Language

• ANSI-99 specification

• Fault-tolerant and consistent

• INSERT, UPDATE, DELETE

• SELECT

• JOINs

• Subqueries

SELECT country.name, city.name, MAX(city.population) as max_pop

FROM country JOIN city ON city.countrycode = country.code

WHERE country.code IN ('USA','RUS','CHN')

GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;

Page 23: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Affinity Collocation

Country

Languag

eCity

Server Node

ON-DISK

Server Node

ON-DISK

key (country = 5) 10

Partition

key (cityId = 10, countryId = 5)10

Partition

key (cityId = 11, countryId = 9) 12

Partition

Page 24: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Collocated Joins

1. Initial Query

2. Query execution over local data

3. Reduce multiple results in one

Ignite Node

Canada

Toronto

Ottawa

Montreal

Calgary

Ignite Node

IndiaMumbai

New Delhi

1 SELECT ct.name, c.name

FROM Country as ct

JOIN City as c ON ct.id = c.countryId

WHERE ct.name = “Canada”;

2

23

Page 25: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Non-Collocated Joins

1. Initial Query

2. Query execution (local + remote data)

3. Potential data movement

4. Reduce multiple results in one

Ignite Node

Canad

a

Toronto

Calgary

1 SELECT ct.name, c.name

FROM Country as ct

JOIN City as c ON ct.id = c.countryId

WHERE ct.name = “Canada”;

2

24 Ignite Node

India

Montreal

Ottawa

3Montreal

Ottawa

Mumbai

New Delhi

Page 26: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Distributed Computations

Page 27: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Compute Grid

DURABLE MEMORY

DURABLE MEMORY

Ignite Cluster

C1

R1

C2

R2

C = C1 + C2

R = R1 + R2

C = Compute

R = Result

in T/2 time

Automatic Failover

Load Balancing

Zero Deployment

Page 28: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

1. Initial Request

2. Fetch data from remote

nodes

3. Process entire data-set

3

1

Data 12

2 Data 2

Client-Server Processing Co-located Processing

Server Node

ON-DISK

Server Node

ON-DISK

1. Initial Request

2. Co-located processing with

data

3. Reduce multiple results in

one

2

2

1Client Node

Server Node

ON-DISK

Server Node

ON-DISK

Client Node

3

Page 29: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Machine Learning

Page 30: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Genetic Algorithm Grid

DURABLE MEMORY

DURABLE MEMORY

Ignite Cluster

F2, C2, M2

F = F1 + F2

C = C1 + C2

Collocated

Computation

Biological Evolution

SimulationChromosome and Genes Cluster

M = M1 + M2

F1, C1, M1

F = Fitness Calculation

C = Crossover

M = Mutation

Page 31: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Machine Learning Grid

K-Means Regressions Decision Trees

R C++ Python Java

Server Node Server NodeServer Node

Distributed Core Algebra

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Scala REST

Random ForestDistributed Algorithms

Dense and Sparse

Algebra

Large Scale

Parallelization

Multi-Language

Support

Dense and Sparse

Algebra

No ETL

Page 32: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Memory Architecture & Persistence

Page 33: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Durable Memory

Off-heap Removes

noticeable GC

pauses

Automatic

Defragmentation

Stores

Superset of

Data

Predictable memory

consumption

Fully Transactional

(Write-Ahead Log)

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Server Node Server Node Server Node

Ignite Cluster

Instantaneous

Restarts

Page 34: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Page 35: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Regions and Segments

• Memory split into regions

• Regions split into segments

• Segments include pages

Page 36: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

B+Tree

• Self-balancing tree

• Memory & Disk

• Sorted Index

• Secondary Indexes

• Hash Index

• Primary Keys

• Hash code based sorting

Page 37: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Free Lists

• Tracks pages of ~ equal free space

• 25% free

• 75% free

• Essential for updates

• Gives page with min size needed

• Reduces fragmentation

• Lowers pages compaction activity

Page 38: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Ignite Native Persistence

1. Update

RA

M

2. Persist

Write-Ahead Log

Partition File 1

3. Ack

4. Checkpointing

Partition File N

Server Node

Page 39: In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.

Any Questions?

Thank you for joining us. Follow the conversation.

http://ignite.apache.org

#apacheignite

#denismagda