in-memory computing essentials for architects and engineers
TRANSCRIPT
© 2017 GridGain Systems, Inc.
In-Memory Performance
Durability of Disk
© 2017 GridGain Systems, Inc.
In-Memory Computing Essentials
for Java Developers
Denis MagdaIgnite PMC Chair
GridGain Director of Product Management
© 2017 GridGain Systems, Inc.
• Apache Ignite Overview
• Clustering and Deployment
• Distributed Storage
• Distributed SQL
• Distributed Computations
• Machine Learning
• Memory Architecture & Persistence
Agenda
© 2017 GridGain Systems, Inc.
Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancialServices
Pharma &Healthcare
E-CommerceTravel & Logistics
Telco
© 2017 GridGain Systems, Inc.
Clustering and Deployment
© 2017 GridGain Systems, Inc.
Clustering
• Server Nodes
• Act as containers for data and computations
• Generally started as standalone processes
• Client Nodes
• Provide a cluster entry point to run operations
• Embedded in applications code
© 2017 GridGain Systems, Inc.
Deployment
• Nodes are logical entities
• Runs in a JVM process
• Many nodes in a single JVM process
• On-Premise and Cloud
• Physical server or VM
• AWS, Azure, Google Compute Engine
• Kubernetes, Mesos, YARN
© 2017 GridGain Systems, Inc.
Distributed Storage
© 2017 GridGain Systems, Inc.
Distributed Storage
JCache Transactions Compute SQL
RDBMS
NoSQL
HDFS
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
3rd party storage caching
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
? ?
© 2017 GridGain Systems, Inc.
Key to Node Mapping
Key Partition
Server Node
ON-DISK
© 2017 GridGain Systems, Inc.
Caches and Partitions
K1, V1
K2, V2
K3, V3
K4, V4
Partition 1
K5, V5
K6, V6
K7,V7
K8, V8 K9, V9
Partition 2
Cache
© 2017 GridGain Systems, Inc.
Partitions Distribution
Ignite Node 1 Ignite Node 2
0 2 4 6 8
10 12 14
1 3 5 7 9
11 13 15
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
0 2 4 1 3 5
? ?
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
0 2 4 1 3 5
© 2017 GridGain Systems, Inc.
Backup Copies
Ignite Node Ignite Node
Ignite Node Ignite Node
0 1
2 3
© 2017 GridGain Systems, Inc.
Backup Copies
Ignite Node Ignite Node
Ignite Node Ignite Node
0 1
2 3
0
1
2
3
© 2017 GridGain Systems, Inc.
Distributed SQL
© 2017 GridGain Systems, Inc.
Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Tools
© 2017 GridGain Systems, Inc.
Connectivity
• JDBC
• ODBC
• REST
• Java, .NET and C++ APIs
// Register JDBC driver.
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Open the JDBC connection.
Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50");
./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/
© 2017 GridGain Systems, Inc.
Data Definition Language
• CREATE/DROP TABLE
• CREATE/DROP INDEX
• ALTER TABLE
• Changes Durability
• Ignite Native Persistence
CREATE TABLE `city` (
`ID` INT(11),
`Name` CHAR(35),
`CountryCode` CHAR(3),
`District` CHAR(20),
`Population` INT(11),
PRIMARY KEY (`ID`, `CountryCode`)
) WITH "template=partitioned, backups=1, affinityKey=CountryCode";
© 2017 GridGain Systems, Inc.
Data Manipulation Language
• ANSI-99 specification
• Fault-tolerant and consistent
• INSERT, UPDATE, DELETE
• SELECT
• JOINs
• Subqueries
SELECT country.name, city.name, MAX(city.population) as max_pop
FROM country JOIN city ON city.countrycode = country.code
WHERE country.code IN ('USA','RUS','CHN')
GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;
© 2017 GridGain Systems, Inc.
Affinity Collocation
Country
Languag
eCity
Server Node
ON-DISK
Server Node
ON-DISK
key (country = 5) 10
Partition
key (cityId = 10, countryId = 5)10
Partition
key (cityId = 11, countryId = 9) 12
Partition
© 2017 GridGain Systems, Inc.
Collocated Joins
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
Ignite Node
Canada
Toronto
Ottawa
Montreal
Calgary
Ignite Node
IndiaMumbai
New Delhi
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
23
© 2017 GridGain Systems, Inc.
Non-Collocated Joins
1. Initial Query
2. Query execution (local + remote data)
3. Potential data movement
4. Reduce multiple results in one
Ignite Node
Canad
a
Toronto
Calgary
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
24 Ignite Node
India
Montreal
Ottawa
3Montreal
Ottawa
Mumbai
New Delhi
© 2017 GridGain Systems, Inc.
Distributed Computations
© 2017 GridGain Systems, Inc.
Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Result
in T/2 time
Automatic Failover
Load Balancing
Zero Deployment
© 2017 GridGain Systems, Inc.
1. Initial Request
2. Fetch data from remote
nodes
3. Process entire data-set
3
1
Data 12
2 Data 2
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
1. Initial Request
2. Co-located processing with
data
3. Reduce multiple results in
one
2
2
1Client Node
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
© 2017 GridGain Systems, Inc.
Machine Learning
© 2017 GridGain Systems, Inc.
Genetic Algorithm Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated
Computation
Biological Evolution
SimulationChromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation
C = Crossover
M = Mutation
© 2017 GridGain Systems, Inc.
Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Scala REST
Random ForestDistributed Algorithms
Dense and Sparse
Algebra
Large Scale
Parallelization
Multi-Language
Support
Dense and Sparse
Algebra
No ETL
© 2017 GridGain Systems, Inc.
Memory Architecture & Persistence
© 2017 GridGain Systems, Inc.
Durable Memory
Off-heap Removes
noticeable GC
pauses
Automatic
Defragmentation
Stores
Superset of
Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Ignite Cluster
Instantaneous
Restarts
© 2017 GridGain Systems, Inc.
© 2017 GridGain Systems, Inc.
Regions and Segments
• Memory split into regions
• Regions split into segments
• Segments include pages
© 2017 GridGain Systems, Inc.
B+Tree
• Self-balancing tree
• Memory & Disk
• Sorted Index
• Secondary Indexes
• Hash Index
• Primary Keys
• Hash code based sorting
© 2017 GridGain Systems, Inc.
Free Lists
• Tracks pages of ~ equal free space
• 25% free
• 75% free
• Essential for updates
• Gives page with min size needed
• Reduces fragmentation
• Lowers pages compaction activity
© 2017 GridGain Systems, Inc.
Ignite Native Persistence
1. Update
RA
M
2. Persist
Write-Ahead Log
Partition File 1
3. Ack
4. Checkpointing
Partition File N
Server Node
© 2017 GridGain Systems, Inc.
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite
#denismagda