imc summit 2016 breakout - amit golander - the benefits of memory and storage convergence to...

21
IMC BENEFITS FROM MEMORY & STORAGE CONVERGENCE DR. AMIT GOLANDER PLEXISTOR, CTO

Upload: in-memory-computing-summit

Post on 21-Jan-2017

95 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

IMC BENEFITS FROM

MEMORY & STORAGE CONVERGENCE

DR. AMIT GOLANDER PLEXISTOR, CTO

Page 2: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

Data

set

ABSTRACT

In-memory compute gave up on Storage and moved the working set to Memory.

This brings tremendous performance gains, but also:

1. Consumes expensive DRAM resources

2. Puts data at risk

3. Suffers from slow recovery time when power failures occur

The big Question:

How will IMC look like when Memory and Storage converge?

Working

set

2

Page 3: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

Agenda:

History & The convergence of Memory & Storage

Benefits – Out-of-the-box

Benefits – That require some work

3

Page 4: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

A LONG TIME AGO…

Ideal

Storage

Requirements for Ideal Storage:

1. Low latency reads

2. High volume persistent writes

3. Reasonable cost

4. Transparent & easy to use

Co

st

L

atency

Persistency

DRAM

HDD

SSD Unfortunately such Storage (#2) did not exist

Big Data Middleware

4

Page 5: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

SO MIDDLEWARE DEVELOPERS & USERS COMPROMISED

Commit

Log

Memory

Table

Storage

Table Persistent,

Pretty Fast

Cheap

Fast

Sear

ch

acce

lera

tion

1. Storage had Horrible latency for persistent writes,

but not as bad if sequentially written

2. So IMC middleware compensated by using:

- Sequential writes at the expense of read latency

- Async writes at the risk of data loss

- Caching like crazy at the expanse of HW cost (DRAM)

- Write amplification at the expanse of HW cost (Storage)

- Compaction at the expense of HW cost (CPU)

Original requirements Vs. IMC reality:

1. Low latency reads

2. High volume (eventual) persistent writes

3. Reasonable cost

4. Transparent & easy to use 5

Page 6: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

WHAT HAS CHANGED?

Memory & Storage are converging:

New HW - Persistent Memory (PM, e.g. NVDIMM-N)

New SW - Software Defined Memory (SDM)

Persistency

DRAM

HDD

SSD

PM

PM+SDM delivers:

1. Low latency reads

2. High Volume persistent writes

3. Reasonable cost

4. Transparent & easy to use Co

st

L

atency

SDM

SDM-ephemeral delivers:

1. Low latency reads

2. High volume persistent* writes

3. Reasonable cost

4. Transparent & easy to use**

* Persistent on orderly shutdowns, not power failures

** Easy to use within share nothing architectures Persistency

DRAM

HDD

SSD

Co

st

L

atency

SDM-ephemeral

6

Page 7: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

HOW TO LEVERAGE SDM?

SDM

Scenario II

New Middleware / Some work to existing

Scenario I

Existing Middleware – Out of the box

SDM SDM

SDM

7

Page 8: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

Agenda:

History & the convergence of Memory & Storage

Benefits – Out-of-the-box

Benefits – That require some work

8

Page 9: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

OUT OF THE BOX INTEGRATION

DRAM/PM FLASH DISK

I/O Path Memory Path

Fast Storage Huge Memory

Data Services

Virtual Memory HDFS POSIX

Plexistor FS (Multi Tier, DAX)

Linux

1. Download & Install SDM

2. Mount m1fs

3. Run your application

9

Page 10: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

OOB BENEFIT 1: LARGE WORKING SETS

Work set 2x Memory size

SDM at 17,000 ops/sec

XFS at 2,000 ops/sec

Performance is highly sensitive to

Working set size > Aggregated memory size

Working set size is dynamic and hard to predict

Large clusters are expensive

Cassandra v3.0.2

I2.4xlarge instance on AWS

Data

set

Working

set

10

Page 11: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

OOB BENEFIT 2: PERSISTENCY

Performance is highly sensitive to persistency/durability requirements

Replication/Mirroring between nodes without persistency is vulnerable to Power Failures

Data loss risk is often not well explained. Confusion leads to wasteful behavior (#copies, Network)

0

30,000

60,000

90,000

120,000

150,000

180,000

Op

s /

sec

The Traditional Tradeoff

(B) Balanced (D) Durable

MongoDB v3.2

E5-2650v3, CloudSpeed SSD

*

(*) – This actually writes two persistent copies: in Memory Table and in Commit Log

11

Page 12: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

OOB BENEFIT 3: LONG RE-BUILD TIMES

Nodes occasionally fail in large clusters

Re-build take many hours to complete

due to extra pressure on the storage layer Clients Clients

Couchbase

server

Couchbase

server

Couchbase

server

Couchbase

server

Couchbase

server X

Couchbase v4.5 beta

E5-2650v4, CloudSpeed SSD

12

Page 13: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

OOB BENEFIT 4: PREDICTABILITY

No hiccups due to separate memory and storage stacks

Highly predictable performance

time

TPS

MySQL v5.6

E5-2680v3, HGST SN150

DB load generator runs at target (not maximal) speed 13

Page 14: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

Agenda:

History & the convergence of Memory & Storage

Benefits – Out-of-the-box

Benefits – That require some work

14

Page 15: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

BENEFITS THAT REQUIRE WORK AT THE MIDDLEWARE LAYER

A lot of potential for Fast Queries & Simplicity

SDM

Storage

Big Data middleware

File-level FIO

E5-2650v3, CloudSpeed SSD 15

Page 16: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

EXAMPLE - AMPOOL

16

• Fast & Standard access throughout

the data pipeline

• 56x faster ingest

3-4x faster OLTP&OLAP than HBase

6x faster Spark than Tachyon

Page 17: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

DESIGNING MIDDLEWARE IN THE SDM ERA

1. Realize that you’re a storage/memory billionaire

– focus on your business logic

2. Use standard POSIX API and share files between frameworks (polyglot)

3. Use SDM zero-cost Clones (cp –reflink)

4. Rely on SDM Auto-tiering (If you must – hint via fadvise/madvise)

5. Consider relying on SDM Mirroring capabilities

6. Use SDM monitoring tools to understand your resource consumption

17

Page 18: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

SUMMARY

Memory and Storage have already started converging (SDM)

IMC best practices are no longer the “best”

SDM provides value to IMC out-of-the-box

but

There is even greater opportunity for those willing to integrate Efficiency

Simplicity

18

Page 19: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

Q & A

Free SDM download - www.plexistor.com/download/

White papers - www.plexistor.com/resources/

Blog - www.plexistor.com/blog/

[email protected]

19

Page 20: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

HIGH AVAILABILITY - CLARIFICATION

Almost zero-latency added for having a 2nd copy, providing that high-speed RDMA network is in place

Public cloud deployments – Keep using your current HA strategy

On premise deployments – Can substitute most copies with storage redundancy

App server 1

Plexistor SDM

App server 2

Plexistor SDM

App server N

Plexistor SDM

High-speed

RDMA

Open

Brick 1

Open

Brick M

20

Page 21: IMC Summit 2016 Breakout - Amit Golander - The Benefits of Memory and Storage Convergence to In-Memory Computing

SDM VS. XFS-DAX VS. NVML - CLARIFICATION

Plexistor ext4/xfs

DAX NVML

Scale Out Application

Auto Tiering Application

Snapshots/Clones Application

Legacy Applications

NVML support

High availability Application

IT policy hooks

DRAM/PM

Memory Path

Virtual Memory POSIX

FS w/ DAX support*

Linux

App using

mmap

App using

NVML

(*) Who supports DAX: - Plexistor SDM

- Linux xfs-dax, and ext4-dax (WIP)

- MS ReFS-dax (WIP) 21