efficient memory disaggregation with infiniswap memory disaggregation with infiniswap juncheng gu,...

65
Efficient Memory Disaggregation with Infiniswap Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, Kang G. Shin

Upload: vodang

Post on 14-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Efficient Memory Disaggregation with Infiniswap

Juncheng Gu, Youngmoon Lee, Yiwen Zhang,Mosharaf Chowdhury, Kang G. Shin

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 1

23/30/17

Memory-intensive applications

33/30/17

Memory-intensive applications

3/30/17 4

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 5

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 6

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 7

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 8

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 9

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 10

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 11

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 12

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 13

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

Memory overestimation

3/30/17 14

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

Time (days)

3/30/17 15

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

Time (days)

3/30/17 16

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

0.5

Time (days)

3/30/17 17

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

0.5≈30%

Time (days)

3/30/17 18

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

0.5≈30%

Time (days)Can we utilize this memory?

3/30/17 19

Machine 2

Used memory Free memory Remote memory

Machine 3 Machine 4 Machine N

Machine 1

3/30/17 20

Disaggregate free memory

Machine 2

Used memory Free memory Remote memory

Machine 3 Machine 4 Machine N

Machine 1

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

3/30/17 21

Disaggregate free memory

Machine 2

Used memory Free memory Remote memory

Machine 3 Machine 4 Machine N

Machine 1

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

3/30/17 22

What are the challenges?

• Minimize deployment overhead• No hardware design• No application modification

• Tolerate failures• e.g. network disconnection, machine crash

• Manage remote memory at scale

No HW design No appmodification

Fault-tolerance Scalability

Memory Blade[ISCA’09]

HPBD[CLUSTER’05] / NBDX[1]

RDMA key-value service(e.g. HERD[SIGCOMM’14], FaRM[NSDI’14])

Intel Rack Scale Architecture(RSA)[2]

Infiniswap

3/30/17 23

Recent work on memory disaggregation

[1] https://github.com/accelio/NBDX[2] http://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 24

3/30/17 25

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

3/30/17 26

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

Infiniswap Block Device• Swap space• Request router

3/30/17 27

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

Local disk• [ASYNC] backup swapped-out

data• Tolerate remote memory

failure

3/30/17 28

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

Infiniswap Deamon• Local memory region• Remote memory service

3/30/17 29

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

RDMA • One-sided operations• Bypass remote CPU

Objectives Ideas

No hardware designRemote paging

No application modification

Fault-tolerance Local backup disk

Scalability Decentralized remote memory management

3/30/17 30

How to meet the design objectives?

3/30/17 31

One-to-many

Application1 Application2

Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 1 Machine 2

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 3

RNIC

Local Disk

User Space

Kernel Space

Async Sync

3/30/17 32

Many-to-many

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 1 Machine 2

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 3

RNIC

Application1 Application2 User Space

Kernel SpaceVirtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

Machine 4

Local Disk Local Disk

Async Sync AsyncSync

3/30/17 33

Many-to-many

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 1 Machine 2

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 3

RNIC

Application1 Application2 User Space

Kernel SpaceVirtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

Machine 4

Local Disk Local Disk

Async Sync AsyncSync

How to scale remote memory?

• How to find remote memory in the cluster?• Which remote mapping should be evicted?

Objectives Ideas

No hardware designRemote paging

No application modification

Fault-tolerance Local backup disk

Scalability Decentralized remote memory management

3/30/17 34

How to meet the design objectives?

3/30/17 35

Management unit: memory page?

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 36

Management unit: memory page?

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Local Page Remote Pagep100 <s1, p1>

1GB = 256K entries1GB = 256K RTTs

3/30/17 37

Management unit: memory slab!

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 38

Management unit: memory slab!

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 39

Which remote machine should be selected?

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 40

Which remote machine should be selected?

Goal: balance memory utilization

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 41

Which remote machine should be selected?

Ø Central controller

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 42

Which remote machine should be selected?

Ø Central controller

Ø Decentralized approach

3/30/17 43

[1]Power of two choices[1]

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996

3/30/17 44

[1]Power of two choices[1]

[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 45

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

3/30/17 46

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

3/30/17 47

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

3/30/17 48

Which slab should be evicted?

Daemon: Does not know the swap activities

Infiniswap Daemon

1 2 3 4

3/30/17 49

Daemon: Too expensive to query all the slabs

Infiniswap Daemon

1 2 3 4

Which slab should be evicted?

Infiniswap Daemon

1 2 3 4

3/30/17 50

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

3/30/17 51

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

Infiniswap Daemon

1 2 3 4

3/30/17 52

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

Infiniswap Daemon

1 2 4

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 53

3/30/17 54

Implementation

• Connection Management• One RDMA connection per active block device - daemon pair

• Control Plane• SEND, RECV

• Data Plane• One-sided RDMA READ, WRITE

Kernel Space

InfiniswapBlock Device

User Space

InfiniswapDaemon

RDMA

3/30/17 55

What are we expecting from Infiniswap?

§ Application performance

§ Cluster memory utilization

§ Network usage

§ Eviction overhead

§ Fault-tolerance overhead

§ Performance as a block device

3/30/17 56

Evaluation

2 x 8 cores (32 vcores)64GB DRAM56Gbps InfiniBand NIC

32-node cluster

InfiniBandNetwork

• 50% working sets in memory

3/30/17 57

Application performance

• Application performance is improved by 2-16x

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

• 50% working sets in memory

3/30/17 58

Application performance

• Application performance is improved by 2-16x

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

0.04 0.060.12

0.04

0.66

0.77

0.61

0.08

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Normalized

Perform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

• 50% working sets in memory

3/30/17 59

Application performance

• Application performance is improved by 2-16x

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

0.04 0.060.12

0.04

0.66

0.77

0.61

0.08

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Normalized

Perform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

• 90 containers (applications), mixing all applications and memory constraints.

3/30/17 60• Cluster memory utilization is improved from 40.8% to 60% (1.47x)

Cluster memory utilization

0

20

40

60

80

100

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Mem

oryU

tiliza

tion(%)

RankofMachines

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

AxisTitle

AxisTitle

ChartTitle

Infiniswapw/oInfiniswap

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 61

3/30/17 62

Limitations and future work• Trade-off in fault-tolerance

• Local disk is the bottleneck• Multiple remote replicas

• Fault-tolerance vs. space-efficiency

• Performance isolation among applications• W/o limitation on each application’s usage• W/o mapping between remote memory and applications

• Infiniswap: remote paging over RDMA• Application performance• Cluster memory utilization

3/30/17 63

Conclusion

• Efficient, practical memory disaggregation• No hardware design• No application modification• Fault-tolerance• Scalability

Source code is coming soon!https://github.com/Infiniswap/infiniswap.git

Thank You !

3/30/17 64