concord : easily exploiting memory content redundancy through the content-aware service command

56
ConCORD: Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command Lei Xia, Kyle Hale, Peter Dinda HPDC’14, Vancouver, Canda, June 23-27 Hobbes: http://xstack.sandia.gov/hobbes/

Upload: adam-carpenter

Post on 31-Dec-2015

20 views

Category:

Documents


0 download

DESCRIPTION

ConCORD : Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command. Lei Xia , Kyle Hale, Peter Dinda. Hobbes : http://xstack.sandia.gov/hobbes/. Overview. Claim : Memory content-sharing detection and tracking should be built as a s eparate service - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

ConCORD: Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

Lei Xia, Kyle Hale, Peter Dinda

HPDC’14, Vancouver, Canda, June 23-27

Hobbes: http://xstack.sandia.gov/hobbes/

Page 2: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

2

Overview• Claim: Memory content-sharing detection and

tracking should be built as a separate service– Exploiting memory content sharing in parallel

systems through content-aware services • Feasibility: Implementation of ConCORD: A

distributed system that tracks memory contents across collections of entities (vms/processes)

• Content-aware service command minimizes the effort to build various content-aware services

• Collective checkpoint service – Only ~200 line of code

Page 3: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

3

Outline• Content-sharing in scientific workloads– Content-aware services in HPC– Content-sharing tracking as a service

• Architecture of ConCORD – Implementation in brief

• Content-aware service command • Collective checkpoint on service command• Performance evaluation• Conclusion

Page 4: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

4

Content-based Memory Sharing

• Eliminate identical pages of memory across multiple VMs/processes

• Reduce memory footprint size in one physical machine

• Intra-node deduplication

[Barker-USENIX’12]

Page 5: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

5

Mol

dy.w

ater

.2M

oldy

.wat

er.4

Mol

dy.w

ater

.8M

oldy

.wat

er.1

2M

oldy

.qua

rtz.2

Mol

dy.q

uartz

.4M

oldy

.qua

rtz.8

Mol

dy.q

uartz

.12

Lam

mps

.2La

mm

ps.4

Lam

mps

.8La

mm

ps.1

2

HPCC

.12

NPB.

bt.B

.4NP

B.cg

.B.4

NPB.

cg.C

.4NP

B.ep

.C.4

NPB.

ep.D

.4NP

B.lu

.C.4

NPB.

sp.B

.4NP

B.sp

.C.4

2048

20480

204800

2048000 Total MemoryIntra-node DistinctInter/Intra Distinct

# of

mem

ory

page

sMemory Content Sharing is Common

in Scientific Workloads in Parallel Systems[previous work published at VTDC’12]

Page 6: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

6

Memory Content Sharing in Parallel Workloads [previous work published at VTDC’12]

• Both Intra-node and inter-node sharing is common in scientific workloads,

• Many have significant amount of inter-node content sharing beyond intra-node sharing

[A Case for Tracking and Exploiting Inter-node and Intra-node Memory Content Sharing in Virtualized Large-Scale Parallel Systems, VTDC’12]

Page 7: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

7

Content-aware Services in HPC• Many services in HPC systems can be simplified

and improved by leveraging the intra- /inter-node content sharing

• Content-aware service: service that can utilize memory content sharing to improve or simplify itself– Content-aware checkpointing

• Collectively checkpoint a set of related VMs/Processes

– Collective virtual machine co-migration• Collectively moving a set of related VMs

– Collective virtual machine reconstruction• Reconstruct/migrate a VM from multiple source VMs,

– Many other services ….

Page 8: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

8

A B C GA B C DFC A B EA C D E

Content-aware Collective Checkpointing

A B C D EP1 P2

FC A B EP3

GA B C D

Checkpoint

Page 9: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

9

A B C D E FC A B E GA B C D

C GA B C DFC A B EA B C D E

Content-aware Collective Checkpointing

A B C D EP1

Reduce checkpoint size by saving only one copy of each distinct content (block) across the all processes

P2

FC A B EP3

GA B C D

Checkpoint

Collective-checkpoint

CA BD E F G

Page 10: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

10

Collective VM Reconstruction

A B C D EVM-1Host-1

FVM-2

C A B EHost-2 Host-3

A B C D GVM-3

Host-4Single VM Migration

Page 11: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

11

Collective VM Reconstruction

A B C D EVM-1Host-1

FVM-2

C A B EHost-2 Host-3

A B C D GVM-3

Host-4Collective VM Reconstruction

A BD GC

Page 12: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

12

Collective VM Reconstruction

A B C D EVM-1Host-1

FVM-2

C A B EHost-2 Host-3

A B C D G

Host-4Collective VM Migration

A BD GC

A B C D GVM-3

Fasten VM migration by reconstructing its memory from multiple sources

Page 13: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

13

• We need to detect and track memory content sharing–Continuously tracking with system

running–Both intra-node and inter-node sharing–Scalable in large scale parallel systems

with minimal overhead

Content-sharing Detection and Tracking

Page 14: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

14

• Content sharing tracking should be factored into a separate service–Maintain and enhance a single

implementation of memory content tracking• Allow us to focus on developing an efficient and

effective tracking service itself–Avoid redundant content tracking overheads

when multiple services exist–Much easier to build content-aware services

with existing tracking service

Content-sharing Tracking As a Service

Page 15: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

15

• A distributed inter-node and intra-node memory content redundancy detection and tracking system– Continuously tracks all memory content sharing in

a distributed memory parallel system– Hash-based content detection• Each memory block is represented by a hash

value (content hash)• Two blocks with the same content hash are

considered as having same content

ConCORD: Overview

Page 16: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

16

ConCORD: System Architecture

Memory Content Update Interface

Content-sharing Query Interface

Content-aware Service Command Controller

ConCORD

Hypervisor (VMM)

VM

Memory Update Monitor

Content-awareService

ProcessMemory Update Monitor

Distributed Memory Content Tracer

Service Command Execution Engine

OS

ptraceinspect… … …node

node

nodes

Page 17: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

17

Distributed Memory Content Tracer

• Uses customized light-weight distributed hash table (DHT)– To track memory content sharing, and location of

contents in system-wide

Conventional DHT DHT in ConCORD

Target System Distributed Systems Large-scale Parallel Systems

Type of key variable length fixed length

Type of object variable size, variable format small size

Fault Tolerance Strict Loose

Persistency Yes No

Page 18: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

18

DHT in ConCORD• DHT Entry: <content-hash, Entity-List>• DHT content is split into partitions, and

distributed (stored and maintained) over the ConCORD instances

• Given a content hash, computing its partition and its responsible instance is fast and straightforward:– zero-hop – no peer information is needed

Page 19: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

19

• Examine memory content sharing and shared locations in system

• Node-wise queries–Given a content hash:• Find number of copies existing in a set of entities• Find the exact locations of these blocks

• Collective Queries– Degree of Sharing: Overall level of content

redundancy among a set of entities– Hot memory content: contents duplicate more than

k copies in a set of entities

Content-sharing Queries

Page 20: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

20

How can we build a content-aware service?

• Runs content-sharing queries inside a service–Uses sharing information to exploit content

sharing and improve service–Requires many effort from service developers • how efficiently and effectively utilize the

content sharing

Page 21: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

21

How can we build a content-aware service?

• Runs a service inside a collective query–ConCORD provides query template– Service developer defines a service by

parametering the query template –ConCORD executes the parameterized query

over all shared memory content• During run of the query, ConCORD completes the service

while utilizes memory content sharing in the system

–Minimize service developers’ effort

Page 22: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

22

• A service command is a parameterable query template

• Services built on top of it are automatically parallelized and executed by ConCORD– partitioning of the task – scheduling the subtasks to execute across nodes– managing all inter-node communication

Content-aware Service Command

Page 23: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

23

• ConCORD provides best-effort service– ConCORD DHT’s view of memory content may be

outdated• Application services require correctness– Ensure correctness using best-effort sharing

information

Challenge: Correctness vs. best-effort

Page 24: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

24

• Collective Phase:– Each node performs subtasks in parallel on locally

available memory blocks – Best-effort, using content tracked by ConCORD– Stale blocks are ignored– Driven by DHT for performance and efficiency

• Local Phase– Each node performs subtasks in parallel on

memory blocks ConCORD does know of– All fresh blocks are covered– Driven by local content for correctness

Service Command: Two Phase Execution

Page 25: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

25

Collective Checkpoint: Initial State

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

Memory content in processes

Page 26: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

26

Collective Checkpoint: Initial State

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

A B C D E

A BC E F

A B C D H

P-1

P-2

P-3

Memory content in processes Memory content in ConCORD’s view

Page 27: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

27

Collective Checkpoint: Initial State

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

A B C D E

A BC E F

A B C D H

P-1

P-2

P-3

Memory content in processes Memory content in ConCORD’s view

ConCORD’s DHT

Content Hash Process Map

A {p1, p2, p3}

B {p1, p2, p3}

C {p1, p2, p3}

D {p1, p3}

E {p1, p2}

F {p2}

H {p3}

Page 28: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

28

Collective Checkpoint: Collective Phase

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

ConCORDService ExecuteEngine

Page 29: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

29

Collective Checkpoint: Collective Phase

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

ConCORDService ExecuteEngine

Save {A,D}

Save {B, E, F}

Save {C,H}

Page 30: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

30

Collective Checkpoint: Collective Phase

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

ConCORDService ExecuteEngine

Page 31: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

31

Collective Checkpoint: Collective Phase

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

ConCORDService ExecuteEngine

{A,D} saved

{B, E, F} saved

{C} saved

Page 32: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

32

Collective Checkpoint: Collective Phase

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

ConCORDService ExecuteEngine

{A,D} saved

{B, E, F} saved

{C} saved

Completed: {A, B, C, D, E, F}

Page 33: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

33

Collective Checkpoint: Local Phase

A B C D E

A BC E F

A B C D G

P-1

P-2

P-3

ConCORDService ExecuteEngine

{A, B, C, D, E, F}

{A, B, C, D, E, F}

{A, B, C, D, E, F}

Local Phase: P-1: Do NothingP-2: Do NothingP-3: Save G

Page 34: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

34

• User-defined Service Specific Functions:– Function executed during collective phase:

For request content hash:If (a local block exists locally):

save the memory block into user-defined file

– Function executed during local phase:For each local memory block

if(block is not saved):saves the block to user-defined file

• Implementation: 220 lines of C code. (Code in Lei Xia’s Ph.D Thesis).

Example Service: Collective Checkpoint

Page 35: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

35

Performance Evaluation

• Service Command Framework– Use a service class with all empty methods (Null

Service Command)• Content-aware Collective Checkpoint • Testbed– IBM x335 Cluster (20 nodes)• Intel Xeon 2.0 GHz/1.5 GB RAM• 1 Gbps Ethernet NIC (1000BASE-T)

– HPC Cluster (500 nodes) • Two quadcore 2.4 GHz Intel Nehalem/48 GB RAM• InfiniBand DDR network

Page 36: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

36

256 512 1024 2048 4096 81920

500

1000

1500

2000

2500

3000

3500

4000

4500

Memory Size per process (6process, 6nodes)

Serv

ice

Tim

e (m

s)Null Service Command

Execution Time Linearly Increases with Total Memory Size

Execution time is linear with total process’ memory size

Page 37: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

37

1 2 4 8 120

100

200

300

400

500

600

700

800

Number of Nodes (1process/node, 1GB/process)

Serv

ice

Tim

e (m

s)Null Service Command

Execution Time Scales with Increasing Nodes

Page 38: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

38

Null Service CommandExecution Time Scales in Large Testbed

Page 39: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

39

1 2 4 6 8 12 160%

20%

40%

60%

80%

100% Raw-gzipConCORD

Number of Nodes (Moldy, 1process/node)

Com

pres

sion

Rati

o (%

)Checkpoint Size:

Runs application with plenty of inter-node content sharing, ConCORD achieves better compression ratio

Page 40: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

40

1 2 4 6 8 12 160%

20%

40%

60%

80%

100% Raw-gzipConCORDConCORD-gzip

Number of Nodes (Moldy, 1process/node)

Com

pres

sion

Rati

o

•Content-aware checkpointing achieves better compression than GZIP for applications with many inter-node content sharing

Checkpoint Size: ConCORD achieves better compression ratio

Page 41: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

41

1 2 4 8 12 16 201024

10240

102400

Raw-GzipConCORD-Checkpoint

Number of Nodes (1 process/node, 1 Gbytes/process, Moldy)

Chec

kpoi

nt T

ime

(ms)

Collective Checkpoint Checkpoint Time Scales with Increasing Nodes

Page 42: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

42

Collective Checkpoint Checkpoint Time Scales with Increasing Nodes

1 2 4 8 12 16 201024

10240

102400

Raw-GzipConCORD-CheckpointRaw-Chkpt

Number of Nodes (1 process/node, 1 Gbytes/process, Moldy)

Chec

kpoi

nt T

ime

(ms)

• Content-aware checkpointing scales well in increasing number of nodes.

•Content-aware checkpointing uses significantly less checkpoint time than memory dump+GZIP while achieving same or better compression ratio.

Page 43: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

43

Checkpoint Time Scales Well in Large Testbed

Page 44: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

44

Conclusion• Claim: Content-sharing tracking should be

factored out as a separate service• Feasibility: Implementation and evaluation of

ConCORD – A distributed system that tracks memory contents in

large-scale parallel systems• Content-aware service command minimizes the

effort to build content-aware services• Collective checkpoint service – Performs well – Only ~200 line of code

Page 45: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

45

• Lei Xia• [email protected]• http://www.lxia.net• http://www.v3vee.org • http://xstack.sandia.gov/hobbes/

Page 46: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

46

Backup Slides

Page 47: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

47

Page 48: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

48

Memory Update Monitor• Collects and monitors memory content updates

in each process/VM periodically– Tracks updated memory pages– Collects updated memory content– Populates memory content in ConCORD

• Maintains a map table from content hash to all local memory pages with corresponding content– Allows ConCORD to locate a memory content

block given a content hash

Page 49: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

49

Code Size of ConCORD

• ConCORD Total: 11326– Distributed content tracer: 5254– Service Command Execution Engine: 3022– Memory update monitor: 780– Service command execution agent: 1946

• Content-sharing query library: 978• Service command library: 1325• Service command terminal: 1826• Management panel: 1430

Page 50: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

50

ConCORD Service Daemon (xDaemon)

Update Interface

Control Interface

xCommandExecution Engine

Content Tracer(DHT)

Query Interface

xCommand Controller

Hash updates

Content queries

xCommand synchronization

Palacios Kernel Modules

xCommand VMM Execution Agent

Memory UpdateMonitor

Send hash updates

System Control

ConCORD Control

ConCORD-VMM interface

xCommand Controller

xCommand Synchronization

ConCORD running instances

Page 51: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

51

• Run-time Parameters– Service Virtual Machines (SVMs): VMs this service

is applied to– Participating Virtual Machines (PVMs): VMs that

can contribute to speed up the service– Service mode: Interactive vs. Batch mode– Timeout, Service data, Pause-VM

Service Command: Run-time Parameters

Page 52: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

52

1 2 4 6 8 12 160%

20%

40%

60%

80%

100%

RawRaw-gzipConCORDConCORD-gzipDoS

Number of Nodes (Moldy VM, 1VM/node)

Com

pres

sion

Rati

oCheckpoint: Significant Inter-node Sharing

Page 53: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

53

1 2 4 8 12 160%

20%

40%

60%

80%

100%

RawRaw-gzipConCORDConCORD-gzipDoS

Number of Nodes (HPCCG, 1VM/Node)

Com

pres

sion

Rati

oCheckpoint: Significant Intra-node Sharing

Page 54: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

54

1 2 4 8 12 1650%

60%

70%

80%

90%

100%

110%

RawConCORD

Number of Nodes (1VM/Node)

Com

pres

sion

Rati

oCheckpoint: Zero Sharing

In worst case with zero sharing, checkpoint generated by content-aware checkpointing is 3% larger than raw memory size

Page 55: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

55

• Communication failure–Message loss between• xDaemon and VM: UDP• xDaemon and library: reliable UDP

• xDaemon instance failure– Lost of effort done during collective phase

• Client library failure– Command is aborted

Service Command: Fault tolerance

Page 56: ConCORD :  Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command

56

Zero-hop DHT: Fault Tolerance

• Communication Failure– Update message loss between memory update

monitor and DHT instance is tolerable– Causes inaccuracy, which is ok

• xDaemon instance Failure– Let it fail, no replication in current implementation– Hash partitions on the instance is lost– Assume the failed instance is coming back soon (in the

same or different physical node)– Lost content hashes on that instance will eventually be

added again– Causes inaccuracy, which is ok