ascar: automating contention management for high

Automating Contention Management for High-Performance Storage Systems

(MSST ’15, June 5th, 2015)

Yan Li, Xiaoyuan Lu, Ethan Miller, Darrell Long

Storage Systems Research Center (SSRC) University of California, Santa Cruz

ASCAR

ASCAR

Contention in modern shared-storage system

2

Client A

Client B

Client C

Client D

Client E Storage devices

Network switch

Network switch

Storage server

Storage server

Storage server

Storage server

ch

S

tch S

SS

ASCAR


3

Client A

Client B

Client C

Client D


Network switch

Network switch

Storage server

Storage server

Storage server

Storage server

ch S

ch

S

ASCAR


4

Client A

Client B

Client C

Client D


Storage server

Storage server

Storage server

Storage server

Network switch

Network switch

Sch

ASCAR


5

Client A

Client B

Client C

Client D


Storage server

Storage server

Storage server

Storage server

Network switch

Network switch

Sh ch

ASCAR

Contention harms efficiency

6

100 clients

150 clients

200 clients

tota

l thr

ough

put

ASCAR

0 5 10 15 20 25Time (second)

0

10

20

30

40

50

60

70

80

Thr

ough

put(

MB

/s)

Client Nodes Throughput

Node 1Node 2Node 3Node 4Node 5

Contention causes fluctuation

7

 Client throughput of a random write workload

 5 nodes accessing 5 servers

 High temporal and spatial I/O speed fluctuation

ASCAR

This is what’s happening at each server node

8

Picture credit: https://www.checkfelix.com/reiseblog/10-argumente-urlaub-schottland/

ASCAR

Limiting client I/O rates can improve performance

9

100 clients

150 clients

200 clients

… if done properly to

tal t

hrou

ghpu

t

200 clients with

contention control

ASCAR 10

200 clients

200 clients with

contention control

Challenges of distributed I/O rate control: 1. capability discovery

nonoptimal rate limit

ASCAR 11

200 clients

200 clients with

contention control



ASCAR 12

200 clients

200 clients with

contention control



ASCAR 13

200 clients

200 clients with

contention control


optimal rate limit

ASCAR 14

200 clients

200 clients with

contention control


 Capability discovery usually involves communication:

•  between clients

•  with a central controller

ASCAR 15

Challenges of distributed I/O rate control: 2. scalability

 Capability discovery usually involves communication:

•  between clients

•  with a central controller

 Intra-node communication grows at O(n2)

 Adds overhead to congested network

 Low responsiveness for highly dynamic workload

ASCAR

Our goal

 Reduce congestion and improve overall system efficiency

 General purpose

 Scalability

 Reduce human involvement

 Reduce management cost

16

ASCAR

Target environments

 High Performance Computing  Homogeneous I/O among clients; bandwidth sensitive; requires

fair bandwidth allocation

 Virtual Machines accessing a shared image  Performance isolation

 Enterprise computing  Interactive workloads, backup, indexing, data sharing

 Future data center-wise storage system that serves multiple customers

 Performance isolation

17

ASCAR

Core ideas

 Client-side rule-based I/O rate control  1. no need for central scheduling or coordination, nimble and

highly responsive

 2. no need to change server software or hardware

 3. no scale-up bottleneck

 Use machine learning and heuristics for rule generation and optimization

 no prior knowledge of the system or workload is required

18

ASCAR

Components of the ASCAR prototype

19

Legends: Traffic rules

ASCAR traffic controller

Client node

Application

File system client

Storage servers

ASCAR Rule Manager Workload

Character & Rule Set Database

Workload Classifier

Rule Generator & Optimizer (SHARP) traffic

rules

Workload character markers

ASCAR



Client node

Application

File system client

Storage servers



Workload Classifier


rules



20

ASCAR



Client node

Application

File system client

Storage servers



Workload Classifier


rules



21

rau



Workload Classifier

Rule Generator & Optimizer (SHARP) tr

ru


ASCAR

Rule-based Contention Control

 Rules tell the controller how to react to congestions  shrink the congestion window if congestion is building up  enlarge the congestion window if speed is increasing

 Each rule maps a congestion state to an action   (Congestion State (CS) statistics) → <action>

 Each client tracks three congestion state statistics   (ack_ewma, send_ewma, pt_ratio)

 An action describes how to change the congestion window and rate limit

 <m, b, τ>  new_cong_window = m × cong_window + b

22

ASCAR

Rule set

 A rule set contains many rules

 A rule set covers the whole congestion state space

23

ASCAR

Rule set

24

Action <m, b, τ>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

ASCAR

Generating a rule set for a certain application

 Extract a short signature workload that covers the important features of the application’s I/O workload

 a signature workload is usually 20 ~ 30 seconds long

 Generate candidate rules and benchmark them with the signature workload on the real system

 test possible combinations of action variables

 Split the hottest rule in the set to generate more rules

25

ASCAR

Begin with one rule: the whole state space maps to one action

26

Action <m, b, τ>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

ASCAR

Try different values of <m, b, τ> with the workload

27

Try the Cartesian product of:

{m ± 0.01, m ± 0.02, m ± 0.04, … } ×

{b ± 0.01, b ± 0.02, b ± 0.04, … } ×

{ τ ± 1, τ ± 2, τ ± 4, … }

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

ASCAR

Find the rule that yields highest performance

28

Action <0.9, 1, 121>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

ASCAR

Split the state space at the most observed state values

29

Action <0.9, 1, 121>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <0.9, 1, 121>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

ASCAR

Run the workload, find out the rules that was triggered most often

30

Action <0.9, 1, 121>

used 32k times

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <0.9, 1, 121>

used 0.3k times

0.98312s

1.8

Action <0.9, 1, 121> used 2k times

Action <0.9, 1, 121>

used 120 times

ASCAR

Improve the most used rules by sweeping all possible values of <m, b, τ>

31

Try the Cartesian product of:

{m ± 0.01, m ± 0.02, m ± 0.04, … } ×

{b ± 0.01, b ± 0.02, b ± 0.04, … } ×

{ τ ± 1, τ ± 2, τ ± 4, … }

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <0.9, 1, 121>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

ASCAR

Find the action that works best

32

Action <1.1, 1.5, 80>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <0.9, 1, 121>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

ASCAR

Split the most used rule’s state space at the most observed state values

33

Action <1.1, 1.5, 80>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <0.9, 1, 121>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

ASCAR

Run workload, find the most used rule, and improve it

34

Action <1.1, 1.5, 80>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <0.5, -1, 30>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

ASCAR

After find the best action, split the most used rule

35

Action <1.1, 1.5, 80>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <0.5, -1,

30>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <0.5, -1, 30>

Action <0.5, -1,

30>

Action <0.5, -1, 30>

ASCAR

Repeat this process

36

Action <1.1, 1.5, 80>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <1, 2, 10>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <0.5, -1, 30>

Action <0.5, -1,

30>

Action <0.5, -1, 30>

Action <2, 1, 10>

Action <1.1, 1.5, 80>

Action <1.1, 1.5,

80>

ASCAR

Until either we are satisfied with the results or failed to find a good rule

37

Action <1.1, 1.5, 80>

ack_ewma

pt_r

aio

0,0 INT_MAX

INT_

MAX

Action <1, 2, 10>

0.98312s

1.8

Action <0.9, 1, 121>

Action <0.9, 1, 121>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <1.1, 1.5, 80>

Action <0.5, -1, 30>

Action <0.5, -1,

30>

Action <0.5, -1,

30>

Action <2, 1, 10>

Action <1.1, 1.5, 80>

Action <1.1, 1.5,

80>

Action <0.5, -1,

30>

Action <0.5, -1, 30>

Action <0.5, -1, 30>

ASCAR

Prototype and Evaluation

 An ASCAR prototype works with Lustre  Lustre is a popular high-performance file system used in HPC

 Client controller works within the Lustre client  Highly responsive and low overhead

 Hardware: 5 servers, 5 clients  Intel Xeon CPU E3-1230 V2 @ 3.30GHz, 16 GB RAM,

Intel 330 SSD for the OS, dedicated 7200 RPM HGST Travelstar Z7K500 hard drive for Lustre, Gigabit Ethernet

38

ASCAR

0 5 10 15 20 25Time (second)

0

10

20

30

40

50

60

70

80

Thr

ough

put(

MB

/s)



0 5 10 15 20 25Time (second)

0

10

20

30

40

50

60

70

80

Thr

ough

put(

MB

/s)



ASCAR is good at increasing throughout and decreasing speed variance

39

Without ASCAR

Node 1

With ASCAR

ASCAR 40

Seq. read

Seq. write

Random read

Random write

Random r/w

BTIO (B4)

BTIO (C4)

Random write BTIO (B4) BTIO (C4)

-60.00%

-50.00%

-40.00%

-30.00%

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%

Spee

d va

rianc

e ch

ange

Throughput improvement

Workload Throughput Improvements

TP only

TP and Var

ASCAR

Seq. read

Seq. write

Random read

Random write

Random r/w

BTIO (B4)

BTIO (C4)


-60.00%

-50.00%

-40.00%

-30.00%

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%

Spee

d va

rianc

e ch

ange



TP only

TP and Var

41

% 15.00% 20.00% 25.00% 30.00% 35.00% 00% 40.0

Higher throughput

ASCAR

Seq. read

Seq. write

Random read

Random write

Random r/w

BTIO (B4)

BTIO (C4)


-60.00%

-50.00%

-40.00%

-30.00%

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%

Spee

d va

rianc

e ch

ange



TP only

TP and Var

42

m read

Th

Lower speed variance

ASCAR

Seq. read

Seq. write

Random read

Random write

Random r/w

BTIO (B4)

BTIO (C4)


-60.00%

-50.00%

-40.00%

-30.00%

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%

Spee

d va

rianc

e ch

ange



TP only

TP and Var

43

Random writeRandom write

BTIO (B4) BTIO (B4)

BTIO (C4)

ASCAR

Seq. read

Seq. write

Random read

Random write

Random r/w

BTIO (B4)

BTIO (C4)


-60.00%

-50.00%

-40.00%

-30.00%

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%

Spee

d va

rianc

e ch

ange



TP only

TP and Var

44

writewrite

BTIO (B4) BTIO (B4)

BTIO (C4)

35.....00000000% 33333333333000000000.0

Random wRandom w

20.00% 222222222225.0

ASCAR

Our prototype shows that …

 ASCAR increases the performance of all workloads  2% to 36%

 ASCAR works best to boost write-heavy workloads  25% to 36%

 and it can lower speed variance at the same time  by more than 50%

45

ASCAR

Handling a new workload without offline training

 Measure the workload’s features

 Compare features with known workloads

 Find the most similar known workload and use its contention control rules

46

ASCAR



Client node

Application

File system client

Storage servers



Workload Classifier


rules



47

Workload Character

& Rule Set Database

Workload Classifier

ASCAR

Measuring workloads’ pressure on the system

 Workloads can have radically different pressure on the underlying system

 And they require different contention management rules

 The combined workload from many applications is a mix of read/write and random/sequential

48

ASCAR

A 75% sequential + 25% random workload can be very different from another

Request p

Request p + 120

Request p + 100

Request p + 101

Request p + 121

Request p + 122

Request p + 200

Request p + 201

Request p

Request p + 3

Request p + 1

Request p + 2

Request p + 4

Request p + 100

Request p + 200

Request p + 351

Random requests

ASCAR

And there are many different 60% read + 40% write workloads out there

Read Write

Read Write Read

Read Read Write Write Write Read Read

ASCAR

The feature set we are using now

 Op type: read/write/metadata

 Ratios between ops (read to write, read/write to metadata, etc.)

 For each type of op, we measure the following features: 1. average size of sequential ops 2. location between seq. ops: random or sequential 3. average temporal gaps between seq. ops

ASCAR

Sample: Different 60% read + 40% write workloads

Read Write

Read Read Write Write Write Read Read

Read to write: 60/40 Avg. size of sequential read: 60 MB Avg. size of sequential write: 40 MB

Read to write: 60/40 Avg. size of sequential read: 15 MB Avg. size of sequential write: 13 MB

ASCAR

Calculating similarity of workloads

 Goal: to determine if they can use the same contention control rules

 How: start from a known workload and tweak its features, measuring the efficiency of existing rules on the changed workload

 Result: we used the results of hundreds of benchmarks to generate a decision tree

53

ASCAR


54

rw_ratio

read_size Linear model 1

Linear model 2

Linear model 3

<=7 >7

<=75 >75

ASCAR

rw_ratio

read_size Linear model 1

Linear model 2

Linear model 3

<=7 >7

<=75 >75


55

< 7 7

rw_ratio

read_size

ASCAR

This decision tree can precisely predict the performance of using a specific rule set with a new workload

 Correlation coefficient: 0.8676

 Mean absolute error: 0.038

 Root mean squared error: 0.0462

56

ASCAR

Conclusion: ASCAR …

 Does not require knowledge of the system or workloads

 fully unsupervised

 Only needs client-side controllers  no need to change hardware/application/network/server software

 Can improve highly changeable workloads  like burst I/O

 Work-conserving  no wasted bandwidth

57

ASCAR

Conclusion: ASCAR …

 we distilled feature set and method for calculating the similarity between workload in terms of applying contention control rules

 source code of our prototype is published as an open source project

 http://www.ssrc.ucsc.edu/ascar.html

58

ASCAR

Future work: online rule optimization

 Current ASCAR prototype requires a lengthy offline learning process

 Use cloud computing services for doing evaluation on a larger scale

 Online tweaking of rules using random-restart hill climbing

 Also need to evaluate the ASCAR algorithm on other workloads: database, web services

59

ASCAR

Acknowledgments

 This research was supported in part by the National Science Foundation under awards IIP-1266400, CCF-1219163, CNS- 1018928, the Department of Energy under award DE-FC02- 10ER26017/DESC0005417, Symantec Graduate Fellowship, and industrial members of the Center for Research in Storage Systems. We would like to thank the sponsors of the Storage Systems Research Center (SSRC), including Avago Technologies, Center for Information Technology Research in the Interest of Society (CITRIS of UC Santa Cruz), Department of Energy/Office of Science, EMC, Hewlett Packard Laboratories, Intel Corporation, National Science Foundation, NetApp, Sandisk, Seagate Technology, Symantec, and Toshiba for their generous support.

60

ASCAR

Thanks!

 ASCAR project: http://www.ssrc.ucsc.edu/ascar.html

 Contact: Yan Li <[email protected]>

61

ascar: automating contention management for high

Documents