01/22/09icdcs20061 load unbalancing to improve performance under autocorrelated traffic ningfang mi...

27
01/22/09 ICDCS2006 1 Load Unbalancing to Load Unbalancing to Improve Performance Improve Performance under Autocorrelated under Autocorrelated Traffic Traffic Ningfang Mi Ningfang Mi College of William and Mary College of William and Mary Joint work with Qi Zhang Joint work with Qi Zhang College of William and Mary College of William and Mary Alma Riska Alma Riska Seagate Research Seagate Research Evgenia Smirni Evgenia Smirni College of William and Mary College of William and Mary

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 1

Load Unbalancing to Load Unbalancing to Improve Performance under Improve Performance under

Autocorrelated TrafficAutocorrelated TrafficNingfang Mi Ningfang Mi College of William and MaryCollege of William and Mary

Joint work with Qi Zhang Joint work with Qi Zhang College of William and MaryCollege of William and Mary

Alma Riska Alma Riska Seagate ResearchSeagate Research

Evgenia Smirni Evgenia Smirni College of William and MaryCollege of William and Mary

Page 2: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 2

OutlineOutline

MotivationMotivation Our solutionOur solution Conclusion and future workConclusion and future work

Page 3: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 3

Clustered ServersClustered Servers

Front-end Dispatche

rBack-end

Nodes

Load Balancin

g

Heavy tailed service time

Round Robin (RR)Round Robin (RR) RandomRandom Join Shortest Queue (JSQ)Join Shortest Queue (JSQ) Join Shortest Weighted Join Shortest Weighted

Queue (JSWQ)Queue (JSWQ) AdaptLoadAdaptLoad

Autocorrelated Interarrival time

Performanc

e?

Page 4: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 4

Why Considering Dependence?Why Considering Dependence? BADBAD performance effect performance effect

Higher ACF, higher dependence, worse performance

- 0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350 400 450 500

lag (k)

Aut

ocor

rela

tion

Fun

ctio

n (A

CF)

NOACF

SRD

LRD

- 500

0

500

1000

1500

2000

2500

3000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Utilization

Resp

onse

Tim

e

NOACF

SRD

LRD

Higher ACF, higher dependence

Page 5: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 5

Effect of ACF on Load BalancingEffect of ACF on Load Balancing

1

10

100

1000

10000

NOACF SRD LRD

Response Time

AdaptLoad JSWQ JSQ RR

Size-based Policies do NOT

win!

WHY?

Page 6: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 6

SRD

- 0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350 400 450 500

lag (k)

ACF

Original stream

server 1

server 2

server 3

server 4

Possible Reason ...Possible Reason ...What is ACF in Each Node?What is ACF in Each Node?

Load plus ACF.

Load+ACF

Page 7: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 7

OutlineOutline

MotivationMotivation Our solutionOur solution Conclusion and future workConclusion and future work

Page 8: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 8

SolutionSolution

EqAL (EqAL (EqEqually distribute work ually distribute work guided by guided by AAutocorrelation and utocorrelation and LLoad)oad)

Balancing load Balancing load AdaptLoadAdaptLoad

Balancing ACFBalancing ACF Move jobs from strongly correlated node to Move jobs from strongly correlated node to

weakly correlated nodeweakly correlated node

Page 9: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 9

Review: AdaptLoadReview: AdaptLoad

Each node only serves request with Each node only serves request with size falling in certain rangesize falling in certain range [s[s000, s0, s11), [s), [s11, s, s22), … [s), … [sN-1N-1, s, sNN∞)∞)

Self-adjust the size ranges by Self-adjust the size ranges by predicting the incoming workload predicting the incoming workload based on the histogram of previous based on the histogram of previous requestsrequests

Page 10: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 10

Review: AdaptLoadReview: AdaptLoad

0

50

100

150

200

250

request size bin

tota

l si

zeStep 1: Build histogram on-line

e.g., request sizes in sequential: 25 100 57 34 22 9 210 …

Page 11: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 11

Review: AdaptLoadReview: AdaptLoadStep 1: Build histogram on-line

e.g., request sizes in sequential: 25 100 57 34 22 9 210 …

0

10000

20000

30000

40000

50000

60000

70000

request size bin

tota

l s

ize

Page 12: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 12

Review: AdaptLoadReview: AdaptLoad

0

10000

20000

30000

40000

50000

60000

70000

request size bin

tota

l s

ize

Step 1: Build histogram on-line

Step 2: At the end of monitoring window, find the boundaries to partition the total work (area) equally

Server 1

Server 2

Server 3 Server 4

s0 s1 s2 s3

s4

Page 13: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 13

S_EQALS_EQAL Server i increase pi of its work

Corrective factor pi : ∑ pi = 0 negative (reducing work) vs. positive (increasing work) p1 =-R (pre-determined corrective constant) pi using semi-geometric method to decide

0

10000

20000

30000

40000

50000

60000

70000

request size bin

tota

l s

ize

Server 1

Server 2

Server 3 Server 4

Server 1

Server 2

Server 3 Server 4

Page 14: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 14

Performance of S_EQALPerformance of S_EQAL

Service time: WorldCup 1998 TraceService time: WorldCup 1998 Trace Inter-arrival time: MMPP(2)Inter-arrival time: MMPP(2)

Same statistics moments as WorldCupSame statistics moments as WorldCup With short range dependence (SRD)With short range dependence (SRD)

4 servers in the cluster4 servers in the cluster Average utilization per server: 62%Average utilization per server: 62%

Page 15: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 15

Average Slowdown by Average Slowdown by RR

0

1000

2000

3000

4000

5000

6000

7000S

low

do

wn

0 10 20 30 40 50 60 70 80 90

R (%)

Best Slowdown

Page 16: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 16

Average Response Time by Average Response Time by RR

0

2000

4000

6000

8000

10000

12000R

esp

on

se t

ime

0 10 20 30 40 50 60 70 80 90

R (%)

Best Response Time

How to get optimal R ?

Page 17: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 17

Dynamic Policy: D_EQALDynamic Policy: D_EQAL

Self adjustSelf adjust R R RR is initialized as 0 Adjust RR for a small value AdjAdj at the end of each

monitoring window The adjustment should improve both slowdown

and response time If not, wrong direction

Recalculate ppii

Set size boundaries

Page 18: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 18

Performance of D_EQALPerformance of D_EQAL

0

1000

2000

3000

4000

5000

6000

7000

Slo

wdo

wn

0 10 20 30 50 70 90 D_EQAL

R (%)

0

2000

4000

6000

8000

10000

12000

Res

pons

e tim

e

0 10 30 50 70 90 D_EQAL

R (%)

Page 19: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 19

Effectiveness of D_EQALEffectiveness of D_EQAL

0

10

20

30

40

50

60

70

0 100 200 300 400 500 600 700 800 900 1000

Monitoring windows

R (

%)

Page 20: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 20

OutlineOutline

MotivationMotivation Our solutionOur solution Conclusion and future workConclusion and future work

Page 21: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 21

Conclusion and Future WorkConclusion and Future Work

Load balancing policy should also consider Load balancing policy should also consider dependence structure in traffic.dependence structure in traffic.

D_EQAL balances the load and correlationD_EQAL balances the load and correlation Self-adaptive Self-adaptive effectiveeffective

Future workFuture work More adaptive -- detect the change of More adaptive -- detect the change of

dependence structuredependence structure Multiple classes - consider different priority Multiple classes - consider different priority

Page 22: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 22

Thank you !

Questions?

Page 23: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 23

Why Considering Dependence?Why Considering Dependence? BADBAD performance effect performance effect Metric: Autocorrelation function (ACF) Metric: Autocorrelation function (ACF)

Inter-arrival time of the Inter-arrival time of the iithth request: request: XXii

The correlation between inter-arrival times The correlation between inter-arrival times with lag with lag kk

corr [ X t , X tk ]=E [ X t− E [ X ] X tk −E [ X ] ]Var [ X ]

x0 x1 x2 x3 x4 x5 x6 x7 x8 lag(1)

lag(2)Higher ACF, higher dependence, worse performance

Page 24: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 24

Examples of ACFExamples of ACF

- 0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350 400 450 500

lag (k)

ACF

NOACF

SRD

LRD

Page 25: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 25

Effect of ACF on a Single ServerEffect of ACF on a Single Server

- 500

0

500

1000

1500

2000

2500

3000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Utilization (%)

Res

pons

e T

ime

NOACF

SRD

LRD

0

5000

10000

15000

20000

25000

30000

35000

40000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Utilization (%)Q

ueue

Len

gth

NOACF

SRD

LRD

Page 26: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 27

Inside Each ServerInside Each Server

0

20

40

60

80

100

Uti

liza

tio

n (

%)

0 10 30 50 70 90

R (%)

Server 1 Server 2 Server 3 Server 4

Page 27: 01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang

01/22/09 ICDCS2006 28

1

10

100

1000

10000

100000

1000000R

esp

on

se t

ime

0 10 30 50 70 90

R (%)

Server 1 Server 2 Server 3 Server 4

0102030405060708090

100

Util

izat

ion

(%)

0 10 30 50 70 90

R (%)

Inside Each ServerInside Each Server

Too Bias

How to get optimal R ?