01/22/09icdcs20061 load unbalancing to improve performance under autocorrelated traffic ningfang mi...

Post on 21-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

01/22/09 ICDCS2006 1

Load Unbalancing to Load Unbalancing to Improve Performance under Improve Performance under

Autocorrelated TrafficAutocorrelated TrafficNingfang Mi Ningfang Mi College of William and MaryCollege of William and Mary

Joint work with Qi Zhang Joint work with Qi Zhang College of William and MaryCollege of William and Mary

Alma Riska Alma Riska Seagate ResearchSeagate Research

Evgenia Smirni Evgenia Smirni College of William and MaryCollege of William and Mary

01/22/09 ICDCS2006 2

OutlineOutline

MotivationMotivation Our solutionOur solution Conclusion and future workConclusion and future work

01/22/09 ICDCS2006 3

Clustered ServersClustered Servers

Front-end Dispatche

rBack-end

Nodes

Load Balancin

g

Heavy tailed service time

Round Robin (RR)Round Robin (RR) RandomRandom Join Shortest Queue (JSQ)Join Shortest Queue (JSQ) Join Shortest Weighted Join Shortest Weighted

Queue (JSWQ)Queue (JSWQ) AdaptLoadAdaptLoad

Autocorrelated Interarrival time

Performanc

e?

01/22/09 ICDCS2006 4

Why Considering Dependence?Why Considering Dependence? BADBAD performance effect performance effect

Higher ACF, higher dependence, worse performance

- 0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350 400 450 500

lag (k)

Aut

ocor

rela

tion

Fun

ctio

n (A

CF)

NOACF

SRD

LRD

- 500

0

500

1000

1500

2000

2500

3000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Utilization

Resp

onse

Tim

e

NOACF

SRD

LRD

Higher ACF, higher dependence

01/22/09 ICDCS2006 5

Effect of ACF on Load BalancingEffect of ACF on Load Balancing

1

10

100

1000

10000

NOACF SRD LRD

Response Time

AdaptLoad JSWQ JSQ RR

Size-based Policies do NOT

win!

WHY?

01/22/09 ICDCS2006 6

SRD

- 0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350 400 450 500

lag (k)

ACF

Original stream

server 1

server 2

server 3

server 4

Possible Reason ...Possible Reason ...What is ACF in Each Node?What is ACF in Each Node?

Load plus ACF.

Load+ACF

01/22/09 ICDCS2006 7

OutlineOutline

MotivationMotivation Our solutionOur solution Conclusion and future workConclusion and future work

01/22/09 ICDCS2006 8

SolutionSolution

EqAL (EqAL (EqEqually distribute work ually distribute work guided by guided by AAutocorrelation and utocorrelation and LLoad)oad)

Balancing load Balancing load AdaptLoadAdaptLoad

Balancing ACFBalancing ACF Move jobs from strongly correlated node to Move jobs from strongly correlated node to

weakly correlated nodeweakly correlated node

01/22/09 ICDCS2006 9

Review: AdaptLoadReview: AdaptLoad

Each node only serves request with Each node only serves request with size falling in certain rangesize falling in certain range [s[s000, s0, s11), [s), [s11, s, s22), … [s), … [sN-1N-1, s, sNN∞)∞)

Self-adjust the size ranges by Self-adjust the size ranges by predicting the incoming workload predicting the incoming workload based on the histogram of previous based on the histogram of previous requestsrequests

01/22/09 ICDCS2006 10

Review: AdaptLoadReview: AdaptLoad

0

50

100

150

200

250

request size bin

tota

l si

zeStep 1: Build histogram on-line

e.g., request sizes in sequential: 25 100 57 34 22 9 210 …

01/22/09 ICDCS2006 11

Review: AdaptLoadReview: AdaptLoadStep 1: Build histogram on-line

e.g., request sizes in sequential: 25 100 57 34 22 9 210 …

0

10000

20000

30000

40000

50000

60000

70000

request size bin

tota

l s

ize

01/22/09 ICDCS2006 12

Review: AdaptLoadReview: AdaptLoad

0

10000

20000

30000

40000

50000

60000

70000

request size bin

tota

l s

ize

Step 1: Build histogram on-line

Step 2: At the end of monitoring window, find the boundaries to partition the total work (area) equally

Server 1

Server 2

Server 3 Server 4

s0 s1 s2 s3

s4

01/22/09 ICDCS2006 13

S_EQALS_EQAL Server i increase pi of its work

Corrective factor pi : ∑ pi = 0 negative (reducing work) vs. positive (increasing work) p1 =-R (pre-determined corrective constant) pi using semi-geometric method to decide

0

10000

20000

30000

40000

50000

60000

70000

request size bin

tota

l s

ize

Server 1

Server 2

Server 3 Server 4

Server 1

Server 2

Server 3 Server 4

01/22/09 ICDCS2006 14

Performance of S_EQALPerformance of S_EQAL

Service time: WorldCup 1998 TraceService time: WorldCup 1998 Trace Inter-arrival time: MMPP(2)Inter-arrival time: MMPP(2)

Same statistics moments as WorldCupSame statistics moments as WorldCup With short range dependence (SRD)With short range dependence (SRD)

4 servers in the cluster4 servers in the cluster Average utilization per server: 62%Average utilization per server: 62%

01/22/09 ICDCS2006 15

Average Slowdown by Average Slowdown by RR

0

1000

2000

3000

4000

5000

6000

7000S

low

do

wn

0 10 20 30 40 50 60 70 80 90

R (%)

Best Slowdown

01/22/09 ICDCS2006 16

Average Response Time by Average Response Time by RR

0

2000

4000

6000

8000

10000

12000R

esp

on

se t

ime

0 10 20 30 40 50 60 70 80 90

R (%)

Best Response Time

How to get optimal R ?

01/22/09 ICDCS2006 17

Dynamic Policy: D_EQALDynamic Policy: D_EQAL

Self adjustSelf adjust R R RR is initialized as 0 Adjust RR for a small value AdjAdj at the end of each

monitoring window The adjustment should improve both slowdown

and response time If not, wrong direction

Recalculate ppii

Set size boundaries

01/22/09 ICDCS2006 18

Performance of D_EQALPerformance of D_EQAL

0

1000

2000

3000

4000

5000

6000

7000

Slo

wdo

wn

0 10 20 30 50 70 90 D_EQAL

R (%)

0

2000

4000

6000

8000

10000

12000

Res

pons

e tim

e

0 10 30 50 70 90 D_EQAL

R (%)

01/22/09 ICDCS2006 19

Effectiveness of D_EQALEffectiveness of D_EQAL

0

10

20

30

40

50

60

70

0 100 200 300 400 500 600 700 800 900 1000

Monitoring windows

R (

%)

01/22/09 ICDCS2006 20

OutlineOutline

MotivationMotivation Our solutionOur solution Conclusion and future workConclusion and future work

01/22/09 ICDCS2006 21

Conclusion and Future WorkConclusion and Future Work

Load balancing policy should also consider Load balancing policy should also consider dependence structure in traffic.dependence structure in traffic.

D_EQAL balances the load and correlationD_EQAL balances the load and correlation Self-adaptive Self-adaptive effectiveeffective

Future workFuture work More adaptive -- detect the change of More adaptive -- detect the change of

dependence structuredependence structure Multiple classes - consider different priority Multiple classes - consider different priority

01/22/09 ICDCS2006 22

Thank you !

Questions?

01/22/09 ICDCS2006 23

Why Considering Dependence?Why Considering Dependence? BADBAD performance effect performance effect Metric: Autocorrelation function (ACF) Metric: Autocorrelation function (ACF)

Inter-arrival time of the Inter-arrival time of the iithth request: request: XXii

The correlation between inter-arrival times The correlation between inter-arrival times with lag with lag kk

corr [ X t , X tk ]=E [ X t− E [ X ] X tk −E [ X ] ]Var [ X ]

x0 x1 x2 x3 x4 x5 x6 x7 x8 lag(1)

lag(2)Higher ACF, higher dependence, worse performance

01/22/09 ICDCS2006 24

Examples of ACFExamples of ACF

- 0.1

0

0.1

0.2

0.3

0.4

0.5

0 50 100 150 200 250 300 350 400 450 500

lag (k)

ACF

NOACF

SRD

LRD

01/22/09 ICDCS2006 25

Effect of ACF on a Single ServerEffect of ACF on a Single Server

- 500

0

500

1000

1500

2000

2500

3000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Utilization (%)

Res

pons

e T

ime

NOACF

SRD

LRD

0

5000

10000

15000

20000

25000

30000

35000

40000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Utilization (%)Q

ueue

Len

gth

NOACF

SRD

LRD

01/22/09 ICDCS2006 27

Inside Each ServerInside Each Server

0

20

40

60

80

100

Uti

liza

tio

n (

%)

0 10 30 50 70 90

R (%)

Server 1 Server 2 Server 3 Server 4

01/22/09 ICDCS2006 28

1

10

100

1000

10000

100000

1000000R

esp

on

se t

ime

0 10 30 50 70 90

R (%)

Server 1 Server 2 Server 3 Server 4

0102030405060708090

100

Util

izat

ion

(%)

0 10 30 50 70 90

R (%)

Inside Each ServerInside Each Server

Too Bias

How to get optimal R ?

top related