cicada - usenix · 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm 8 vm 9 vm 2 vm 3 vm 4 vm 5 vm 6 vm 7 vm 8 vm 9...

Post on 11-Jul-2020

93 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CICADAPREDICTIVE GUARANTEES FOR CLOUD NETWORKS

Katrina LaCurts Jeff Mogul

Hari Balakrishnan Yoshio Turner

MIT CSAIL, Google Inc., HP Labs

lacurts@mit|2

BANDWIDTH GUARANTEES

`

lacurts@mit|2

BANDWIDTH GUARANTEES10 machines

`

lacurts@mit|2

BANDWIDTH GUARANTEES10 machines

`

lacurts@mit|2

BANDWIDTH GUARANTEES10 machines

`

lacurts@mit|2

BANDWIDTH GUARANTEES

bandwidth guarantee: a promise from the provider to the customer that its VMs will be able to

communicate with each other at a particular rate (informal definition)

10 machines

`

lacurts@mit|3

WHY GUARANTEES?

lacurts@mit|3

WHY GUARANTEES?

is there really that much data?

lacurts@mit|3

WHY GUARANTEES?applications can send

terabytes or petabytes of data internally [1]

is there really that much data?

[1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

lacurts@mit|3

WHY GUARANTEES?applications can send

terabytes or petabytes of data internally [1]

aren’t cloud networks homogeneous and well-provisioned?

[1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

lacurts@mit|3

WHY GUARANTEES?applications can send

terabytes or petabytes of data internally [1]application traffic is

affected by other users

aren’t cloud networks homogeneous and well-provisioned?

[1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

lacurts@mit|3

WHY GUARANTEES?applications can send

terabytes or petabytes of data internally [1]application traffic is

affected by other users

what customers care about network performance?

[1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

lacurts@mit|3

WHY GUARANTEES?applications can send

terabytes or petabytes of data internally [1]application traffic is

affected by other users

enterprise customers need to satisfy SLAs with their own customers

what customers care about network performance?

[1]  Zaharia,  et  al.  “Improving  MapReduce  Performance  in  Heterogeneous  Environments”.  OSDI  2008

lacurts@mit|4

POSSIBLE ARCHITECTURE

lacurts@mit|4

POSSIBLE ARCHITECTURE

customer requests machines and guarantee

10 machines, 10Gb/s between each pair

lacurts@mit|4

POSSIBLE ARCHITECTURE

customer requests machines and guarantee

provider places customer’s VMs,

enforces guarantee

10 machines, 10Gb/s between each pair

lacurts@mit|4

POSSIBLE ARCHITECTURE

customer requests machines and guarantee

provider places customer’s VMs,

enforces guarantee

10 machines, 10Gb/s between each pair

what if some pairs of VMs use fewer than 10Gb/s?

lacurts@mit|4

POSSIBLE ARCHITECTURE

customer requests machines and guarantee

provider places customer’s VMs,

enforces guarantee

10 machines, 10Gb/s between each pair

what if some pairs of VMs use fewer than 10Gb/s?

over-provisioned network increased cost to customer,

wasted bandwidth within network

lacurts@mit|4

POSSIBLE ARCHITECTURE

customer requests machines and guarantee

provider places customer’s VMs,

enforces guarantee

10 machines, 10Gb/s between each pair

what if some pairs of VMs use fewer than 10Gb/s?

what if some pairs of VMs need more than 10Gb/s?

over-provisioned network increased cost to customer,

wasted bandwidth within network

lacurts@mit|4

POSSIBLE ARCHITECTURE

customer requests machines and guarantee

provider places customer’s VMs,

enforces guarantee

10 machines, 10Gb/s between each pair

what if some pairs of VMs use fewer than 10Gb/s?

what if some pairs of VMs need more than 10Gb/s?

over-provisioned network increased cost to customer,

wasted bandwidth within network

under-provisioned network poor performance for

customer’s application

lacurts@mit|4

POSSIBLE ARCHITECTURE

customer requests machines and guarantee

provider places customer’s VMs,

enforces guarantee

10 machines, 10Gb/s between each pair

what if some pairs of VMs use fewer than 10Gb/s?

what if some pairs of VMs need more than 10Gb/s?

over-provisioned network increased cost to customer,

wasted bandwidth within network

under-provisioned network poor performance for

customer’s application

problem: how do customers know what guarantee they need?

lacurts@mit|5

CICADA’S ARCHITECTUREcicada makes predictions about an application’s

traffic to automatically generate a guarantee

lacurts@mit|5

CICADA’S ARCHITECTURE

customer makes initial request for machines

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|5

CICADA’S ARCHITECTURE

customer makes initial request for machines provider places

tenant VMs

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|5

CICADA’S ARCHITECTURE

hypervisors send measurements to cicada controller

customer makes initial request for machines provider places

tenant VMs

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|5

CICADA’S ARCHITECTURE

hypervisors send measurements to cicada controller

customer makes initial request for machines provider places

tenant VMs

cicada predicts bandwidth for each tenant

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|5

CICADA’S ARCHITECTURE

hypervisors send measurements to cicada controller

customer makes initial request for machines

provider offers guarantee based on cicada’s prediction

provider places tenant VMs

cicada predicts bandwidth for each tenant

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|5

CICADA’S ARCHITECTURE

hypervisors send measurements to cicada controller

customer makes initial request for machines

provider offers guarantee based on cicada’s prediction

provider places tenant VMs

cicada predicts bandwidth for each tenant

provider updates placement based on cicada’s prediction

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|6

CICADA’S ARCHITECTURE

hypervisors send measurements to cicada controller

customer makes initial request for machines

provider offers guarantee based on cicada’s prediction

provider places tenant VMs

cicada predicts bandwidth for each tenant

provider updates placement based on cicada’s prediction

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|6

CICADA’S ARCHITECTURE

hypervisors send measurements to cicada controller

customer makes initial request for machines

provider offers guarantee based on cicada’s prediction

provider places tenant VMs

cicada predicts bandwidth for each tenant

provider updates placement based on cicada’s prediction

problem: is application traffic actually predictable? if so, is it captured by existing models?

lacurts@mit|7

APPLICATION TRAFFICto design cicada’s prediction algorithm, we need to understand how

cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

more data

less data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

more data

less datano data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

variable application

more data

less datano data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

variable application

spatial variability: different pairs of VMs transfer different amounts of data

more data

less datano data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

variable application

spatial variability: different pairs of VMs transfer different amounts of data

more data

less datano data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

variable application

spatial variability: different pairs of VMs transfer different amounts of data

more data

less datano data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

variable application

spatial variability: different pairs of VMs transfer different amounts of datatemporal variability: pairs of VMs transfer different amounts of data at different times

more data

less datano data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|

DATASET

... ... ......

HP Cloud Serviceshttp://hpcloud.com

8

lacurts@mit|

DATASET

... ... ......

observed sflow data (could also use tcpdump, etc.)

HP Cloud Serviceshttp://hpcloud.com

8

lacurts@mit|

DATASET

... ... ......

observed sflow data (could also use tcpdump, etc.)

HP Cloud Serviceshttp://hpcloud.com

8

(in this talk) collected one traffic matrix per hour, for

each application

lacurts@mit|

DATASET

... ... ......

observed sflow data (could also use tcpdump, etc.)

M1 M2 Mn

...

observed traffic

HP Cloud Serviceshttp://hpcloud.com

8

each entry represents the number of bytes

transferred between i and j

(in this talk) collected one traffic matrix per hour, for

each application

lacurts@mit|

DATASET

... ... ......

observed sflow data (could also use tcpdump, etc.)

M1 M2 Mn

...

observed traffic

HP Cloud Serviceshttp://hpcloud.com

8

each entry represents the number of bytes

transferred between i and j

(in this talk) collected one traffic matrix per hour, for

each application

goal: use this dataset to quantify spatial and temporal variability

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal

Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal

Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal

Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal

Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal

Fij = 1/20 Fij < 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal

Fij = 1/20 Fij < 1/20 Fij > 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal many distinct Fij values

Fij = 1/20 Fij < 1/20 Fij > 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal many distinct Fij values

Fij = 1/20 Fij < 1/20 Fij > 1/20

2. calculate the coefficient of variation (σ/µ) of the Fij values

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal many distinct Fij values

Fij = 1/20 Fij < 1/20 Fij > 1/20

2. calculate the coefficient of variation (σ/µ) of the Fij values

⇒ σ(Fij)/µ(Fij) = 0

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal many distinct Fij values

Fij = 1/20 Fij < 1/20 Fij > 1/20

2. calculate the coefficient of variation (σ/µ) of the Fij values

⇒ σ(Fij)/µ(Fij) = 0

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

⇒ σ(Fij)/µ(Fij) = 1*

*see paper for this result

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal many distinct Fij values

Fij = 1/20 Fij < 1/20 Fij > 1/20

2. calculate the coefficient of variation (σ/µ) of the Fij values

⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

⇒ σ(Fij)/µ(Fij) = 1*

*see paper for this result

more data

less datano data

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

1. let Fij = fraction of tenant traffic sent from VMi to VMj

all Fij values are equal many distinct Fij values

Fij = 1/20 Fij < 1/20 Fij > 1/20

2. calculate the coefficient of variation (σ/µ) of the Fij values

⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

Fij = 0 Fij = 1/12

two distinct Fij values

⇒ σ(Fij)/µ(Fij) = 1*

*see paper for this result

the higher the cov, the higher the spatial variation

more data

less datano data

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F

Spatial Variation(cov of Fij values)

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F

Spatial Variation(cov of Fij values)

applications exhibit high spatial variability

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F

Spatial Variation(cov of Fij values)

applications exhibit high spatial variability

cov values as large as 100

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F

Spatial Variation(cov of Fij values)

applications exhibit high spatial variability

cov values as large as 100

(the same result holds for temporal variability, see paper)

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F

Spatial Variation(cov of Fij values)

applications exhibit high spatial variability

cov values as large as 100

(the same result holds for temporal variability, see paper)

customers cannot model such variability themselves, and existing models do not capture it

lacurts@mit|11

CICADA’S ALGORITHMM1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

M̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

= f(M1,M2, . . . ,Mn)

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

= w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

= w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

prediction based on errors made in previous predictions

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

prediction based on errors made in previous predictions

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

prediction based on errors made in previous predictions

wi(n+ 1) = wi(n)

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

prediction based on errors made in previous predictions

· e�L(i,n)loss function

wi(n+ 1) = wi(n)

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

prediction based on errors made in previous predictions

L(i, n) =k ~Mi � ~Mnk

k ~Mnk· e�L(i,n)

loss function

wi(n+ 1) = wi(n)

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|11

CICADA’S ALGORITHM

= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

prediction based on errors made in previous predictions

L(i, n) =k ~Mi � ~Mnk

k ~Mnk· e�L(i,n)

loss function

normalization

· 1

Zn+1

wi(n+ 1) = wi(n)

Cicada’s  prediction  algorithm  draws  inspiration  from  the  following  works:  [1]  Wang,  et  al.  “COPE:  Traffic  Engineering  in  Dynamic  Networks”.  SIGCOMM  2006.  [2]  Herbster  and  Warmuth.  “Tracking  the  Best  Expert”.  Machine  Learning  Vol  32,  1998.

lacurts@mit|12

EVALUATIONto evaluate cicada, we compared its predictions to ones made by

a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

lacurts@mit|12

EVALUATIONto evaluate cicada, we compared its predictions to ones made by

a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data

VOC requires customers to input parameters

(group sizes, amount of bandwidth)

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

lacurts@mit|12

EVALUATIONto evaluate cicada, we compared its predictions to ones made by

a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data

VOC requires customers to input parameters

(group sizes, amount of bandwidth)

we used an oracle method to pick the best possible

parameters, and also allowed for temporal variability

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

lacurts@mit|12

EVALUATIONto evaluate cicada, we compared its predictions to ones made by

a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data

VOC requires customers to input parameters

(group sizes, amount of bandwidth)

we used an oracle method to pick the best possible

parameters, and also allowed for temporal variability

L2-norm errorE =

kM̂ �MkkM̂k

relative error

E =

Pi=n2

i=1 M̂ [i]�M [i]Pi=n2

i=1 M [i]

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

lacurts@mit|12

EVALUATIONto evaluate cicada, we compared its predictions to ones made by

a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data

VOC requires customers to input parameters

(group sizes, amount of bandwidth)

we used an oracle method to pick the best possible

parameters, and also allowed for temporal variability

L2-norm errorE =

kM̂ �MkkM̂k

relative error

E =

Pi=n2

i=1 M̂ [i]�M [i]Pi=n2

i=1 M [i]

differentiates between over- and under-prediction

[1]  Ballani,  et  al.  “Towards  Predictable  Datacenter  Networks”.  SIGCOMM  2011.

lacurts@mit|13

RESULTS PREDICTING AVERAGE DEMAND

does not differentiate between over- and under-prediction

differentiates between over- and under-prediction

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F

Relative L2-norm Error (fraction)

CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F

Relative Error (fraction)

CicadaVOC (Oracle)

lacurts@mit|13

RESULTS PREDICTING AVERAGE DEMAND

cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and

require no customer input(71% for L2 error)

does not differentiate between over- and under-prediction

differentiates between over- and under-prediction

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F

Relative L2-norm Error (fraction)

CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F

Relative Error (fraction)

CicadaVOC (Oracle)

lacurts@mit|13

RESULTS

cicada occasionally under-predicts; VOC does not because we selected its parameters to

avoid under-prediction

PREDICTING AVERAGE DEMAND

cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and

require no customer input(71% for L2 error)

does not differentiate between over- and under-prediction

differentiates between over- and under-prediction

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F

Relative L2-norm Error (fraction)

CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F

Relative Error (fraction)

CicadaVOC (Oracle)

lacurts@mit|13

RESULTS

cicada occasionally under-predicts; VOC does not because we selected its parameters to

avoid under-prediction

PREDICTING AVERAGE DEMAND

cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and

require no customer input(71% for L2 error)

does not differentiate between over- and under-prediction

differentiates between over- and under-prediction

cicada’s predictions typically require only 1-2 hours of

application history

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F

Relative L2-norm Error (fraction)

CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F

Relative Error (fraction)

CicadaVOC (Oracle)

lacurts@mit|14

PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

lacurts@mit|14

PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

network resources must be available

lacurts@mit|14

PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

network resources must be available customers can add a buffer to the offered guarantees

accept cicada’s guarantee, but add 5% more bandwidth

lacurts@mit|14

PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

network resources must be available

the provider may choose to augment predictions that cicada is not confident about

typically not confident for “small” applications

customers can add a buffer to the offered guarantees

accept cicada’s guarantee, but add 5% more bandwidth

lacurts@mit|14

PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

network resources must be available

the provider may choose to augment predictions that cicada is not confident about

typically not confident for “small” applications

customers can add a buffer to the offered guarantees

accept cicada’s guarantee, but add 5% more bandwidth

cicada can detect when a guarantee is too low, and offer a new one

observe packet drops

lacurts@mit|15

NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

lacurts@mit|15

NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

lacurts@mit|15

NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

provider’s goal: minimize wasted bandwidth

lacurts@mit|15

NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

provider’s goal: minimize wasted bandwidth

intra-rack bandwidth is usually cheap

(sometimes free)

lacurts@mit|15

NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

provider’s goal: minimize wasted bandwidth

intra-rack bandwidth is usually cheap

(sometimes free)

inter-rack bandwidth is more expensive

lacurts@mit|15

NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

provider’s goal: minimize wasted inter-rack bandwidth

intra-rack bandwidth is usually cheap

(sometimes free)

inter-rack bandwidth is more expensive

lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck

Ba

nd

wid

th A

va

ila

ble

(GB

yte

/ho

ur)

Oversubscription Factor

[1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)

Oversubscription Factor

bw factor 1x = VOC’s placement [1]

[1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|16

NETWORK UTILIZATION

= VOC’s placement [1]= cicada’s placement

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck

Ba

nd

wid

th A

va

ila

ble

(GB

yte

/ho

ur)

Oversubscription Factor

bw factor 1x

[1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|16

NETWORK UTILIZATION

= VOC’s placement [1]= cicada’s placement

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck

Ba

nd

wid

th A

va

ila

ble

(GB

yte

/ho

ur)

Oversubscription Factor

bw factor 1xbw factor 25x

[1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)

Oversubscription Factor

bw factor 1xbw factor 25x

bw factor 250x

= VOC’s placement [1]= cicada’s placement

[1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)

Oversubscription Factor

bw factor 1xbw factor 25x

bw factor 250x

= VOC’s placement [1]= cicada’s placement

as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth

[1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)

Oversubscription Factor

bw factor 1xbw factor 25x

bw factor 250x

= VOC’s placement [1]= cicada’s placement

in some scenarios, cicada’s placement more than doubles the amount

of available inter-rack bandwidth

as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth

[1]  Ballani,  et  al.    “Towards  Predictable  Datacenter  Networks”.    SIGCOMM  2011.

cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|17

SUMMARY

accurate

cloud applications exhibit variability that existing models don’t capture

cicada captures this variability, and provides guarantees that are

calculated quickly*

require little history

increase network utilization

*see paper for this result

lacurts@mit|17

SUMMARY

accurate

cloud applications exhibit variability that existing models don’t capture

cicada captures this variability, and provides guarantees that are

calculated quickly*

require little history

increase network utilization

*see paper for this result

improve relative error by 90%

lacurts@mit|17

SUMMARY

accurate

cloud applications exhibit variability that existing models don’t capture

cicada captures this variability, and provides guarantees that are

calculated quickly*

require little history

increase network utilization

*see paper for this result

improve relative error by 90%

< 10milliseconds per application

lacurts@mit|17

SUMMARY

accurate

cloud applications exhibit variability that existing models don’t capture

cicada captures this variability, and provides guarantees that are

calculated quickly*

require little history

increase network utilization

*see paper for this result

improve relative error by 90%

< 10milliseconds per application

1--2 hours for most applications

lacurts@mit|17

SUMMARY

accurate

cloud applications exhibit variability that existing models don’t capture

cicada captures this variability, and provides guarantees that are

calculated quickly*

require little history

increase network utilization

*see paper for this result

improve relative error by 90%

< 10milliseconds per application

1--2 hours for most applications

in some cases, inter-rack utilization is doubled

top related