is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · fs known in...

73
Is advance knowledge of flow sizes a plausible assumption? Vojislav Dukic, Sangeetha Abdu Jyothi, Bojan Karlas, Muhsen Owaida, Ce Zhang, Ankit Singla

Upload: others

Post on 08-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Is advance knowledge of flow sizes a plausible assumption?

Vojislav Dukic, Sangeetha Abdu Jyothi, Bojan Karlas,

Muhsen Owaida, Ce Zhang, Ankit Singla

Page 2: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow size information can increase efficiency

2

Performance

FS known in advance

FS not known in advance

Page 3: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow size information can increase efficiency

3

…fine-grained circuit scheduling

…packet scheduling within switches

…deadline-aware prioritization

…adaptive routing

…congestion control schemes

Page 4: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

100 KB

100 MB

Page 5: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

!5

Typically, first-in first-out

Packet scheduling at a switch queue

100 MB

100 KB

Page 6: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

!5

Typically, first-in first-out

Packet scheduling at a switch queue

100 MB

100 KB

Page 7: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

!6

Packet scheduling at a switch queue

But could apply shortest job first idea!

100 MB

100 KB

Page 8: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

!7

Packet scheduling at a switch queue

Least remaining bytes first[*pFabric: Minimal Near-Optimal Datacenter Transport; SIGCOMM ’13]

100 MB

100 KB

Page 9: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

!7

Packet scheduling at a switch queue

Least remaining bytes first[*pFabric: Minimal Near-Optimal Datacenter Transport; SIGCOMM ’13]

100 MB

100 KB

Page 10: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

!8

Packet scheduling at a switch queue

Simple, but problematic: do we know the shortest flow?

100 MB

100 KB

Page 11: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Assumption:Flow size is know in advance

Page 12: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

“In many datacenter applications flow sizes or deadlines are known at initiation time and can be conveyed to the network stack (e.g., through a socket API)...”

pFabric – SIGCOMM ’13Improves FCT by 4x

“When an application calls send() or sendto() on a socket, the operating

system sends this demand in a request message to the Fastpass arbiter,

specifying the destination and the number of bytes.”

FastPass – SIGCOMM ‘14 Improves FCT by 15x

“The sender must specify the size of a message when presenting its first byte to the transport... Knowledge of message sizes is particularly valuable because it allows transports to prioritize shorter messages.”

Homa – SIGCOMM ’18Reduces tail latency by 100x

Time

Page 13: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Assumption: Flow size is know in advance

Page 14: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Assumption: Flow size is know in advance

Page 15: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

“In this paper, we question the validity of this assumption, and point out that, for many applications, such information is difficult to obtain, and may even be unavailable.”

PIAS – NSDI ‘15

“A number of applications are unable to provide size/deadline information at the start of their flows, e.g. database access

and HTTP chunked transfer.”

Karuna – SIGCOMM ‘16

“We ignore flow size and duration as they cannot be acquired until a flow finishes.”

CODA – SIGCOMM ’16

Time

Page 16: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Example alternative: Flow aging

Packets sent = packets remaining

012340123 5

100 MB

100 KB

Page 17: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Example alternative: Flow aging

Packets sent = packets remaining

45

100 MB

100 KB

Page 18: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Possible alternative: flow aging

Works well for long tail flow size distributions

Page 19: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Possible alternative: flow aging

Works well for long tail flow size distributions

Works only when relative flow size is needed

Doesn’t work for equally sized flows

Page 20: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Is advance knowledge of flow sizes a plausible assumption?

Page 21: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

100% knowledge is likely intractable* * or at least too expensive

Is advance knowledge of flow sizes a plausible assumption?

Page 22: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

100% knowledge is likely intractable* * or at least too expensive

… but partial knowledge is plausible and useful

Is advance knowledge of flow sizes a plausible assumption?

Page 23: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

17

Sources of flow size information?

How useful is imprecise / partial knowledge?

Incentives for operators and users?

Page 24: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

18

Sources of flow size information?

How useful is imprecise / partial knowledge?

Incentives for operators and users?

Page 25: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow size estimation: design space

19

Many apps know flow sizes

Doable in private DCs

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Page 26: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow size estimation: design space

19

Many apps know flow sizes

Doable in private DCs

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past tracesSome apps don’t

Change a lot of applications

Change network API

Page 27: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0KB

50KB

100KB

150KB

200KB

250KB

1 1.002 1.004 1.006 1.008 1.01

Buffe

r occ

upan

cy

Time (s)

1G network

Flow size estimation: design space

20

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Buffer occupancy

Page 28: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0KB

50KB

100KB

150KB

200KB

250KB

1 1.002 1.004 1.006 1.008 1.01

Buffe

r occ

upan

cy

Time (s)

1G network

Flow size estimation: design space

21

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Buffer occupancy

Page 29: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0KB

50KB

100KB

150KB

200KB

250KB

1 1.002 1.004 1.006 1.008 1.01

Buffe

r occ

upan

cy

Time (s)

1G network

Flow size estimation: design space

22

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Buffer occupancy

Page 30: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0KB

50KB

100KB

150KB

200KB

250KB

1 1.002 1.004 1.006 1.008 1.01

Buffe

r occ

upan

cy

Time (s)

1G network

Flow size estimation: design space

23

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Buffer occupancy

Page 31: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0KB

50KB

100KB

150KB

200KB

250KB

1 1.002 1.004 1.006 1.008 1.01

Buffe

r occ

upan

cy

Time (s)

1G network10G network

Flow size estimation: design space

24

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Buffer occupancy

Page 32: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow size estimation: design space

25

while(data in buffer):y = write(socket_desc,buffer+x,100KB

)

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Page 33: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow size estimation: design space

26

flow_size = f(disk I/O,memory I/O,past network traffic,computation

)

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Page 34: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Learning flow size

27

* Values are in R2

1 – perfect prediction0 – mean value prediction

WorkloadWeb server

TensorFlow

PageRank

KMeans

SGD

Performance0.960.970.830.880.79

+

TraceDisk I/O

Memory I/O

CPU cycles

Past traffic

+

ModelGBDT

FFNN

LSTM =

Page 35: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

28

Sources of flow size information?

How useful is imprecise / partial knowledge?

Incentives for operators and users?

Page 36: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow scheduling using imprecise knowledge

[TensorFlow workload]

0 0.05

0.1 0.15

0.2 0.25

0.3 0.35

0.4 0.45

pFabric pHost FastPass

Mea

n FC

T (m

s)Perfect

Fifo

Mean FCT (ms)

1

[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]

2

[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]

Page 37: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow scheduling using imprecise knowledge

0 0.05

0.1 0.15

0.2 0.25

0.3 0.35

0.4 0.45

pFabric pHost FastPass

Mea

n FC

T (m

s)Perfect

Fifo

Mean FCT (ms)

[TensorFlow workload]

1

[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]

2

[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]

Least remaining bytes first

Page 38: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow scheduling using imprecise knowledge

0 0.05

0.1 0.15

0.2 0.25

0.3 0.35

0.4 0.45

pFabric pHost FastPass

Mea

n FC

T (m

s)Perfect

Fifo

Mean FCT (ms)

[TensorFlow workload]

1

[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]

2

[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]

Page 39: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow scheduling using imprecise knowledge

0 0.05

0.1 0.15

0.2 0.25

0.3 0.35

0.4 0.45

pFabric pHost FastPass

Mea

n FC

T (m

s)Perfect

FifoPrediction

Mean FCT (ms)

[TensorFlow workload]

1

[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]

2

[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]

Page 40: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow scheduling using imprecise knowledge

0 0.05

0.1 0.15

0.2 0.25

0.3 0.35

0.4 0.45

pFabric pHost FastPass

Mea

n FC

T (m

s)Perfect

FifoPrediction

Mean FCT (ms)

[TensorFlow workload]

1

[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]

2

[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]

Page 41: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0 0.05

0.1 0.15

0.2 0.25

0.3 0.35

0.4 0.45

pFabric pHost FastPass

Mea

n FC

T (m

s)Perfect

FifoPrediction

Aging

Flow scheduling using imprecise knowledge

Mean FCT (ms)

[TensorFlow workload]

1

[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]

2

[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]

Page 42: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

1 1.2 1.4 1.6 1.8

2 2.2 2.4 2.6

PageRankKMeans SGD

TensorFlowWeb Server

Rela

tive

perfo

rman

ce

degr

adat

ion

Coflow scheduling using imprecise knowledge (Sincronia*)

35

Slowdown

[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]

Page 43: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

1 1.2 1.4 1.6 1.8

2 2.2 2.4 2.6

PageRankKMeans SGD

TensorFlowWeb Server

Rela

tive

perfo

rman

ce

degr

adat

ion

Coflow scheduling using imprecise knowledge (Sincronia*)

36

Slowdown

[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]

Workload R2

Web server 0.96

TensorFlow 0.97

PageRank 0.83

KMeans 0.88

SGD 0.79

Page 44: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

1 1.2 1.4 1.6 1.8

2 2.2 2.4 2.6

PageRankKMeans SGD

TensorFlowWeb Server

Rela

tive

perfo

rman

ce

degr

adat

ion

Coflow scheduling using imprecise knowledge (Sincronia*)

37

Slowdown

[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]

Workload R2

Web server 0.96

TensorFlow 0.97

PageRank 0.83

KMeans 0.88

SGD 0.79

Page 45: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

1 1.2 1.4 1.6 1.8

2 2.2 2.4 2.6

PageRankKMeans SGD

TensorFlowWeb Server

Rela

tive

perfo

rman

ce

degr

adat

ion

Coflow scheduling using imprecise knowledge (Sincronia*)

38

Slowdown

[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]

Workload R2

Web server 0.96

TensorFlow 0.97

PageRank 0.83

KMeans 0.88

SGD 0.79

Page 46: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0

0.2

0.4

0.6

0.8

1

1 10 100

CDF

Latency (µs)

CPU

Fast enough, deployable learning?

39

CDF

Page 47: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0

0.2

0.4

0.6

0.8

1

1 10 100

CDF

Latency (µs)

CPU

Fast enough, deployable learning?

40

CDF

Page 48: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0

0.2

0.4

0.6

0.8

1

1 10 100

CDF

Latency (µs)

CPUFPGA

Fast enough, deployable learning?

41

CDF

Page 49: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Flow size estimation: design space

42

Works only for repetitive workloads

Must identify the application Substantial implementation effort.

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Learning from past traces

Page 50: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Learning from past traces

Every technique provides some knowledge

43

Estimation effort

Flow sizeknowledge

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Page 51: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Learning from past traces

Every technique provides some knowledge

43

Estimation effort

Flow sizeknowledge

Exact sizes given by application

TCP buffer occupancy

Monitoring system calls

Page 52: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

44

Sources of flow size information?

How useful is imprecise / partial knowledge?

Incentives for operators and users?

Page 53: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

What we want: More knowledge ⇒ better performance

Knowledge = Effort

Performance

0% 100%

Page 54: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Time

Page 55: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Unknown

Time

Page 56: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Unknown Known

TimeTime

Page 57: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Unknown Known

Q1: Can this flow’s performance deteriorate?

TimeTime

Page 58: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Unknown Known

Q2: Can the whole system’s performance deteriorate?

TimeTime

Page 59: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q1: Can this flow’s performance deteriorate?

Unknown Known

Q2: Can the whole system’s performance deteriorate?

Page 60: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q1: Can this flow’s performance deteriorate?

Unknown Known

Old scheduling problem, but novel, interesting questions raised by our unique angle!

Q2: Can the whole system’s performance deteriorate?

Page 61: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q1: Can this flow’s performance deteriorate?

Unknown Known

Page 62: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q1: Can this flow’s performance deteriorate?

Unknown Known

Least remaining bytes first

Page 63: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q1: Can this flow’s performance deteriorate?

Unknown Known

Least remaining bytes first

+ 1

Flow aging0101223

Page 64: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q1: Can this flow’s performance deteriorate?

Unknown Known

A flow’s performance cannot deteriorate!

Least remaining bytes first

+ 1

Flow aging0101223

Page 65: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

For coflow scheduling with mean flow size used for unknowns,performance can in fact deteriorate

Unknown Known

Q1: Can this flow’s performance deteriorate?

Page 66: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q2: Can the whole system’s performance deteriorate?

Unknown Known

Page 67: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Q2: Can the whole system’s performance deteriorate?

Average co/flow completion time system-wide can deteriorate!

Unknown Known

Page 68: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0 0.5

1 1.5

2 2.5

3 3.5

0% 20% 40% 60% 80% 100%

Slow

down

Percentage of known flows

Slowdown

Perfect knowledge performance

Unknown Known

Q2: Can the whole system’s performance deteriorate?

Page 69: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

0 0.5

1 1.5

2 2.5

3 3.5

0% 20% 40% 60% 80% 100%

Slow

down

Percentage of known flows

Slowdown

Open question: positive results with some restrictions?

Perfect knowledge performance

Unknown Known

Q2: Can the whole system’s performance deteriorate?

Page 70: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

55

Sources of flow size information? Partial knowledge available from many sources

How useful is imprecise / partial knowledge?Can still provide performance improvements

Incentives for operators and users? Need better guarantees on value of investment

Is advance knowledge of flow sizes a plausible assumption?

Page 71: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Thank you for your attention

[email protected]

Page 72: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Performance improvement

0

0.2

0.4

0.6

0.8

1

0 1KB 10KB 100KB 1MB 10MB All

Perfo

rman

ce

Size of known flows

SGDPageRank

TensorFlow

Page 73: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD

Features

58

Feature DescriptionStart time, tf Start time of f relative to job start timeFlow gap Time since the end of the previous flowFirst call Size of the first system call tfNetwork in Data received until tfNetwork out Data sent until tfNetwork in(d) Data received from flow’s dest. d until tfNetwork out(d) Data sent by this host to d until tfCPU CPU cycles used until tfDisk I/O Total disk I/O until tfMemory I/O Total memory I/O until tfPrevious flows Flow sizes for last k flows