new bandwidth partitioning - stanford university · 2011. 5. 9. · complete bandwidth partitioning...
TRANSCRIPT
Bandwidth partitioning
(jointly with R. Pan, C. Psounis, C. Nair, B. Yang, L. Breslau and S. Shenker)
2
The Setup
• A congested network with many users
• Problems: – allocate bandwidth fairly – control queue size and hence delay
3
Approach 1: Network-centric
• Network node: fair queueing • User traffic: any type
Pros: perfect (short-time scale) fairness Cons:
depending on number of users/flows need to add and remove queues dynamically
incremental addition of queues not possible in data centers, may want to support VM migrations, etc
4
Approach 2: User-centric
• Network node: FIFO • User traffic: responsive to congestion (e.g. TCP)
problem: requires user cooperation, there are many TCP implementations!
• For example, if the red source blasts away, it will get all of the link’s bandwidth
• Question: Can we prevent a single source (or a small number of sources) from hogging up all the bandwidth, without explicitly identifying the rogue source?
• We will deal with full-scale bandwidth partitioning later
5
• Solve problems with tail-drop – Proactively indicate congestion – No bias against bursty flow – No synchronization effect
• Provide better QoS at router – low steady-state delay – lower packet dropping
Active Queue Management
6
Random Early Detection (RED)
yes
Drop the new packet
end
Admit packet with a probability p
end
AvgQsize > Maxth?
yes
Arriving packet
no
Admit the new packet
end
AvgQsize > Minth? no
7
RED Dropping Curve
minth maxth
0
Average Queue Size
Drop
Pro
babi
lity
1
maxp
8
What QoS does RED Provide?
• Lower buffer delay: good interactive service – qavg is controlled and small
• With congestion responsive flows: packet dropping is reduced – early congestion indication allows traffic to throttle
back before congestion
• With responsive: fair bandwidth allocation, approximately and over long time scales
9
Simulation Comparison: The setup
R1 1Mbps
10Mbps S(2)
S(m)
S(m+n)
TCP Sources
S(m+1)
UDP Sources
S(1)
R2
D(2)
D(m)
D(m+n)
TCP Sinks
D(m+1)
UDP Sinks
D(1)
10Mbps
10
1 UDP source and 32 TCP sources
11
A Randomized Algorithm: First Cut
• Consider a single link shared by 1 unresponsive (red) flow and k distinct responsive (green) flows
• Suppose the buffer gets congested
• Observe: It is likely there are more packets from the red (unresponsive) source
• So if a randomly chosen packet is evicted, it will likely be a red packet • Therefore, one algorithm could be: When buffer is congested evict a randomly chosen packet
12
Comments
• Unfortunately, this doesn’t work because there is a small non-zero chance of evicting a green packet
• Since green sources are responsive, they interpret the packet drop as a congestion signal and back-off
• This only frees up more room for red packets
13
Randomized algorithm: Second attempt
• Suppose we choose two packets at random from the queue and compare their ids, then it is quite unlikely that both will be green
• This suggests another algorithm: Choose two packets at random and drop them both if their ids agree
• This works: That is, it limits the maximum bandwidth the red source can consume
14
yes
Drop the new packet
end
Admit packet with a probability p
end
AvgQsize > Maxth?
yes
RED
Arriving packet
no
Admit the new packet
end
AvgQsize > Minth? no
yes
no
Drop both matched packets
end
Draw a packet at random from queue
Flow id same as the new packet id ?
yes
Drop the new packet
end
Admit packet with a probability p
end
no AvgQsize > Maxth?
no
CHOKe
yes
15
Simulation Comparison: The setup
R1 1Mbps
10Mbps S(2)
S(m)
S(m+n)
TCP Sources
S(m+1)
UDP Sources
S(1)
R2
D(2)
D(m)
D(m+n)
TCP Sinks
D(m+1)
UDP Sinks
D(1)
10Mbps
16
1 UDP source and 32 TCP sources
17
A Fluid Analysis
discards from the queue
permeable tube with leakage
18
Setup
discards from the queue N: the total number of packets in the buffer λi: the arrival rate for flow i Li(t): the rate at which flow i packets cross location t
0 D location in tube t+δt t
Li(t)
19
The Equation
Boundary Conditions
20
Simulation Comparison: 1UDP, 32 TCPs
21
Complete bandwidth partitioning
• We have just seen how to prevent a small number of sources from hogging all the bandwidth
• However, this is far from ideal fairness – What happens if we use a bit more state?
22
Our approach: Exploit power laws
• Most flows are very small (mice), most bandwidth is consumed by a few large (elephant) flows: simply partition the bandwidth amongst the elephant flows
• New problem: Quickly (automatically) identify elephant flows, allocate bandwidth to them
23
Detecting large (elephant) flows
• Detection: – Flip a coin with bias p (= 0.1, say) for heads on each arriving
packet, independently from packet to packet. – A flow is “sampled” if one its packets has a head on it
• A flow of size X has roughly 0.1X chance of being sampled – flows with fewer than 5 packets are sampled with prob 0.5 – flows with more than 10 packets are sampled with prob 1
• Most mice will not be sampled, most elephants will be
24
The AFD Algorithm
Di
Data Buffer
Flow Table
• AFD is a randomized algorithm joint work with Rong Pan, Lee Breslau and Scott Shenker currently being ported onto Cisco’s core router (and other) platforms
AFD vs. WRED
Test 1: TCP Traffic (1Gbps, 4 Classes)
Class 1
400 Flows
0 15 45 60 30 time
Class 2
400 Flows
0 15 45 60 30 time
Class 3
400 Flows
0 15 45 60 30 time
Class 4
0 15 45 60 30 time
Weight 1
160 Flows
Weight 3
Weight 2
400 Flows
1600 Flows Weight 4
TCP Traffic:Throughputs Under WRED
Throughput Ratio:13/1
TCP Traffic: Throughputs Under AFD
Throughput Ratio:2/1 as desired
AFD’s Implementation in IOS
• AFD is implemented as an IOS feature directly on top of IO driver – Integration Branch : haw_t – LABEL=V124_24_6
• Lines of codes: 689 lines – Structure definition and initialization: 253 – Per packet enque process function: 173 – Background timer function: 263
• Tested on e-ARMS c3845 platform
Throughput Comparison vs. 12.5T IOS
Scenario 2 (100Mbps Link) With Smaller Measurement Intervals - the longer the interval => the better rate accuracy
AFD Tradeoffs
• There is no free lunch and AFD does make a tradeoff • AFD’s tradeoff is as follows:
– by allowing bandwidth (rate) guarantees to be delivered over relative larger time intervals, the algorithm is able to achieve increased efficiency and lower cost implementation (e.g., lower cost ASICs; to lower instruction and memory bandwidth overhead for software)
– what does “allowing bandwidth (rate) guarantees to be delivered over relative larger time intervals” really mean? for example: if a traffic stream is guaranteed a rate of 10Mbps, is
that rate delivered over every time interval of size 1 sec, or is the rate delivered over time intervals of 100 milliseconds;
if the time interval is larger, AFD is more efficient, but the traffic can be more bursty within the interval
as link speeds go up, the time intervals for which AFD can be efficient becomes smaller and smaller.