turboflow: information rich flow record generation on … · 2020-02-06 · turboflow architecture:...
TRANSCRIPT
![Page 1: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/1.jpg)
TurboFlow: Information Rich Flow Record Generation on
Commodity SwitchesJohn Sonchack1, Adam J. Aviv2, Eric Keller3, Jonathan M. Smith1
1University of Pennsylvania, 2USNA, 3University of Colorado
![Page 2: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/2.jpg)
Introduction: Network Monitoring with Flow Records
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
2
![Page 3: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/3.jpg)
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebugging
3
Introduction: Network Monitoring with Flow Records
![Page 4: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/4.jpg)
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingLow throughput because packets dropped at switch
2!
4
Introduction: Network Monitoring with Flow Records
![Page 5: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/5.jpg)
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingCongestion at
switch 2!
5
Introduction: Network Monitoring with Flow Records
![Page 6: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/6.jpg)
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingCongestion at
switch 2!
6
Introduction: Network Monitoring with Flow Records
![Page 7: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/7.jpg)
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingHosts 2 and 4 are in a botnet!
From botnet 7
Introduction: Network Monitoring with Flow Records
![Page 8: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/8.jpg)
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebugging
3.2$Tb/s >$100$M$packets/s
>$10$M$flows/s
8
Introduction: Network Monitoring with Flow Records
![Page 9: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/9.jpg)
9
Flow Monitoring Switches: Prior Work
Sampled Packets InaccurateSampling Flow
Records
![Page 10: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/10.jpg)
Custom Hardware Offloading
Server Offloading Expensive
Restrictive
Packets (or other records)
Flow Monitoring Switches: Prior Work
10
Sampled Packets InaccurateSampling
Flow Records
Flow Records
Packets (or other records)
Flow Records
![Page 11: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/11.jpg)
Introduction: TurboFlowMain idea: Optimize instead of offload. Q : What can we get out of the programmable hardware in next-generation commodity switches?
Onboard MicroserversProgrammable Forwarding Engines
11
![Page 12: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/12.jpg)
Introduction: TurboFlowMain idea: Optimize instead of offload. Q : What can we get out of the programmable hardware in next-generation commodity switches?
A : Flow record generation for multi-terabit rate traffic without sampling or offloading.
Onboard MicroserversProgrammable Forwarding Engines
12
![Page 13: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/13.jpg)
13
Programmable Forwarding
Engine
MicroserverTurboFlow
Flow Record Generation
Flow Records
Packets
Pre-aggregation
Partial Flow
Records
Introduction: TurboFlow
![Page 14: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/14.jpg)
• Introduction
• Architecture
• Evaluation
• Conclusion
14
Outline
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlow
Pre-aggregation
Partial Flow
Records
Flow Record Generation
![Page 15: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/15.jpg)
15
TurboFlow Architecture
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlow
Partial Flow
Records
Pre-aggregation
Flow Record Generation
![Page 16: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/16.jpg)
Forwarding Engine
Switch CPU
Background: Programmable Forwarding Engines
16
Stateful VariablesActionMatch
![Page 17: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/17.jpg)
Forwarding Engine
Background: Programmable Forwarding Engines
17
Packet Count
Average Interarrival
…
Update 3 1 ms …Update 49 8 ms …Update 3 42 ms …
Match Stateful VariablesAction
Flow (IP 5-tuple)
A -> BE -> GF -> G
Switch CPU
![Page 18: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/18.jpg)
Forwarding Engine
Background: Programmable Forwarding Engines
18
Switch CPU
Packet Count
Average Interarrival
…
Update 3 1 ms …Update 49 8 ms …Update 3 42 ms …
Match Stateful VariablesAction
Flow (IP 5-tuple)
A -> BE -> GF -> G
Table Manager
Rule installation rate: < 10 K / s
Flow arrival rate @ 1 Tb/s:
> 10,000 K / s
![Page 19: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/19.jpg)
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
19
Table Manager
![Page 20: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/20.jpg)
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
20
Current Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
Match Stateful Variables
![Page 21: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/21.jpg)
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
21
Match Stateful Variables
Flow Key Hash
1234
Current Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
Table Manager
![Page 22: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/22.jpg)
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
22
Flow Key (IP 5-tuple)
Packet Count
Average Interarrival
…
A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …Z -> Q 9 10 ms …
Match Stateful Variables
Flow Key Hash
1234
HASH
Current Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …Z -> Q 9 10 ms …
Tracked Flow: Update Counters
A->B
![Page 23: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/23.jpg)
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
23
Match Stateful Variables
Flow Key Hash
1234
HASH
Tracked Flow: Update Counters
A->B
G->H
Untracked Flow: Replace colliding record, send it to CPU
Z -> Q: 9 10 ms …
Record Aggregator
Tracked Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …G -> H 1 — …
![Page 24: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/24.jpg)
24
TurboFlow Design
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlow
Partial Flow
Records
Pre-aggregation
Flow Record Generation
![Page 25: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/25.jpg)
Count: 12Count: 2Count: 10
TurboFlow Architecture: Using the CPU Efficiently
25
Partial Flow
Records
Flow Records
Key Count
A->B 12
Flow Stats Dictionary
![Page 26: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/26.jpg)
TurboFlow Architecture: Using the CPU Efficiently
Partial Flow
Records
Flow Records
Count: 12Key Count
A->B 12
Flow Stats Dictionary
Count: 2Count: 10
26
Optimization Performance Vs. Baseline
baseline (std::unordered_map)
-
Reduce Pointer Operations
1.64X
Vectorize Key Comparison
3.79X
Batch and Prefetch
4.9X
average of 146 cycles spent per partial flow record.
![Page 27: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/27.jpg)
27
Outline
• Introduction
• Architecture
• Evaluation
• Conclusion
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlowOptimized
Flow Record Generation
Pre-aggregation
Partial Flow
Records
![Page 28: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/28.jpg)
28
Implementation and Evaluation
Benchmark Workloads • 10 Gb/s Internet
Router Traces (CAIDA 2015)
• 144 Node Simulated Datacenter Cluster (YAPS simulator)
ImplementationsP4 Switch
(3.2 Tb/s Barefoot Tofino)
P4 SmartNIC (40 Gb/s Netronome NFP)
![Page 29: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/29.jpg)
29
Implementation and Evaluation
Benchmark Workloads • 10 Gb/s Internet
Router Traces (CAIDA 2015)
• 144 Node Simulated Datacenter Cluster (YAPS simulator)
ImplementationsP4 Switch
(3.2 Tb/s Barefoot Tofino)
P4 SmartNIC (40 Gb/s Netronome NFP)
![Page 30: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/30.jpg)
30
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
![Page 31: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/31.jpg)
31
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
Partial aggregation using 5 MB of FE memory reduces workload by ~4X.
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
![Page 32: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/32.jpg)
32
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
1 2 3 4Switch CPU Cores
0
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
std::unordered map Fully Optimized
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
Optimizations improve performance by ~5X.
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
![Page 33: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/33.jpg)
33
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
1 2 3 4Switch CPU Cores
0
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
std::unordered map Fully Optimized
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
FE pre-aggregation + optimizations = terabit rate workloads using 1 core and ~26% of FE memory.
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
![Page 34: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/34.jpg)
34
Outline
• Introduction
• TurboFlow Design
• Implementation and Evaluation
• Conclusion
![Page 35: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/35.jpg)
In the Paper
Cost analysis
More interesting flow features Pipeline layouts
Psuedocode
Expected worst case analysis
More Evaluation
35
![Page 36: TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture: Using the CPU Efficiently Partial Flow Records Flow Records Count: 12 Key Count](https://reader033.vdocument.in/reader033/viewer/2022050405/5f821674565c5c064d34ade5/html5/thumbnails/36.jpg)
Conclusion (and Thank You for Listening!)
• Flow records are important for monitoring, but difficult to generate at the switch due to high traffic rates.
• TurboFlow is a flow record generator carefully optimized for next generation commodity switch hardware that scales to multi-terabit rate traffic without sampling.
36