confluo - usenix€¦ · anurag khandelwal, rachit agarwal, ion stoica 1. motivation!2. motivation...
TRANSCRIPT
-
Confluo: Distributed Monitoring and Diagnosis
Stack for High-speed Networks
Anurag Khandelwal, Rachit Agarwal, Ion Stoica
�1
-
Motivation
!2
-
Motivation• Managing large scale networks is increasingly complex
!
Network Misconfigurations Network Failures Load Imbalance Network Congestion
!2
-
Motivation• Managing large scale networks is increasingly complex
!
Network Misconfigurations Network Failures Load Imbalance Network Congestion
• Network issues ⇒ Performance degradation, loss in revenue
!2
-
Opportunity: Networks can capture a lot of data…
!3
-
Opportunity: Networks can capture a lot of data…
!3
-
Opportunity: Networks can capture a lot of data…
!3
In-network techniques, e.g., Marple [SIGCOMM’19], FlowRadar [NSDI’16], UnivMon [SIGCOMM’16]
-
Opportunity: Networks can capture a lot of data…
!3
Limited storage
-
Opportunity: Networks can capture a lot of data…
!3
Limited storage
In-band Network Telemetry (INT)
-
Opportunity: Networks can capture a lot of data…
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
!3
Limited storage
In-band Network Telemetry (INT)
-
Opportunity: Networks can capture a lot of data…
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
!3
Limited storage
In-band Network Telemetry (INT)
• Analyze telemetry data at end-hosts e.g., Trumpet [SIGCOMM’16], PathDump [OSDI’16], SwitchPointer [NSDI’18]
-
Example: Checking Path Conformance
!4
S1 S2 S3
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1 S2 S3
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1 S2 S3
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1 S2 S3
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1
S1 S2 S3
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1 S2 S3
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1 S2 S3S1
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1 S2 S3S1
S2
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
Example: Checking Path Conformance
!4
S1 S2 S3
• Does the packet pass through switch S1? [PathDump, OSDI’16]
• Embed wide range of telemetry data within packet headers: ‣ Packet trajectory‣ Hop latency
‣ Queue lengths ‣ Link utilization, and many more…
-
!5
Goals for end-host stack design
-
!5
Goals for end-host stack design
End-host stacks need to support:
Real-time monitoring of rich telemetry data
-
!5
Goals for end-host stack design
End-host stacks need to support:
Real-time monitoring of rich telemetry data
Low-overhead distributed diagnosis of network events
-
!5
Goals for end-host stack design
End-host stacks need to support:
Real-time monitoring of rich telemetry data
Low-overhead distributed diagnosis of network events
Highly-concurrent reads & writes of headers using minimal CPU
-
Challenge: …Networks capture a lot of data
!6
-
Challenge: …Networks capture a lot of data
Line-rate for 10Gbps links ⇒ 0.9-16 million packets/second ~ 50 nanoseconds budget per packet header!
!6
-
Challenge: …Networks capture a lot of data
Line-rate for 10Gbps links ⇒ 0.9-16 million packets/second ~ 50 nanoseconds budget per packet header!
!6
Thro
ughp
ut (p
acke
ts/s
)
0M
4M
8M
12M
16M
Storm+Kafka Flink+Kafka KafkaBTrDB CorfuDB TimescaleDB
#Cores 32 32 32 32 32 32
Max packet rate @ 10Gbps
-
Challenge: …Networks capture a lot of data
Line-rate for 10Gbps links ⇒ 0.9-16 million packets/second ~ 50 nanoseconds budget per packet header!
!6
Thro
ughp
ut (p
acke
ts/s
)
0M
4M
8M
12M
16M
Storm+Kafka Flink+Kafka KafkaBTrDB CorfuDB TimescaleDB
#Cores 32 32 32 32 32 32
Transactional Semantics
Max packet rate @ 10Gbps
-
Existing Approaches
Func
tiona
lity
Performance
!7
-
Existing Approaches
Func
tiona
lity
Performance
Traditional stacksTraditional End-host Stacks Tribeca [VLDB’96], Gigascope [SIGMOD’03],
Time Machine [SIGCOMM’08]
!7
-
Existing Approaches
Func
tiona
lity
Performance
Traditional stacks
End-host Stacks using Stream-processing systems/Key-Value Stores
OpenSOC, Tigon, PathDump [OSDI’16], SwitchPointer [NSDI’18]
Stacks using external data-processing systems
Traditional End-host Stacks Tribeca [VLDB’96], Gigascope [SIGMOD’03],
Time Machine [SIGCOMM’08]
!7
-
Existing Approaches
Func
tiona
lity
Performance
Traditional stacks
End-host Stacks using Stream-processing systems/Key-Value Stores
OpenSOC, Tigon, PathDump [OSDI’16], SwitchPointer [NSDI’18]
Stacks using external data-processing systems
Monitor rich telemetry data,
-
Existing Approaches
Func
tiona
lity
Performance
Traditional stacks
End-host Stacks using Stream-processing systems/Key-Value Stores
OpenSOC, Tigon, PathDump [OSDI’16], SwitchPointer [NSDI’18]
Stacks using external data-processing systems
Custom-designed monitoring stacks
Custom-designed monitoring stacks FloSIS [USENIX ATC’15], Trumpet
[SIGCOMM’16]
Traditional End-host Stacks Tribeca [VLDB’96], Gigascope [SIGMOD’03],
Time Machine [SIGCOMM’08]
!7
-
Existing Approaches
Func
tiona
lity
Performance
Traditional stacks
End-host Stacks using Stream-processing systems/Key-Value Stores
OpenSOC, Tigon, PathDump [OSDI’16], SwitchPointer [NSDI’18]
Stacks using external data-processing systems
Custom-designed monitoring stacks
Custom-designed monitoring stacks FloSIS [USENIX ATC’15], Trumpet
[SIGCOMM’16]
10-40 Gbps links, Limited functionality or precision
Traditional End-host Stacks Tribeca [VLDB’96], Gigascope [SIGMOD’03],
Time Machine [SIGCOMM’08]
!7
-
Existing Approaches
Func
tiona
lity
Performance
Traditional stacks
End-host Stacks using Stream-processing systems/Key-Value Stores
OpenSOC, Tigon, PathDump [OSDI’16], SwitchPointer [NSDI’18]
Stacks using external data-processing systems
Custom-designed monitoring stacks
Custom-designed monitoring stacks FloSIS [USENIX ATC’15], Trumpet
[SIGCOMM’16]
Traditional End-host Stacks Tribeca [VLDB’96], Gigascope [SIGMOD’03],
Time Machine [SIGCOMM’08]
Ideal
Can we achieve both simultaneously?!7
-
Confluo
!8
Writ
e Th
roug
hput
(Ops
)
0M
4M
8M
12M
16M
20M
24M
28M
Storm+Kafka Flink+Kafka Kafka BTrDBCorfuDB TimescaleDB Atomic MultiLog
#Cores 32 32 32 32 32 32 1
-
Confluo
!8
Writ
e Th
roug
hput
(Ops
)
0M
4M
8M
12M
16M
20M
24M
28M
Storm+Kafka Flink+Kafka Kafka BTrDBCorfuDB TimescaleDB Atomic MultiLog
#Cores 32 32 32 32 32 32 1
Max packet rate @ 10Gbps
-
Confluo
!8
Writ
e Th
roug
hput
(Ops
)
0M
4M
8M
12M
16M
20M
24M
28M
Storm+Kafka Flink+Kafka Kafka BTrDBCorfuDB TimescaleDB Atomic MultiLog
#Cores 32 32 32 32 32 32 1
Max packet rate @ 10Gbps
Confluo achieves this using a new data structure Atomic MultiLog
-
Atomic MultiLogNew data structure that exploits structure in telemetry data to meet all goals
!9
-
Atomic MultiLogNew data structure that exploits structure in telemetry data to meet all goals
!9
Attributes of interest are fixed-sized
32-bit IP addresses, timestamps, 16-bit port numbers, switchIDs, queue-lengths, etc.
-
Atomic MultiLogNew data structure that exploits structure in telemetry data to meet all goals
!9
Attributes of interest are fixed-sized
32-bit IP addresses, timestamps, 16-bit port numbers, switchIDs, queue-lengths, etc.
Low-overhead indexing with specialized perfect k-ary trees
-
Atomic MultiLogNew data structure that exploits structure in telemetry data to meet all goals
Data once written is not updated
Aggregated only at coarse-grained timescales.
!9
Attributes of interest are fixed-sized
32-bit IP addresses, timestamps, 16-bit port numbers, switchIDs, queue-lengths, etc.
Low-overhead indexing with specialized perfect k-ary trees
-
Atomic MultiLogNew data structure that exploits structure in telemetry data to meet all goals
Data once written is not updated
Aggregated only at coarse-grained timescales.
!9
Append-only write-efficient data structures
Attributes of interest are fixed-sized
32-bit IP addresses, timestamps, 16-bit port numbers, switchIDs, queue-lengths, etc.
Low-overhead indexing with specialized perfect k-ary trees
-
Atomic MultiLogNew data structure that exploits structure in telemetry data to meet all goals
Data once written is not updated
Aggregated only at coarse-grained timescales.
!9
Append-only write-efficient data structures
Do not require serializable transactions, linearizability is sufficient
Linearizability: single-operation, single-object Serializability: multi-operation, multi-object
Attributes of interest are fixed-sized
32-bit IP addresses, timestamps, 16-bit port numbers, switchIDs, queue-lengths, etc.
Low-overhead indexing with specialized perfect k-ary trees
-
Atomic MultiLogNew data structure that exploits structure in telemetry data to meet all goals
Data once written is not updated
Aggregated only at coarse-grained timescales.
!9
Append-only write-efficient data structures
Do not require serializable transactions, linearizability is sufficient
Linearizability: single-operation, single-object Serializability: multi-operation, multi-object
Trim down concurrency mechanisms to updating 2 integers
Attributes of interest are fixed-sized
32-bit IP addresses, timestamps, 16-bit port numbers, switchIDs, queue-lengths, etc.
Low-overhead indexing with specialized perfect k-ary trees
-
Atomic MultiLog: Write Efficient Storage
10
-
Traditional Data Stores: Use complex data structures to support general workloads, compromising on write efficiency
Atomic MultiLog: Write Efficient Storage
10
-
Atomic MultiLog: Write Efficient Storage
10
Data once written is not updated
-
Atomic MultiLog: Write Efficient Storage
Header Log
Concurrent, Append-Only Logs
10
Data once written is not updated
-
Atomic MultiLog: Write Efficient Storage
Header Log Attribute IndexES
Concurrent, Append-Only Logs
Reference Logs
10
Data once written is not updated
e.g., srcIP
-
Atomic MultiLog: Write Efficient Storage
Header Log Attribute IndexES
Time-Indexed Filters
Concurrent, Append-Only Logs
Reference Logs
10
Data once written is not updated
e.g., srcIP
e.g., srcIP=10.0.0.1 && distort=90
-
Atomic MultiLog: Write Efficient Storage
Header Log Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Concurrent, Append-Only Logs
Reference Logs
10
Data once written is not updated
e.g., srcIP
e.g., srcIP=10.0.0.1 && distort=90
e.g., min(CWND)
-
Atomic MultiLog: Write Efficient Storage
Header Log Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Concurrent, Append-Only Logs
Reference Logs
10
Append-only logs provide write efficiency
Data once written is not updated
e.g., srcIP
e.g., srcIP=10.0.0.1 && distort=90
e.g., min(CWND)
-
Atomic MultiLog: Write Efficient Storage
Header Log Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Concurrent, Append-Only Logs
Reference Logs
10
Append-only logs provide write efficiency
Do not support in-place updates
Data once written is not updated
e.g., srcIP
e.g., srcIP=10.0.0.1 && distort=90
e.g., min(CWND)
-
Atomic MultiLog Consistency
11
-
Atomic MultiLog Consistency
11
Database Transactions
-
Atomic MultiLog Consistency
11
Database Transactions
User:
ReadWriteWriteRead
…WriteReadWrite
WriteWriteRead
-
Atomic MultiLog Consistency
11
Database Transactions
Network Monitoring & Diagnosis
User:
ReadWriteWriteRead
…WriteReadWrite
WriteWriteRead
-
Atomic MultiLog Consistency
11
Database Transactions
Network Monitoring & Diagnosis
User:
ReadWriteWriteRead
…WriteReadWrite
WriteWriteRead
Network:
Write
Write
Write…
-
Atomic MultiLog Consistency
11
Database Transactions
Network Monitoring & Diagnosis
User:
ReadWriteWriteRead
…WriteReadWrite
WriteWriteRead
Network:
Write
Write
Write…
Network Operator:
Read
Read
Read
…
-
Atomic MultiLog Consistency
11
Do not require serializable transactions, linearizability is sufficient
Database Transactions
Network Monitoring & Diagnosis
User:
ReadWriteWriteRead
…WriteReadWrite
WriteWriteRead
Network:
Write
Write
Write…
Network Operator:
Read
Read
Read
…
-
Efficient Linearizablity for LogsSupport for concurrent appends & reads
12
-
Efficient Linearizablity for Logs
Read-Tail, Write-Tail
Support for concurrent appends & reads
12
-
Efficient Linearizablity for Logs
Read-Tail Write-Tail
Append
Support for concurrent appends & reads
12
-
Efficient Linearizablity for Logs
Read-Tail Write-Tail
Append
Support for concurrent appends & reads
Safe for Concurrent
READS
12
-
Efficient Linearizablity for Logs
Safe for Concurrent
READS
Read-Tail, Write-Tail
Support for concurrent appends & reads
12
-
Efficient Linearizablity for Logs
Read-Tail, Write-Tail
Linearizable reads & appends
Support for concurrent appends & reads
12
-
Efficient Linearizablity for Logs
Read-Tail, Write-Tail
Lock-free techniques for efficiency
Linearizable reads & appends
Support for concurrent appends & reads
12
-
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Reference Logs
13
ATOMIC MULTILOG
-
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Offsets
0
50
100
Reference Logs
13
ATOMIC MULTILOG
-
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Offsets
0
50
100
0 50
100
100 150
100 150
50
Reference Logs
13
ATOMIC MULTILOG
-
200
200
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Global Read Tail
Global Write Tail
Offsets
0
50
100
0 50
100
100 150
100 150
50
OFFSETS
Reference Logs
13
ATOMIC MULTILOG
-
200
250
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Global Read Tail
Global Write Tail
200
Offsets
0
50
100
0 50
100
100 150
100 150
50
OFFSETS
Reference Logs
13
ATOMIC MULTILOG
-
200
250
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Global Read Tail
Global Write Tail
200
Offsets
0
50
100
200
200
200
0 50
100
100 150
100 150
50
OFFSETS
Reference Logs
13
ATOMIC MULTILOG
-
250
250
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Global Read Tail
Global Write Tail
200
Offsets
0
50
100
200
200
200
0 50
100
100 150
100 150
50
OFFSETS
Reference Logs
13
ATOMIC MULTILOG
-
250
250
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Global Read Tail
Global Write Tail
200
Offsets
0
50
100
200
200
200
0 50
100
100 150
100 150
50
OFFSETS
Relax linearizability for individual logs; ensure linearizability only
for end-to-end operations
Reference Logs
13
ATOMIC MULTILOG
-
250
250
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Global Read Tail
Global Write Tail
200
Offsets
0
50
100
200
200
200
0 50
100
100 150
100 150
50
OFFSETS
Relax linearizability for individual logs; ensure linearizability only
for end-to-end operations
Reference Logs
13
ATOMIC MULTILOG
Significant performance gains with linearizability at high degrees of concurrency
-
250
250
Atomic MultiLog LinearizabilityHeader Log Attribute
IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Global Read Tail
Global Write Tail
200
Offsets
0
50
100
200
200
200
0 50
100
100 150
100 150
50
OFFSETS
Relax linearizability for individual logs; ensure linearizability only
for end-to-end operations
Reference Logs
13
No support for transactions
ATOMIC MULTILOG
Significant performance gains with linearizability at high degrees of concurrency
-
Atomic MultiLog Indexing
14
-
Traditional Indexes: Expensive to ensure atomicity, high overhead write paths, etc.
Atomic MultiLog Indexing
14
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
14
Attributes of interest are fixed-sized
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
fixed-depth
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
Reference LogsLeaf Nodes
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
fixed-depth
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
Reference LogsLeaf Nodes
Efficient write path and write conflict resolutions
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
fixed-depth
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
Reference LogsLeaf Nodes
Efficient write path and write conflict resolutions
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
fixed-depth
2.2x faster (1core), 7.8x faster (48 cores)
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
Reference LogsLeaf Nodes
Efficient write path and write conflict resolutions
Ordered access via inexpensive range queries
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
fixed-depth
2.2x faster (1core), 7.8x faster (48 cores)
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
Reference LogsLeaf Nodes
Efficient write path and write conflict resolutions
Ordered access via inexpensive range queries
14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
fixed-depth
2.2x faster (1core), 7.8x faster (48 cores)
~5.1x faster
-
Attribute IndexES
Time-Indexed Filters
Time-Indexed Aggregates
Atomic MultiLog Indexing
Perfect K-ary Tree
NULLNU
LL
Reference LogsLeaf Nodes
Efficient write path and write conflict resolutions
Ordered access via inexpensive range queries
Only supports fixed-sized attributes 14
Attributes of interest are fixed-sized Header fields have fixed domain sizes, e.g., 16-bit port in [0, 216]
Exactly k children
fixed-depth
2.2x faster (1core), 7.8x faster (48 cores)
~5.1x faster
-
Confluo End-host Architecture
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
NIC
Native Apps
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
NIC
MM
SM
Native Apps
…
MM = Mirror Module
SM = Spray Module
Ring Buffers
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
NIC
MM
SM
Native AppsWriter Writer Writer…
Atomic MultiLog
…
MM = Mirror Module
SM = Spray Module
Ring Buffers
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
NIC
MM
SM
Native AppsWriter Writer Writer…
Atomic MultiLog
Monitor Diagnoser
…
MM = Mirror Module
SM = Spray Module
Ring Buffers
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
NIC
MM
SM
Native AppsWriter Writer Writer…
Atomic MultiLog
Monitor Diagnoser Archiver
…
MM = Mirror Module
SM = Spray Module
Ring Buffers
!15
-
Confluo End-host ArchitectureHypervisor VM1
VM2
VMk
End-host Module
…
Hypervisor VM1
VM2
VMk
End-host Module
Hypervisor VM1
VM2
VMk
End-host Module
!15
-
Confluo End-host Architecture
Coordinator
Hypervisor VM1
VM2
VMk
End-host Module
…
Hypervisor VM1
VM2
VMk
End-host Module
Hypervisor VM1
VM2
VMk
End-host Module
!15
-
Confluo End-host Architecture
Coordinator
Hypervisor VM1
VM2
VMk
End-host Module
…
Hypervisor VM1
VM2
VMk
End-host Module
Hypervisor VM1
VM2
VMk
End-host Module
Consistent analysis of network-wide events using Linearizable Snapshots
!15
-
Consistency in Distributed Diagnosis
16
-
Consistency in Distributed Diagnosis
16
Diagnostic Query Q1
Diagnostic Query Q2
-
Consistency in Distributed Diagnosis
16
Diagnostic Query Q1
Diagnostic Query Q2
Snapshot S1
Snapshot S2
-
Consistency in Distributed Diagnosis
16
Diagnostic Query Q1
Diagnostic Query Q2
Snapshot S1
Snapshot S2
• Confluo provides linearizable snapshots, i.e., • Each snapshot is atomic• Snapshots are totally ordered, i.e.,
• if S1 “happens before” S2, S2 must contain all the changes in S1
-
Consistency in Distributed Diagnosis
16
Diagnostic Query Q1
Diagnostic Query Q2
Snapshot S1
Snapshot S2
• Confluo provides linearizable snapshots, i.e., • Each snapshot is atomic• Snapshots are totally ordered, i.e.,
• if S1 “happens before” S2, S2 must contain all the changes in S1• Limitation: Does not consider stack delays in ordering packets across end-hosts
-
Consistency in Distributed Diagnosis
16
Diagnostic Query Q1
Diagnostic Query Q2
Snapshot S1
Snapshot S2
• Confluo provides linearizable snapshots, i.e., • Each snapshot is atomic• Snapshots are totally ordered, i.e.,
• if S1 “happens before” S2, S2 must contain all the changes in S1• Limitation: Does not consider stack delays in ordering packets across end-hosts
Please see our paper for details on snapshot algorithm!
-
Evaluation• Setup:‣ Servers: 12 core 2.3 GHz Xeon CPUs, 252GB RAM‣ Network: 10Gbps links, Pica8 P-3297 switches
!17
-
Evaluation• Setup:‣ Servers: 12 core 2.3 GHz Xeon CPUs, 252GB RAM‣ Network: 10Gbps links, Pica8 P-3297 switches
• Summary of Results:‣ Capture packet headers at line rate > 10Gbps while evaluating
1000s of triggers & 10s of filters with minimal CPU %
!17
-
Evaluation• Setup:‣ Servers: 12 core 2.3 GHz Xeon CPUs, 252GB RAM‣ Network: 10Gbps links, Pica8 P-3297 switches
• Summary of Results:‣ Capture packet headers at line rate > 10Gbps while evaluating
1000s of triggers & 10s of filters with minimal CPU %
!17
‣ Exploit rich telemetry data in packet headers to enable large class of network monitoring and diagnosis applications
-
Evaluation• Setup:‣ Servers: 12 core 2.3 GHz Xeon CPUs, 252GB RAM‣ Network: 10Gbps links, Pica8 P-3297 switches
• Summary of Results:‣ Capture packet headers at line rate > 10Gbps while evaluating
1000s of triggers & 10s of filters with minimal CPU %
Please see our paper for detailed evaluation!
!17
‣ Exploit rich telemetry data in packet headers to enable large class of network monitoring and diagnosis applications
-
Atomic MultiLog Performance
!18
-
Atomic MultiLog PerformanceT
hrou
ghpu
t (P
acke
ts/s
)
0K
10M
20M
30M
40M
#Indexes
0 1 2 4
1 Filter 16 Filters
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
!18
-
Atomic MultiLog PerformanceT
hrou
ghpu
t (P
acke
ts/s
)
0K
10M
20M
30M
40M
#Indexes
0 1 2 4
1 Filter 16 Filters
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
!18
-
Atomic MultiLog Performance
Takeaway: Packet write-rate degrades gracefully on adding more filters and indexes
Thr
ough
put
(Pac
kets
/s)
0K
10M
20M
30M
40M
#Indexes
0 1 2 4
1 Filter 16 Filters
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
!18
-
Atomic MultiLog Performance
Takeaway: Packet write-rate degrades gracefully on adding more filters and indexes
Takeaway: Confluo’s write throughput scales well with #cores due to inexpensive concurrency control
Thr
ough
put
(Pac
kets
/s)
0K
10M
20M
30M
40M
#Indexes
0 1 2 4
1 Filter 16 Filters
0K
25M
50M
75M
100M
#Cores
1 2 4 8
1 Filter 16 Filters
0K
25M
50M
75M
100M
#Indexes
1 2 4 8
1 Index 4 Indexes
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
!18
-
Atomic MultiLog PerformanceC
PU
Uti
lizat
ion
(%)
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Filter 16 Filters
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Index 4 Indexes
!19
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
CPU Utilization for processing packets at line rate on a 10Gbps link
-
Atomic MultiLog PerformanceC
PU
Uti
lizat
ion
(%)
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Filter 16 Filters
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Index 4 Indexes
!19
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
CPU Utilization for processing packets at line rate on a 10Gbps link
-
Atomic MultiLog PerformanceC
PU
Uti
lizat
ion
(%)
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Filter 16 Filters
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Index 4 Indexes
!19
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
CPU Utilization for processing packets at line rate on a 10Gbps link
-
Atomic MultiLog Performance
Takeaway: Confluo can capture packets at line rate for 10Gbps links for a wide range of packet sizes using a single CPU core.
CP
U U
tiliz
atio
n (%
)
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Filter 16 Filters
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Index 4 Indexes
!19
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
CPU Utilization for processing packets at line rate on a 10Gbps link
-
Atomic MultiLog Performance
Takeaway: Confluo can capture packets at line rate for 10Gbps links for a wide range of packet sizes using a single CPU core.
CP
U U
tiliz
atio
n (%
)
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Filter 16 Filters
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Index 4 Indexes
!19
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
Many more results in the paper…
CPU Utilization for processing packets at line rate on a 10Gbps link
-
General Applicability and Impact
-
• Confluo exploits three properties: fixed-sized attributes, append-only, non-transactional
General Applicability and Impact
-
• Confluo exploits three properties: fixed-sized attributes, append-only, non-transactional
• Many other applications exhibit similar properties:
• Distributed messaging, e.g., Apache Kafka, Amazon Kinesis, etc.
• Time-series databases, e.g., OpenTSDB, InfluxDB, etc.
General Applicability and Impact
-
• Confluo exploits three properties: fixed-sized attributes, append-only, non-transactional
• Many other applications exhibit similar properties:
• Distributed messaging, e.g., Apache Kafka, Amazon Kinesis, etc.
• Time-series databases, e.g., OpenTSDB, InfluxDB, etc.
• We are actively exploring Confluo applicability beyond network monitoring and debugging
General Applicability and Impact
-
• Confluo exploits three properties: fixed-sized attributes, append-only, non-transactional
• Many other applications exhibit similar properties:
• Distributed messaging, e.g., Apache Kafka, Amazon Kinesis, etc.
• Time-series databases, e.g., OpenTSDB, InfluxDB, etc.
• We are actively exploring Confluo applicability beyond network monitoring and debugging
General Applicability and Impact
Open Source: https://www.github.com/ucbrise/confluo
https://www.github.com/ucbrise/confluo
-
Confluo Summary
21
-
Confluo Summary
21
Thank You!
https://www.github.com/ucbrise/confluo
Introduces a new data structure: Atomic MultiLog
Exploits structure in network telemetry data to support:
• Rich monitoring• Low-overhead diagnosis and,• High-concurrency reads and writes
Distributed monitoring and diagnosis stack for high-speed networks
https://www.github.com/ucbrise/confluo
-
Backup Slides
22
-
Atomic MultiLog Performance
Takeaway: Packet write-rate degrades gracefully on adding more filters and indexes
Takeaway: Confluo’s write throughput scales well with #cores owing to its inexpensive lock-free concurrency
Thr
ough
put
(Pac
kets
/s)
0K
10M
20M
30M
40M
#Indexes
0 1 2 4
1 Filter 4 Filters16 Filters 64 Filters
0K
25M
50M
75M
100M
#Cores
1 2 4 8
1 Filter 4 Filters16 Filters 64 Filters
0K
25M
50M
75M
100M
#Indexes
1 2 4 8
0 Indexes 1 Index 2 Indexes 4 Indexes
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
!23
-
Atomic MultiLog Performance
Takeaway: At line rate of 10Gbps, Confluo can handle average packet size as small as 128B with 16 filters and 2 indexes on a single core.
CPU
Util
izat
ion
(%)
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
1 Filter 4 Filters16 Filters 64 Filters
0
25
50
75
100
Packet Size (Bytes)
64 128 256 512 1024 1500
0 Indexes 1 Index 2 Indexes 4 Indexes
!24
Indexes: srcIP, srcPort, dstIP, dstPort, timestamp
Filter Templates: (f1) packets from VM A to VM B; (f2) packets to VM A; (f3) packets from VM A on destination port P; (f4) packets between (IP1, P1) and (IP2, P2); and (f5) packets to or from VM A.
-
Atomic MultiLog Performance
Takeaway: Confluo can evaluate 1000s of trigger queries with less than 4% CPU utilization at 1ms intervals, and with latency less than 70μs.
Takeaway: Diagnostic query latency in Confluo increases linearly with number of captured packets in Confluo.
CPU
Util
izat
ion
(%)
0
1
2
3
4
#Triggers
1 10 100 1000
1 ms 5 ms10 ms 20 ms
Trig
ger
Late
ncy
10us
100us
1ms
10ms
100ms
#Cores
1 10 100 1000
Late
ncy
(ms)
0
50
100
150
200
250
#Captured Packets (millions)
50 100 150 200 250 300
q1 q2 q3q4 q5
Query Templates: (q1) packets from VM A to VM B; (q2) packets to VM A; (q3) packets from VM A on destination port P; (q4) packets between (IP1, P1) and (IP2, P2); and (q5) packets to or from VM A.
Trigger Templates: aggregate > threshold, aggregate in {sum(pktSize), min(priority), max(CWND), count(pkts), …}
!25
-
Monitoring & Diagnosis Scenario
26
-
Monitoring & Diagnosis Scenario“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
26
-
Monitoring & Diagnosis Scenario“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Switch ’S'
26
-
Monitoring & Diagnosis Scenario“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Flow1 Flow2
Switch ’S'
26
-
Monitoring & Diagnosis Scenario
priority(flow1) > priority(flow2)
“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Flow1 Flow2
Switch ’S'
26
-
Monitoring & Diagnosis Scenario
priority(flow1) > priority(flow2)
• Filter TCP retransmissions as pkt_drops
“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Flow1 Flow2
Switch ’S'
26
Monitoring:
-
Monitoring & Diagnosis Scenario
priority(flow1) > priority(flow2)
• Filter TCP retransmissions as pkt_drops
“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Flow1 Flow2
Switch ’S'
• Aggregate drop_count on pkt_drops• Trigger alert if drop_count > T in 1ms interval
26
Monitoring:
-
Monitoring & Diagnosis Scenario
priority(flow1) > priority(flow2)
• Filter TCP retransmissions as pkt_drops
• priority(flow1) > priority(flow2) • drop_count(flow1) < drop_count(flow2)
“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Flow1 Flow2
Switch ’S'
• Aggregate drop_count on pkt_drops• Trigger alert if drop_count > T in 1ms interval
Diagnosis: Check if:
26
Monitoring:
-
Diagnosing Losses due to Flow Priorities“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
27
priority(flow1) > priority(flow2)
Flow1 Flow2
Switch ’S'
-
Diagnosing Losses due to Flow Priorities
Setup: 15 low priority flows & 1 high priority flow w/ 10Gbps links
“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
27
priority(flow1) > priority(flow2)
Flow1 Flow2
Switch ’S'
-
Diagnosing Losses due to Flow Priorities
Setup: 15 low priority flows & 1 high priority flow w/ 10Gbps links
Takeaway: Confluo is able to diagnose packet drops due to flow priorities.
drop
_cou
nt
0
450
900
1350
1800
FlowID
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
FlowID 1: High Priority, FlowID 2-16: Low Priority
27
priority(flow1) > priority(flow2)
Flow1 Flow2
Switch ’S'
-
Diagnosing Losses due to Flow Priorities“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
28
-
Diagnosing Losses due to Flow Priorities“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Setup: k low priority flows & 1 high priority flow w/ 10Gbps links
28
-
Diagnosing Losses due to Flow Priorities
Takeaway: Confluo can diagnose issues across 100s of VMs in a few ms
Late
ncy
(ms)
0
1
2
3
4
5
#Low priority flows (k)1 2 4 8 16 32 64 128
Atomic Snapshot Query Execution
“Detect TCP packet losses, determine if it is due to difference in flow priorities.”
Setup: k low priority flows & 1 high priority flow w/ 10Gbps links
28
-
MultiLog#1 Multilog#2 Multilog#3
Consistency in Distributed Diagnosis
Wall-
Clock Time
29
-
MultiLog#1 Multilog#2 Multilog#3
P12 P32
Consistency in Distributed Diagnosis
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes
29
Write begin
Write Complete (Visible)
P11
-
MultiLog#1 Multilog#2 Multilog#3
P12 P32
Consistency in Distributed Diagnosis
Wall-
Clock Time
P21P31
P22
Naive approach to snapshot: obtain globalReadTails for all MultiLogs
Pi = Packet Writes
29
P11
-
MultiLog#1 Multilog#2 Multilog#3
P12 P32
Consistency in Distributed Diagnosis
Wall-
Clock Time
P21P31
P22
Naive approach to snapshot: obtain globalReadTails for all MultiLogs
Pi = Packet Writes Snapshot 1
29
P11
-
MultiLog#1 Multilog#2 Multilog#3
P12 P32
Consistency in Distributed Diagnosis
Wall-
Clock Time
P21P31
P22
Naive approach to snapshot: obtain globalReadTails for all MultiLogs
Pi = Packet Writes Snapshot 1
29
P11
Snapshot 2
-
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Consistency in Distributed Diagnosis
Wall-
Clock Time
P21P31
P22
Naive approach to snapshot: obtain globalReadTails for all MultiLogs
Pi = Packet Writes Snapshot 1
Snapshot 1 contains P22 but not P12, Snapshot 2 contains P12 but not P22 29
P11
Snapshot 2
P22
-
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Consistency in Distributed Diagnosis
Wall-
Clock Time
P21P31
P22
Naive approach to snapshot: obtain globalReadTails for all MultiLogs
Pi = Packet Writes Snapshot 1
Snapshot 1 contains P22 but not P12, Snapshot 2 contains P12 but not P22
Not consistent!
29
P11
Snapshot 2
P22
-
Consistency in Distributed Diagnosis
30
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22
-
Consistency in Distributed Diagnosis
• Centralized sequencer orders all writes to system (e.g., DBMS)Existing Approaches
30
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22
-
Consistency in Distributed Diagnosis
• Centralized sequencer orders all writes to system (e.g., DBMS)• Algorithms with weaker consistency (e.g., Chandy Lamport)
Existing Approaches
30
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22
-
Consistency in Distributed Diagnosis
• Centralized sequencer orders all writes to system (e.g., DBMS)• Algorithms with weaker consistency (e.g., Chandy Lamport)
Existing Approaches
30
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22
Infeasible!
-
Linearizable Snapshots in Confluo
31
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22
-
Linearizable Snapshots in Confluo
• Impose order on some writes during query execution rather than during writes
31
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22
-
Linearizable Snapshots in Confluo
• Impose order on some writes during query execution rather than during writes• How do we make the naive snapshots consistent?
31
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22
-
• Key insight: Delay visibility of P11, P22 to all queries until after the snapshots
Linearizable Snapshots in Confluo
• Impose order on some writes during query execution rather than during writes
• P11, P12 now excluded from both snapshots 31
MultiLog#1 Multilog#2 Multilog#3
P12P12 P32
Wall-
Clock Time
P21P31
P22
Pi = Packet Writes Snapshot 1
P11
Snapshot 2
P22P22P12
-
Linearizable Snapshots in Confluo
Server#1 Server#3Server#2 Server#n Coordinator
32
Delay packet writes that happen during a snapshot operation
-
Read-Tail Values
GeT-And-Freeze
Linearizable Snapshots in Confluo
Server#1 Server#3Server#2 Server#n Coordinator
1. Broadcast FreezeAndGet request to all servers- Each server atomically freezes and gets the value of readTail
2. Receive the readTail values from all servers
32
Delay packet writes that happen during a snapshot operation
-
ACKS
Linearizable Snapshots in Confluo
Server#1 Server#3Server#2 Server#n Coordinator
1. Broadcast FreezeAndGet request to all servers- Each server atomically freezes and gets the value of readTail
2. Receive the readTail values from all servers
4. Receive ACKs from all servers
3. Send Unfreeze request to all servers- If no other snapshot in progress, atomically un-freeze readTail
32
Un-freeze
Delay packet writes that happen during a snapshot operation
-
Linearizable Snapshots in Confluo
Server#1 Server#3Server#2 Server#n Coordinator
1. Broadcast FreezeAndGet request to all servers- Each server atomically freezes and gets the value of readTail
2. Receive the readTail values from all servers
4. Receive ACKs from all servers
3. Send Unfreeze request to all servers- If no other snapshot in progress, atomically un-freeze readTail
The visibility of any packet write that would have completed during the snapshot is delayed by freezing the globalReadTail (Step 1)
33
Delay packet writes that happen during a snapshot operation
-
Linearizable Snapshots in Confluo
Server#1 Server#3Server#2 Server#n Coordinator
1. Broadcast FreezeAndGet request to all servers- Each server atomically freezes and gets the value of readTail
2. Receive the readTail values from all servers
4. Receive ACKs from all servers
3. Send Unfreeze request to all servers- If no other snapshot in progress, atomically un-freeze readTail
The visibility of any packet write that would have completed during the snapshot is delayed by freezing the globalReadTail (Step 1)
The packet writes are only made visible (in Step 3) after snapshot(s) have been collected (in Step 2)
33
Delay packet writes that happen during a snapshot operation