Generating Realistic TCP WorkloadsGenerating Realistic TCP Workloads
Felix Hernandez-CamposFelix Hernandez-Campos
Ph. D. CandidatePh. D. Candidate
Dept. of Computer ScienceDept. of Computer Science
Univ. of North Carolina at Chapel HillUniv. of North Carolina at Chapel Hill
Recipient of the 2001 CMG FellowshipRecipient of the 2001 CMG Fellowship
Joint work withJoint work with
F. Donelson Smith and Kevin JeffayF. Donelson Smith and Kevin Jeffay
22
Experimental Networking ResearchExperimental Networking Researchand Performance Evaluationand Performance Evaluation
•• Evaluating network protocols andEvaluating network protocols andmechanisms requires careful experimentationmechanisms requires careful experimentation–– Network simulation (NS, Network simulation (NS, OpnetOpnet, , etc.etc.))–– Network testbedsNetwork testbeds
•• A critical element of these experiments is theA critical element of these experiments is thetraffic workloadtraffic workload–– Are current workloads Are current workloads realisticrealistic??
•• LetLet’’s look at some measurementss look at some measurements–– SprintSprint’’s tier-1 backbone networks tier-1 backbone network
33
OC-48 Link in San Jose, CA
February 3rd, 2004
TCP is the dominant transport protocol
© 2004 Sprint ATL
44
Internet traffic is driven by a large number of applications
© 2004 Sprint ATL
55
RouterRouter RouterRouter
EthernetEthernetSwitchSwitch
TrafficTrafficGeneratorsGenerators
TrafficTrafficGeneratorsGenerators
100/1000100/1000MbpsMbps
EthernetSwitch
10001000MbpsMbps
10001000MbpsMbps
100100MbpsMbps
100100MbpsMbps
Internet Traffic GenerationInternet Traffic GenerationTestbed Testbed exampleexample
66
RouterRouter RouterRouter
EthernetEthernetSwitchSwitch
TrafficTrafficGeneratorsGenerators
TrafficTrafficGeneratorsGenerators
100/1000100/1000MbpsMbps
EthernetSwitch
10001000MbpsMbps
10001000MbpsMbps
100100MbpsMbps
100100MbpsMbps
Internet Traffic GenerationInternet Traffic GenerationTestbed Testbed exampleexample
•• Evaluate queuing mechanisms in the routersEvaluate queuing mechanisms in the routers
•• Evaluate transport protocolsEvaluate transport protocols
•• Evaluate intrusion/anomaly detection mechanismsEvaluate intrusion/anomaly detection mechanisms
•• Evaluate traffic monitoring techniquesEvaluate traffic monitoring techniques
•• ……
77
Traffic GenerationTraffic GenerationState-of-the-ArtState-of-the-Art
•• Open-loopOpen-loop–– Large number of sophisticated modelsLarge number of sophisticated models
»» Packet-level modelingPacket-level modeling
–– But TCP is a closed-loop protocolBut TCP is a closed-loop protocol»» Open-loop traffic generation breaks reliability, flowOpen-loop traffic generation breaks reliability, flow
control, and congestion controlcontrol, and congestion control
•• Closed-loopClosed-loop–– The idea is to simulate the behavior ofThe idea is to simulate the behavior of
users/applicationsusers/applications»» Source-level modeling of specific applicationSource-level modeling of specific application
88
Application-Specific ModelingApplication-Specific ModelingHTTP Model ExampleHTTP Model Example
Web
Page
Request sizes,
File sizes
Inactive OFF
times
(user think time)
Active OFF times
Requests
to same
server
TimeBase HTML
Base HTML
Embedded
Embedded
Embedded
Embedded
Embedded
99
SURGE HTTP ModelSURGE HTTP Model[[Barford Barford and and CrovellaCrovella, 1998], 1998]
Web
Page
Request sizes,
File sizes
Inactive OFF
times
(user think time)
Active OFF times
Requests
to same
server
TimeBase HTML
Base HTML
Embedded
Embedded
Embedded
Embedded
Embedded
Embedded Object Sizes:Embedded Object Sizes:
Lognormal(Lognormal(µµ = 8.215, = 8.215, = 1.46) = 1.46)
Base HTML Sizes:Base HTML Sizes:
Lognormal(Lognormal(µµ = 7.63, = 7.63, = 1.001) = 1.001) –– BodyBody
Pareto(Pareto(kk = 10K, = 10K, = 1.0) = 1.0) –– TailTail
OFF Time Durations:OFF Time Durations:
Pareto(Pareto(kk = 10K, = 10K, = 1.0) = 1.0)
No. of EmbeddedNo. of Embedded
ObjectsObjects: Pareto(: Pareto(kk = 2, = 2, = =
1.245)1.245)
1010
Application-Specific ModelsApplication-Specific ModelsShortcomingsShortcomings
•• Creating application models is a challenging, time-Creating application models is a challenging, time-consuming taskconsuming task
–– Traffic mixes are driven by a large number of applicationsTraffic mixes are driven by a large number of applications
•• Set of dominant applications evolves quicklySet of dominant applications evolves quickly–– Applications themselves also evolveApplications themselves also evolve–– We cannot even identify a significant fraction of the trafficWe cannot even identify a significant fraction of the traffic
•• Modeling closed application protocols requiresModeling closed application protocols requiresreverse-engineeringreverse-engineering
•• Privacy considerations complicate data acquisitionPrivacy considerations complicate data acquisition–– TCP header tracesTCP header traces are ok, are ok, application dataapplication data is not is not
1111
Our ApproachOur Approach
•• Abstract source-level modelingAbstract source-level modeling–– Application-neutral technique to describe the source-levelApplication-neutral technique to describe the source-level
behavior of any TCP connectionbehavior of any TCP connection–– Efficient analysis applicable any arbitrary trace of TCPEfficient analysis applicable any arbitrary trace of TCP
headers (no analysis of payloads)headers (no analysis of payloads)–– We can also measure network-level parameters, such asWe can also measure network-level parameters, such as
round-trip times, receiver window sizes, round-trip times, receiver window sizes, etc.etc.
•• Source-level trace replaySource-level trace replay–– Replaying abstract source-level behaviorReplaying abstract source-level behavior
•• Validation in network Validation in network testbedtestbed–– We wrote a distributed, scalable traffic generator (We wrote a distributed, scalable traffic generator (tmixtmix))–– Comparison of original and synthetic tracesComparison of original and synthetic traces
1212
Different Views of Internet TrafficDifferent Views of Internet TrafficAbstract Source-level ModelingAbstract Source-level Modeling
UNC Internet
time
DATA
from UNCDATA
from Internet
DATA
UNC
DATA
from Inet
Request URL HTML Source Req. Image
Aggregate Packet
Arrival Level
Single-Flow Packet
Arrivals
Abstract
Source Level
Application Level
(e.g., web traffic)
2,500 bytes 4,800 bytes 800 b 1,800 b
Our Approach
Per-Flow Filtering
1313
Client-Server ApplicationsClient-Server ApplicationsPersistent HTTP ExamplePersistent HTTP Example
TIME329 b
803 b
BROWSER
SERVER
HTTP
Request 1
HTTP
Response 1
403 b
25,821 b
HTTP
Request 2
HTTP
Response 2
356 b
1,198 b
HTTP
Request 3
HTTP
Response 3
0.12
secs
3.12
secs
Epoch 2 Epoch 3Epoch 1
•• We call a pair of application data units (We call a pair of application data units (ADUsADUs) that) thatcarry a request/response exchange an carry a request/response exchange an epochepoch
•• Quiet timesQuiet times are also part of the workload of TCP are also part of the workload of TCP
1414
Sequential A-b-t ModelSequential A-b-t Model
•• Abstract source-level model for describing theAbstract source-level model for describing theworkload of TCP connectionsworkload of TCP connections
•• Each connection is summarized using a Each connection is summarized using a connectionconnectionvectorvector of the form of the form C Cii = ( = (ee11, , ee22,,……, , eenn) with ) with nn 11epochsepochs
–– Each epoch has the formEach epoch has the form e ejj = ( = (aajj, , tatajj, , bbjj, , tbtbjj))
•• Connection vectors can be extracted from trace ofConnection vectors can be extracted from trace ofTCP segment headersTCP segment headers
–– Sequence number directionality, timing analysis, writeSequence number directionality, timing analysis, writesize and packet size interactionssize and packet size interactions
–– OO((n log nn log n) + ) + OO((nn**WW))
1515
Client-Server ApplicationsClient-Server ApplicationsSMTP and NNTP ExamplesSMTP and NNTP Examples
TIME
93 b
SMTP
SENDER
SMTP
RECEIVER
220 Host
Info
32 b
HELO
191 b
250 Domain
Info
77 b
59b
250 Ok
RCPT
75b
38b
250 Ok
DATA
6b 22,568 b
Message
50b
250 Ok
44b
250 Ok
TIME
53 b
NNTP
READER
NNTP
SERVER
200 News
Srv 5.7b1
MODE
READER ARTICLE n
12 b
1056 bytes
220 n <id1> article
retrieved [article]
13 b
42 b
200 News
Service
GROUP
unc.help
19 b
42 b
unc.support
status
GROUP
unc.test
15 b
32 b
unccs.test
status
XOVER
15 b
32 b
224 data
[header n]
5.02 secs
1616
Beyond the Client-Server ModelBeyond the Client-Server ModelIcecast Icecast –– Internet Radio Internet Radio
TIME392bPLAYER
SERVER
Request 100
msecs.
48
ms.
217b 1253b 1686b 432b 863b 2442b 436b
105
msecs.
21
ms.
100
msecs.
34
ms.
1671b
108
msecs.
Audio Frames
•• Server PUSH applications do not follow theServer PUSH applications do not follow thetraditional client-server modeltraditional client-server model
•• The sequential a-b-t model is still applicableThe sequential a-b-t model is still applicable–– Make Make aaii and and tbtbii zerozero
1717
Beyond the Client-Server ModelBeyond the Client-Server ModelNNTP in Stream-ModeNNTP in Stream-Mode
TIME52 b
NNTP
PEER
NNTP
PEER
201 Server
Ready
13 b
MODE
STREAM
43 b
203
StreamOK
CHECK
<id1>
41 b
CHECK
<id2>
41 b
CHECK
<id3>
43 b
TAKETHIS <id2>
[article]
15,678 bytes
49 b
438
don’t
send
<id1>
CHECK
<id4>
41 b
42 b
238
send
<id2>
51 b
438
don’t
send
<id3>
49 b
438
don’t
send
<id4>
TIME68 b
PEER A
PEER B
BitTorrent
Protocol
68 b
BitTorrent
Protocol
657 b
Bitfield
657 b
Bitfield
5b
Unchoke
5b
Interested
5b
Interested
17b
16397 b
Piece
i
Request
Piece i
17b
Request
Piece j
17b
Request
Piece k
17b
Request
Piece l
17b
Request
Piece m
16397 b
Piece
j
16397 b
Piece
k
16397 b
Piece
l
16397 b
Piece
m
1818
Concurrent A-b-t ModelConcurrent A-b-t Model
•• Some connections are said to exhibit Some connections are said to exhibit data exchangedata exchangeconcurrencyconcurrency
•• Two reasons:Two reasons:–– Increasing performanceIncreasing performance–– Enabling natural concurrencyEnabling natural concurrency
•• Concurrent a-b-t model describes each side of theConcurrent a-b-t model describes each side of theconnection separatelyconnection separately
((((aa11, , tata11), (), (aa22, , tata22),),……, (, (aann, , tatann))))((((bb11, , tbtb11), (), (bb22, , tbtb22),),……, (, (bbmm, , tbtbmm))))
•• Concurrency can be detected with high probabilityConcurrency can be detected with high probability–– p.p.seqnoseqno > > q.q.acknoackno and and q.q.seqnoseqno > > p.p.acknoackno–– O(O(nn**WW))
1919
Source-Level Trace ReplaySource-Level Trace ReplayTraffic Generation in Lab Traffic Generation in Lab TestbedTestbed
Anonymized Packet
Header Trace
Source-level Trace:
Set of Connection Vectors
Traffic
Generators
Traffic
Generators
Processing
Workload Partitioning
Synthetic Packet
Header Trace
TESTBED
Source-level
Performance
Metrics
2020
Sample ComparisonSample Comparison
2121 2222
ConclusionConclusion
•• New method for modeling traffic mixesNew method for modeling traffic mixes–– Empirically-derived connection vectorsEmpirically-derived connection vectors–– Studied sequential vs. concurrent dichotomyStudied sequential vs. concurrent dichotomy–– Fully automated, efficient analysisFully automated, efficient analysis
•• New traffic generation approachNew traffic generation approach–– Enables comparison of real and synthetic trafficEnables comparison of real and synthetic traffic–– Implemented a distributed traffic generatorImplemented a distributed traffic generator–– Techniques for scaling traffic loadTechniques for scaling traffic load