applications & application-layer protocols: synthetic ...mweigle/clemson/courses/...»(current...

8
1 CPSC 826 Internetworking Applications & Application-Layer Protocols: Synthetic Traffic Generation * Michele Weigle Department of Computer Science Clemson University [email protected] September 13, 2004 http://www.cs.clemson.edu/~mweigle/courses/cpsc826 * Slides derived from “How Real Can Network Traffic Be” by Kevin Jeffay, presented at the Univ. of Virginia, 2004. 2 Generating Synthetic Traffic Outline Why do we need to generate synthetic traffic? Why can’t we just replay a packet trace? How can HTTP traffic be described? » History (Mah model, SURGE, thttp, PackMime) Is just describing HTTP traffic enough? » a-b-t trace modeling 3 Generating Synthetic Traffic Why? Suppose you develop a new router active queue management (AQM) scheme How does it affect the performance of current Internet applications? Simulation! » In a network simulator (like ns-2) » In a testbed ISP 1 Edge ISP 1 Edge Router Router ISP 2 Edge ISP 2 Edge Router Router ISP1 ISP1 Clients/Servers Clients/Servers ISP2 ISP2 Clients/Servers Clients/Servers Will AQM Will AQM Help? Help? 4 Generating Synthetic Traffic ns-2 and Testbeds ns-2 » Network simulator » www.isi.edu/nsnam/ns/ » Simulate thousands of connections on one computer Testbeds » Use lots of computers to simulate even more computers » Input some traffic » Monitor how it performs “Tuning RED for Web Traffic,”Mikkel Christiansen, Kevin Jeffay, Don Smith, and David Ott, SIGCOMM 2000.

Upload: others

Post on 09-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

1

CPSC 826Internetworking

Applications &Application-Layer Protocols:Synthetic Traffic Generation*

Michele WeigleDepartment of Computer Science

Clemson [email protected]

September 13, 2004

http://www.cs.clemson.edu/~mweigle/courses/cpsc826

* Slides derived from “How Real Can Network Traffic Be” by Kevin Jeffay, presented at the Univ. of Virginia, 2004. 2

Generating Synthetic TrafficOutline

Why do we need to generate synthetic traffic?

Why can’t we just replay a packet trace?

How can HTTP traffic be described?» History (Mah model, SURGE, thttp, PackMime)

Is just describing HTTP traffic enough?» a-b-t trace modeling

3

Generating Synthetic TrafficWhy?

Suppose you develop a new router active queuemanagement (AQM) scheme

How does it affect the performance of currentInternet applications?

Simulation!» In a network simulator (like ns-2)» In a testbed

ISP 1 EdgeISP 1 EdgeRouterRouter

ISP 2 EdgeISP 2 EdgeRouterRouter

ISP1ISP1Clients/ServersClients/Servers

ISP2ISP2Clients/ServersClients/Servers

Will AQMWill AQMHelp?Help?

4

Generating Synthetic Trafficns-2 and Testbeds

ns-2» Network simulator» www.isi.edu/nsnam/ns/» Simulate thousands of

connections on onecomputer

Testbeds» Use lots of computers to

simulate even morecomputers

» Input some traffic» Monitor how it performs

“Tuning RED for Web Traffic,”Mikkel Christiansen, KevinJeffay, Don Smith, and David Ott, SIGCOMM 2000.

Page 2: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

5

Internet

Generating Synthetic TrafficFirst Approach

“Realistic” traffic generation» Collect a packet trace from a link of interest

Arrival times, packet sizes, …» Replay the trace directly, or» Model the trace and use the model to generate statistically similar

traces Will the resulting traffic be “real” enough?

Clemson

time

6

Generating Synthetic TrafficSource-level traffic generation

Since the network shapes the traffic, what about the traffic isinvariant of the network?» Axiom: The application/user’s behavior is invariant of low-level

network processes

The Floyd, Paxson argument: source-level generation of trafficis preferred over packet-level generation» We desire application-dependent, network independent models of traffic

InternetClemson

time

7

Generating Synthetic TrafficSource-level traffic generation

Clemson Internet

time

WebRequest

HTMLSource Req. Image

2,500 bytes 4,800 bytes 800 b 1,800 b

Per-Flow Filtering

We need models ofhow applicationsgenerate traffic» Models of

applicationprotocols plusmodels of howapplications areused by users

Approaches:» Analytic models» Empirical models

8

Source-Level Traffic GenerationApproaches

Mah Model» Mah, “An Empirical Model of HTTP Network Traffic”, INFOCOM

1997. SURGE

» Barford and Crovella, “Generating Representative Web Workloads forNetwork and Server Performance Evaluation”, ACM SIGMETRICS1998

thttp» Hernandez-Campos, Jeffay, and Smith, “Tracking the Evolution of

Web Traffic: 1995-2003”, IEEE/ACM MASCOTS 2003 PackMime

» Cao, Cleveland, Gao, Jeffay, Smith, and Weigle, “Stochastic Modelsfor Generating Synthetic HTTP Source Traffic”, INFOCOM 2004.

tmix» Hernandez-Campos, Smith, and Jeffay, “Generating Realistic TCP

Workloads”, CMG 2004.

Page 3: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

9

ApproachesMah Model (1997)

Based on packet traces» Late 1995» 10 Mbps Ethernet» ~1.7 million HTTP packets traced

~5000 HTTP responses HTTP/1.0

» no persistent connections» no pipelining

Model is a series of empirical CDFs Primary random variables:

» Request sizes/Reply sizes» User think time» Document size - number of embedded images/page» Consecutive documents retrievals - same server» Server selection - popularity

REQREQ

RSPRSP

UserUser

ServerServer

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

TimeTime

10

ApproachesSURGE (1998) REQREQ

RSPRSP

UserUser

ServerServer

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

TimeTime

Intensity measured in “user equivalents”» The number of web users whose behavior is simulated» Make request, wait» ON / OFF process

Primary random variables:» Request sizes/Reply sizes» Popularity» Embedded references» Temporal locality» OFF times (active and inactive)

11

Approachesthttp (2003)

Primary random variables:» Request sizes/Reply sizes» User think time» Persistent connection usage» Number of objects per

persistent connection

» Number of embedded images/page» Number of parallel connections» Consecutive documents per server» Number of servers per page

REQREQ

RSPRSP

UserUser

ServerServer

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

TimeTime

Based on packet traces» 2003» 1 Gbps Ethernet» Millions of HTTP packets traced

96 million HTTP responses HTTP/1.0 and HTTP/1.1 Model is a series of empirical CDFs

12

ApproachesPackMime (2004)

Based on packet traces» Early 2000 / Late 2000» 100 Mbps Ethernet / 1 Gpbs Ethernet» Millions of HTTP packets

HTTP/1.0 and HTTP/1.1 Connection-based rather than page-based Intensity measured in new connections per second Primary random variables:

» Request sizes/Reply sizes» Time between requests» Number of pages requested» Number of request-response exchanges per page to same server» Network-specific: RTT, loss, bottleneck link speed

REQREQ

RSPRSP

UserUser

ServerServer

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

REQREQ

RSPRSP

TimeTime

Page 4: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

13

Source-Level Traffic GenerationThe failure of existing approaches

Dominant approach is tomodel individual applications

Wide-area traffic isgenerated by many differentapplications

Simulation/testbedexperiments should generate“traffic mixes”

Does the HTTP source-levelmodel constructionparadigm scale to otherapplications?

WebWeb

File-SharingFile-Sharing

Other TCPOther TCPOther UDPOther UDP

FTP, EmailFTP, EmailStreaming, DNS, GamesStreaming, DNS, Games

14

Constructing Source-Level ModelsSteps for simple request/response protocols

Obtain a trace of TCP/IP headers from a network link» (Current ethics dictate that tracing beyond TCP header is

inappropriate without users’ permission) Use changes in TCP sequence numbers (and

knowledge of HTTP) to infer application data unit(ADU) boundaries

Compute empirical distributions of the ADUs (andhigher-level objects) of interest

15

Ex: HTTP Model ConstructionHTTP inference from TCP packet headers

ClientClient ServerServer

DATADATA

ACKACK

DATADATADATADATAACKACK

FINFINFIN-ACKFIN-ACK

FINFINFIN-ACKFIN-ACK

SYNSYN

SYN-ACKSYN-ACKACKACK

seqnoseqno 305305 acknoackno 11

seqnoseqno 11 acknoackno 305305

seqnoseqno 14611461 acknoackno 305305

seqnoseqno 28762876 acknoackno 305305

seqnoseqno 305305 acknoackno 28762876TIMETIME

16

Ex: HTTP Model ConstructionHTTP inference from TCP packet headers

ClientClient ServerServer

DATADATA

ACKACK

DATADATADATADATAACKACK

FINFINFIN-ACKFIN-ACK

FINFINFIN-ACKFIN-ACK

SYNSYN

SYN-ACKSYN-ACKACKACK

305305 bytes bytes

28762876 bytes bytes

seqnoseqno 305305 acknoackno 11

seqnoseqno 1 1 ackno ackno 305305

seqnoseqno 14611461 acknoackno 305305

seqnoseqno 28762876 acknoackno 305305

seqnoseqno 305 305 ackno ackno 28762876TIMETIME

Page 5: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

17

Ex: HTTP Model ConstructionHTTP inference from TCP packet headers

ClientClient ServerServer

ACKACK

ACKACK

FINFINFIN-ACKFIN-ACK

FINFINFIN-ACKFIN-ACK

SYNSYN

SYN-ACKSYN-ACKACKACK

305 bytes305 bytes

seqnoseqno 1 1 ackno ackno 305305

seqnoseqno 305 305 ackno ackno 28762876TIMETIME

2,876 bytes2,876 bytes

HTTPHTTPRequestRequest

HTTPHTTPResponseResponse

18

Source-Level Traffic GenerationDo current model generation methods scale?

Implicit assumptions behind application modelingtechniques:» We can identify the application corresponding to a given flow

recorded during a measurement period» We can identify traffic generated by (instances) of the same

application» We know the operation of the application-level protocol

Ex: The HTTP success story:» Request sizes/Reply sizes» User think time» Persistent connection usage» Nbr of objects per persistent

connection

» Number of embedded images/page– Number of parallel connections» Consecutive documents per server» Number of servers per page

19

Source-Level Traffic GenerationDo current model generation methods scale?

Implicit assumptions behind application modelingtechniques:» We can identify the application corresponding to a given flow

recorded during a measurement period» We can identify traffic generated by (instances) of the same

application» We know the operation of the application-level protocol

What’s needed is an application-independentmethod of constructing source-level traffic models» We need to be able to construct application-level models

of traffic without knowing what applications are beingused or how the applications work

» We need to construct source-level models of applicationmixes seen in real networks

20

TCP Connection SignaturesRecording communication “patterns”

DATADATA

ACKACK

DATADATADATADATAACKACK

FINFINFIN-ACKFIN-ACK

FINFINFIN-ACKFIN-ACK

SYNSYN

SYN-ACKSYN-ACKACKACK

seqseq 305305 ackack 11

seqseq 11 ackack 305305

seqseq 14611461 ackack 305305

seqseq 28762876 ackack 305305

seqseq 305305 ackack 28762876

WebWebServerServer

2,876 bytes2,876 bytes

DATADATA

ACKACK

DATADATADATADATAACKACK

FINFINFIN-ACKFIN-ACK

FINFINFIN-ACKFIN-ACK

SYNSYN

SYN-ACKSYN-ACKACKACK

seqseq 305 305 ack ack 11

seqseq 1 1 ack ack 305305

seqseq 1461 1461 ack ack 305305

seqseq 2876 2876 ack ack 305305

seqseq 305 305 ack ack 28762876

CallerCaller CalleeCallee

HTTPHTTPResponseResponse

HTTPHTTPRequestRequest305 bytes305 bytes

WebWebBrowserBrowser

Page 6: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

21

TCP Connection SignaturesRecording communication “patterns”

TIMETIME

Web ClientWeb Client

Web ServerWeb Server

HTTPHTTPRequestRequest305 bytes305 bytes

HTTPHTTPResponseResponse

2,876 bytes2,876 bytes

Communication pattern was (a1, b1)» E.g., (305 bytes, 2,876 bytes)

22

TCP Connection SignaturesThe a-b-t trace model

CallerCaller

CalleeCallee

aa11bytesbytes

bb11 bytes bytes

aa22bytesbytes

bb33bytesbytes

aa33bytesbytes

Epoch 1Epoch 1 Epoch 2Epoch 2 Epoch 3Epoch 3

tt11 seconds seconds tt22 seconds seconds

bb22bytesbytes

We model a TCP connection as a-b-t vector:((a1, b1, t1), (a2, b2, t2), …, (ae, be, ⊥))

where e is the number of epochs

23

SMTP (send email)

The a-b-t Trace ModelTypical Communication Patterns

Telnet (remote terminal)

FTP-DATA (file download)

24

Source-Level Trace ReplayTraffic generation in a laboratory testbed

Given a testbed or simulator, can we effectively simulateAbilene?

» Can we simulate “the Internet” in a lab or inside a modest computerusing a simple dumbbell topology?

» Can we get away from having to make arbitrary decisions about howwe generate synthetic traffic?

Cloud1Cloud1Browsers/Browsers/

ServersServersCloud2Cloud2

Browsers/Browsers/ServersServers

EthernetEthernetSwitchSwitch

EthernetEthernetSwitchSwitch

Cloud 1Cloud 1Edge RouterEdge Router

Cloud 2Cloud 2Edge RouterEdge Router

… …

Abilene traffic?Abilene traffic?

Page 7: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

25

Source-Level Trace ReplayTraffic generation in a laboratory testbed

Testbed:» 150+ end-systems, 10/100/1,000 Mbps connectivity,

dozens of switches and routers

Cloud1Cloud1Browsers/Browsers/

ServersServersCloud2Cloud2

Browsers/Browsers/ServersServers

EthernetEthernetSwitchSwitch

EthernetEthernetSwitchSwitch

Cloud 1Cloud 1Edge RouterEdge Router

Cloud 2Cloud 2Edge RouterEdge Router

… …

Abilene traffic?Abilene traffic?

Input trace: A 2-hour Abilene trace from theNLANR repository» 334 billion bytes, 404 million packets, 5 million TCP

connections26

Source-Level Trace ReplayTraffic generation in a laboratory testbed

Anonymized Packet Header Trace

Source-level Trace:Set of a-b-t Connection Vectors

TrafficGenerators

TrafficGenerators

Processing

WorkloadPartitioning

Synthetic Packet Header Trace

TESTBED

How doessynthetic trace

compare tooriginal trace?

27

Source-Level Trace ReplayTraffic generation in a simulator

Anonymized Packet Header Trace

Source-level Trace:Set of a-b-t Connection Vectors

SimulatedSources

SimulatedSources

Processing

Synthetic Packet Header Trace

SIMULATOR

How doessynthetic trace

compare tooriginal trace?

Translation

28

Validation of Generated TrafficQuestions

Can we reproduce source-level properties of the originaltraffic?

Can we reproduce interesting measures of the originaltrace?» Throughput per unit time» Number of active connections per unit time» Connection transmission rates» Long range dependence in packet and byte arrivals» …

Can we see interesting differences between UNC trafficand Abilene traffic?

Page 8: Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current ethics dictate that tracing beyond TCP header is inappropriate without users’ permission)

29

Synthetic Traffic GenerationSummary

Simulation is the backbone of networking research

Too little attention is paid to realistic traffic generation» How can we derive fundamental truths from today’s

simulation results?

Model traffic as patterns of data exchange patternswithin TCP connections

» Application-independent, network-independent

Development of new, flexible traffic generators» With tunable degrees of realism