1 long-range dependence in a changing internet traffic mix statistical and applied mathematical...

Download 1 Long-Range Dependence in a Changing Internet Traffic Mix STATISTICAL and APPLIED MATHEMATICAL SCIENCES INSTITUTE Flix Hernndez-Campos Don Smith Department

If you can't read please download the document

Upload: whitney-oneal

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

3 Summary data 2002 Traces2002 Traces ~ 5 billion packets ~ 1.6 terabytes of network traffic ~ 95% TCP packets ~ 5% UDP packets ~ 93% TCP bytes ~ 7% UDP bytes 10% max 2-hr. mean link utilization 0.01%-0.16% packets dropped by monitor Two-hour traces, 2 nd week in April of 2002 and 2003Two-hour traces, 2 nd week in April of 2002 and 2003 –5:00 AM, 10:00 AM, 3:00 PM, 9:30 PM on each of 7 days –28 traces (56 hours) per year 2003 Traces2003 Traces ~ 10 billion packets ~ 2.9 terabytes of network traffic ~ 75% TCP packets ~ 25% UDP packets ~ 86% TCP bytes ~ 14% UDP bytes 18% max 2-hr. mean link utilization 0 packets dropped by monitor

TRANSCRIPT

1 Long-Range Dependence in a Changing Internet Traffic Mix STATISTICAL and APPLIED MATHEMATICAL SCIENCES INSTITUTE Flix Hernndez-Campos Don Smith Department of Computer Science, UNC-Chapel Hill J. S. Marron Department of Statistics and Operations Research, UNC-Chapel Hill Cheolwoo Park SAMSI David Rolls Department of Mathematics and Statistics, UNC-Wilmington 2 Measurements UNC Internet Capture TCP/IP packet headers on Gigabit Ethernet link (inbound from Internet) Monitor (tcpdump) 1 Gbps Ethernet ~35,000 Internet Users 3 Summary data 2002 Traces2002 Traces ~ 5 billion packets ~ 1.6 terabytes of network traffic ~ 95% TCP packets ~ 5% UDP packets ~ 93% TCP bytes ~ 7% UDP bytes 10% max 2-hr. mean link utilization 0.01%-0.16% packets dropped by monitor Two-hour traces, 2 nd week in April of 2002 and 2003Two-hour traces, 2 nd week in April of 2002 and 2003 5:00 AM, 10:00 AM, 3:00 PM, 9:30 PM on each of 7 days 28 traces (56 hours) per year 2003 Traces2003 Traces ~ 10 billion packets ~ 2.9 terabytes of network traffic ~ 75% TCP packets ~ 25% UDP packets ~ 86% TCP bytes ~ 14% UDP bytes 18% max 2-hr. mean link utilization 0 packets dropped by monitor 4 Hurst parameter (H) estimates and confidence intervals H estimated from wavelet analysis tools (logscale diagrams of D. Veitch)H estimated from wavelet analysis tools (logscale diagrams of D. Veitch) H estimates for 2003 packet counts were significantly lower than for 2002 (not true for byte counts).H estimates for 2003 packet counts were significantly lower than for 2002 (not true for byte counts). Several traces had H > 1 or very wide confidence intervals.Several traces had H > 1 or very wide confidence intervals. H estimates were independent of time of day or day of week (both packets and bytes) in both years.H estimates were independent of time of day or day of week (both packets and bytes) in both years. 5 H not related to link utilization or active TCP connections 6 Extreme examples of H > 1 or wide confidence intervals 7 Dependent SiZer analysis of wide CI example Test for statistically significant differences from FGN process with parameters estimated from data, H=0.8Test for statistically significant differences from FGN process with parameters estimated from data, H=0.8 Top: local linear smoothing of data with different window widthsTop: local linear smoothing of data with different window widths Bottom: statistical inference on trends of smoothed curve at each window widthBottom: statistical inference on trends of smoothed curve at each window width 8 Dependent SiZer analysis of H > 1 example Analysis shows both non-linear trends and greater variability than FGN process at many time scalesAnalysis shows both non-linear trends and greater variability than FGN process at many time scales 9 Logscale diagram of typical 2002 and 2003 traces Protocol dependent analysis suggested by increase in UDPProtocol dependent analysis suggested by increase in UDP Filtered traces to create new traces: TCP only and UDP onlyFiltered traces to create new traces: TCP only and UDP only TCP is dominant influence in all cases except 2003 packet counts where UDP dominates.TCP is dominant influence in all cases except 2003 packet counts where UDP dominates. Sharp increase at middle scales shapes H estimate (less slope so lower H).Sharp increase at middle scales shapes H estimate (less slope so lower H). 10 Same conclusion for all traces. Why? 11 The Blubster effect (2003s hot new peer-to-peer file sharing application) Recall that UDP packets increased to 25% of 2003 packets (but only 14% of bytes).Recall that UDP packets increased to 25% of 2003 packets (but only 14% of bytes). Analysis of UDP packets found 70% from application (Blubster) in 2003 that was negligible in 2002.Analysis of UDP packets found 70% from application (Blubster) in 2003 that was negligible in Second filtering: make Blubster-only and Rest (TCP + other UDP) traces.Second filtering: make Blubster-only and Rest (TCP + other UDP) traces. Blubster alone dominated H estimate for packets, not bytesBlubster alone dominated H estimate for packets, not bytes 12 Why? Blubsters packet traffic is periodic SiZer analysis of Blubster trace looking for structure beyond white noiseSiZer analysis of Blubster trace looking for structure beyond white noise Found high-frequency variability with periods in 1-5 second range (caused by update and search queries among peers)Found high-frequency variability with periods in 1-5 second range (caused by update and search queries among peers) These correspond to the time-scales in logscale diagram where UDP dominates the wavelet coefficients.These correspond to the time-scales in logscale diagram where UDP dominates the wavelet coefficients. 13 Results summary We presented results from a study of traffic on the UNC Internet link from two years, 2002 and 2003.We presented results from a study of traffic on the UNC Internet link from two years, 2002 and A single application generating about 18% of packets and < 10% of bytes in traces can strongly influence the H estimate (in this case, because of periodic behavior).A single application generating about 18% of packets and < 10% of bytes in traces can strongly influence the H estimate (in this case, because of periodic behavior). A significant number of traces produced H estimates >1 or wide confidence intervals.A significant number of traces produced H estimates >1 or wide confidence intervals. Dependent Sizer is an effective tool for augmenting wavelet analysis and understanding structure in Internet data.Dependent Sizer is an effective tool for augmenting wavelet analysis and understanding structure in Internet data. H was not related to time-of-day, day-of-week, link utilization, or number of active TCP connections.H was not related to time-of-day, day-of-week, link utilization, or number of active TCP connections.