18 november 2015 1 root cause analysis of tcp throughput: methodology, techniques, and applications...

42
1 July 3, 2022 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut Eurecom Sophia Antipolis, France

Upload: linette-patterson

Post on 05-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

1 April 20, 2023

Root Cause Analysis of TCP Throughput:

Methodology, Techniques, and Applications

Matti Siekkinen

Ph.D. DefenseOctober 30, 2006

Institut EurecomSophia Antipolis, France

Page 2: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

2 April 20, 2023

Outline Introduction and Motivation

Root cause analysis of TCP throughput: what and why?

Part 1: Methodology InTraBase: Integrated Traffic Analysis Based on Object

Relational DBMS

Part 2: Root cause analysis techniques Taxonomy of TCP rate limitation causes Our approach to infer limitation causes

Part 3: Case study on Performance Analysis of ADSL Clients

Conclusions Contributions Future work

Page 3: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

3 April 20, 2023

The Internet: over the last 5 years…

Traffic volumes and number of users have skyrocketed

Access link capacities have multiplied Dominance shifted from Web+FTP into Peer-to-peer

applications TCP still the dominating transport protocol

Carries over 90% of traffic

Page 4: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

4 April 20, 2023

The Internet: questions raised

ISPs would like to know how clients are doing What are the performance limitations that Internet

applications are facing? Why does a client with 4Mbit/s ADSL access obtain

only total download rate of few KB/s with eDonkey? Why, after upgrading my link, I see no improvement

in throughput? Internet does not provide directly answers

The network is dumb!

Need techniques for traffic measurement and analysis

Page 5: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

5 April 20, 2023

Root Cause Analysis of TCP Throughput

What? Analysis and inference of the reasons that prevent a given

TCP connection from achieving a higher throughput. Reasons are called limitation causes

Why TCP? TCP typically over 90% of all traffic

Page 6: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

6 April 20, 2023

Background

TCP Rate Analysis Tool (T-RAT) by Zhang et al. (sigcomm 2002) Pioneering research work

Ground breaking insights It is not all congestion! Opened up many questions

We implemented and tested it Results are way off too often Fundamental assumptions do not hold

T-RAT analyzes unidirectional traffic Passively collected measurements Usable in more cases (asymmetric paths) The source of the problems

Page 7: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

7 April 20, 2023

Our approach

We analyze only passive traffic measurements Capture and store all TCP/IP headers, analyze later off-line

Observe traffic at a single measurement point Applicable in diverse situations E.g. at the edge of an ISP’s network

Know all about clients’ downloads and uploads

Bidirectional packet traces

Connection level analysis

Page 8: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

8 April 20, 2023

Single measurement point anywhere along the path Cannot/don’t want to control it Complicates estimation of parameters (RTT and cwnd)

Challenges (1/3)

A: RTT ~ d1 piece of cake…B: RTT ~ d3+d4

How to get d4? (Did ack2 trigger data2?)

ack2

A B

Page 9: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

9 April 20, 2023

Challenges (2/3)

A lot of data to analyze Potentially millions of connections per trace

Deep analysis For each connection of each trace

Compute a lot of metrics Divide connections into pieces

• Analyse separately and compute more metrics Need to keep track of everything

Page 10: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

10 April 20, 2023

Challenges (3/3)

Find the right metrics to characterize all limitations Not too many Need to gather a lot of experience

Get it right! Several methods for computing a particular metrics

Choose the “best” for the situation Try to maximize correctness of results E.g. 5 ways to estimate RTTs

Careful validations Benchmark with a lot of reference traces Cross validate metrics

Page 11: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

11 April 20, 2023

Outline Introduction and Motivation

Root cause analysis of TCP throughput: what and why?

Part 1: Methodology InTraBase: Integrated Traffic Analysis Based on

Object Relational DBMS

Part 2: Root cause analysis techniques Taxonomy of TCP rate limitation causes Our approach to infer limitation causes

Part 3: Case study on Performance Analysis of ADSL Clients

Conclusions Contributions Future work

Page 12: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

12 April 20, 2023

Why did we need InTraBase?

First try: ad-hoc scripts and specialized software tools (tcptrace et al.)

Problems:1. Management

• Data, metadata, and tools• Got lost with files containing data and

ad-hoc scripts• Lot of metrics to compute and combine

2. Cumbersome analysis process• Iterative analysis• Data loses semantics and structure

3. Scalability• Cannot analyze large enough data sets

Filter

Process

Combine

Store

Interpret

Page 13: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

13 April 20, 2023

Our InTraBase approach

Application logsWeb100 Raw base data

files

Network link

Base data

Results

Queries

Meta data

Database SystemApplication

TCP

IP

Preprocess

tcpdumpFunctions

Store traffic measurements in files as base data

Upload base data into the db and process it within the db Issue SQL queries Object-relational DBMS create functions for advanced

processing

Page 14: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

14 April 20, 2023

Benefits from a DBMS-based Approach

Organize and manage data, related metadata, analysis results and tools

Data becomes structured and has semantics Processing and updating data is easier

Tools “understand” the data higher-level programming

Searching is more efficient (indexes) Store reusable intermediate results It is easier to combine different data sources

E.g. across OSI layers

Page 15: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

15 April 20, 2023

SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2;

SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2;

SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2;

SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2;

Histogram of the packet inter-arrival times of the fastest connection

0

10

20

30

40

50

60

70

80

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

connections

bytespackets

tput…

connection id

packets

timestampstart #seqend #seq

flags…

connection id

iat(…) plot_ts_hist()

histogram.pdf

Page 16: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

16 April 20, 2023

Outline Introduction and Motivation

Root cause analysis of TCP throughput: what and why?

Part 1: Methodology InTraBase: Integrated Traffic Analysis Based on Object

Relational DBMS

Part 2: Root cause analysis techniques Taxonomy of TCP rate limitation causes Our approach to infer limitation causes

Part 3: Case study on Performance Analysis of ADSL Clients

Conclusions Contributions Future work

Page 17: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

17 April 20, 2023

Scope

Study long lived TCP connections Short connections are another topic

Dominated by slow start?

Assume FIFO scheduling Necessary for link capacity estimations with packet

dispersion techniques Reasonable assumption for most traffic May not hold for cable modem and 802.11 access networks

Page 18: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

18 April 20, 2023

Limitation Causes for TCP Throughput

Application

Transport layer TCP receiver

Receiver window limitation TCP protocol

Slow start…

Network layer Bottleneck link

Page 19: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

19 April 20, 2023

Application that sends larger bursts separated by idle periods BitTorrent, HTTP/1.1 (persistent)

only keep-alive messages

transfer periods

Page 20: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

20 April 20, 2023

Limitation Causes: Application

The application does not even attempt to use all network resources

TCP connections are partitioned into two periods: Bulk Transfer Period (BTP): application provides

constantly data to transfer Never run out of data in buffer B1

Application Limited Period (ALP): opposite of BTP TCP has to wait for data because B1 is empty

Application Application

TCP TCPNetwork

Sender Receiver

buffersB1

Page 21: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

21 April 20, 2023

Limitation Causes: TCP Receiver

Receiver advertized window limits the rate max amount of outstanding bytes =

min(cwnd,rwnd) Sender is idle waiting for ACKs to arrive

Flow control Sender application overflows receiving application Buffer B2 is full

Configuration problem (unintentional) default receiver advertized window is set too low window scaling is not enabled

Application Application

TCP TCPNetwork

Sender Receiver

buffersB2

Page 22: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

22 April 20, 2023

Limitation Causes: Network

Limitation is due to congestion at a bottleneck link Shared bottleneck: obtain only a fraction of its capacity Non-shared bottleneck: obtain all of its capacity

Page 23: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

23 April 20, 2023

Our Approach to Root Cause Analysis

Divide & Conquer1. Partition connections into BTPs and ALPs

Filter out application impact

2. Analyze the bulk transfer periods for limitation by TCP receiver TCP protocol Network

Methods are based on metrics computed from packet headers

Page 24: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

24 April 20, 2023

Why filter out application effect? Many TCP/IP –level traffic studies do not account for application

effect RTTs, burstiness… Try to study network properties but end up measuring application

effect instead!

Page 25: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

25 April 20, 2023

Distinguishing BTPs from ALPs:Isolate & Merge algorithm

1. phase: Isolate Fact: TCP always tries to send MSS size packets Consequence: small packets (size < MSS) and idle

time indicate application limitation Buffer between application and TCP is empty

TimeIdle time > RTT

MSS packet

packet smaller than MSS

ALP

ALP

large fraction of small packets

Page 26: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

26 April 20, 2023

Distinguishing BTPs from ALPs:Isolate & Merge algorithm

2. phase: Merge Why?

After Isolate, BTPs may be separated by very short ALPs Analyze impact of the application

• How much ALPs decrease overall throughput?

How? Merge subsequent transfer periods separated by ALP to create a

new BTP Mergers controlled with drop parameter Iterate until all possible mergers are performed

Page 27: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

27 April 20, 2023

BTP Analysis

1. Compute limitation scores for each BTP 4 quantitative scores

[0,1] We use retransmission rates, inter-arrival time

patterns, path capacity, RTT etc.

2. Perform classification of BTPs into limitation causes Map (combination of) limitation scores into a cause Threshold-based scheme

Page 28: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

28 April 20, 2023

Classification scheme

4 thresholds need to be set

b-score

Dispersion score

Retransmissionscore

Receiver windowlimitation score

Page 29: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

29 April 20, 2023

Classification: calibrating the thresholds

Difficult task: Diversity vs. Control Reference data needs to be representative & diverse enough

No simulations Need to control experiments in some way to get what we

want

Reference data with partially controlled experiments Try to generate transfers limited by certain cause FTP downloads from Fedora Core mirror sites

232 sites covering all continents Artificial bottleneck links with rshaper

network limitation Nistnet to add delay

receiver limitation (Wr/RTT < bw)

Control the number of simultaneous downloads unshared vs. shared bottleneck

InternetInternet

AustraliaJapan

FinlandUSA

EurecomRshaperNistnet

Page 30: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

30 April 20, 2023

Classification: calibrating the thresholdsexample

bottleneck set at 1 Mbit/s, 1 download at a time

set th1 here

Page 31: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

31 April 20, 2023

Outline Introduction and Motivation

Root cause analysis of TCP throughput: what and why?

Part 1: Methodology InTraBase: Integrated Traffic Analysis Based on Object

Relational DBMS

Part 2: Root cause analysis techniques Taxonomy of TCP rate limitation causes Our approach to infer limitation causes

Part 3: Case study on Performance Analysis of ADSL Clients

Conclusions Contributions Future work

Page 32: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

32 April 20, 2023

Motivation

Stress test for our techniques Do we learn useful things?

Knowing throughput limitations (=performance) is useful ISPs want satisfied clients Need to know what’s going on before things can be improved

Installed InTraBase at France Telecom to study traffic at their ADSL access network Root cause analysis techniques implemented within InTraBase

Page 33: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

33 April 20, 2023

Measurement Setup

24 hours of traffic on March 10, 2006

290 GB of TCP traffic 64% downstream, 36% upstream

Observed packets from ~3000 clients, analyze only 1335 Excluded clients did not generate enough traffic for RCA

Two pcap probes here

Internetcollectnetwork

accessnetwork

Page 34: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

34 April 20, 2023

Connections Size distribution highly skewed Use only 1% of them for RCA

Represent > 85% of all traffic

Clients Heavy-hitters: 15% of clients generate 85-90% of traffic (up &

down) Low access link utilization

Why?

Warming up…

Page 35: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

35 April 20, 2023

Results of Limitation Analysis

Striking result Application limits performance of over 80% of clients What’s going on?

Page 36: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

36 April 20, 2023

Application analysis:Application limited traffic

Quite stable and symmetric volumes Over 80% of all traffic

eDonkey and “other” dominateP2P

other

eDonkey

Page 37: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

37 April 20, 2023

Application analysis:Saturated access link

No recognized P2P Asymmetric port 80/8080 downstream

Real Web traffic?

Page 38: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

38 April 20, 2023

Connecting the evidence…

Most clients’ performance limited by applications Very low link utilizations for application limited traffic Most of application limited traffic seems to be P2P

Peers often have asymmetric uplink and downlink capacities P2P applications/users enforce upload rate limits

Most clients’ download performance seems to suffer from P2P clients drastically limiting their upload rates

Internet

Internet

Low utilization Low capacity+rate limiter

downloading client

uploadingclients

Page 39: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

39 April 20, 2023

Outline Introduction and Motivation

Root cause analysis of TCP throughput: what and why?

Part 1: Methodology InTraBase: Integrated Traffic Analysis Based on Object

Relational DBMS

Part 2: Root cause analysis techniques Taxonomy of TCP rate limitation causes Our approach to infer limitation causes

Part 3: Case study on Performance Analysis of ADSL Clients

Conclusions Contributions Future work

Page 40: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

40 April 20, 2023

ConclusionsClaims and contributions

Part 1

Part 2

Part 2

Part 3

1. DBMSs provide powerful infrastructure for analysis of passive traffic measurements Performance is good.

2. We can infer root causes for TCP throughput using bidirectional packet traces at single measurement point located anywhere on the

TCP/IP path.

3. Today’s Internet applications interact in diverse ways with TCP Bias/error in TCP/IP path analysis Filter out their effects first

4. TCP root cause analysis techniques with DBMS-based analysis enable: performance evaluation of applications, evaluation of network utilization, and identification of TCP configuration problems.

Page 41: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

41 April 20, 2023

The case is not yet closed…

Short connections Challenge previous “old” results with RCA What about persistent connections?

Wireless traffic Non-FIFO scheduling Link-layer issues

Extended case study on ADSL clients We saw a day, what about a week? Trends, consistency

Page 42: 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut

42 April 20, 2023

Thank you!

Questions?