1 an integrated approach to improving web performance lili qiu cornell university b-exam december,...
TRANSCRIPT
![Page 1: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/1.jpg)
1
An Integrated Approach to Improving Web Performance
Lili Qiu
Cornell University
B-examDecember, 2000
![Page 2: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/2.jpg)
2
Acknowledgement Robbert van Renesse, George Varghese,
Ken Birman, Zygmunt Haas, Eva Tardos Venkata N. Padmanabhan, Geoff Volker,
Yin Zhang, Sinivasan Keshav
![Page 3: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/3.jpg)
3
Outline Motivation & Open Issues Solutions
Study the workload of a busy Web server Properly provision the content distribution
networks Optimize TCP performance for Web
transfers Summary & Other Work
![Page 4: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/4.jpg)
4
Motivation Web is the most dominant traffic in the
Internet today Account for over 70% wide-area traffic
Web performance is often unsatisfactory WWW – World Wide Wait Consequence: losing potential
customers! Network congestio
nOverloadedWeb server
![Page 5: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/5.jpg)
5
Challenges in Providing Highly Efficient Web Services
Workload characterization The workload of busy Web
sites is not well understood
Infrastructure provisioning Current trend: Content
Distribution Networks Problem: Where to place
replicas? Protocol inefficiency
Mismatch between Web transfers and TCP protocol
WorkloadCharacterization
InfrastructureProvisioning
ProtocolInefficiency
![Page 6: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/6.jpg)
6
Our Solutions Web Workload Characterization
Study the workload of a busy Web server Provision Web infrastructure
Develop placement algorithms for content distribution networks (CDNs)
Improve protocol efficiency Optimize TCP startup performance for Web
transfers
![Page 7: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/7.jpg)
7
Part I Web Workload Characterization
The Content and Access Dynamics of a Busy Web Site: Findings and Implications. Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, August 2000. (Joint work with V. N. Padmanabhan)
![Page 8: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/8.jpg)
8
Motivation Solid understanding of Web workload is critical
for designing robust and scalable systems Each of the Web components provides a
unique perspective on the functioning of the Web
Internetreplica
proxy
replica
proxy
proxy
Clients Servers
![Page 9: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/9.jpg)
9
Motivation (Cont.) Distinguishing features of our work
Study MSNBC Web site a large news server consistently ranked among the busiest
sites in the Web Study content & access dynamics
The dynamics of file modification and creation
The dynamics of users access
![Page 10: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/10.jpg)
10
Overview MSNBC server site
a large news site server cluster with 40 nodes 25 million accesses a day (HTML content alone) Period studied: Aug. – Oct. 99 & Dec. 17, 98 flash crowd
Server logs HTTP access logs Content Replication System (CRS) logs HTML content logs
Data analysis Content dynamics Access dynamics
![Page 11: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/11.jpg)
11
Content Dynamics Period studied: 10/1/99 – 10/28/99 Predictive power of modification history
Modification history is a rough predictor of future modification interval
Extent of change upon file modification Most file modifications are minimal
delta encoding can be very useful
![Page 12: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/12.jpg)
12
Predictive Power of Modification History Has significant bearing on cache
consistency control algorithms, such as adaptive TTL
Prediction algorithm studied Estimate the future modification interval as
the mean of the past x samples Performance metrics
Correlation coefficient between the predicted and actual values
Error in prediction
![Page 13: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/13.jpg)
13
Correlation Coefficient
A larger averaging window size helps to predict the future modification interval up to a certain point.
00.10.20.30.40.50.60.70.8
0 5 10 15 20
Averaging window size (# samples)
Co
rre
lati
on
co
eff
icie
nt
![Page 14: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/14.jpg)
14
Error in Prediction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
50 250 450 650 850
Averaging window: 16 samples
Mean error: 226%
Median error: 45%
Percentage error in predicting file modification interval
Modification history yields a rough predictor need an alternative mechanism (e.g. call-back based invalidation) as
backup
![Page 15: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/15.jpg)
15
Extent of Change Upon File Modifications
Compute delta using vdelta algorithm
Metric as |vdelta(v1,v2)|
|v1|+|v2| 2 Results
In 77% cases, 1% In 96% cases,
10%
Modification between successive versions is small
Delta encoding can be very useful
![Page 16: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/16.jpg)
16
Access Dynamics Spatial locality in client accesses
Domain membership is significant except when there is a “hot” event of global interest
Temporal stability of file popularity The set of popular documents mostly
remains stable over a timescale of days
Distribution of file popularity Zipf-like distribution but with a much larger
than at proxies
![Page 17: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/17.jpg)
17
Temporal Stability of File Popularity
Methodology Consider the traces from
a pair of days Pick the top n popular
documents from each day Compute the overlap
Results One day apart:significant
overlap (80%) Two months apart:
smaller overlap (20-80%) Ten months apart: very
small overlap (mostly below 20%)
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000
# popular documents picked
Exte
nt o
f ove
rlap
17DEC98 - 18OCT99 01AUG99 - 18OCT99 17OCT99 - 18OCT99
The set of popular documents remains stable for days
![Page 18: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/18.jpg)
18
Spatial Locality inClient Accesses
Normal Day
0
0.2
0.4
0.6
0.8
1
0 10000 20000 30000 40000 50000
Domain ID
Frac
tion
of re
ques
ts s
hare
d
Domain membership is significant except when there is a “hot” event of global interest
Dec. 17, 1998
0
0.2
0.4
0.6
0.8
1
1.2
0 5000 10000 15000 20000 25000 30000 35000
Domain IDFr
actio
n of
requ
ests
sha
red
Trace
Random
![Page 19: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/19.jpg)
19
Spatial Distribution of Client Accesses
Cluster clients using network aware clustering [KW00]
IP addresses with the same address prefix belongs to a cluster
Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively
A small number of client clusters contribute most of the requests.
![Page 20: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/20.jpg)
20
The Applicability of Zipf-law to Web requests
The Web requests follow Zipf-like distribution Request frequency 1/i, where i is a document’s ranking
The value of is much larger in MSNBC traces 1.4 – 1.8 in MSNBC traces smaller or close to 1 in the proxy traces close to 1 in the small departmental server logs [ABC+96] Highest when there is a hot event
0
0.5
1
1.5
2
MSNBC Proxies Less popular servers
![Page 21: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/21.jpg)
21
Impact of larger Accesses in MSNBC traces
are much more concentrated90% of the accesses are accounted by
Top 2-4% files in MSNBC traces
Top 36% files in proxy traces (Microsoft proxies and the proxies studied in [BCF+99])
Top 10% files in small departmental server logs reported in [AW96]
Popular news sites like MSNBC see much more concentrated accesses Reverse caching and replication can be very
effective!
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5
Percentage of Documents (sorted by popularity)
Pe
rce
nta
ge
of R
eq
uest
s
12/17/98 Server Traces 08/01/99 Server Traces10/06/99 Proxy Traces
![Page 22: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/22.jpg)
22
Summary of Results & Implications
Facts Implications
Past modification history, when averaged over a sufficiently large window, yields a rough predictor
Guide for setting TTL, but need an alternative mechanism (e.g. callback-based invalidation) as backup
Modification between successive versions is small
Delta encoding can be very useful
![Page 23: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/23.jpg)
23
Summary of Results & Implications (Cont.)
Facts Implications
The set of popular documents remains stable over a timescale of days
Prefetch/push previously popular files that have undergone modification
File popularity follows Zipf-like distribution, but with a much larger than at proxies
Potential of reverse caching & replication
![Page 24: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/24.jpg)
24
Part II Provision Content Distribution Networks (CDNs)
On the Placement of Web Server Replicas. To appear in INFOCOM'2001. (Joint work with V. N. Padmanabhan and G. M. Voelker)
![Page 25: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/25.jpg)
25
Introduction to CDNs Content providers want to offer
better service to their clients at lower cost
Increasing deployment of content distribution networks (CDNs)
Akamai, Digital Island, Exodus … Idea: a network of servers Features:
Outsourcing infrastructure Improve performance by moving
content closer to end users Flash crowd protection
CDNserver
server
ClientsContent
Providers
server
server
server
![Page 26: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/26.jpg)
26
Placement of CDN servers Goal
minimize users’ latency or bandwidth usage
Minimum K-median problem
Select K centers to minimize the sum of assignment costs
Cost can be latency or bandwidth or other metric we want to optimize
NP-hard problem
CDNserver
server
server
server
server
ClientsContent
Providers
![Page 27: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/27.jpg)
27
Placement Algorithms Tree based algorithm [LGG+99]
Assume the underlying topologies are trees, and model it as a dynamic programming problem
O(N3M2) for choosing M replicas among N potential places
Random Pick the best among several random
assignments Hot spot
Place replicas near the clients that generate the largest load
![Page 28: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/28.jpg)
28
Placement Algorithms (Cont.)
Greedy algorithmGreedy(N,M) { for I = 1 .. M { for each remaining replica R {
cost[R] = cost after placing an additional replica at R
} select the replica with the lowest cost }}
Super Optimal algorithm Lagrangian relaxation + subgradient method
![Page 29: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/29.jpg)
29
Simulation Methodology Network topology
Randomly generated topologies Using GT-ITM Internet topology generator
Real Internet network topology AS level topology obtained using BGP routing data from
a set of seven geographically dispersed BGP peers Web Workload
Real server traces MSNBC, ClarkNet, NASA Kennedy Space Center
Performance Metric Relative performance: costpractical/costsuper-optimal
![Page 30: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/30.jpg)
30
Simulation Results inRandom Tree Topologies
![Page 31: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/31.jpg)
31
Simulation Results inRandom Graph Topologies
![Page 32: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/32.jpg)
32
Simulation Results inReal Internet Topologies
![Page 33: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/33.jpg)
33
Effects of Imperfect Knowledge about Input Data
Predict load using moving window average
(a) Perfect knowledge about topology
(b) Knowledge about Topology with a factor of 2
accurate
![Page 34: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/34.jpg)
34
Summary First experimental study on placement of CDNs Knowledge about client workload and topology is
crucial for provisioning CDNs The greedy algorithm performs the best
Within a factor of 1.1 – 1.5 of super-optimal The greedy algorithm is insensitive to noise
Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4
The hot spot algorithm performs nearly as well Within a factor of 1.6 – 2 of super-optimal
How to obtain inputs Moving window average for load prediction Using BGP router data to obtain topology information
![Page 35: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/35.jpg)
35
Part III Transport Layer Optimization for the Web Speeding Up Short Data Transfers: Theory,
Architectural Support, and Simulation Results. Proceedings of NOSSDAV 2000 (Joint work with Yin Zhang and Srinivasan Keshav)
![Page 36: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/36.jpg)
36
Motivation Characteristics of Web data transfers
Short & bursty [Mah97] Use TCP
Problem: Short data transfers interact poorly with TCP !
![Page 37: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/37.jpg)
37
TCP/Reno Basics
Slow Start Exponential growth in
congestion window, Slow: log(n) round
trips for n segments Congestion
Avoidance Linear probing of BW
Fast Retransmission Triggered by 3
Duplicated ACK’s
![Page 38: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/38.jpg)
38
Related Work P-HTTP [PM94]
Reuses a single TCP connection for multiple Web transfers, but still pays slow start penalty
T/TCP [Bra94] Cache connection count, RTT
TCP Control Block Interdependence [Tou97]: Cache cwnd, but large bursts cause losses
Rate Based Pacing [VH97] 4K Initial Window [AFP98] Fast Start [PK98, Pad98]
Need router support to ensure TCP friendliness
![Page 39: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/39.jpg)
39
Our Approach Directly enter Congestion Avoidance Choose optimal initial congestion window
A Geometry Problem: Fitting a block to the service rate curve to minimize completion time
![Page 40: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/40.jpg)
40
Optimal Initial cwnd Minimize completion time by having the
transfer end at an epoch boundary.
![Page 41: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/41.jpg)
41
Shift Optimization Minimize initial cwnd while keeping the
same integer number of RTT’s
Before optimization:cwnd = 9
After optimization:cwnd = 5
![Page 42: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/42.jpg)
42
Effect of Shift Optimization
![Page 43: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/43.jpg)
43
TCP/SPAND Estimate network state by sharing performance
information SPAND: Shared PAssive Network Discovery [SSK97]
Directly enter Congestion Avoidance, starting with the optimal initial cwnd
Avoid large bursts by pacing
Internet
Web Servers
PerformanceServer
![Page 44: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/44.jpg)
44
Implementation Issues Scope for sharing and aggregation
24-bit heuristic network-aware clustering [KW00]
Collecting performance information Performance reports, New TCP option, Windmill’s
approach, … Information aggregation
Sliding window average Retrieving estimation of network state
Explicit query, active push, … Pacing
Leaky bucket based pacing
![Page 45: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/45.jpg)
45
Opportunity for Sharing MSNBC: 90% requests arrive within 5 minutes
since the most recent request from the same client network (using 24-bit heuristic)
![Page 46: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/46.jpg)
46
Cost for Sharing MSNBC: 15,000-25,000 different client
networks in a 5-minute interval during peak hours (using 24-bit heuristic)
![Page 47: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/47.jpg)
47
Simulation Results Methodology
Download files in rounds Performance Metric
Average completion time TCP flavors considered
reno-ssr: Reno with slow start restart reno-nssr: Reno w/o slow start restart newreno-ssr: NewReno with slow start restart newreno-nssr: NewReno w/o slow start restart
![Page 48: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/48.jpg)
48
Simulation Topologies
![Page 49: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/49.jpg)
49
T1 Terrestrial WAN Link withSingle Bottleneck
![Page 50: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/50.jpg)
50
T1 Terrestrial WAN Link withMultiple Bottlenecks
![Page 51: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/51.jpg)
51
T1 Terrestrial WAN Link with Multiple Bottlenecks and Heavy Congestion
![Page 52: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/52.jpg)
52
TCP Friendliness (I)Against reno-ssr with 50-ms Timer
![Page 53: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/53.jpg)
53
TCP Friendliness (II)Against reno-ssr with 200-ms Timer
![Page 54: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/54.jpg)
54
Summary TCP/SPAND significantly reduces latency
for short data transfers 35-65% compared to reno-ssr / newreno-ssr 20-50% compared to reno-nssr / newreno-
nssr Even higher for fatter pipes
TCP/SPAND is TCP-friendly TCP/SPAND is incrementally deployable
Server-side modification only No modification at client-side
![Page 55: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/55.jpg)
55
Contributions Workload characterization
Study the workload of MSNBC web site
Infrastructure provisioning
Develop placement algorithms for Content Distribution Networks
Protocol efficiency Optimize TCP startup
performance for Web transfers
Workloadcharacterization
InfrastructureProvisioning
ProtocolInefficiency
![Page 56: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/56.jpg)
56
Other Work Available at
http://www.cs.cornell.edu/lqiu/papers/papers.html Fast Firewall Implementations for Software and
Hardware-based Routers. Submitted to ACM SIGMETRICS’2001.
Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms on the Internet. Proceedings of IEEE INFOCOM'2000, Tel-Aviv, Israel, March 2000.
On Individual and Aggregate TCP Performance. 7th International Conference on Network Protocols (ICNP'99), Toronto, Canada, October 1999.
![Page 57: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/57.jpg)
57
Contributions Study the workload of a busy Web
server Develop placement algorithms for
Content Distribution Networks Optimize TCP startup performance for
short Web transfers
![Page 58: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/58.jpg)
58
Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms
Internet telephony are subject to Variable loss rate Variable delay
Previous work has addressed the two problems separately Use FEC for loss recovery Use playout buffer adaptation for
delay jitter compensation
![Page 59: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/59.jpg)
59
Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms (Cont.)
Our work Demonstrate the interaction between
playout algorithm and FEC Playout algorithm should depend on both FEC and
network loss conditions and network jitter Propose several playout algorithms that
provide this coupling Demonstrate the effectiveness of the
algorithms through simulations
![Page 60: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/60.jpg)
60
On Individual and Aggregate TCP Performance Motivation
TCP behavior under many competing TCP connections has not been sufficiently explored
Our work Use extensive simulations to
investigate the individual and aggregate TCP performance for many concurrent connections
![Page 61: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/61.jpg)
61
On Individual and Aggregate TCP Performance (Cont.) Major findings
All connections have the same rtt Wc > 3*Conn global synchronization Conn < Wc < 3*Conn local synchronization Wc < Conn shut off connections
Adding random processing time synchronization and consistent discrimination less pronounced
Derive the general characterization of overall throughput, goodput, and loss probability
Quantify the roundtrip bias for connections with different RTT
![Page 62: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/62.jpg)
62
Understanding the End-to-End Performance Impact of RED in a Heterogeneous Environment
Motivation IETF recommends wide spread
deployment of RED in routers Most previous work studies RED in
relatively homogeneous environment Our work
Investigate the interaction of RED with five types of heterogeneity
![Page 63: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/63.jpg)
63
Understanding the End-to-End Performance Impact of RED in a Heterogeneous Environment (Cont.) Major findings
Mix of short and long TCP connections Short TCP connections get higher goodput with RED than with
Drop Tail Mix of TCP and UDP
Bursty UDP tends to get lower loss rate with RED than with Drop Tail
Mix of ECN and non-ECN capable traffic ECN-capable TCP connections get higher goodput than non-ECN-
capable TCP connections Effect of different RTT
RED reduces the bias against long-RTT bulk transfers Effect of two-way traffic
When ACK path is congested, TCP gets higher goodput with RED than with Drop Tail
![Page 64: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/64.jpg)
64
Effects of Imperfect Knowledge about Input Data
![Page 65: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/65.jpg)
65
Effects of Imperfect Knowledge about Input Data (Cont.)
The effect of imperfect topology information
Randomly remove from 0 up to 50% edges in the AS topology derived from the BGP routing tables
The greedy algorithm is insensitive to edge removal
Performs within 2.6 of optimal when the edge removal is 50%
![Page 66: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/66.jpg)
66
Why is the Web so slow? Application layer
Web servers are overloaded … Transport layer
Web transfers are short and busty, and interact poorly with TCP
Network layer Routers are not fast enough Network congestion Route flaps and routing instabilities
…Inefficiency in any layer of the
protocol stack can slow down the Web!
![Page 67: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/67.jpg)
67
Challenges in Providing Highly Efficient Web Services Workload characterization
The workload of busy Web sites is not well understood
Infrastructure provisioning Current trend: Building efficient Web services
through replication (Content Distribution Networks)
Problem: Where to place replicas? Protocol inefficiency
Mismatch between Web transfers and TCP protocol
![Page 68: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/68.jpg)
68
Introduction Solid understanding of Web workload is critical
for designing robust and scalable systems The workload of popular Web servers is not
well understood Study the content and access dynamics of
MSNBC web site a large news server one of the busiest sites in the Web 25 million accesses a day (HTML content alone) Period studied: Aug. – Oct. 99 & Dec. 17, 98 flash
crowd
![Page 69: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/69.jpg)
69
Content Dynamics Period studied: 10/1/99 – 10/28/99 CDF of modification intervals
Distinct knees in the CDF at one hour and one day
Predictive power of modification history Modification history is a rough predictor of
future modification interval Extent of change upon file modification
Most file modifications are minimal delta encoding can be very useful
![Page 70: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/70.jpg)
70
CDF of Modification Intervals
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
Modification interval (seconds)
Distinct knees in the CDF at one hour and one day
![Page 71: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/71.jpg)
71
Impact of Age on Popularity
For most documents, accesses are concentrated soon after creation
020406080
100120140160180200
0 100000 200000 300000 400000 500000
Time elapsed since creation (seconds)
Docum
ent ID
(sort
ed in
decre
asin
g o
rder
of
popula
rity
)
![Page 72: 1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f0e5503460f94c22b01/html5/thumbnails/72.jpg)
72
Causes of First-time MissesUp to 40% of cache misses are due to firsttime misses [VDA+99]
Date New files (%) Old files (%)
Oct. 8, 99 23.16 76.84
Oct. 9, 99 13.22 86.78
Oct. 10,99 13.25 86.75
Oct. 11,99 18.75 81.28
Accesses to old documents account for most first-time misses hard to anticipate such accesses & eliminate first-time misses