traffic prediction on the internet

27
Traffic Prediction on the Internet Anne Denton

Upload: hallie

Post on 14-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Traffic Prediction on the Internet. Anne Denton. Outline. Paper by Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana Solutions Time-Series prediction Our work for the KDD-cup 03. Time Series Prediction on the Internet. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Traffic Prediction on the Internet

Traffic Prediction on the Internet

Anne Denton

Page 2: Traffic Prediction on the Internet

Outline

Paper by Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana

Solutions Time-Series prediction Our work for the KDD-cup 03

Page 3: Traffic Prediction on the Internet

Time Series Prediction on the Internet

By Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana

Adjustment to “hot spots” Avoiding degradation, even “denial of

service” Can “hot spots” be predicted? Can predicted “hot spots” be

avoided?

Page 4: Traffic Prediction on the Internet

What are “hot spots”? Exceptionally large numbers of requests Spontaneous, short lifetime “instant” ramp up in traffic

Only valid on long time scales Claim: time scale for increase larger than

time scale to react Why does increase take time?

Passing on the word How good does a predictor have to be?

Cost of missing a “hot spot” higher than aggregate cost of false alarms (similar to hurricane)

Page 5: Traffic Prediction on the Internet

Examples

Olympics (Nagano 98) Soccer World Cup (98) NASA (95)

Page 6: Traffic Prediction on the Internet

What to do about “hot spots”? <Detour> “The Columbia Hotspot Rescue

Service: A Research Plan”E. Coffman, P. Jelenkovic, J.Nieh, and D. Rubenstein

Approaches Deal ad hoc with high request Build a better network (expensive) Content delivery services

Caching Extra bandwidth

Suggested solution: use available and underutilized resources

Page 7: Traffic Prediction on the Internet

Hotspot Rescue Service

Server-based approach Requires additional resources from

server when necessary Resources provided by other members of

Hotspot Rescue Service Peer-to-Peer approach

Requires additional resources from client when necessary

Caching

Page 8: Traffic Prediction on the Internet

Four Phases Prediction (see rest of presentation)

Server-based: daemons P2P: plug-ins

Replication Server-based: replication of objects P2P: identified cached copies More advanced: redistribution of traffic load

Notification Modifications to DNS (Domain Name System) P2P system proactively announces hot objects and

indicates alternative locations? Termination<End of Detour>

Page 9: Traffic Prediction on the Internet

Tail of Distribution

Requests per 10-second time slot X-axis: number of hits per time slot Y-axis: probability that that number of

hits will be exceeded

Page 10: Traffic Prediction on the Internet

Time Scales Prediction relies on correlation

between values at different times Auto correlation function

Predictabilityon time scalesof 5-30 min

ttftf d)()(

Page 11: Traffic Prediction on the Internet

Prediction Algorithm

Standard problem Signal processing Econometrics

Internet traffic Particularly bursty

Simplest model Linear extrapolation

Page 12: Traffic Prediction on the Internet

Structure of Prediction Algorithms Traffic observation

# of requests in time unit (t-1,t] Usually 1s

Prediction window Duration Wp 0

Advance notice Prediction at time t:

Mapping of observations in [t-Wp,t] to a number pt 0 of requests predicted in interval [t+, t++1] that is units in the future

Page 13: Traffic Prediction on the Internet

Linear Prediction Linear Fit: Least squares linear fit

pt = ft(t+) with ft(s) = at s+bt

Minimizing Performance: O(W+T)

W: Window size T: uptime duration

Problems Prediction window size must match burstiness

parameters governing request flow

t

Wtiit

p

rif 2)(

Page 14: Traffic Prediction on the Internet

Results

Depends on properties of auto-correlation function

Page 15: Traffic Prediction on the Internet

Conclusions of Paper Build a load-based taxonomy of web

server traffic Depends on technological,

sociological, and psychological factors Look for quantification of basic

patterns reflecting behavior

Do we agree ??? Why cluster when we can classify!!

Page 16: Traffic Prediction on the Internet

Our Approach

Normally time series prediction uses only data in that time series

We use similarity to other instances E.g., other web sites

Model-free Weighted Nearest Neighbor approach Problem:

How integrate time?

Page 17: Traffic Prediction on the Internet

Typical Nearest Neighbor Classification / Regression R(A1, …, An, C)

Attributes Ai

C class label (classification) or continuous variable (regression)

Based on distance function on Ai

K nearest neighbors Neighbors within a range Use kernel function to weight closer ones

higher

Page 18: Traffic Prediction on the Internet

Weighting of Attributes

Some attributes are more important than others

Apply scaling to space Optimize weights through

Hill-climbing Genetic Algorithm

How does this generalize to a time-series?

Page 19: Traffic Prediction on the Internet

Our Answer

Identify “relevant” sections in the time series E.g. times with already high download

rates We’ll call each relevant section a

“prediction”

Page 20: Traffic Prediction on the Internet

Predictions

Each prediction contains information about The nature of the time series The time instance in question, i.e. the

history of requests The actual change in requests

Make a table of predictions Leads to a relation just as standard

classification / regression setting

Page 21: Traffic Prediction on the Internet

Data Set Paper citations in “e-print ArXive” Background: KDD-cup 03

Predict the change in citations in successive 3-month periods

Only consider periods with at least 6 citations Evaluation: L1 distance (Manhattan distance)

between predicted and real difference Very close match between citation history

and request history Predict change in requests Only consider periods that already show large

number of requests

Page 22: Traffic Prediction on the Internet

Attributes of a “Prediction” Quantitative attributes

Number of citations in window Gradient of citations in window Aggregate number of citations up to and through

window (assume finite time series) Attribute values given by time series

Keyword occurrences Author Number of revisions of papers Maximum time interval between revisions Country of origin Format

Page 23: Traffic Prediction on the Internet

Similarity Function

Common kernel-function

What worked better

2

210

10 2)(exp),( xxxxK

1010 1

1),(

xxwxxK

Page 24: Traffic Prediction on the Internet

Plot of Similarity Function

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

x

f(x)

Gaussian

1/(1+x)

Page 25: Traffic Prediction on the Internet

Accuracy No linear extrapolation data available

Could lead to negative citations Comparison

Default prediction: No change: 1851 Very simple model (decrease by 0.3 in 3

months): 1532 Prediction based on average of time series

(synchronized at first non-0): 1593 Prediction based on quantitative attributes: 1465 Full prediction (prelimiary): 1357 Weight optimized (very preliminary): reduction

1414 -> 1391

Page 26: Traffic Prediction on the Internet

Results

0

500

1000

1500

2000

2500

3000

1 2 3 4 5 6 7 8 9 10 11

Series1

Series2

Series3

Series4

Page 27: Traffic Prediction on the Internet

Conclusions

Method works well for citation prediction

Yet to be tested for hot-spot prediction