traffic prediction on the internet

Traffic Prediction on the Internet

Anne Denton

Outline

Paper by Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana

Solutions Time-Series prediction Our work for the KDD-cup 03

Time Series Prediction on the Internet

By Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana

Adjustment to “hot spots” Avoiding degradation, even “denial of

service” Can “hot spots” be predicted? Can predicted “hot spots” be

avoided?

What are “hot spots”? Exceptionally large numbers of requests Spontaneous, short lifetime “instant” ramp up in traffic

Only valid on long time scales Claim: time scale for increase larger than

time scale to react Why does increase take time?

Passing on the word How good does a predictor have to be?

Cost of missing a “hot spot” higher than aggregate cost of false alarms (similar to hurricane)

Examples

Olympics (Nagano 98) Soccer World Cup (98) NASA (95)

What to do about “hot spots”? <Detour> “The Columbia Hotspot Rescue

Service: A Research Plan”E. Coffman, P. Jelenkovic, J.Nieh, and D. Rubenstein

Approaches Deal ad hoc with high request Build a better network (expensive) Content delivery services

Caching Extra bandwidth

Suggested solution: use available and underutilized resources

Hotspot Rescue Service

Server-based approach Requires additional resources from

server when necessary Resources provided by other members of

Hotspot Rescue Service Peer-to-Peer approach

Requires additional resources from client when necessary

Caching

Four Phases Prediction (see rest of presentation)

Server-based: daemons P2P: plug-ins

Replication Server-based: replication of objects P2P: identified cached copies More advanced: redistribution of traffic load

Notification Modifications to DNS (Domain Name System) P2P system proactively announces hot objects and

indicates alternative locations? Termination<End of Detour>

Tail of Distribution

Requests per 10-second time slot X-axis: number of hits per time slot Y-axis: probability that that number of

hits will be exceeded

Time Scales Prediction relies on correlation

between values at different times Auto correlation function

Predictabilityon time scalesof 5-30 min

ttftf d)()(

Prediction Algorithm

Standard problem Signal processing Econometrics

Internet traffic Particularly bursty

Simplest model Linear extrapolation

Structure of Prediction Algorithms Traffic observation

# of requests in time unit (t-1,t] Usually 1s

Prediction window Duration Wp 0

Advance notice Prediction at time t:

Mapping of observations in [t-Wp,t] to a number pt 0 of requests predicted in interval [t+, t++1] that is units in the future

Linear Prediction Linear Fit: Least squares linear fit

pt = ft(t+) with ft(s) = at s+bt

Minimizing Performance: O(W+T)

W: Window size T: uptime duration

Problems Prediction window size must match burstiness

parameters governing request flow

t

Wtiit

p

rif 2)(

Results

Depends on properties of auto-correlation function

Conclusions of Paper Build a load-based taxonomy of web

server traffic Depends on technological,

sociological, and psychological factors Look for quantification of basic

patterns reflecting behavior

Do we agree ??? Why cluster when we can classify!!

Our Approach

Normally time series prediction uses only data in that time series

We use similarity to other instances E.g., other web sites

Model-free Weighted Nearest Neighbor approach Problem:

How integrate time?

Typical Nearest Neighbor Classification / Regression R(A1, …, An, C)

Attributes Ai

C class label (classification) or continuous variable (regression)

Based on distance function on Ai

K nearest neighbors Neighbors within a range Use kernel function to weight closer ones

higher

Weighting of Attributes

Some attributes are more important than others

Apply scaling to space Optimize weights through

Hill-climbing Genetic Algorithm

How does this generalize to a time-series?

Our Answer

Identify “relevant” sections in the time series E.g. times with already high download

rates We’ll call each relevant section a

“prediction”

Predictions

Each prediction contains information about The nature of the time series The time instance in question, i.e. the

history of requests The actual change in requests

Make a table of predictions Leads to a relation just as standard

classification / regression setting

Data Set Paper citations in “e-print ArXive” Background: KDD-cup 03

Predict the change in citations in successive 3-month periods

Only consider periods with at least 6 citations Evaluation: L1 distance (Manhattan distance)

between predicted and real difference Very close match between citation history

and request history Predict change in requests Only consider periods that already show large

number of requests

Attributes of a “Prediction” Quantitative attributes

Number of citations in window Gradient of citations in window Aggregate number of citations up to and through

window (assume finite time series) Attribute values given by time series

Keyword occurrences Author Number of revisions of papers Maximum time interval between revisions Country of origin Format

Similarity Function

Common kernel-function

What worked better

2

210

10 2)(exp),( xxxxK

1010 1

1),(

xxwxxK

Plot of Similarity Function

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

x

f(x)

Gaussian

1/(1+x)

Accuracy No linear extrapolation data available

Could lead to negative citations Comparison

Default prediction: No change: 1851 Very simple model (decrease by 0.3 in 3

months): 1532 Prediction based on average of time series

(synchronized at first non-0): 1593 Prediction based on quantitative attributes: 1465 Full prediction (prelimiary): 1357 Weight optimized (very preliminary): reduction

1414 -> 1391

Results

0

500

1000

1500

2000

2500

3000

1 2 3 4 5 6 7 8 9 10 11

Series1

Series2

Series3

Series4

Conclusions

Method works well for citation prediction

Yet to be tested for hot-spot prediction

traffic prediction on the internet

Documents

time t

time unit t

time slotxaxis

time slotyaxis

long time scalesclaim

time serieswe use similarity

window size t

interval t