time series forecasting modeling cmg12
TRANSCRIPT
1
“And” or “Or” ?
CMG 12 December 4, 2012Alex Gilgur
Josep FerrandizMatthew Beason
OVERVIEW
• Definitions: Loose but True
• Business Case
• So what’s the problem?
• How much traffic do you have to support?
• Regression
• Can you support the traffic at time T?
• Forecasting
• What If…
• Solution
• Real-World Use Case
• A Digression about Regression
• Conclusions
• Acknowledgments
• Q&A
2
HIGH
LOW
R
0 0.5 1 1.5 2 2.5 3 3.5 4
1
3
5
7
9
11
13
15
17
0
10
20
30
40
50
60
70
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121
1000s hosts * 100s apps * 10s metrics
12 months * 4.5 weeks * 7 days of the week * 24 hours
Business Demands
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
DEFINITIONS: LOOSE BUT TRUE
3
Regression:
A black boxy = f (x)
x
y
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
DEFINITIONS: LOOSE BUT TRUE
Statistics = The art of torturing data until they talk to you 4
Regression:
A black boxy = f (x)
x
A black boxy = f (t)t y
Time Series:
0
50
100
150
200
250
300
9/24
/201
1
10/4
/201
1
10/1
4/20
11
10/2
4/20
11
11/3
/201
1
11/1
3/20
11
11/2
3/20
11
12/3
/201
1
Max Daily Concurrency
TrendSeasonality
LevelEvents
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
DEFINITIONS: LOOSE BUT TRUE
Statistics = The art of torturing data until they talk to you5
Regression:
A black boxy = f (x)
x
A black boxy = f (t)t y
Time Series:
0
50
100
150
200
250
300
9/24
/201
1
10/4
/201
1
10/1
4/20
11
10/2
4/20
11
11/3
/201
1
11/1
3/20
11
11/2
3/20
11
12/3
/201
1
Max Daily Concurrency
TrendSeasonality
LevelEvents
TSA and Regressionallow us to reconstruct the
y given the x,
and / or the t, and the
parameters
DEFINITIONS: CONTINUED
Forecasting = The art of meaningful reflection on the past6
Forecasting: Predicting the future based on the past
0
200
400
600
800
1000
1200
9/24/2011 11/13/2011 1/2/2012 2/21/2012
<pool ABCD>: peak-hour busy threads of <app1234>
RSAS
ForecastPro...
Compute a Weighted Moving
Average
Extend it 1 point;Add that point to the
WMA
FOR(Level, Trend, Seasonality, Events)
BUSINESS CASE
• You have a web site
• You know your business metric behavior
• can forecast it
• can simulate it• You need to size the servers while minimizing the cost
• CPU
• Memory
• Worker threads
• Storage
• Network
So what’s the problem?
7
“It’s complicated”8
A black boxy = f (x)
x
The same black boxy = f (t)t y
y = f (x, t) + ε (t)
BM
, Q, X
, and
R a
s ti
mes
erie
s
Q (BM, t) = X(BM, t) * R(BM, t)
q
x
r
BMX = throughput (TPS)R = response time
Q = concurrency (traffic)BM = business metric
How much traffic do you need to support?
HOW MUCH TRAFFIC DO YOU HAVE TO SUPPORT?
A better question is… 9
BM
, Q, X
, and
R a
s ti
mes
erie
s
Q (BM, t) = X(BM, t) * R(BM, t)
q
x
r
BM
X = throughput (TPS)R = response time
Q = concurrency (traffic)BM = business metric
t = time
BM = f(t)X = f (BM)
R = f(X, BM)Q = R * X = f (R, X)
Q (BM, t) = X(BM, t) * R[X(BM, t), BM, t]
Tools: 1: Enter Regression
The complexity of the relationships is enormous
TOOLS: 2: A WORD FOR FORECASTING
• If we cannot regress it, we forecast it.
• Not an Excel-style regression to time
• Not a point forecast:
• need the prediction interval
Holt-Winters and ARIMA are standard tools; new methods are being developed. 10
Holt-Winters
ARIMA
Can you support the traffic that you will have at time T?
A simple example
MORE SERIOUS CASES:
11
http://robjhyndman.com/papers/complex-seasonality/http://forecastingprinciples.com/
There are cases where regression would not have worked
MORE SERIOUS CASES:
12
http://robjhyndman.com/papers/complex-seasonality/http://forecastingprinciples.com/
There are cases where regression would not have worked
Exponentially Weighted Moving Average (HW)
Auto Regressive Integrated Moving Average
Extend it 1 point;Add that point to the
Time Series
FOR(Level, Trend, Seasonality, Events)
IF 𝑩𝑴 = 𝒇 𝒕 …
We need to outsmart the model 13
𝑄 𝐵𝑀 𝑡 , 𝑡 = 𝑋 𝐵𝑀 𝑡 , 𝑡 ∗ 𝑅{[𝑋 𝐵𝑀 𝑡 , 𝑡 , 𝐵𝑀, 𝑡}
1. Forecast the BM; get the value at time T2. Build a regression of performance metrics to BM
i. How good is the regression?ii. How do we measure the goodness of the regression?
Can you support the traffic that you will have at time T?
Q = f(BM) + ε The ε is the residualsif the fit is good, ε is small => R2 is high
What if the R2 is OK, but… we used linear model on quadratic data?we missed a pattern in the data?
What if the ε is time-dependent ? Q(t) = f[BM (t)] + ε (t)
AN ILLUSTRATION (SOTTO VOCE)
14
Tried to fit a quadratic modelR2 = 0.995
Obviously missed a trendThe data are cubic
R2 is not good enough
AN ILLUSTRATION (SOTTO VOCE)
15
Tried to fit a quadratic modelR2 = 0.995
Obviously missed a trendThe data are cubic
R2 is not good enough
Here the missed trend may not matter, but it’s only an illustration
SOLUTION: FORECAST THE RESIDUALS!
Forecast IV; build regression; forecast residuals; add it all together 16
Start DV == f (IV)?
DATA
DV(t)
, IV
(t)
Generate TSA FORECASTS for IV and DV
Project IV and DV to t = T
independently
Done
NO
DATA
Generate DV(IV) REGRESSION
YES
Generate TSA FORECAST for
ResidualsAnd for IV
Project to t = T
Combine DV[IV(t=T)] + Residuals(t = T)
DV (t)IV(t)
DV (t) = f[IV(t), t] |t = T* + ε (t) |t = T*
TRADITIONAL SOLUTION:
17
BM Response Time
ThroughputTraffic
A real-life exampleSize the worker threads for an application for the next year
REGRESSION IS OPTIMIZATION
18
DV = f(IV, A) : A = arg min(ε );ε = DV|predict - DV
Averages: OLS: Simple algebraCI from StDev
Linear a0 + a1 * IVPolynomial a0 + a1 * IV + a2 * IV^2 + …Exponential a0 * exp (a1 * IV)Logarithmic a0 * log (a1 * IV) Power a0 * IV ^ a1
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
REGRESSION IS OPTIMIZATION
19
y = 0.241x + 24.215R² = 0.03376
Con
curr
ency
Business Metric
Q Linear (Q)
DV = f(IV, A) : A = arg min(ε );ε = DV|predict - DV
Averages: OLS: Simple algebraCI from StDev
95%ile?
Linear a0 + a1 * IVPolynomial a0 + a1 * IV + a2 * IV^2 + …Exponential a0 * exp (a1 * IV)Logarithmic a0 * log (a1 * IV) Power a0 * IV ^ a1
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
REGRESSION IS OPTIMIZATION
20
y = 0.241x + 24.215R² = 0.03376
Con
curr
ency
Business Metric
Q Linear (Q)
DV = f(IV, A) : A = arg min(ε );ε = DV|predict - DV
Averages: OLS: Simple algebraCI from StDev
95%ile?
Linear a0 + a1 * IVPolynomial a0 + a1 * IV + a2 * IV^2 + …Exponential a0 * exp (a1 * IV)Logarithmic a0 * log (a1 * IV) Power a0 * IV ^ a1
Traf
fic
Business Metric
Q vs. BM
QMdlLo90%Hi90%
library(quantreg)Mdl = rq (DV ~ IV, tau = 0.95)DV_bar = predict (Mdl)
EXAMPLE (CONTINUED)
21
Forecast BM
Build Regression Q ~ BM
Forecast Residuals
Q(t) = f[BM (t)] + ε (t)
Size the worker threads for an application for the next year
a) b)
FINISHING TOUCHES
22
Red = regular regression
Blue = our methodGreen and black = data
Grey = predictive interval bounds
CONCLUSIONS• Downsides:
• It is an extra step in building the projection, increasing the runtime of computing the models.
• If the regression model is good, then the residuals are unforecastable.
• Advantages:
• It is a very robust method:
• No worries about the data not being suitable for the regression:
• missed trend and periodicity in the residuals will be picked up by the TSA forecasts.
• It is a versatile method:
• Regression and TSA forecasting combined:
• give us more control in tuning regression and TSA models than regression by itself and TSA forecasting by itself.
• TSA forecast of residuals can only be inappropriate if the regression is good.
• Then the weight (significance) of the residuals is negligible compared with the actual data.
• There are forecasting methods even for unforecastable data.
• Forecast replacement for nonlinear time series data:
• Linear is too conservative
• Exponential is too optimistic
• Quadratic regression to time
• Forecast residualsThere is no reason not to use it 23
• Co-authors and reviewers:
• Dr. Josep Ferrandiz
• Matthew Beason
• A big thank-you goes to
• Dr. Igor Trubin who inspired this paper at CMG’11
• Mike Perka who has been my guide on this journey into the world of IT data
ACKNOWLEDGMENTS