big data at home depot ksu – big data survey course steve einbender advanced analytics architect

6
Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect

Upload: curtis-mcdowell

Post on 05-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect

Big Data at Home Depot

KSU – Big Data Survey CourseSteve Einbender

Advanced Analytics Architect

Page 2: Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect

2

Time Series Concept

Statistical Forecasting and/or Statistical Estimation are the primary goals of Time Series modeling

Time Series is essential to accurately model and account for Time Effects. If you don’t then you will, in general, confound your experiment and, specifically, your covariates/predictors.

Conceptual Model:

= ….. T is Trend, S is Seasonal, E is Error, ….. And is

Time period and is the Lag period In general, assess behavior of the Dependent Variable(e.g., Net_Sales) and

apply an appropriate Time Series Model to the Trend and Seasonality of the DV (e.g., ARIMA) to account for the Time Effects….. Then the fun begins…

Page 3: Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect

3

Operationalizing the Concept

p - AutoRegressive (Auto Correlation)d - Integrated (Stationarity / Trend)q - Moving Average (Shocks / Error)

P – Seasonal Auto CorrelationD – Seasonal TrendQ – Seasonal Error

Seasonal effects: If there are spikes in the data every four periods for quarterly data, or every 12 periods for monthly data, there is a seasonal effect.

Page 4: Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect

4

Time Series Parameter Specifications

ARIMA modeling involves three stages: (1) Identification of the initial p, d, and q parameters

Autoregressive component (p). Usually 0, 1, or 2 Integrated component (d). Usually 0, 1, or 2 Moving average component (q). Usually 0, 1, or 2

(2) Estimation of the p (auto-regressive) and q (moving average) components to see if they contribute significantly to the model or if one or the other should be dropped; and

(3) Diagnosis of the residuals to see if they are random and normally distributed, indicating a good model.

An ARIMA (0,1,1) model means no autoregressive component, differencing one time to remove linear trends, and a lag 1 moving average component.

Page 5: Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect

5

Time Series Forecasting System(TSFS) Demo

Data Range identification View Series graphically What Functions and Tests do we use to derive the most accurate Time

Series model possible ? Autocorrelation Function Partial Autocorrelation Function

Patterns in the ACF/PACF functions can be used to suggest different models to use. White Noise Test Dickey-Fuller Unit Root / Stationarity Test

After a candidate set of models are identified, the models are estimated and their fit assessed

The best fitting model is used to generate a forecast.

Page 6: Big Data at Home Depot KSU – Big Data Survey Course Steve Einbender Advanced Analytics Architect

6

ARIMA with Dynamic Regression

Another use of Time Series is for the introduction of Covariates/Predictors. An extension of ordinary Regression One or more of the Independent Variables(i.e., predictors) are correlated with the

Dependent Variable at non-concurrent time lags.

Intervention Analysis Two basic activities

Identify the Functional Form of the Intervention Assess the Statistical Significance of the Intervention

Let’s look at how we build a Time Series ADS….