data science and predictive spc

63

Upload: alex-gilgur

Post on 13-Jan-2017

69 views

Category:

Data & Analytics


0 download

TRANSCRIPT

What does a data scientist do?

Alex Gilgur. Data Science & Predictive SPC

The Maslow Pyramid

Alex Gilgur. Data Science & Predictive SPC

The Maslow Pyramid of Data Science

IT Infrastructure

Software Engineering

Quantitative Analytics

Domain

Data

Alex Gilgur. Data Science & Predictive SPC

Data Science = Nuclear Energy

Blow up in our face

Alex Gilgur. Data Science & Predictive SPC

Data Science = Nuclear Energy

Blow up in our face

…or…

Alex Gilgur. Data Science & Predictive SPC

Data Science = Nuclear Energy

Blow up in our face

…or…

Give us Power

Alex Gilgur. Data Science & Predictive SPC

What’s the Team We’re Rooting For?

DATA

INFORMATION

8Alex Gilgur. Data Science & Predictive SPC

What’s the Team We’re Rooting For?

DATA

INFORMATION

9

INFORMATION

INFORMATION

Alex Gilgur. Data Science & Predictive SPC

What’s the Team We’re Rooting For?INFORMATION

10

INFORMATION

INFORMATION

Alex Gilgur. Data Science & Predictive SPC

••••

… … …

… … …

… … …

… … …

Servers = argmax (Revenue |Budget)

Revenue = f[Throughput (Servers, SW, Budget)]

Servers = argmin (Budget | Revenue)

•Throughput = t (UX)

•Revenue = r (Throughput)

•Budget = f(SW, Servers)

Constraints:•Domain•Budget ≤ B

From X to Y to X

Closing the Loop

∆ →

Arithmetic means of random samples taken from any distribution asymptotically converges to a normal distribution as the number of such samples tends to infinity.

CENTRAL LIMIT THEOREM

σσ

σ

σ

σσ

σσ

σ

○○○

○ σ○○

○ σ σ○○

●●

http://www.isixsigma.com/

❑❑❑❑❑… … …

❑❑❑❑❑

❑❑❑❑… … …

❑❑❑

•••

•••

Key

Per

form

ance

Met

ric (K

PM

)

72 hrs

LSL

USL

How did HAL Know?

• ••

■■

●●●

A Few Words About ForecastingMethods:

● EWMA● ARIMA ● Regression

EWMA models are very specific and computationally fast, but they have to be told trend (linear or exponential) and seasonality (additive or multiplicative).

ARIMA model will implicitly account for trends, seasonality, and stationarity of the data. Autocorrelation of ARIMA residuals provide all the periodicities that have been missed.

For stationary data, use ARIMAFor non-stationary data, use EWMAEWMA and ARIMA overlap

When to use Regression:● data are monotonic.● seasonality is NOT statistically significant.● EWMA and ARIMA fail.

When to use Quantile Regression:● Upper and Lower bounds behave differently.● Outliers are possible.

For each data set, we can run a model competition, computing forecast model quality based on a weighted sum of model goodness of fit, model suitability for forecasting, data stationarity and data variability, and selecting the model that works best for each data set.

EWMA

ARIMA

Quantile Regression

●●

○○

… …

●●●●

… …

●●●●

… …

●●●●

… …

●●●●

… …

… …

… …

… …

p50R …

p50 R …

Target

(LCL…UCL)

(LSL…USL)

p50R …

p50 R …

Target

(LCL…UCL)

(LSL…USL)

❑❑❑❑

o

❑❑❑

••••

●○○○

●○

●○

■■

●●●●●●

www.isixsigma.com

www.amstat.org

www.cmg.org

www.linkedin.com

http://alexonsimanddata.blogspot.com/

http://josepferrandiz.blogspot.com/

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”

- H.G.Wells (1866-1946)

THANK YOU

•••••

σσ

Universal Scalability Law