data science demystified

30
1 IPL CONFIDENTIAL Data Science demystified Murthy Kolluru, Ph.D.

Upload: institute-of-product-leadership

Post on 28-Jan-2018

1.018 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Data Science demystified

1IPL CONFIDENTIAL

Data Science demystified

Murthy Kolluru, Ph.D.

Page 2: Data Science demystified

2IPL CONFIDENTIAL

Page 3: Data Science demystified

3IPL CONFIDENTIAL

Page 4: Data Science demystified

4IPL CONFIDENTIAL

How good is my customer?

• Within the first few weeks of engagement, figure out how muchrevenue can be expected in the first two years.

• 100,000 customers over 5 years and

a lot of data

• POS data, playing, demographics

• Over 50 attributes

Page 5: Data Science demystified

5IPL CONFIDENTIAL

Attribute 1

Attribute 2

Page 6: Data Science demystified

6IPL CONFIDENTIAL

Probability of being high value = -0.25* age + 0.34* income + 0.78 * number of kids

Age Income Kids

Output

Page 7: Data Science demystified

7IPL CONFIDENTIAL

Attribute 1

Attribute 2

Page 8: Data Science demystified

8IPL CONFIDENTIAL

If parents are old and number of kids is less than 2 and income is less than $10K,

the value is low

Output

Page 9: Data Science demystified

9IPL CONFIDENTIAL

Attribute 1

Attribute 2

Page 10: Data Science demystified

10IPL CONFIDENTIAL

Attribute 1

Attribute 2

Page 11: Data Science demystified

11IPL CONFIDENTIAL

Page 12: Data Science demystified

12IPL CONFIDENTIAL

Simplest form of non-linearity

Page 13: Data Science demystified

13IPL CONFIDENTIAL

By carefully combining simple non-linearities, you can get

highly non linear curves.

Page 14: Data Science demystified

14IPL CONFIDENTIAL

Page 15: Data Science demystified

15IPL CONFIDENTIAL

Page 16: Data Science demystified

16IPL CONFIDENTIAL

Finally mind is demystified!

Rival The New Yorker, December 6, 1958 P. 44

ABSTRACT: Talk story about the perceptron, a new

electronic brain which hasn't been built, but which has

been successfully simulated on the I.B.M. 704. Talk

with Dr. Frank Rosenblatt, of the Cornell Aeronautical

Laboratory, who is one of the two men who developed

the prodigy; the other man is Dr. Marshall C. Yovits, of

the Office of Naval Research, in Washington. Dr.

Rosenblatt defined the perceptron as the first non-

biological object which will achieve an organization o

its external environment in a meaningful way. It

interacts with its environment, forming concepts that

have not been made ready for it by a human agent. If

a triangle is held up, the perceptron's eye picks up the

image & conveys it along a random succession of lines

to the response units, where the image is registered. It

can tell the difference betw. a cat and a dog, although

it wouldn't be able to tell whether the dog was to the

left or right of the cat. Right now it is of no practical

use, Dr. Rosenblatt conceded, but he said that one

day it might be useful to send one into outer space to

take in impressions for us

Page 17: Data Science demystified

17IPL CONFIDENTIAL

Page 18: Data Science demystified

18IPL CONFIDENTIAL

• Blackbox models only solve part of the problem

• How do we get Explicability?

Page 19: Data Science demystified

19IPL CONFIDENTIAL

Attribute 4

Attribute 1

Attribute 2

Attribute 5

Page 20: Data Science demystified

20IPL CONFIDENTIAL

What we did

• Created more features

• Did they have a favorite game?

• How are the kids ages distributed?

• When did the first sale happen?

• …

Page 21: Data Science demystified

21IPL CONFIDENTIAL

Patterns

Favorite – Played a

game more than 50% of

the time

Uniform –Played multiple

games

Page 22: Data Science demystified

22IPL CONFIDENTIAL

Page 23: Data Science demystified

23IPL CONFIDENTIAL

Page 24: Data Science demystified

24IPL CONFIDENTIAL

24

Customers who are uniform in first 30

days are on average sticky and give

more revenues in two years.

Page 25: Data Science demystified

25IPL CONFIDENTIAL

First sale

Dec and Jan

win!

Page 26: Data Science demystified

26IPL CONFIDENTIAL

26

Upsell?

Dec & Jan lose

big!

Page 27: Data Science demystified

27IPL CONFIDENTIAL

• A great model on simple and incomplete data almost

always loses to a simple and incomplete model on great

data

• Pick unsolved problems in your business where you have

some past data

• Create as many additional factors as you can from the data

• View it from multiple angles in your Excel

• You will most likely have some Aha moments in store!!!

Action Points

Page 28: Data Science demystified

28IPL CONFIDENTIAL

There will be a shortage of

100,000 data scientists and

1,000,000 data smart

managers by 2020

Mckinsey

Page 29: Data Science demystified

29IPL CONFIDENTIAL

IPL’s Big Data Analytics Track

Architecting data science solutions &

products

Hands-on model building

Data visualizations

and story telling

Complexities in data sourcing,

privacy, security

Page 30: Data Science demystified

30IPL CONFIDENTIAL

THANK YOU

11/29/2014