machine learning for suits

56
#ODSC, Boston, May 30th, 2015 MACHINE LEARNING FOR SUITS Rahul Dave ([email protected]) LxPrior Inc. & Harvard IACS LXPrior @rahuldave 1

Upload: sergey-makarevich

Post on 12-Aug-2015

36 views

Category:

Data & Analytics


2 download

TRANSCRIPT

!#ODSC,!Boston,!May!30th,!2015

MACHINE(LEARNING

FOR$SUITSRahul&Dave&([email protected])

LxPrior'Inc.'&'Harvard'IACS LXPrior !!! @rahuldave 1

CLASSIFICATION• will%a%customer%churn?

• is%this%a%check?%For%how%much?

• a%man%or%a%woman?

• will%this%customer%buy?

• do%you%have%cancer?

• is%this%spam?

• whose%picture%is%this?

• what%is%this%text%about?j

j"image"from"code"in"h/p://bit.ly/1Azg29G

LXPrior !!! @rahuldave 2

100 1 2 3 4 5 6 7 8 9

10

0

1

2

3

4

5

6

7

8

9

x

y

REGRESSION• how%many%dollars%will%you%spend?

• what%is%your%creditworthiness

• how%many%people%will%vote%for%Bernie%t%days%before%elec9on

• use%to%predict%probabili9es%for%classifica9on

• causal%modeling%in%econometrics

LXPrior !!! @rahuldave 3

PART%1 LXPrior !!! @rahuldave 4

HYPOTHESIS)SPACES

A"polynomial"looks"so:

!

All#polynomials#of#a#degree#or#complexity# #cons4tute#a#hypothesis#space.

LXPrior !!! @rahuldave 5

30#points#of#data.#Which#fit#is#be4er?#Line#in# #or#curve#in# ?

LXPrior !!! @rahuldave 6

100 1 2 3 4 5 6 7 8 9

10

0

1

2

3

4

5

6

7

8

9

x

y

What%does%it%mean%to%FIT?

Minimize'distance'from'the'line?

Minimize'squared'distance'from'the'line.

Get$intercept$ $and$slope$ .

LXPrior !!! @rahuldave 7

EMPIRICAL)RISK)MINIMIZATION

The$sample$must$be$representa/ve$of$the$popula/on!

A:#Empirical#risk#es/mates#out#of#sample#risk.#B:#Thus#the#out#of#sample#risk#is#also#small.

Is#this#enough?

LXPrior !!! @rahuldave 8

TARGET y = f(x)

SAMPLE (TRAINING EXAMPLES)

( , ), ( , ), . . . , ( , ) x1 y1 x2 y2 xn yn

LEARNER

FINALHYPOTHESIS

g ≈ f

INPUT DISTRIBUTION

P(x)

RISK/ERRORMEASURE

HYPOTHESISSET H

, , . . . , g, . . . , h1 h2 hM

*

*"image"based"on"amlbook.com

LXPrior !!! @rahuldave 9

THE$REAL$WORLD$HAS$NOISE

Which%fit%is%be+er%now?%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%The%line%or%the%curve?

LXPrior !!! @rahuldave 10

SAMPLE (TRAINING EXAMPLES)

( , ), ( , ), . . . , ( , ) x1 y1 x2 y2 xn yn

LEARNER

FINALHYPOTHESIS

g ≈ f

INPUT DISTRIBUTION

P(x)

RISK/ERRORMEASURE

HYPOTHESISSET H

, , . . . , g, . . . , h1 h2 hM

TARGET DISTRIBUTION P(y|x) JOINT

DISTRIBUTION P(x, y)

y = f(x) + ϵ *

*"image"based"on"amlbook.com

LXPrior !!! @rahuldave 11

DETERMINISTIC*NOISE*(Bias)*vs*STOCHASTIC*NOISE**

*"image"based"on"amlbook.com

LXPrior !!! @rahuldave 12

UNDERFITTING*(Bias)vs#OVERFITTING#(Variance)

LXPrior !!! @rahuldave 13

!If$you$are$having$problems$in$machine$learning$the$reason$

is$almost$always

OVERFITTING!"Background"from"h0p://commons.wikimedia.org/wiki/File:Overfi>ng.svg

LXPrior !!! @rahuldave 14

DATA$SIZE$MATTERS:$straight$line$fits$to$a$sine$curve

Corollary:(Must(fit(simpler(models(to(less(data!

LXPrior !!! @rahuldave 15

HOW$DO$WE$LEARN?

Training Set Test Set

D Dataset

LXPrior !!! @rahuldave 16

OverfittingUnderfitting

Complexity “d”

Erro

r or r

isk

Low BiasHigh Variance

High BiasLow Variance

BALANCE'THE'COMPLEXITY

LXPrior !!! @rahuldave 17

TrainingSet

ValidationSet

Training Settrains g*

… … … …

trains g-0

estimates Rout(g-0)

trains g-1

estimates Rout(g-1)

trains g-*

estimates Rout(g-*)

trains g-n

estimates Rout(g-n)

∈ H∗Test Settests g* ∈ H∗estimates Rout(g*)

… … … …

in H0

in H1

in H∗

in Hn

pick H∗ with lowest Rout(g-*), then retrain in H∗ on entire set

D

VALIDATION• train'test*not*enough*as*we*fit*for* *on*test*set*and*contaminate*it

• thus*do*train'validate'test

TrainingSet

TestSet

D Dataset

ValidationSet

LXPrior !!! @rahuldave 18

TrainingSet

ValidationSet

Training Settrains g*

… … … …

estimates R0CV

∈ H∗Test Settests g* ∈ H∗estimates Rout(g*)

… … … …

in H0

in H1

in H∗

in Hn

pick H∗ with lowest RCV , then retrain in H∗ on entire set

estimates R1CV

estimates R*CV

estimates RnCV

D

CROSS%VALIDATION

Test Setleft over

D Dataset

LXPrior !!! @rahuldave 19

LXPrior !!! @rahuldave 20

CROSS%VALIDATION

is

• robust(to(outlier(valida/on(set

• allows(for(larger(training(sets

• allows(for(error(es/mates

Here$we$find$ .

LXPrior !!! @rahuldave 21

OverfittingUnderfittingErro

r or r

isk

Low BiasHigh Variance

High BiasLow Variance

Regularizer α

H13

subsets of

α∗

S*

REGULARIZATION

Keep$higher$a*priori$complexity$and$impose$a

complexity+penalty

on#risk#instead,#to#choose#a#SUBSET#of# .#We'll#make#the#coefficients#small:

LXPrior !!! @rahuldave 22

LXPrior !!! @rahuldave 23

REGULARIZATION

As#we#increase# ,#coefficients#go#towards#0.#

Lasso%uses% %sets%

coefficients%to%exactly%0.

Thus%regulariza-on%automates:

!!!!!!!FEATURE!ENGINEERING

LXPrior !!! @rahuldave 24

LXPrior !!! @rahuldave 25

PART%2 LXPrior !!! @rahuldave 26

CLASSIFICATIONBY#LINEAR#SEPARATION

Which%line?

• Different)Algorithms,)different)lines.

• SVM)uses)max:marginj

j"image"from"code"in"h/p://bit.ly/1Azg29G

LXPrior !!! @rahuldave 27

PROBABILISTIC+CLASSIFICATION

Model& &or& .&&&&&

If&la,er&use&Bayes&Theorem:&

LXPrior !!! @rahuldave 28

DISCRIMINATIVE,CLASSIFIER

LXPrior !!! @rahuldave 29

GENERATIVE)CLASSIFIER

LXPrior !!! @rahuldave 30

The$virtual$ATM:

:"Inspired"by"h.p://blog.yhathq.com/posts/image9classifica;on9in9Python.html

LXPrior !!! @rahuldave 31

PCA$$$$$$unsupervised*dim*reduc-on*from*332x137x3*to*50

!!Diagram!from!h+p://stats.stackexchange.com/a/140579

LXPrior !!! @rahuldave 32

kNN

CLASSIFICATION

A

B

C

LXPrior !!! @rahuldave 33

Bias/Oversmoothing

Variance/Undersmoothing

BIAS%and%VARIANCE%in%KNN

LXPrior !!! @rahuldave 34

kNN

CLASSIFICATION

CROSS%VALIDATED

LXPrior !!! @rahuldave 35

EVOO$from$WHERE?,

,"data"and"image"from".p://www.ggobi.org/book/

LXPrior !!! @rahuldave 36

SVM$uses$KERNELS.

."fig"from"h*p://nlp.stanford.edu/IR8book/

LXPrior !!! @rahuldave 37

PART%3% LXPrior !!! @rahuldave 38

0 1

PNPredictedNegative

PPPredictedPositive

1 FNFalse Negative

TPTrue Positive

OPObservedPositive

0 TNTrue Negative

FPFalse Positive

ONObservedNegative

Predicted

Obs

erve

d

EVALUATING*CLASSIFIERS

confusion_matrix(bestcv.predict(Xtest), ytest)

[[11, 0], [3, 4]]

checks=blue=1,,dollars=red=0

LXPrior !!! @rahuldave 39

CLASSIFIER)PROBABILITIES

• classifiers*output*rankings*or*probabili3es

• ought*to*be*well*callibrated,*or,*atleast,*similarly*ordered

LXPrior !!! @rahuldave 40

CLASSIFICATION*RISK

• The%usual%loss%is%the%1.0%loss% .

• Thus,% %and%

!!!!!!!CHOOSE!CLASS!WITH!LOWEST!RISK

1"if" "1"if" .

!!!!!!!choose&1&if& &!&Intui/ve!

LXPrior !!! @rahuldave 41

ASYMMETRIC*RISK^

want%no%checks%misclassified%as%dollars,%i.e.,%no%false%nega6ves:% .

Start%with%

choose&1&if&

^"image"of"breast"carcinoma"from"h1p://bit.ly/1QgqhBw

LXPrior !!! @rahuldave 42

ASYMMETRIC*RISK

i.e.$ $where

0 1

PNPredictedNegative

PPPredictedPositive

1 FNFalse Negative

TPTrue Positive

OPObservedPositive

0 TNTrue Negative

FPFalse Positive

ONObservedNegative

Predicted

Obs

erve

d

confusion_matrix(ytest, bestcv2, r=10, Xtest) [[5, 4], [0, 9]]

LXPrior !!! @rahuldave 43

ROC$SPACE+

0 1

PNPredictedNegative

PPPredictedPositive

1 FNFalse Negative

TPTrue Positive

OPObservedPositive

0 TNTrue Negative

FPFalse Positive

ONObservedNegative

Predicted

Obs

erve

d

+"this+next"fig:"Data"Science"for"Business,"Foster"et."al.

LXPrior !!! @rahuldave 44

ROC$CURVE LXPrior !!! @rahuldave 45

ROC$CURVE• Rank&test&set&by&prob/score&from&highest&to&lowest

• At&beginning&no&+ives

• Keep&moving&threshold

• confusion&matrix&at&each&threshold

LXPrior !!! @rahuldave 46

COMPARING*CLASSIFERSTelecom'customer'Churn'data'set'from'@YhatHQ<

<"h$p://blog.yhathq.com/posts/predic8ng:customer:churn:with:sklearn.html

LXPrior !!! @rahuldave 47

ROC$curves

LXPrior !!! @rahuldave 48

ASYMMETRIC*CLASSES

• A#has#large#FP#

• B#has#large#FN.#

• On#asymmetric#data#sets,#A#will#do#very#bad.

• Both#downsampling#and#unequal#classes#are#used#in#pracAce

#"figure"from"Data"Science"for"Business,"Foster"et."al.

LXPrior !!! @rahuldave 49

ASYMMETRIC*CLASSES

We#look#for#lines#with#slope

Large& &penalizes&FN.

Churn&and&Cancer&u&dont&want&FN:&an&uncaught&churner&or&cancer&pa3ent&(P=churn/cancer)

LXPrior !!! @rahuldave 50

Li#$Curve• Rank&test&set&by&prob/score&from&highest&to&lowest

• Calculate&TPR&for&each&confusion&matrix&( )&

• Calculate&frac?on&of&test&set&predicted&as&posi?ve&( )

• Plot& &vs& .&LiD&represents&the&

advantage&of&a&classifier&over&random&guessing.

LXPrior !!! @rahuldave 51

EXPECTED'VALUE'FORMALISM

Can$be$used$for$risk$or$profit/u3lity$(nega3ve$risk)

!!!!!!!!!

Frac%on(of(test(set(pred(to(be(posi%ve( :

LXPrior !!! @rahuldave 52

Profit&curve• Rank&test&set&by&prob/score&from&highest&to&lowest

• Calculate&the&expected&profit/u=lity&for&each&confusion&matrix&( )&

• Calculate&frac=on&of&test&set&predicted&as&posi=ve&( )

• plot& &against&

LXPrior !!! @rahuldave 53

Finite&budget#

• 100,000%customers,%40,000%budget,%5$%per%customer

• we%can%target%8000%customers

• thus%target%top%8%

• classifier%1%does%be>er%there,%even%though%classifier%2%makes%max%profit

#"figure"from"Data"Science"for"Business,"Foster"et."al.

LXPrior !!! @rahuldave 54

WHERE%TO%GO%FROM%HERE?• Follow&@YhatHQ,&@kdnuggets&etc

• Read&Provost,&Foster;&Fawce<,&Tom.&Data&Science&for&Business:&What&you&need&to&know&about&data&mining&and&dataIanalyJc&thinking.&O'Reilly&Media.&

• Read&Learning&From&Data&by&Yaser&Abu&Mustafa&et.al.&if&you&want&the&math.&The&learning&diagram&and&some&demos&were&inspired&by&that&book.

• check&out&Harvard's&cs109&or&Andrew&Ng's&coursera&course&videos&online

LXPrior !!! @rahuldave 55

• Ask%lots%of%ques-ons%of%your%data%science%team

• Follow%your%intui-on

THANKS!!@rahuldave

LXPrior !!! @rahuldave 56