all models are wrong, but some are useful: 6 lessons for making predictive analytics work

AllModelsAreWrong,ButSomeAreUseful:6LessonsForMakingPredictiveAnalyticsWorkDr.BrianMacNameebrian.macnamee@ucd.ie@brianmacnamee

machinelearning

ar,ficialintelligence

datascience

cogni,vecompu,ng

bigdata

InspiredbyBrendanTierneyh:p://www.oraly,cs.com/2012/06/data-science-is-mul,disciplinary.html

deeplearning

ar#ficialintelligence

datascience

cogni#vecompu#ng

bigdata

deeplearning

InspiredbyBrendanTierneyh:p://www.oraly#cs.com/2012/06/data-science-is-mul#disciplinary.html

machinelearning

if LOAN-SALARY RATIO < 1.5 then OUTCOME=’repay’

else if LOAN-SALARY RATIO > 4 then OUTCOME=’default’

else if AGE < 40 and OCCUPATION =’industrial’ then OUTCOME=’default’

else OUTCOME=’repay’

end if

FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com

http://www.machinelearningbook.com

Betterdatausuallybeatsbiggermodels

Predictionisalotofthings1

2 Thereisnosuchthingasafreelunch

3 LookforGoldilocks

4

Chooseyourevaluationcarefully5

6 RememberOccam’sRazor

PredictionIsA LotOfThings

1

Predictingthevalueofan

unknownvariableatatimeinthe

future

Forecast

0

27.5

55

82.5

110

July September November January March May

Predictthevalueofanunknownvariableassociatedwithan

object

Image Set

Image Set

ContainingNerves NotContainingNerves

Predictingthepropensityof

somebodytotakeanactionatatime

inthefuture

Population

Population

LeastLikelyToRespond

MostLikelyToRespond

"Indataanalyticsapredictionisanassignmentofavaluetoanunknownvariable."FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com


Predictionsmeansalotofdifferentthings,whichmeanswecanapplypredictivemodellingtomanydifferentproblems.

Thinkcarefullyaboutwhattypeofdecisionyouwanttomake(label,rank,orforecast),andthendesignapredictivemodellingsolutiontobesthelpwiththat.

Lesson

27

ThereIsNoSuchThingAsA FreeLunch

2

www.rapidminer.com

http://www.rapidminer.com

29 www.rapidminer.com

http://www.rapidminer.com

"Wehavedubbedtheassociated resultsNoFreeLunchtheoremsbecausetheydemonstratethatifanalgorithmperformswellonacertainclassofproblemsthenitnecessarilypaysforthatwithdegradedperformanceonthesetofallremainingproblems."

Wolpert&Macready

"No Free Lunch Theorems for Optimization", David H. Wolpert and William G. Macready, IEEE Transactions On Evolutionary Computation, vol. 1, no. 1, 1997 http://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf

http://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf



Tree Model



Nearest Neighbour Model



Linear Model



Tree Model



Nearest Neighbour Model



Linear Model



Thereareahugenumberofdifferentpredictivemodellingalgorithms.Youneedtoexperimentwithlotsofdifferentones.

Lesson

randomforestdecisiontreeistonicregressionneuralnetwork nearest neighbour naive Bayes supportvectormachine logistic regressionBayesiannetworkensemblegradientboostinglinearmodelwinnow

LookForGoldilocks

3

●

●

●

●

●

0 20 40 60 80 100

20000

40000

60000

80000

Age

Income



0 50 100 150 200

0.1

0.2

0.3

0.4

0.5

Training Iteration

Mis

clas

sific

atio

n R

ate

Performance on Training SetPerformance on Validation Set

0 50 100 150 200

0.1

0.2

0.3

0.4

0.5

Training Iteration

Mis

clas

sific

atio

n R

ate




Alwaystuneyourmodels,butbeverycarefulofoverfitting.Avalidationdatasetiscrucialhere.

Lesson

56

BetterDataUsuallyBeatsBiggerModels

4

DigitalImageProcessin

g,

Gonzalez&W

oods,2002


g,

Gonzalez&W

oods,2002

The fitting is obtained by robust linear regression(we use iteratively reweighted least squares) on the½logðfÞ; Iðf;ϕÞ$ scatter plot for f between f0 andf1 (to be specified) cycles per pixels. Robust regres-sion gives consistent estimations which are not influ-enced by the spurious spikes due to pseudoperiodicnoise. Least-squares estimation also gives the stan-dard deviation σ of the residues.

3. Find the localization of upper outliers in the averagepower spectrum as frequency pairs ðξ; ηÞ such that,under the common 3σ rule

logðgjPj2ðξ; ηÞjÞ − ½A − α logðfÞ$σ

> 3: (10)

This results in an outlier map Mpo such that

Mpoðξ; ηÞ ¼ 1 if an outlier is present at ðξ; ηÞ in the

average spectrum of the patches, and ¼ 0 otherwise.Note that a false-positive rate of 1% is expected undera Gaussian distribution. We restrict the outlier detec-tion to frequencies f > f2 (to be specified), since lowfrequencies do not correspond to repetitive patterns.

4. Resize the outlier map of size L × L to size X × Y, giv-ing a map Mo of the probable spurious spikes causedby quasiperiodic noise in the original image spectrum.Multiplying the initial image spectrum by 1 −Mo actsas a notch filter, eliminating the influence of the qua-siperiodic noise.

5. Retrieve an estimation n of the periodic noise compo-nent as the inverse Fourier transform ofMoðξ;ηÞIðξ;ηÞ,and the estimated denoised image i as i − n (i.e., theinverse transform of ½1 −Moðξ; ηÞ$Iðξ; ηÞ).

3.2 Practical ConsiderationsThe implementation details presented below do notplay a crucial role in the good behavior of the algorithm,but are given in order to enable the algorithm to berecreated.

First, since most images have discontinuities betweentheir left/right (respectively top/bottom) borders, their spec-trum shows dominant straight lines along the horizontalaxis (respectively vertical axis). To reduce these boundaryeffects, we multiply the patches p by a two-dimensional

Denoised image

100 200 300 400 500 600

50

100

150

200

250

300

350

400

4500

50

100

150

200

250Method noise

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450−200

−150

−100

−50

0

50

100

150

200

Original image

80 100 120 140 160 180 200 220

320

340

360

380

400

420 0

50

100

150

200

250Denoised image

80 100 120 140 160 180 200 220

320

340

360

380

400

420 0

50

100

150

200

250

(a) (b)

(c) (d)

Fig. 11 Apollo experiment (2). (a) Denoised image. (b) Estimation of the noise. (c) Close-up view ofthe noisy image. (d) Close-up view of the denoised image.

Journal of Electronic Imaging 013003-9 Jan∕Feb 2015 • Vol. 24(1)

Sur and Grédiac: Automated removal of quasiperiodic noise using frequency domain statistics


g,

Gonzalez&W

oods,2002

RawActivity

NormalisedActivity

WakeAlignedActivity

CumulativeWakeAlignedActivity

Activity

Activity Peakactivity(day)

Variationinactivity(day)

Totalactivity(day)

Peakactivity(1sthour)

Variationinactivity(1sthour)

Totalactivity(1sthour)

Areaundercumulativeactivitycurve

…

ChooseAnAlgorithm

GenerateData

TuneModelParameters

Developingnew,richerfeaturesisoftenabetterwaytoimprovemodelperformancethanusingmoresophisticatedmodellingtechniques.

Lesson

AnAsideOnDeepLearning

Deep Learning

Google Trends: http://www.google.com/trends/

2005 2007 2009 2011 2013 2015

Deep-learningmethodsarerepresentaUon-learningmethodswithmul\plelevelsofrepresenta\on,

obtainedbycomposingsimplebutnon-linearmodulesthateachtransformtherepresenta\onatonelevel

(star\ngwiththerawinput)intoarepresenta\onatahigher,slightlymoreabstractlevel.

[LeCunetal,2014]

Deep Learning Yann LeCun, Yoshua Bengio & Geoffrey Hinton http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html

0 1 2 3 4 5 6 7 8

9

Convolu\onalneuralnetworksseemtobrilliantlyaddresstheselecUvity-invariancedilemmathatis

fundamentaltoalleffortstolearntoclassifyobjects:theyproducerepresenta\onsthatareselec\vetothe

aspectsoftheimagethatareimportantfordiscrimina\on,butthatareinvarianttoirrelevant

aspects

Convolu\onalnetworksholdrecordsforproblemsinimagerecogniUon,speechrecogniUon,andtext

classificaUonamongstotherareas

On Welsh Corgis, Computer Vision, and the Power of Deep Learning, Microsoft Research, 2014 http://research.microsoft.com/en-us/news/features/dnnvision-071414.aspx Rise of the machines, The Economist, 2015 http://www.economist.com/news/briefing/21650526-artificial-intelligence-scares-peopleexcessively-so-rise-machines

HardwareDataAlgorithms

Applica4ons

79

ChooseYourEvaluationCarefully

5

A marketing company working for a charity has developed two different models that predict the likelihood that donors will respond to a mail-shot asking them to make a special extra donation. Two models have been built and an evaluation experiment had been performed. Now we must decide which model to use.

Prediction

TRUE FALSE

TargetTRUE 2355 337

FALSE 329 1714

ClassificationAccuracy:85.93%

Model1

Prediction

TRUE FALSE

TargetTRUE 2198 494

FALSE 471 1572

ClassificationAccuracy:79.62%

Model2

Model1



Model2



Therearemanydifferentperformancemeasuresthatwecanusetoevaluatetheperformanceofamodel.Youneedtopicktheonethatbestmatchesthedecisionsyouaretryingtomake.

Lesson

87

RememberOccam’sRazor

6

Timeline

Followers

Following

Tweets+ Metadata

Profile

Tweets+ Metadata

Profile

http://www.cso.ie/en/releasesandpublications/er/ibn/irishbabiesnames2014/





Alwaysstartwithsimplesolutionsfirst.Onlyaddcomplexityifrequired.

Lesson

Frustrafitperpluraquodpotestfieriperpauciora(Itisfutiletodowithmorethingsthatwhichcanbedonewithfewer)

Betterdatausuallybeatsbiggermodels

Predictionisalotofthings1

2 Thereisnosuchthingasafreelunch

3 LookforGoldilocks

4

Chooseyourevaluationcarefully5

6 RememberOccam’sRazor


ThankYouQuestions?

TrainingCourse:FundamentalsofMachine LearningforPredictiveDataAnalyticsDublin,March21st-23rd www.theanalyticsstore.ie/training/

[email protected]@brianmacnamee


http://www.theanalyticsstore.ie/fundamentals-machine-learning-predictive-data-analytics/

mailto:[email protected]

all models are wrong, but some are useful: 6 lessons for making predictive analytics work

Data & Analytics