all models are wrong, but some are useful: 6 lessons for making predictive analytics work
TRANSCRIPT
AllModelsAreWrong,ButSomeAreUseful:6LessonsForMakingPredictiveAnalyticsWorkDr.BrianMacNameebrian.macnamee@ucd.ie@brianmacnamee
machinelearning
ar,ficialintelligence
datascience
cogni,vecompu,ng
bigdata
InspiredbyBrendanTierneyh:p://www.oraly,cs.com/2012/06/data-science-is-mul,disciplinary.html
deeplearning
ar#ficialintelligence
datascience
cogni#vecompu#ng
bigdata
deeplearning
InspiredbyBrendanTierneyh:p://www.oraly#cs.com/2012/06/data-science-is-mul#disciplinary.html
machinelearning
if LOAN-SALARY RATIO < 1.5 then OUTCOME=’repay’
else if LOAN-SALARY RATIO > 4 then OUTCOME=’default’
else if AGE < 40 and OCCUPATION =’industrial’ then OUTCOME=’default’
else OUTCOME=’repay’
end if
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Betterdatausuallybeatsbiggermodels
Predictionisalotofthings1
2 Thereisnosuchthingasafreelunch
3 LookforGoldilocks
4
Chooseyourevaluationcarefully5
6 RememberOccam’sRazor
PredictionIsA LotOfThings
1
Predictingthevalueofan
unknownvariableatatimeinthe
future
Forecast
0
27.5
55
82.5
110
July September November January March May
0
27.5
55
82.5
110
July September November January March May
Predictthevalueofanunknownvariableassociatedwithan
object
Label
Image Set
Image Set
ContainingNerves NotContainingNerves
Predictingthepropensityof
somebodytotakeanactionatatime
inthefuture
Rank
Population
Population
LeastLikelyToRespond
MostLikelyToRespond
"Indataanalyticsapredictionisanassignmentofavaluetoanunknownvariable."FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Predictionsmeansalotofdifferentthings,whichmeanswecanapplypredictivemodellingtomanydifferentproblems.
Thinkcarefullyaboutwhattypeofdecisionyouwanttomake(label,rank,orforecast),andthendesignapredictivemodellingsolutiontobesthelpwiththat.
Lesson
27
ThereIsNoSuchThingAsA FreeLunch
2
www.rapidminer.com
29 www.rapidminer.com
"Wehavedubbedtheassociated resultsNoFreeLunchtheoremsbecausetheydemonstratethatifanalgorithmperformswellonacertainclassofproblemsthenitnecessarilypaysforthatwithdegradedperformanceonthesetofallremainingproblems."
Wolpert&Macready
"No Free Lunch Theorems for Optimization", David H. Wolpert and William G. Macready, IEEE Transactions On Evolutionary Computation, vol. 1, no. 1, 1997 http://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Tree Model
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Nearest Neighbour Model
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Linear Model
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Tree Model
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Nearest Neighbour Model
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Linear Model
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Thereareahugenumberofdifferentpredictivemodellingalgorithms.Youneedtoexperimentwithlotsofdifferentones.
Lesson
randomforestdecisiontreeistonicregressionneuralnetwork nearest neighbour naive Bayes supportvectormachine logistic regressionBayesiannetworkensemblegradientboostinglinearmodelwinnow
LookForGoldilocks
3
●
●
●
●
●
0 20 40 60 80 100
20000
40000
60000
80000
Age
Income
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
●
●
●
●
●
0 20 40 60 80 100
20000
40000
60000
80000
Age
Income
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
●
●
●
●
●
0 20 40 60 80 100
20000
40000
60000
80000
Age
Income
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
●
●
●
●
●
0 20 40 60 80 100
20000
40000
60000
80000
Age
Income
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
0 50 100 150 200
0.1
0.2
0.3
0.4
0.5
Training Iteration
Mis
clas
sific
atio
n R
ate
Performance on Training SetPerformance on Validation Set
Alwaystuneyourmodels,butbeverycarefulofoverfitting.Avalidationdatasetiscrucialhere.
Lesson
56
BetterDataUsuallyBeatsBiggerModels
4
DigitalImageProcessin
g,
Gonzalez&W
oods,2002
DigitalImageProcessin
g,
Gonzalez&W
oods,2002
The fitting is obtained by robust linear regression(we use iteratively reweighted least squares) on the½logðfÞ; Iðf;ϕÞ$ scatter plot for f between f0 andf1 (to be specified) cycles per pixels. Robust regres-sion gives consistent estimations which are not influ-enced by the spurious spikes due to pseudoperiodicnoise. Least-squares estimation also gives the stan-dard deviation σ of the residues.
3. Find the localization of upper outliers in the averagepower spectrum as frequency pairs ðξ; ηÞ such that,under the common 3σ rule
logðgjPj2ðξ; ηÞjÞ − ½A − α logðfÞ$σ
> 3: (10)
This results in an outlier map Mpo such that
Mpoðξ; ηÞ ¼ 1 if an outlier is present at ðξ; ηÞ in the
average spectrum of the patches, and ¼ 0 otherwise.Note that a false-positive rate of 1% is expected undera Gaussian distribution. We restrict the outlier detec-tion to frequencies f > f2 (to be specified), since lowfrequencies do not correspond to repetitive patterns.
4. Resize the outlier map of size L × L to size X × Y, giv-ing a map Mo of the probable spurious spikes causedby quasiperiodic noise in the original image spectrum.Multiplying the initial image spectrum by 1 −Mo actsas a notch filter, eliminating the influence of the qua-siperiodic noise.
5. Retrieve an estimation n of the periodic noise compo-nent as the inverse Fourier transform ofMoðξ;ηÞIðξ;ηÞ,and the estimated denoised image i as i − n (i.e., theinverse transform of ½1 −Moðξ; ηÞ$Iðξ; ηÞ).
3.2 Practical ConsiderationsThe implementation details presented below do notplay a crucial role in the good behavior of the algorithm,but are given in order to enable the algorithm to berecreated.
First, since most images have discontinuities betweentheir left/right (respectively top/bottom) borders, their spec-trum shows dominant straight lines along the horizontalaxis (respectively vertical axis). To reduce these boundaryeffects, we multiply the patches p by a two-dimensional
Denoised image
100 200 300 400 500 600
50
100
150
200
250
300
350
400
4500
50
100
150
200
250Method noise
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450−200
−150
−100
−50
0
50
100
150
200
Original image
80 100 120 140 160 180 200 220
320
340
360
380
400
420 0
50
100
150
200
250Denoised image
80 100 120 140 160 180 200 220
320
340
360
380
400
420 0
50
100
150
200
250
(a) (b)
(c) (d)
Fig. 11 Apollo experiment (2). (a) Denoised image. (b) Estimation of the noise. (c) Close-up view ofthe noisy image. (d) Close-up view of the denoised image.
Journal of Electronic Imaging 013003-9 Jan∕Feb 2015 • Vol. 24(1)
Sur and Grédiac: Automated removal of quasiperiodic noise using frequency domain statistics
DigitalImageProcessin
g,
Gonzalez&W
oods,2002
DigitalImageProcessin
g,
Gonzalez&W
oods,2002
DigitalImageProcessin
g,
Gonzalez&W
oods,2002
RawActivity
NormalisedActivity
WakeAlignedActivity
CumulativeWakeAlignedActivity
Activity
Activity Peakactivity(day)
Variationinactivity(day)
Totalactivity(day)
Peakactivity(1sthour)
Variationinactivity(1sthour)
Totalactivity(1sthour)
Areaundercumulativeactivitycurve
…
ChooseAnAlgorithm
GenerateData
TuneModelParameters
ChooseAnAlgorithm
GenerateData
TuneModelParameters
Developingnew,richerfeaturesisoftenabetterwaytoimprovemodelperformancethanusingmoresophisticatedmodellingtechniques.
Lesson
AnAsideOnDeepLearning
Deep Learning
Google Trends: http://www.google.com/trends/
2005 2007 2009 2011 2013 2015
Deep-learningmethodsarerepresentaUon-learningmethodswithmul\plelevelsofrepresenta\on,
obtainedbycomposingsimplebutnon-linearmodulesthateachtransformtherepresenta\onatonelevel
(star\ngwiththerawinput)intoarepresenta\onatahigher,slightlymoreabstractlevel.
[LeCunetal,2014]
Deep Learning Yann LeCun, Yoshua Bengio & Geoffrey Hinton http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html
0 1 2 3 4 5 6 7 8
9
Convolu\onalneuralnetworksseemtobrilliantlyaddresstheselecUvity-invariancedilemmathatis
fundamentaltoalleffortstolearntoclassifyobjects:theyproducerepresenta\onsthatareselec\vetothe
aspectsoftheimagethatareimportantfordiscrimina\on,butthatareinvarianttoirrelevant
aspects
Convolu\onalnetworksholdrecordsforproblemsinimagerecogniUon,speechrecogniUon,andtext
classificaUonamongstotherareas
On Welsh Corgis, Computer Vision, and the Power of Deep Learning, Microsoft Research, 2014 http://research.microsoft.com/en-us/news/features/dnnvision-071414.aspx Rise of the machines, The Economist, 2015 http://www.economist.com/news/briefing/21650526-artificial-intelligence-scares-peopleexcessively-so-rise-machines
HardwareDataAlgorithms
Applica4ons
79
ChooseYourEvaluationCarefully
5
A marketing company working for a charity has developed two different models that predict the likelihood that donors will respond to a mail-shot asking them to make a special extra donation. Two models have been built and an evaluation experiment had been performed. Now we must decide which model to use.
Prediction
TRUE FALSE
TargetTRUE 2355 337
FALSE 329 1714
ClassificationAccuracy:85.93%
Model1
Prediction
TRUE FALSE
TargetTRUE 2198 494
FALSE 471 1572
ClassificationAccuracy:79.62%
Model2
Model1
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Model2
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
Therearemanydifferentperformancemeasuresthatwecanusetoevaluatetheperformanceofamodel.Youneedtopicktheonethatbestmatchesthedecisionsyouaretryingtomake.
Lesson
87
RememberOccam’sRazor
6
Timeline
Followers
Following
Tweets+ Metadata
Profile
Tweets+ Metadata
Profile
Tweets+ Metadata
Profile
http://www.cso.ie/en/releasesandpublications/er/ibn/irishbabiesnames2014/
Alwaysstartwithsimplesolutionsfirst.Onlyaddcomplexityifrequired.
Lesson
Frustrafitperpluraquodpotestfieriperpauciora(Itisfutiletodowithmorethingsthatwhichcanbedonewithfewer)
Betterdatausuallybeatsbiggermodels
Predictionisalotofthings1
2 Thereisnosuchthingasafreelunch
3 LookforGoldilocks
4
Chooseyourevaluationcarefully5
6 RememberOccam’sRazor
FundamentalsofMachineLearningforPredictiveDataAnalyticsJohnKelleher,BrianMacNamee,andAoifeD'Arcy www.machinelearningbook.com
ThankYouQuestions?
TrainingCourse:FundamentalsofMachine LearningforPredictiveDataAnalyticsDublin,March21st-23rd www.theanalyticsstore.ie/training/
[email protected]@brianmacnamee