model selection and resampling methods · 2021. 4. 8. · century adventures of baron munchausen,...
TRANSCRIPT
Modelselectionandresamplingmethods
MScDataScience2ndsememster
Audience
Minimalbackgroundinmathematicsandstatistic-analysisandcalculus(integral,derivatives,studyoffunctions,…)-basicstatisticalconcepts(expectation,median,covariance,distributions,…)
Minimalknowledgeonstatisticalmodeling
-regression-expectation,variance/covariance,statisticaldescriptors,…
BasicexpertisewithPythonandJupyterNotebook-installingnewpackages-writingbasiccodeandrunningpipelines-knowledgeofstandardlibraries(numpy,pandas,scikit-learn)
Thecourse
BasedonlessonsandnotebooksAdditionalreadingmaterialandreferencesareprovidedateachlessonAllthematerialavailableatthecoursewebsite
https://marcolorenzi.github.io/teaching.html
10lessonsWhatisexpectedfromyou:• Bi-weeklyassignments:
4intotal,10points
• Finalexam:10points
Whymodelselection?
source:https://machinelearningmastery.com
Thecommondenominator
Data
Hypothesis
Fancywaytocombinedataaccordingtohypothesis
Answer:prediction,
datarelationship,…
MachineLearningandPigeons
“…aclockisarrangedtopresentthefoodhopperatregularintervalswithnoreferencewhatsoevertothebird'sbehavior”.-Onebirdwasconditionedtoturncounter-clockwiseaboutthecage-Anotherrepeatedlythrustitsheadintooneoftheuppercornersofthecage-Athirddevelopeda'tossing’responseasifplacingitsheadbeneathaninvisiblebarandliftingitrepeatedly-Twobirdsdevelopedapendulummotionoftheheadandbody-Anotherbirdwasconditionedtomakeincompletepeckingorbrushingmovementsdirectedtowardbutnottouchingthefloor
Thebirdhappenstobeexecutingsomeresponseasthehopperappears;asaresultittendstorepeatthisresponse
‘Superstition’inthePigeon
B.F.SkinnerJournalofExperimentalPsychology#38,1947
Modelscanbesuperstitious(andbehavelikepigeons)
Theytendtoenhanceabehavior(prediction)whenapositiveresponse(data)ismet.The“enhancing”abilitydependsontheirassumptions:
AmodelisnottrueAmodelprovidesanopinion: -basedonsomesenseofrealityundertheformofdata
-basedonitsownassumptions
Thejobofadatascientististodeterminewhetheranopinioncanbetrusted
Whendifferentopinionsareavailable,thisiscalled
modelselection
Protagoras,wikipedia.org
Relativism “… there is no absolute evaluation of the nature… because theevaluation will be relative to who is perceiving it. Therefore, toPersonX,theweatheriscold,whereastoPersonY,theweatherishot.Thisphilosophyimpliesthattherearenoabsolute"truths".Thetruth,according toProtagoras, is relative,anddiffersaccording toeachindividual”
Whymodelselection(2)Tacklingawhatevermachinelearningproblem(regression,classifications,…)1-Writingsomecodeimplementingamodelidea2-Gettingthedatafromsomerepository3-Trainingthemodel:
3a.Bugfixes,parametershacking3b.Useonalloronsomepartofthedata,cross-validating,collectingresults,…
4-Repeatpoint3many(many)times.Trying,trying,trying,…5-Onceeverythingworksreasonably,fixingalltheparametersandprocessingsteps6-Publishing/reportingtheresults
Whymodelselection(2)
Mendelsonetal.NeuroImage:Clinical,2017
Acasestudy:automatedclinicaldiagnosisofAlzheimer’sdisease
Standardapproachestomodelselection
-Empirical(part1)JackknifeBootstrapCross-validationStep-wisecomparison-Theoretical(part2)InformationCriteriaBayesianmodelcomparison
Part1.Afocusonresamplingmethods.
“Pullyourselfupbyyourbootstraps”…widelythoughttobebasedononeoftheeighteenthcenturyAdventuresofBaronMunchausen,byRudolfErichRaspe.
Efron&TibshiraniAnIntroductiontoBootstrap,1993
Keyconceptofdataresampling.Doingourbestoutoftheavailableresources.
Attheendofthecourseyouwillbeableto
• Criticallyassesstheperformanceofthemodelonaspecifiedtask
• Identifyandpreventthesourcesofassessmentbias
• Createyourownbenchmarkforavarietyofmodelingproblem
• Identifymodelingalternativesandevaluationstrategies
• Visualizeandpresentperformancesacrossmodels
• Understandthebasisoftheoreticalapproachestomodelselection
Questions?