supervised learning - penn engineering · 2019. 1. 22. · supervised learning : examples §...
TRANSCRIPT
SupervisedLearning
RobotImageCredit:Viktoriya Sukhanova©123RF.com
TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.PleasesendcommentsandcorrectionstoEric.
TheBadgesGame
Background:• Pre-registeredattendeesatthe1994MachineLearningConferencereceivedanamebadgelabeledwitha"+"or"-"
• Thelabelisbasedonly uponthename• Thereare294examples(210positiveand84negative)
Whatfunctionwasusedtogeneratethe+/- labeling?
+NaokiAbe - EricBaum
TrainingData
3
+NaokiAbe- Myriam Abramson+DavidW.Aha+KamalM.Ali- EricAllender+DanaAngluin- Chidanand Apte+MinoruAsada+LarsAsker+Javed Aslam+JoseL.Balcazar- CristinaBaroglio
+PeterBartlett- EricBaum+Welton Becket- Shai Ben-David+GeorgeBerg+NeilBerkman+Malini Bhandaru+Bir Bhanu+Reinhard Blasig- Avrim Blum- AnselmBlumer+JustinBoyan
+CarlaE.Brodley+NaderBshouty- WrayBuntine- Andrey Burago+TomBylander+BillByrne- ClaireCardie+JohnCase+JasonCatlett- PhilipChan- Zhixiang Chen- ChrisDarken
TestData
5
?Shivani Agarwal?ChrisCallison-Burch?EricEaton?PeterStone?MatthewTaylor
LabeledTestData
6
- Shivani Agarwal- ChrisCallison-Burch- EricEaton+PeterStone+MatthewTaylor
WhatisLearning?• TheBadgesGameisanexampleofakeylearningprotocol:supervisedlearning
• Firstquestion:Areyousureyougotit?Why?• Issues:–Whichproblemwaseasier: predictionormodeling?– Representation– Problemsetting– BackgroundKnowledge–Whendidlearningtakeplace?
Algorithm:canyouwriteaprogramthattakesthisdataasinputandpredictsthelabelforyourname?
7
Output
y∈YAnitemy
drawnfromanoutputspaceY
Input
x∈XAnitemx
drawnfromaninputspaceX
Systemy =f(x)
SupervisedLearning
• Weconsidersystemsthatapplyanunknownfunctionf()toinputitemsxandreturnanoutputy =f(x).
8
Output
y∈YAnitemy
drawnfromanoutputspaceY
Input
x∈XAnitemx
drawnfromaninputspaceX
Systemy =f(x)
SupervisedLearning
• In(supervised)machinelearning,ourgoalistolearnafunctionh()fromexamplesthatapproximatesf()
9
Output
y∈Y
Anitemydrawnfromalabel
spaceY
Input
x∈X
AnitemxdrawnfromaninstancespaceX
LearnedModely=h(x)
Supervisedlearning
10
Targetfunctiony=f(x)
y = h(x)
Supervisedlearning:Training
• GivethelearnerexamplesinD train
• Thelearnerreturnsamodelh(x)11
LabeledTrainingDataD train
(x1,y1)(x2,y2)…
(xN,yN)
Learnedmodelh(x)
LearningAlgorithm
Canyousuggestotherlearningprotocols?
h(x)isthemodelwe’lluseinourapplication
FunctionApproximationProblemSetting• Setofpossibleinstances• Setofpossiblelabels• Unknowntargetfunction• Setoffunctionhypotheses
Input:Trainingexamplesofunknowntargetfunctionf
Output:Hypothesisthatbestapproximatesf
XY
f : X ! YH = {h | h : X ! Y}
h 2 H
BasedonslidebyTomMitchell
{hxi, yii}ni=1 = {hx1, y1i , . . . , hxn, yni}
SampleDataset• ColumnsdenotefeaturesXi
• Rowsdenotelabeledinstances• Classlabeldenoteswhetheratennisgamewasplayed
hxi, yii
hxi, yii
Supervisedlearning:Testing
• Reservesomelabeleddatafortesting14
LabeledTestData
D test
(x’1,y’1)(x’2,y’2)
…(x’M,y’M)
Supervisedlearning:Testing
LabeledTestData
D test
(x’1,y’1)(x’2,y’2)
…(x’M,y’M)
TestLabelsY test
y’1y’2...y’M
RawTestDataX test
x’1x’2….x’M
15
TestLabelsY test
y’1y’2...y’M
RawTestDataX test
x’1x’2….x’M
Supervisedlearning:Testing• Applythemodeltotherawtestdata• Evaluatebycomparingpredictedlabelsagainstthetestlabels
16
Learnedmodelh(x)
PredictedLabelsh(X test)h(x’1)h(x’2)….
h(x’M)
Canyouuse thetestdataotherwise?
SupervisedLearning:Examples
§ Diseasediagnosis§ x:Propertiesofpatient(symptoms,labtests)§ f:Disease(ormaybe:recommendedtherapy)
§ Part-of-Speechtagging§ x:AnEnglishsentence(e.g.,Thecanwillrust)§ f:Thepartofspeechofawordinthesentence
§ Facerecognition§ x:Bitmappictureofperson’sface§ f:Nametheperson(ormaybe:apropertyof)
§ AutomaticSteering§ x:Bitmappictureofroadsurfaceinfrontofcar§ f:Degreestoturnthesteeringwheel
17
Manyproblemsthatdonotseemlikeclassificationproblemscanbedecomposedintoclassificationproblems.
KeyIssuesinMachineLearning• Modeling
– Howtoformulateapplicationproblemsasmachinelearningproblems?– Howtorepresentthedata?– LearningProtocols(whereisthedata&labelscomingfrom?)
• Representation– Whatfunctions shouldwelearn(hypothesisspaces)?– Howtomaprawinput toaninstancespace?– Anyrigorouswaytofindthese?Anygeneralapproach?
• Algorithms– Whataregoodalgorithms?– Howdowedefinesuccess?– Generalizationvs.overfitting– Thecomputationalproblem
18
Usingsupervisedlearning
§ Whatisourinstancespace?§ Whatkindoffeaturesareweusing?
§ Whatisourlabelspace?§ Whatkindoflearningtaskarewedealingwith?
§ Whatisourhypothesisspace?§ Whatkindoffunctions(models)arewelearning?
§ Whatlearningalgorithmdoweuse?§ Howdowelearnthemodelfromthelabeleddata?
§ Whatisourlossfunction/evaluationmetric?§ Howdowemeasuresuccess?Whatdriveslearning?
19
Output
y∈YAnitemy
drawnfromalabelspaceY
Input
x∈XAnitemx
drawnfromaninstancespaceX
LearnedModelh(x)
1.TheinstancespaceX
• DesigninganappropriateinstancespaceX iscrucialforhowwellwecanpredicty.
20
1.TheinstancespaceX§ Whenweapplymachinelearningtoatask,wefirst
needtodefinetheinstancespaceX.§ Instancesx∈ X aredefinedbyfeatures:
§ Booleanfeatures:§ Isthereafoldernamedafterthesender?§ Doesthisemailcontainstheword‘class’?§ Doesthisemailcontainstheword‘waiting’?§ Doesthisemailcontainstheword‘class’andtheword‘waiting’?
§ Numericalfeatures:§ Howoftendoes‘learning’occurinthisemail?§ Whatlongisemail?§ HowmanyemailshaveIseenfromthissenderoverthelastday/week/month?
§ Bagoftokens§ Justlistallthetokens intheinput 21
Doesitaddanything?
What’sX fortheBadgesgame?
§ Possiblefeatures:§ Gender§ Name’scountry-of-origin§ Lengthoftheirfirstorlastname§ Doesthenamecontainletter‘x’?§ Howmanyvowelsdoestheirnamecontain?§ Isthen-th letteravowel?§ Doesthenamehavethesamenumberofvowelsandconsonants?
22
X asavectorspace
§ X isanN-dimensionalvectorspace(e.g.<N)§ Eachdimension=onefeature.
§ Eachx isafeaturevector(hencetheboldfacex).§ Thinkofx =[x1 …xN]asapointinX :
23x1
x2
Goodfeaturesareessential§ Thechoiceoffeaturesiscrucial forhowwellataskcanbelearned
§ Inmanyapplicationareas(language,vision,etc.),alotofworkgoesintodesigningsuitablefeatures
§ Thisrequiresdomainexpertise
§ Thinkaboutthebadgesgame– whatifyouwerefocusingonvisualfeatures?
§ Wecan’tteachyouwhatspecificfeaturestouseforyourtask§ Butwewilltouchonsomegeneralprinciples
24
Output
y∈YAnitemy
drawnfromalabelspaceY
Input
x∈XAnitemx
drawnfromaninstancespaceX
LearnedModelh(x)
2.ThelabelspaceY
• ThelabelspaceY determineswhatkind ofsupervisedlearningtask wearedealingwith
25
SupervisedlearningtasksI
§ Outputlabelsy∈Y arecategorical:§ Binaryclassification:Twopossiblelabels§ Multi-classclassification:kpossiblelabels
§ Outputlabelsy∈Y arestructuredobjects (sequencesoflabels,parsetrees,etc.)
§ Structurelearning
26
SupervisedlearningtasksII
§ Outputlabelsy∈Y arenumerical:§ Regression(linear/polynomial):
§ Labelsarecontinuous-valued§ Learnalinear/polynomialfunctionf(x)
§ Ranking:§ Labelsareordinal§ Learnanorderingf(x1)>f(x2)overinput
27
Output
y∈YAnitemy
drawnfromalabelspaceY
Input
x∈XAnitemx
drawnfromaninstancespaceX
LearnedModelh(x)
3.Themodelh(x)
• Weneedtochoosewhatkind ofmodelwewanttolearn
28
ALearningProblem
29
y = f (x1, x2, x3, x4)Unknownfunction
x1x2x3x4
Example x1 x2 x3 x4 y1 0 0 1 0 0
3 0 0 1 1 14 1 0 0 1 15 0 1 1 0 06 1 1 0 0 07 0 1 0 1 0
2 0 1 0 0 0Canyoulearnthis
function?Whatisit?
HypothesisSpaceCompleteIgnorance:Thereare216 =65536possiblefunctionsoverfourinputfeatures.
Wecan’tfigureoutwhichoneiscorrectuntilwe’veseeneverypossibleinput-outputpair.
Afterobservingsevenexampleswestillhave29 possibilitiesfor f
IsLearningPossible?
30
Example x1 x2 x3 x4 y
16 1 1 1 1 ?
1 0 0 0 0 ?
1 0 0 0 ?
1 0 1 1 ?1 1 0 0 01 1 0 1 ?
1 0 1 0 ?1 0 0 1 1
0 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 ?
0 0 1 1 10 0 1 0 0
2 0 0 0 1 ?
1 1 1 0 ?
q Thereare|Y||X| possiblefunctionsf(x)fromtheinstancespaceX tothelabelspaceY.
q Learnerstypicallyconsideronlyasubset ofthefunctionsfromX toY,calledthehypothesisspaceH .H⊆|Y||X|
GeneralstrategiesforMachineLearning
§ Developflexiblehypothesisspaces:§ Decisiontrees,neuralnetworks,nestedcollections.§ Constrainingthehypothesisspaceisdonealgorithmically
§ Developrepresentationlanguagesforrestrictedclassesoffunctions:§ Servetolimittheexpressivityofthetargetmodels§ E.g.,Functionalrepresentation(n-of-m);Grammars;linearfunctions;stochasticmodels;
§ Getflexibilitybyaugmentingthefeaturespace§ Ineithercase:
§ Developalgorithmsforfindingahypothesisinourhypothesisspace,thatfitsthedata
§ Andhopethattheywillgeneralizewell
34
KeyIssuesinMachineLearning• Modeling
– Howtoformulateapplicationproblemsasmachinelearningproblems?– Howtorepresentthedata?– LearningProtocols(whereisthedata&labelscomingfrom?)
• Representation– Whatfunctions shouldwelearn(hypothesisspaces)?– Howtomaprawinput toaninstancespace?– Anyrigorouswaytofindthese?Anygeneralapproach?
• Algorithms– Whataregoodalgorithms?– Howdowedefinesuccess?– Generalizationvs.overfitting– Thecomputationalproblem
35