machine learning – some basics - bishop prml ch....
TRANSCRIPT
![Page 1: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/1.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Machine Learning – some basicsBishop PRML Ch. 1
Torsten Möller
![Page 2: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/2.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Outline
Machine Learning: What, Why, and How?
Curve Fitting: (e.g.) Regression and Model Selection
Goal Function: Decision Theory—ML, Loss Function, MAP
Probability Theory: (e.g.) Coin Tossing and Parameter Estimation
Conclusion
Machine Learning–The basics ©Möller / Mori 1
![Page 3: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/3.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
References
• Christopher M. Bishop: Pattern Recognition and MachineLearning; Springer, 2007
• Han, Kamber: Data Mining: Concepts and Techniques;Elsevier, 2012
• Hastie, Tibshirani, Friedman: The Elements of StatisticalLearning; Springer, 2009
• James, Witten, Hastie, Tibshirani: An Introduction toStatistical Learning with Applications in R; Springer, 2015
• Kevin Murphy: Machine Learning, A ProbabilisticPerspective, The MIT Press, 2012
• Tan, Steinbach, Kumar: Introduction to Data Mining;Pearson, 2005
Machine Learning–The basics ©Möller / Mori 2
![Page 4: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/4.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Outline
Machine Learning: What, Why, and How?
Curve Fitting: (e.g.) Regression and Model Selection
Goal Function: Decision Theory—ML, Loss Function, MAP
Probability Theory: (e.g.) Coin Tossing and Parameter Estimation
Conclusion
Machine Learning–The basics ©Möller / Mori 3
![Page 5: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/5.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
What is Machine Learning (ML)?
• Algorithms that automatically improve performance throughexperience
• Often this means define a model by hand, and use data to fitits parameters
Machine Learning–The basics ©Möller / Mori 4
![Page 6: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/6.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Why ML?
• The real world is complex – difficult to hand-craft solutions.• ML is the preferred framework for applications in many fields:
• Computer Vision• Natural Language Processing, Speech Recognition• Robotics• …
Machine Learning–The basics ©Möller / Mori 5
![Page 7: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/7.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Hand-written Digit Recognition
!"#$%&'"()#*$#+ !,- )( !.,$#/+ +%((/#/01/! 233456 334
7/#()#-! 8/#9 */:: )0 "'%! /$!9 +$"$;$!/ +,/ ") "'/ :$1< )(
8$#%$"%)0 %0 :%&'"%0& =>?@ 2ABC D,!" -$</! %" ($!"/#56E'/ 7#)")"97/ !/:/1"%)0 $:&)#%"'- %! %::,!"#$"/+ %0 F%&6 GH6
C! !//0I 8%/*! $#/ $::)1$"/+ -$%0:9 ()# -)#/ 1)-7:/J1$"/&)#%/! *%"' '%&' *%"'%0 1:$!! 8$#%$;%:%"96 E'/ 1,#8/-$#</+ 3BK7#)") %0 F%&6 L !')*! "'/ %-7#)8/+ 1:$!!%(%1$"%)07/#()#-$01/ ,!%0& "'%! 7#)")"97/ !/:/1"%)0 !"#$"/&9 %0!"/$+)( /.,$::9K!7$1/+ 8%/*!6 M)"/ "'$" */ );"$%0 $ >6? 7/#1/0"
/##)# #$"/ *%"' $0 $8/#$&/ )( )0:9 (),# "*)K+%-/0!%)0$:
8%/*! ()# /$1' "'#//K+%-/0!%)0$: );D/1"I "'$0<! ") "'/
(:/J%;%:%"9 7#)8%+/+ ;9 "'/ -$"1'%0& $:&)#%"'-6
!"# $%&'() *+,-. */0+12.33. 4,3,5,6.
N,# 0/J" /J7/#%-/0" %08):8/! "'/ OAPQKR !'$7/ !%:'),/""/
+$"$;$!/I !7/1%(%1$::9 B)#/ PJ7/#%-/0" BPK3'$7/KG 7$#" SI
*'%1' -/$!,#/! 7/#()#-$01/ )( !%-%:$#%"9K;$!/+ #/"#%/8$:
=>T@6 E'/ +$"$;$!/ 1)0!%!"! )( GI?HH %-$&/!U RH !'$7/
1$"/&)#%/!I >H %-$&/! 7/# 1$"/&)#96 E'/ 7/#()#-$01/ %!
-/$!,#/+ ,!%0& "'/ !)K1$::/+ V;,::!/9/ "/!"IW %0 *'%1' /$1'
!"# $%%% &'()*(+&$,)* ,) -(&&%') ()(./*$* ()0 1(+2$)% $)&%..$3%)+%4 5,.6 784 ),6 784 (-'$. 7997
:;<6 #6 (== >? @AB C;DE=FDD;?;BG 1)$*& @BD@ G;<;@D HD;I< >HJ CB@A>G KLM >H@ >? "94999N6 &AB @BO@ FP>QB BFEA G;<;@ ;IG;EF@BD @AB BOFCR=B IHCPBJ
?>==>SBG PT @AB @JHB =FPB= FIG @AB FDD;<IBG =FPB=6
:;<6 U6 M0 >PVBE@ JBE><I;@;>I HD;I< @AB +,$.W79 GF@F DB@6 +>CRFJ;D>I >?@BD@ DB@ BJJ>J ?>J **04 *AFRB 0;D@FIEB K*0N4 FIG *AFRB 0;D@FIEB S;@A!!"#$%&$' RJ>@>@TRBD K*0WRJ>@>N QBJDHD IHCPBJ >? RJ>@>@TRB Q;BSD6 :>J**0 FIG *04 SB QFJ;BG @AB IHCPBJ >? RJ>@>@TRBD HI;?>JC=T ?>J F==>PVBE@D6 :>J *0WRJ>@>4 @AB IHCPBJ >? RJ>@>@TRBD RBJ >PVBE@ GBRBIGBG >I@AB S;@A;IW>PVBE@ QFJ;F@;>I FD SB== FD @AB PB@SBBIW>PVBE@ D;C;=FJ;@T6
:;<6 "96 -J>@>@TRB Q;BSD DB=BE@BG ?>J @S> G;??BJBI@ M0 >PVBE@D ?J>C @AB+,$. GF@F DB@ HD;I< @AB F=<>J;@AC GBDEJ;PBG ;I *BE@;>I !676 X;@A @A;DFRRJ>FEA4 Q;BSD FJB F==>EF@BG FGFR@;QB=T GBRBIG;I< >I @AB Q;DHF=E>CR=BO;@T >? FI >PVBE@ S;@A JBDRBE@ @> Q;BS;I< FI<=B6
Belongie et al. PAMI 2002
• Difficult to hand-craft rules about digits
Machine Learning–The basics ©Möller / Mori 6
![Page 8: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/8.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Hand-written Digit Recognition
xi =
!"#$%&'"()#*$#+ !,- )( !.,$#/+ +%((/#/01/! 233456 334
7/#()#-! 8/#9 */:: )0 "'%! /$!9 +$"$;$!/ +,/ ") "'/ :$1< )(
8$#%$"%)0 %0 :%&'"%0& =>?@ 2ABC D,!" -$</! %" ($!"/#56E'/ 7#)")"97/ !/:/1"%)0 $:&)#%"'- %! %::,!"#$"/+ %0 F%&6 GH6
C! !//0I 8%/*! $#/ $::)1$"/+ -$%0:9 ()# -)#/ 1)-7:/J1$"/&)#%/! *%"' '%&' *%"'%0 1:$!! 8$#%$;%:%"96 E'/ 1,#8/-$#</+ 3BK7#)") %0 F%&6 L !')*! "'/ %-7#)8/+ 1:$!!%(%1$"%)07/#()#-$01/ ,!%0& "'%! 7#)")"97/ !/:/1"%)0 !"#$"/&9 %0!"/$+)( /.,$::9K!7$1/+ 8%/*!6 M)"/ "'$" */ );"$%0 $ >6? 7/#1/0"
/##)# #$"/ *%"' $0 $8/#$&/ )( )0:9 (),# "*)K+%-/0!%)0$:
8%/*! ()# /$1' "'#//K+%-/0!%)0$: );D/1"I "'$0<! ") "'/
(:/J%;%:%"9 7#)8%+/+ ;9 "'/ -$"1'%0& $:&)#%"'-6
!"# $%&'() *+,-. */0+12.33. 4,3,5,6.
N,# 0/J" /J7/#%-/0" %08):8/! "'/ OAPQKR !'$7/ !%:'),/""/
+$"$;$!/I !7/1%(%1$::9 B)#/ PJ7/#%-/0" BPK3'$7/KG 7$#" SI
*'%1' -/$!,#/! 7/#()#-$01/ )( !%-%:$#%"9K;$!/+ #/"#%/8$:
=>T@6 E'/ +$"$;$!/ 1)0!%!"! )( GI?HH %-$&/!U RH !'$7/
1$"/&)#%/!I >H %-$&/! 7/# 1$"/&)#96 E'/ 7/#()#-$01/ %!
-/$!,#/+ ,!%0& "'/ !)K1$::/+ V;,::!/9/ "/!"IW %0 *'%1' /$1'
!"# $%%% &'()*(+&$,)* ,) -(&&%') ()(./*$* ()0 1(+2$)% $)&%..$3%)+%4 5,.6 784 ),6 784 (-'$. 7997
:;<6 #6 (== >? @AB C;DE=FDD;?;BG 1)$*& @BD@ G;<;@D HD;I< >HJ CB@A>G KLM >H@ >? "94999N6 &AB @BO@ FP>QB BFEA G;<;@ ;IG;EF@BD @AB BOFCR=B IHCPBJ
?>==>SBG PT @AB @JHB =FPB= FIG @AB FDD;<IBG =FPB=6
:;<6 U6 M0 >PVBE@ JBE><I;@;>I HD;I< @AB +,$.W79 GF@F DB@6 +>CRFJ;D>I >?@BD@ DB@ BJJ>J ?>J **04 *AFRB 0;D@FIEB K*0N4 FIG *AFRB 0;D@FIEB S;@A!!"#$%&$' RJ>@>@TRBD K*0WRJ>@>N QBJDHD IHCPBJ >? RJ>@>@TRB Q;BSD6 :>J**0 FIG *04 SB QFJ;BG @AB IHCPBJ >? RJ>@>@TRBD HI;?>JC=T ?>J F==>PVBE@D6 :>J *0WRJ>@>4 @AB IHCPBJ >? RJ>@>@TRBD RBJ >PVBE@ GBRBIGBG >I@AB S;@A;IW>PVBE@ QFJ;F@;>I FD SB== FD @AB PB@SBBIW>PVBE@ D;C;=FJ;@T6
:;<6 "96 -J>@>@TRB Q;BSD DB=BE@BG ?>J @S> G;??BJBI@ M0 >PVBE@D ?J>C @AB+,$. GF@F DB@ HD;I< @AB F=<>J;@AC GBDEJ;PBG ;I *BE@;>I !676 X;@A @A;DFRRJ>FEA4 Q;BSD FJB F==>EF@BG FGFR@;QB=T GBRBIG;I< >I @AB Q;DHF=E>CR=BO;@T >? FI >PVBE@ S;@A JBDRBE@ @> Q;BS;I< FI<=B6
ti = (0, 0, 0, 1, 0, 0, 0, 0, 0, 0)
• Represent input image as a vector xi ∈ R784.• Suppose we have a target vector ti
• This is supervised learning• Discrete, finite label set: perhaps ti ∈ {0, 1}10, a classificationproblem
• Given a training set {(x1, t1), . . . , (xN , tN )}, learningproblem is to construct a “good” function y(x) from these.
• y : R784 → R10
Machine Learning–The basics ©Möller / Mori 7
![Page 9: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/9.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Face Detection
Schneiderman and Kanade, IJCV 2002
• Classification problem• ti ∈ {0, 1, 2}, non-face, frontal face, profile face.
Machine Learning–The basics ©Möller / Mori 8
![Page 10: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/10.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Spam Detection
• Classification problem• ti ∈ {0, 1}, non-spam, spam• xi counts of words, e.g. Viagra, stock, outperform,
multi-bagger
Machine Learning–The basics ©Möller / Mori 9
![Page 11: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/11.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Caveat - Horses (source?)• Once upon a time there were two neighboring farmers, Jedand Ned. Each owned a horse, and the horses both liked tojump the fence between the two farms. Clearly the farmersneeded some means to tell whose horse was whose.
• So Jed and Ned got together and agreed on a scheme fordiscriminating between horses. Jed would cut a small notchin one ear of his horse. Not a big, painful notch, but just bigenough to be seen. Well, wouldn’t you know it, the day afterJed cut the notch in horse’s ear, Ned’s horse caught on thebarbed wire fence and tore his ear the exact same way!
• Something else had to be devised, so Jed tied a big bluebow on the tail of his horse. But the next day, Jed’s horsejumped the fence, ran into the field where Ned’s horse wasgrazing, and chewed the bow right off the other horse’s tail.Ate the whole bow!
Machine Learning–The basics ©Möller / Mori 10
![Page 12: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/12.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Caveat - Horses (source?)• Once upon a time there were two neighboring farmers, Jedand Ned. Each owned a horse, and the horses both liked tojump the fence between the two farms. Clearly the farmersneeded some means to tell whose horse was whose.
• So Jed and Ned got together and agreed on a scheme fordiscriminating between horses. Jed would cut a small notchin one ear of his horse. Not a big, painful notch, but just bigenough to be seen. Well, wouldn’t you know it, the day afterJed cut the notch in horse’s ear, Ned’s horse caught on thebarbed wire fence and tore his ear the exact same way!
• Something else had to be devised, so Jed tied a big bluebow on the tail of his horse. But the next day, Jed’s horsejumped the fence, ran into the field where Ned’s horse wasgrazing, and chewed the bow right off the other horse’s tail.Ate the whole bow!
Machine Learning–The basics ©Möller / Mori 11
![Page 13: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/13.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Caveat - Horses (source?)• Once upon a time there were two neighboring farmers, Jedand Ned. Each owned a horse, and the horses both liked tojump the fence between the two farms. Clearly the farmersneeded some means to tell whose horse was whose.
• So Jed and Ned got together and agreed on a scheme fordiscriminating between horses. Jed would cut a small notchin one ear of his horse. Not a big, painful notch, but just bigenough to be seen. Well, wouldn’t you know it, the day afterJed cut the notch in horse’s ear, Ned’s horse caught on thebarbed wire fence and tore his ear the exact same way!
• Something else had to be devised, so Jed tied a big bluebow on the tail of his horse. But the next day, Jed’s horsejumped the fence, ran into the field where Ned’s horse wasgrazing, and chewed the bow right off the other horse’s tail.Ate the whole bow!
Machine Learning–The basics ©Möller / Mori 12
![Page 14: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/14.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Caveat - Horses (source?)
• Finally, Jed suggested, and Ned concurred, that they shouldpick a feature that was less apt to change. Height seemedlike a good feature to use. But were the heights different?Well, each farmer went and measured his horse, and do youknow what? The brown horse was a full inch taller than thewhite one!
Moral of the story: ML provides theory and tools for settingparameters. Make sure you have the right model and features.Think about your “feature vector x.”
Machine Learning–The basics ©Möller / Mori 13
![Page 15: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/15.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Caveat - Horses (source?)
• Finally, Jed suggested, and Ned concurred, that they shouldpick a feature that was less apt to change. Height seemedlike a good feature to use. But were the heights different?Well, each farmer went and measured his horse, and do youknow what? The brown horse was a full inch taller than thewhite one!
Moral of the story: ML provides theory and tools for settingparameters. Make sure you have the right model and features.Think about your “feature vector x.”
Machine Learning–The basics ©Möller / Mori 14
![Page 16: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/16.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Stock Price Prediction
• Problems in which ti is continuous are called regression• E.g. ti is stock price, xi contains company profit, debt, cashflow, gross sales, number of spam emails sent, …
Machine Learning–The basics ©Möller / Mori 15
![Page 17: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/17.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Clustering Images
Wang et al., CVPR 2006
• Only xi is defined: unsupervised learning• E.g. xi describes image, find groups of similar images
Machine Learning–The basics ©Möller / Mori 16
![Page 18: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/18.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Types of Learning Problems
• Supervised Learning• Classification• Regression
• Unsupervised Learning• Density estimation• Clustering: k-means, mixture models, hierarchical clustering• Hidden Markov models
• Reinforcement Learning
Machine Learning–The basics ©Möller / Mori 17
![Page 19: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/19.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Outline
Machine Learning: What, Why, and How?
Curve Fitting: (e.g.) Regression and Model Selection
Goal Function: Decision Theory—ML, Loss Function, MAP
Probability Theory: (e.g.) Coin Tossing and Parameter Estimation
Conclusion
Machine Learning–The basics ©Möller / Mori 18
![Page 20: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/20.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
An Example - Polynomial Curve Fitting
�
�
0 1
−1
0
1
• Suppose we are given training set of N observations(x1, . . . , xN ) and (t1, . . . , tN ), xi, ti ∈ R
• Regression problem, estimate y(x) from these dataMachine Learning–The basics ©Möller / Mori 19
![Page 21: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/21.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Polynomial Curve Fitting
• What form is y(x)?• Let’s try polynomials of degree M :
y(x,w) = w0+w1x+w2x2+. . .+wMxM
• This is the hypothesis space.
• How do we measure success?• Sum of squared errors:
E(w) =1
2
N∑n=1
{y(xn,w)− tn}2
• Among functions in the class, choosethat which minimizes this error
�
�
0 1
−1
0
1
t
x
y(xn,w)
tn
xn
Machine Learning–The basics ©Möller / Mori 20
![Page 22: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/22.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Polynomial Curve Fitting
• What form is y(x)?• Let’s try polynomials of degree M :
y(x,w) = w0+w1x+w2x2+. . .+wMxM
• This is the hypothesis space.
• How do we measure success?• Sum of squared errors:
E(w) =1
2
N∑n=1
{y(xn,w)− tn}2
• Among functions in the class, choosethat which minimizes this error
�
�
0 1
−1
0
1
t
x
y(xn,w)
tn
xn
Machine Learning–The basics ©Möller / Mori 21
![Page 23: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/23.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Polynomial Curve Fitting
• What form is y(x)?• Let’s try polynomials of degree M :
y(x,w) = w0+w1x+w2x2+. . .+wMxM
• This is the hypothesis space.
• How do we measure success?• Sum of squared errors:
E(w) =1
2
N∑n=1
{y(xn,w)− tn}2
• Among functions in the class, choosethat which minimizes this error
�
�
0 1
−1
0
1
t
x
y(xn,w)
tn
xn
Machine Learning–The basics ©Möller / Mori 22
![Page 24: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/24.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Polynomial Curve Fitting
• Error function
E(w) =1
2
N∑n=1
{y(xn,w)− tn}2
• Best coefficients
w∗ = argminw
E(w)
• Found using pseudo-inverse
Machine Learning–The basics ©Möller / Mori 23
![Page 25: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/25.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Which Degree of Polynomial?
�
�
�����
0 1
−1
0
1
�
�
�����
0 1
−1
0
1
�
�
�����
0 1
−1
0
1
�
�
�����
0 1
−1
0
1
• A model selection problem• M = 9 → E(w∗) = 0: This is over-fitting
Machine Learning–The basics ©Möller / Mori 24
![Page 26: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/26.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Generalization
�
�����
0 3 6 90
0.5
1TrainingTest
• Generalization is the holy grail of ML• Want good performance for new data
• Measure generalization using a separate set• Use root-mean-squared (RMS) error: ERMS =
√2E(w∗)/N
Machine Learning–The basics ©Möller / Mori 25
![Page 27: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/27.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Controlling Over-fitting: Regularization
�
�
�����
0 1
−1
0
1
• As order of polynomial M increases, so do coefficientmagnitudes
• Penalize large coefficients in error function:
E(w) =1
2
N∑n=1
{y(xn,w)− tn}2 +λ
2||w||2
Machine Learning–The basics ©Möller / Mori 26
![Page 28: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/28.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Controlling Over-fitting: Regularization
�
�
�����
0 1
−1
0
1
• As order of polynomial M increases, so do coefficientmagnitudes
• Penalize large coefficients in error function:
E(w) =1
2
N∑n=1
{y(xn,w)− tn}2 +λ
2||w||2
Machine Learning–The basics ©Möller / Mori 27
![Page 29: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/29.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Controlling Over-fitting: Regularization
�
�
� ������� �
0 1
−1
0
1
�
�
� ������
0 1
−1
0
1
Machine Learning–The basics ©Möller / Mori 28
![Page 30: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/30.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Controlling Over-fitting: Regularization
�����
� ���−35 −30 −25 −200
0.5
1TrainingTest
• Note the ERMS for the training set. Perfect match of trainingset with the model is a result of over-fitting
• Training and test error show similar trend
Machine Learning–The basics ©Möller / Mori 29
![Page 31: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/31.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Over-fitting: Dataset size
�
�
�������
0 1
−1
0
1
�
�
���������
0 1
−1
0
1
• With more data, more complex model (M = 9) can be fit• Rule of thumb: 10 datapoints for each parameter
Machine Learning–The basics ©Möller / Mori 30
![Page 32: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/32.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Validation Set
• Split training data into training set and validation set• Train different models (e.g. diff. order polynomials) ontraining set
• Choose model (e.g. order of polynomial) with minimum erroron validation set
Machine Learning–The basics ©Möller / Mori 31
![Page 33: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/33.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Cross-validationrun 1
run 2
run 3
run 4
• Data are often limited• Cross-validation creates S groups of data, use S − 1 to train,other to validate
• Extreme case leave-one-out cross-validation (LOO-CV): S isnumber of training data points
• Cross-validation is an effective method for model selection,but can be slow
• Models with multiple complexity parameters: exponentialnumber of runs
Machine Learning–The basics ©Möller / Mori 32
![Page 34: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/34.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Summary
• Want models that generalize to new data• Train model on training set• Measure performance on held-out test set
• Performance on test set is good estimate of performance onnew data
Machine Learning–The basics ©Möller / Mori 33
![Page 35: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/35.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Summary - Model Selection
• Which model to use? E.g. which degree polynomial?• Training set error is lower with more complex model
• Can’t just choose the model with lowest training error• Peeking at test error is unfair. E.g. picking polynomial withlowest test error
• Performance on test set is no longer good estimate ofperformance on new data
Machine Learning–The basics ©Möller / Mori 34
![Page 36: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/36.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Summary - Model Selection
• Which model to use? E.g. which degree polynomial?• Training set error is lower with more complex model
• Can’t just choose the model with lowest training error• Peeking at test error is unfair. E.g. picking polynomial withlowest test error
• Performance on test set is no longer good estimate ofperformance on new data
Machine Learning–The basics ©Möller / Mori 35
![Page 37: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/37.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Summary - Solutions I
• Use a validation set• Train models on training set. E.g. different degree polynomials• Measure performance on held-out validation set• Measure performance of that model on held-out test set
• Can use cross-validation on training set instead of aseparate validation set if little data and lots of time
• Choose model with lowest error over all cross-validation folds(e.g. polynomial degree)
• Retrain that model using all training data (e.g. polynomialcoefficients)
Machine Learning–The basics ©Möller / Mori 36
![Page 38: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/38.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Summary - Solutions I
• Use a validation set• Train models on training set. E.g. different degree polynomials• Measure performance on held-out validation set• Measure performance of that model on held-out test set
• Can use cross-validation on training set instead of aseparate validation set if little data and lots of time
• Choose model with lowest error over all cross-validation folds(e.g. polynomial degree)
• Retrain that model using all training data (e.g. polynomialcoefficients)
Machine Learning–The basics ©Möller / Mori 37
![Page 39: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/39.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Summary - Solutions II
• Use regularization• Train complex model (e.g high order polynomial) but penalizebeing “too complex” (e.g. large weight magnitudes)
• Need to balance error vs. regularization (λ)• Choose λ using cross-validation
• Get more data
Machine Learning–The basics ©Möller / Mori 38
![Page 40: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/40.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Summary - Solutions II
• Use regularization• Train complex model (e.g high order polynomial) but penalizebeing “too complex” (e.g. large weight magnitudes)
• Need to balance error vs. regularization (λ)• Choose λ using cross-validation
• Get more data
Machine Learning–The basics ©Möller / Mori 39
![Page 41: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/41.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Common Task Framework
• common to many competitions (like kaggle)1. A publicly available training dataset involving, for each
observation, a list of (possibly many) feature measurements,and a class label for that observation.
2. A set of enrolled competitors whose common task is to infera class prediction rule from the training data.
3. A scoring referee, to which competitors can submit theirprediction rule. The referee runs the prediction rule against atesting dataset which is sequestered behind a Chinese wall.The referee objectively and automatically reports the score(prediction accuracy) achieved by the submitted rule
Machine Learning–The basics ©Möller / Mori 40
![Page 42: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/42.jpg)
Outline
Machine Learning: What, Why, and How?
Curve Fitting: (e.g.) Regression and Model Selection
Goal Function: Decision Theory—ML, Loss Function, MAP
Probability Theory: (e.g.) Coin Tossing and Parameter Estimation
Conclusion
![Page 43: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/43.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Goal Function – Decision Theory
For a sample x, decide which class(Ck) it is from.
Ideas:• Maximum Likelihood• Minimum Loss/Cost (e.g. misclassification rate)• Maximum Aposteriori (MAP)
Intro. to Machine Learning Alireza Ghane 42
![Page 44: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/44.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Decision: Maximum Likelihood
• Inference step: Determine statistics from training data.p(x, t) OR p(x|Ck)
• Decision step: Determine optimal t for test input x:
t = argmaxk
{ p (x|Ck)︸ ︷︷ ︸ }
Likelihood
Intro. to Machine Learning Alireza Ghane 43
![Page 45: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/45.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Decision: Maximum Likelihood
• Inference step: Determine statistics from training data.p(x, t) OR p(x|Ck)
• Decision step: Determine optimal t for test input x:
t = argmaxk
{ p (x|Ck)︸ ︷︷ ︸ }
Likelihood
Intro. to Machine Learning Alireza Ghane 44
![Page 46: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/46.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Decision: Maximum Likelihood
• Inference step: Determine statistics from training data.p(x, t) OR p(x|Ck)
• Decision step: Determine optimal t for test input x:
t = argmaxk
{ p (x|Ck)︸ ︷︷ ︸ }
Likelihood
Intro. to Machine Learning Alireza Ghane 45
![Page 47: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/47.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Decision: Minimum Misclassification Rate
p(mistake) = p (x ∈ R1, C2) + p (x ∈ R2, C1)=
∫R1
p (x, C2) dx +∫R2
p (x, C1) dx
p(mistake) =∑k
∑j
∫Rj
p (x, Ck) dx
R1 R2
x0 x
p(x, C1)
p(x, C2)
x
x: decision boundary.x0: optimal decision boundary
x0 : argminR1
{p (mistake)}
Intro. to Machine Learning Alireza Ghane 46
![Page 48: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/48.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Decision: Minimum Misclassification Rate
p(mistake) = p (x ∈ R1, C2) + p (x ∈ R2, C1)=
∫R1
p (x, C2) dx +∫R2
p (x, C1) dx
p(mistake) =∑k
∑j
∫Rj
p (x, Ck) dx
R1 R2
x0 x
p(x, C1)
p(x, C2)
x
x: decision boundary.x0: optimal decision boundary
x0 : argminR1
{p (mistake)}
Intro. to Machine Learning Alireza Ghane 47
![Page 49: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/49.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Decision: Minimum Loss/Cost
• Misclassification rate:
R : argmin{Ri|i∈{1,··· ,K}}
∑k
∑j
L (Rj , Ck)
• Weighted loss/cost function:
R : argmin{Ri|i∈{1,··· ,K}}
∑k
∑j
Wj,kL (Rj , Ck)
Is useful when:
• The population of the classes are different• The failure cost is non-symmetric• · · ·
Intro. to Machine Learning Alireza Ghane 48
![Page 50: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/50.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Decision: Maximum Aposteriori (MAP)
Bayes’ Theorem: P{A|B} = P{B|A}P{A}P{B}
p(Ck|x)︸ ︷︷ ︸Posterior
∝ p(x|Ck)︸ ︷︷ ︸Likelihood
p(Ck)︸ ︷︷ ︸Prior
• Provides an Aposteriori Belief for the estimation, rather thana single point estimate.
• Can utilize Apriori Information in the decision.
Intro. to Machine Learning Alireza Ghane 49
![Page 51: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/51.jpg)
Outline
Machine Learning: What, Why, and How?
Curve Fitting: (e.g.) Regression and Model Selection
Goal Function: Decision Theory—ML, Loss Function, MAP
Probability Theory: (e.g.) Coin Tossing and Parameter Estimation
Conclusion
![Page 52: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/52.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Coin Tossing
• Let’s say you’re given a coin, and you want to find outP (heads), the probability that if you flip it it lands as “heads”.
• Flip it a few times: H H T
• P (heads) = 2/3
• Hmm... is this rigorous? Does this make sense?
Machine Learning–The basics ©Möller / Mori 51
![Page 53: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/53.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Coin Tossing
• Let’s say you’re given a coin, and you want to find outP (heads), the probability that if you flip it it lands as “heads”.
• Flip it a few times: H H T
• P (heads) = 2/3
• Hmm... is this rigorous? Does this make sense?
Machine Learning–The basics ©Möller / Mori 52
![Page 54: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/54.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Coin Tossing
• Let’s say you’re given a coin, and you want to find outP (heads), the probability that if you flip it it lands as “heads”.
• Flip it a few times: H H T
• P (heads) = 2/3
• Hmm... is this rigorous? Does this make sense?
Machine Learning–The basics ©Möller / Mori 53
![Page 55: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/55.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Coin Tossing - Model
• Bernoulli distribution P (heads) = µ, P (tails) = 1− µ
• Assume coin flips are independent and identically distributed(i.i.d.)
• i.e. All are separate samples from the Bernoulli distribution• Given data D = {x1, . . . , xN}, heads: xi = 1, tails: xi = 0,the likelihood of the data is:
p(D|µ) =N∏
n=1
p(xn|µ) =N∏
n=1
µxn(1− µ)1−xn
Machine Learning–The basics ©Möller / Mori 54
![Page 56: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/56.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum Likelihood Estimation
• Given D with h heads and t tails• What should µ be?• Maximum Likelihood Estimation (MLE): choose µ whichmaximizes the likelihood of the data
µML = argmaxµ
p(D|µ)
• Since ln(·) is monotone increasing:
µML = argmaxµ
ln p(D|µ)
Machine Learning–The basics ©Möller / Mori 55
![Page 57: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/57.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum Likelihood Estimation• Likelihood:
p(D|µ) =N∏
n=1
µxn(1− µ)1−xn
• Log-likelihood:
ln p(D|µ) =N∑
n=1
xn lnµ+ (1− xn) ln(1− µ)
• Take derivative, set to 0:
d
dµln p(D|µ) =
N∑n=1
xn1
µ− (1− xn)
1
1− µ=
1
µh− 1
1− µt
⇒ µ =h
t+ hMachine Learning–The basics ©Möller / Mori 56
![Page 58: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/58.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum Likelihood Estimation• Likelihood:
p(D|µ) =N∏
n=1
µxn(1− µ)1−xn
• Log-likelihood:
ln p(D|µ) =N∑
n=1
xn lnµ+ (1− xn) ln(1− µ)
• Take derivative, set to 0:
d
dµln p(D|µ) =
N∑n=1
xn1
µ− (1− xn)
1
1− µ=
1
µh− 1
1− µt
⇒ µ =h
t+ hMachine Learning–The basics ©Möller / Mori 57
![Page 59: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/59.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum Likelihood Estimation• Likelihood:
p(D|µ) =N∏
n=1
µxn(1− µ)1−xn
• Log-likelihood:
ln p(D|µ) =N∑
n=1
xn lnµ+ (1− xn) ln(1− µ)
• Take derivative, set to 0:
d
dµln p(D|µ) =
N∑n=1
xn1
µ− (1− xn)
1
1− µ=
1
µh− 1
1− µt
⇒ µ =h
t+ hMachine Learning–The basics ©Möller / Mori 58
![Page 60: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/60.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum Likelihood Estimation• Likelihood:
p(D|µ) =N∏
n=1
µxn(1− µ)1−xn
• Log-likelihood:
ln p(D|µ) =N∑
n=1
xn lnµ+ (1− xn) ln(1− µ)
• Take derivative, set to 0:
d
dµln p(D|µ) =
N∑n=1
xn1
µ− (1− xn)
1
1− µ=
1
µh− 1
1− µt
⇒ µ =h
t+ hMachine Learning–The basics ©Möller / Mori 59
![Page 61: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/61.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum Likelihood Estimation• Likelihood:
p(D|µ) =N∏
n=1
µxn(1− µ)1−xn
• Log-likelihood:
ln p(D|µ) =N∑
n=1
xn lnµ+ (1− xn) ln(1− µ)
• Take derivative, set to 0:
d
dµln p(D|µ) =
N∑n=1
xn1
µ− (1− xn)
1
1− µ=
1
µh− 1
1− µt
⇒ µ =h
t+ hMachine Learning–The basics ©Möller / Mori 60
![Page 62: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/62.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Bayesian Learning
• Wait, does this make sense? What if I flip 1 time, heads? DoI believe µ=1?
• Learn µ the Bayesian way:
P (µ|D) =P (D|µ)P (µ)
P (D)
P (µ|D)︸ ︷︷ ︸posterior
∝ P (D|µ)︸ ︷︷ ︸likelihood
P (µ)︸ ︷︷ ︸prior
• Prior encodes knowledge that most coins are 50-50• Conjugate prior makes math simpler, easy interpretation
• For Bernoulli, the beta distribution is its conjugate
Machine Learning–The basics ©Möller / Mori 61
![Page 63: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/63.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Bayesian Learning
• Wait, does this make sense? What if I flip 1 time, heads? DoI believe µ=1?
• Learn µ the Bayesian way:
P (µ|D) =P (D|µ)P (µ)
P (D)
P (µ|D)︸ ︷︷ ︸posterior
∝ P (D|µ)︸ ︷︷ ︸likelihood
P (µ)︸ ︷︷ ︸prior
• Prior encodes knowledge that most coins are 50-50• Conjugate prior makes math simpler, easy interpretation
• For Bernoulli, the beta distribution is its conjugate
Machine Learning–The basics ©Möller / Mori 62
![Page 64: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/64.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Bayesian Learning
• Wait, does this make sense? What if I flip 1 time, heads? DoI believe µ=1?
• Learn µ the Bayesian way:
P (µ|D) =P (D|µ)P (µ)
P (D)
P (µ|D)︸ ︷︷ ︸posterior
∝ P (D|µ)︸ ︷︷ ︸likelihood
P (µ)︸ ︷︷ ︸prior
• Prior encodes knowledge that most coins are 50-50• Conjugate prior makes math simpler, easy interpretation
• For Bernoulli, the beta distribution is its conjugate
Machine Learning–The basics ©Möller / Mori 63
![Page 65: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/65.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Beta Distribution• We will use the Beta distribution to express our priorknowledge about coins:
Beta(µ|a, b) = Γ(a+ b)
Γ(a)Γ(b)︸ ︷︷ ︸normalization
µa−1(1− µ)b−1
• Parameters a and b control the shape of this distribution
Machine Learning–The basics ©Möller / Mori 64
![Page 66: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/66.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Posterior
P (µ|D) ∝ P (D|µ)P (µ)
∝N∏
n=1
µxn(1− µ)1−xn
︸ ︷︷ ︸likelihood
µa−1(1− µ)b−1︸ ︷︷ ︸prior
∝ µh(1− µ)tµa−1(1− µ)b−1
∝ µh+a−1(1− µ)t+b−1
• Simple form for posterior is due to use of conjugate prior• Parameters a and b act as extra observations• Note that as N = h+ t → ∞, prior is ignored
Machine Learning–The basics ©Möller / Mori 65
![Page 67: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/67.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Posterior
P (µ|D) ∝ P (D|µ)P (µ)
∝N∏
n=1
µxn(1− µ)1−xn
︸ ︷︷ ︸likelihood
µa−1(1− µ)b−1︸ ︷︷ ︸prior
∝ µh(1− µ)tµa−1(1− µ)b−1
∝ µh+a−1(1− µ)t+b−1
• Simple form for posterior is due to use of conjugate prior• Parameters a and b act as extra observations• Note that as N = h+ t → ∞, prior is ignored
Machine Learning–The basics ©Möller / Mori 66
![Page 68: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/68.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Posterior
P (µ|D) ∝ P (D|µ)P (µ)
∝N∏
n=1
µxn(1− µ)1−xn
︸ ︷︷ ︸likelihood
µa−1(1− µ)b−1︸ ︷︷ ︸prior
∝ µh(1− µ)tµa−1(1− µ)b−1
∝ µh+a−1(1− µ)t+b−1
• Simple form for posterior is due to use of conjugate prior• Parameters a and b act as extra observations• Note that as N = h+ t → ∞, prior is ignored
Machine Learning–The basics ©Möller / Mori 67
![Page 69: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/69.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum A Posteriori
• Given posterior P (µ|D) we could compute a single value,known as the Maximum a Posteriori (MAP) estimate for µ:
µMAP = argmaxµ
P (µ|D)
• Known as point estimation• However, correct Bayesian thing to do is to use the fulldistribution over µ
• i.e. ComputeEµ[f ] =
∫p(µ|D)f(µ)dµ
• This integral is usually hard to compute
Machine Learning–The basics ©Möller / Mori 68
![Page 70: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/70.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum A Posteriori
• Given posterior P (µ|D) we could compute a single value,known as the Maximum a Posteriori (MAP) estimate for µ:
µMAP = argmaxµ
P (µ|D)
• Known as point estimation• However, correct Bayesian thing to do is to use the fulldistribution over µ
• i.e. ComputeEµ[f ] =
∫p(µ|D)f(µ)dµ
• This integral is usually hard to compute
Machine Learning–The basics ©Möller / Mori 69
![Page 71: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/71.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Maximum A Posteriori
• Given posterior P (µ|D) we could compute a single value,known as the Maximum a Posteriori (MAP) estimate for µ:
µMAP = argmaxµ
P (µ|D)
• Known as point estimation• However, correct Bayesian thing to do is to use the fulldistribution over µ
• i.e. ComputeEµ[f ] =
∫p(µ|D)f(µ)dµ
• This integral is usually hard to compute
Machine Learning–The basics ©Möller / Mori 70
![Page 72: Machine Learning – some basics - Bishop PRML Ch. 1vda.univie.ac.at/Teaching/FDA/19s/LectureNotes/01_basics.pdf• ti 2 f0;1;2g,non-face,frontalface,profileface. MachineLearning–Thebasics](https://reader036.vdocument.in/reader036/viewer/2022070811/5f09ebd17e708231d42924bb/html5/thumbnails/72.jpg)
Machine Learning Model Selection Goal Function Coin Tossing Conclusion
Conclusion
• Types of learning problems• Supervised: regression, classification• Unsupervised
• Learning as optimization• Squared error loss function• Maximum likelihood (ML)• Maximum a posteriori (MAP)
• Want generalization, avoid over-fitting• Cross-validation• Regularization• Bayesian prior on model parameters
• see also Bishop: Chapters 1.1, 1.3, 1.5, 2.1
Machine Learning–The basics ©Möller / Mori 71