![Page 1: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/1.jpg)
Machine Learning
Roughly speaking, for a given learning task, with a given finite amount of training data, the
best generalization performance will be achieved if the right balance is struck between the
accuracy attained on that particular training set, and the “capacity” of the machine, that is, the
ability of the machine to learn any training set without error. A machine with too much capacity
is like a botanist with a photographic memory who, when presented with a new tree,
concludes that it is not a tree because it has a different number of leaves from anything she
has seen before; a machine with too little capacity is like the botanist’s lazy brother, who
declares that if it’s green, it’s a tree. Neither can generalize well. The exploration and
formalization of these concepts has resulted in one of the shining peaks of the theory of
statistical learning.
(Vapnik, 1979)
![Page 2: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/2.jpg)
What is machine learning?
Data
examples
Model
training
Output
Predictions
Classifications
Clusters
OrdinalsWhy: Face Recognition?
![Page 3: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/3.jpg)
Categories of problems
Clustering
Classification
Regression
Ordinal Reg.
Prediction
By output:
By input:
Vector, X Time Series, x(t)
![Page 4: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/4.jpg)
One size never fits all…
• Improving an algorithm:
– First option: better features
• Visualize classes
• Trends
• Histograms
– Next: make the algorithm smarter (more complicated)
• Interaction of features
• Better objective and training criteria
WEKA or GGOBI
![Page 5: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/5.jpg)
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
y=1 + 0.5t + 4t2 - t3
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
Categories of ML algorithms
Non-parametric Parametric
By model:
By training:
Supervised (labeled) Unsupervised (unlabeled)
Raw data only Model parameters only
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
Kernel
methods
![Page 6: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/6.jpg)
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
0 50 100 150 200 2500
0.05
0.1
0.15
0.2
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
![Page 7: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/7.jpg)
Training a ML algorithm
• Choose data
• Optimize model parameters according to:
– Objective function
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
Regression Classification
-2 0 2 4 6 8-2
0
2
4
6
8
10
1
2Mean Square Error
Max Margin
![Page 8: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/8.jpg)
Pitfalls of ML algorithms
• Clean your features:– Training volume: more is better
– Outliers: remove them!
– Dynamic range: normalize it!
• Generalization– Over fitting
– Under fitting
• Speed: parametric vs. non
• What are you learning? …features, features, features…
![Page 9: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/9.jpg)
outliers
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
input
outp
ut
-4 -2 0 2 4 6-20
-10
0
10
20
30
40
50
input
outp
ut Keep a “good” percentile range!
5-95, 1-99: depends on your data
![Page 10: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/10.jpg)
Dynamic range
0 0.2 0.4 0.6 0.8 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
f1
f2
1
2
0 200 400 600 800 1000-1
0
1
2
3
4
5
6
f1
f2
1
2
0 200 400 600 800 10000
50
100
150
200
250
300
350
400
f1
f2
1
2
-2 0 2 4 6 8-1
0
1
2
3
4
5
6
f1
f2
1
2
![Page 11: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/11.jpg)
Over fitting and comparing
algorithms
• Early stop
• Regularization
• Validation Sets
![Page 12: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/12.jpg)
Under fittingCurse of dimensionality
![Page 13: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/13.jpg)
Under fittingCurse of dimensionality
![Page 14: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/14.jpg)
K-Means clustering
•Planar decision boundaries,
depending on space you are in…
•Highly Efficient
•Not always great (but usually
pretty good)
•Needs good starting criteria
![Page 15: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/15.jpg)
K-Nearest Neighbor
•Arbitrary decision boundaries
•Not so efficient…
•With enough data in each class…
optimal
•Easy to train, known as a lazy classifier
![Page 16: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/16.jpg)
Mixture of Gaussians•Arbitrary decision boundaries
with enough boundaries
•Efficient, depending on number
of models and Gaussians
•Can represent more than just
Gaussian distributions
•Generative, sometimes tough to
train up
•Spurious singularities
•Can get a distribution for a
specific class and feature(s)… and
get a Bayesian classifier
![Page 17: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/17.jpg)
Components Analysis
(principal or independent)•Reduces dimensionality
•All other classifiers work in a
rotated space
•Remember Eigen-values and
Vectors?
![Page 18: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/18.jpg)
Trees Classifiers
•Arbitrary Decision boundaries
•Can be quite efficient (or not!)
•Needs good criteria for splitting
•Easy to visualize
![Page 19: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/19.jpg)
Multi-Layer Perceptron
•Arbitrary (but linear) Decision
boundaries
•Can be quite efficient (or not!)
•What did it learn?
![Page 20: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/20.jpg)
Support Vector Machines
•Arbitrary Decision boundaries
•Efficiency depends on support
vector size and feature size
![Page 21: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/21.jpg)
Hidden Markov Models
•Arbitrary Decision boundaries
•Efficiency depends on state
space and number of models
•Generalizes to incorporate
features that change over time
![Page 22: Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,](https://reader033.vdocument.in/reader033/viewer/2022041712/5e48a154f64480724f009d82/html5/thumbnails/22.jpg)
More sophisticated approaches
• Graphical models (like an HMM)– Bayesian network
– Markov random fields
• Boosting– Adaboost
• Voting
• Cascading
• Stacking…