![Page 1: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/1.jpg)
6.S093 Visual Recognition through Machine Learning Competition
Image by kirkh.deviantart.com
Joseph Lim and Aditya Khosla
Acknowledgment: Many slides from David Sontag and Machine Learning Group of UT Austin
![Page 2: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/2.jpg)
What is Machine Learning???
![Page 3: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/3.jpg)
What is Machine Learning???
According to the Wikipedia,Machine learning concerns the construction and study of systems that can learn from data.
![Page 4: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/4.jpg)
What is ML?
• Classification– From data to discrete classes
![Page 5: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/5.jpg)
What is ML?
• Classification– From data to discrete classes
• Regression– From data to a continuous value
![Page 6: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/6.jpg)
What is ML?
• Classification– From data to discrete classes
• Regression– From data to a continuous value
• Ranking– Comparing items
![Page 7: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/7.jpg)
What is ML?
• Classification– From data to discrete classes
• Regression– From data to a continuous value
• Ranking– Comparing items
• Clustering– Discovering structure in data
![Page 8: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/8.jpg)
What is ML?
• Classification– From data to discrete classes
• Regression– From data to a continuous value
• Ranking– Comparing items
• Clustering– Discovering structure in data
• Others
![Page 9: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/9.jpg)
Classification:data to discrete classes
• Spam filtering
Spam Regular Urgent
![Page 10: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/10.jpg)
Classification:data to discrete classes
• Object classification
Fries Hamburger None
![Page 11: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/11.jpg)
Regression: data to a value
• Stock market
![Page 12: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/12.jpg)
Regression: data to a value
• Weather prediction
![Page 13: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/13.jpg)
Clustering: Discovering structure
Set of ImagesSet of Images
![Page 14: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/14.jpg)
Clustering
![Page 15: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/15.jpg)
Clustering
• Unsupervised learning• Requires data, but NO LABELS• Detect patterns
– Group emails or search results– Regions of images
• Useful when don’t know what you’re looking for
![Page 16: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/16.jpg)
Clustering
• Basic idea: group together similar instances• Example: 2D point patterns
![Page 17: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/17.jpg)
Clustering
• Basic idea: group together similar instances• Example: 2D point patterns
![Page 18: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/18.jpg)
Clustering
• Basic idea: group together similar instances• Example: 2D point patterns
![Page 19: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/19.jpg)
Clustering
• Basic idea: group together similar instances• Example: 2D point patterns
• What could “similar” mean?– One option: small Euclidean distance– Clustering is crucially dependent on the measure
of similarity (or distance) between “points”
![Page 20: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/20.jpg)
K-Means• An iterative clustering
algorithm– Initialize: Pick K random points
as cluster centers– Alternate:
• Assign data points to closest cluster center
• Change the cluster center to the average of its assigned points
– Stop when no points’ assignments change
Animation is from Andrey A. Shabalin’s website
![Page 21: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/21.jpg)
K-Means• An iterative clustering algorithm
– Initialize: Pick K random points as cluster centers– Alternate:
• Assign data points to closest cluster center
• Change the cluster center to the average of its assigned points
– Stop when no points’ assignments change
![Page 22: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/22.jpg)
Properties of K-means algorithm
• Guaranteed to converge in a finite number of iterations
• Running time per iteration:– Assign data points to closest cluster center
O(KN)
– Change the cluster center to the average of its assigned points
O(N)
![Page 23: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/23.jpg)
Example: K-Means for SegmentationK=2 Original
Goal of Segmentation is to partition an imageinto regions each ofwhich has reasonablyhomogenous visualappearance.
![Page 24: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/24.jpg)
Example: K-Means for SegmentationK=2 K=3 K=10 Original
![Page 25: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/25.jpg)
Pitfalls of K-Means
• K-means algorithm is heuristic– Requires initial means– It does matter what you pick!
• K-means can get stuck
K=1 should be better K=2 should be better
![Page 26: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/26.jpg)
Classification
![Page 27: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/27.jpg)
Typical Recognition System
features classify+1 bus
-1 not bus
• Extract features from an image• A classifier will make a decision based on
extracted features
x F(x) y
![Page 28: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/28.jpg)
Typical Recognition System
features classify+1 bus
-1 not bus
• Extract features from an image• A classifier will make a decision based on
extracted features
x F(x) y
![Page 29: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/29.jpg)
Classification
• Supervised learning• Requires data, AND LABELS• Useful when you know what you’re looking
for
![Page 30: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/30.jpg)
Linear classifier
• Which of these linear separators is optimal?
![Page 31: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/31.jpg)
Maximum Margin Classification• Maximizing the margin is good according to intuition and PAC
theory.• Implies that only support vectors matter; other training
examples are ignorable.
![Page 32: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/32.jpg)
Classification Margin• Distance from example xi to the separator is
• Examples closest to the hyperplane are support vectors. • Margin ρ of the separator is the distance between support
vectors.
w
xw br i
T
r
ρ
![Page 33: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/33.jpg)
Linear SVM Mathematically• Let training set {(xi, yi)}i=1..n, xi Rd, yi {-1, 1} be separated by a
hyperplane with margin ρ. Then for each training example (xi, yi):
• For every support vector xs the above inequality is an equality. After rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is
• Then the margin can be expressed through (rescaled) w and b as:
wTxi + b ≤ - ρ/2 if yi = -1wTxi + b ≥ ρ/2 if yi = 1
w
22 r
yi(wTxi + b) ≥ ρ/2
![Page 34: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/34.jpg)
Linear SVMs Mathematically (cont.)
• Then we can formulate the quadratic optimization problem:
Find w and b such that
is maximized
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
Find w and b such that
Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
![Page 35: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/35.jpg)
Is this perfect?
![Page 36: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/36.jpg)
Is this perfect?
![Page 37: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/37.jpg)
Soft Margin Classification
• What if the training set is not linearly separable?• Slack variables ξi can be added to allow
misclassification of difficult or noisy examples, resulting margin called soft.
ξi
ξi
![Page 38: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/38.jpg)
Soft Margin Classification Mathematically
• The old formulation:
• Modified formulation incorporates slack variables:
• Parameter C can be viewed as a way to control overfitting: it “trades off” the relative importance of maximizing the margin and fitting the training data.
Find w and b such thatΦ(w) =wTw is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1
Find w and b such thatΦ(w) =wTw + CΣξi is minimized and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0
![Page 39: 6.S093 Visual Recognition through Machine Learning Competition](https://reader034.vdocument.in/reader034/viewer/2022051622/56814fee550346895dbdbc74/html5/thumbnails/39.jpg)
Exercise
• Exercise 1. Implement K-means
• Exercise 2. Play with SVM’s C-parameter