machine learning overview
DESCRIPTION
Machine Learning Overview. Tamara Berg CS 590-133 Artificial Intelligence. Many slides throughout the course adapted from Svetlana Lazebnik , Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer , Rob Pless , Killian Weinberger, Deva Ramanan. Announcements. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/1.jpg)
1
Machine Learning Overview
Tamara Berg
CS 590-133 Artificial Intelligence
Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan
![Page 2: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/2.jpg)
Announcements
• HW4 is due April 3
• Reminder: Midterm2 next Thursday – Next Tuesday’s lecture topics will not be included (but material
will be on the final so attend!)
• Midterm review – Monday, 5pm in FB009
![Page 3: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/3.jpg)
Midterm Topic ListBe able to define the following terms and answer basic questions about them:
Reinforcement learning– Passive vs Active RL– Model-based vs model-free approaches– Direct utility estimation– TD Learning and TD Q-learning– Exploration vs exploitation– Policy Search– Application to Backgammon/Aibos/helicopters (at a high level)
Probability– Random variables – Axioms of probability– Joint, marginal, conditional probability distributions – Independence and conditional independence– Product rule, chain rule, Bayes rule
![Page 4: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/4.jpg)
Midterm Topic List
Bayesian Networks General– Structure and parameters – Calculating joint and conditional probabilities– Independence in Bayes Nets (Bayes Ball)
Bayesian Inference– Exact Inference (Inference by Enumeration, Variable Elimination)– Approximate Inference (Forward Sampling, Rejection Sampling, Likelihood
Weighting)– Networks for which efficient inference is possible
Naïve Bayes– Parameter learning including Laplace smoothing– Likelihood, prior, posterior – Maximum likelihood (ML), maximum a posteriori (MAP) inference – Application to spam/ham classification– Application to image classification (at a high level)
![Page 5: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/5.jpg)
Midterm Topic List
HMMs– Markov Property– Markov Chains– Hidden Markov Model (initial distribution, transitions, emissions)– Filtering (forward algorithm)
Machine Learning– Unsupervised/supervised/semi-supervised learning– K Means clustering– Training, tuning, testing, generalization
![Page 6: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/6.jpg)
Machine learning
Image source: https://www.coursera.org/course/ml
![Page 7: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/7.jpg)
Machine learning
• Definition– Getting a computer to do well on a task
without explicitly programming it– Improving performance on a task based on
experience
![Page 8: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/8.jpg)
Big Data!
![Page 9: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/9.jpg)
What is machine learning?
• Computer programs that can learn from data
• Two key components– Representation: how should we represent the data?– Generalization: the system should generalize from its
past experience (observed data items) to perform well on unseen data items.
![Page 10: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/10.jpg)
Types of ML algorithms
• Unsupervised– Algorithms operate on unlabeled examples
• Supervised– Algorithms operate on labeled examples
• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled
examples
![Page 11: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/11.jpg)
![Page 12: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/12.jpg)
Clustering
– The assignment of objects into groups (aka clusters) so that objects in the same cluster are more similar to each other than objects in different clusters.
– Clustering is a common technique for statistical data analysis, used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.
![Page 13: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/13.jpg)
Euclidean distance, angle between data vectors, etc
![Page 14: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/14.jpg)
![Page 15: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/15.jpg)
K-means clustering
• Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk
k
ki
ki mxMXDcluster
clusterinpoint
2)(),(
![Page 16: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/16.jpg)
![Page 17: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/17.jpg)
![Page 18: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/18.jpg)
![Page 19: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/19.jpg)
![Page 20: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/20.jpg)
![Page 21: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/21.jpg)
![Page 22: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/22.jpg)
![Page 23: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/23.jpg)
![Page 24: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/24.jpg)
![Page 25: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/25.jpg)
![Page 26: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/26.jpg)
![Page 27: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/27.jpg)
![Page 28: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/28.jpg)
![Page 29: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/29.jpg)
Source: Hinrich Schutze
![Page 30: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/30.jpg)
Hierarchical clustering strategies
• Agglomerative clustering• Start with each data point in a separate cluster• At each iteration, merge two of the “closest” clusters
• Divisive clustering• Start with all data points grouped into a single cluster• At each iteration, split the “largest” cluster
![Page 31: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/31.jpg)
PProduces a hierarchy of clusterings
P
P
P
![Page 32: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/32.jpg)
P
![Page 33: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/33.jpg)
Divisive Clustering
• Top-down (instead of bottom-up as in Agglomerative Clustering)
• Start with all data points in one big cluster
• Then recursively split clusters
• Eventually each data point forms a cluster on its own.
![Page 34: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/34.jpg)
Flat or hierarchical clustering?
• For high efficiency, use flat clustering (e.g. k means)
• For deterministic results: hierarchical clustering
• When a hierarchical structure is desired: hierarchical algorithm
• Hierarchical clustering can also be applied if K cannot be predetermined (can start without knowing K)
Source: Hinrich Schutze
![Page 35: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/35.jpg)
Clustering in Action – example from computer vision
![Page 36: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/36.jpg)
Recall: Bag of Words Representation
· Represent document as a “bag of words”
![Page 37: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/37.jpg)
Bag-of-features models
Slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
![Page 38: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/38.jpg)
Bags of features for image classification
1. Extract features
![Page 39: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/39.jpg)
1. Extract features
2. Learn “visual vocabulary”
Bags of features for image classification
![Page 40: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/40.jpg)
1. Extract features
2. Learn “visual vocabulary”
3. Represent images by frequencies of “visual words”
Bags of features for image classification
![Page 41: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/41.jpg)
…
1. Feature extraction
![Page 42: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/42.jpg)
2. Learning the visual vocabulary
…
![Page 43: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/43.jpg)
2. Learning the visual vocabulary
Clustering
…
![Page 44: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/44.jpg)
2. Learning the visual vocabulary
Clustering
…Visual vocabulary
![Page 45: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/45.jpg)
Example visual vocabulary
Fei-Fei et al. 2005
![Page 46: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/46.jpg)
3. Image representation
…..
fre
que
ncy
Visual words
![Page 47: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/47.jpg)
Types of ML algorithms
• Unsupervised– Algorithms operate on unlabeled examples
• Supervised– Algorithms operate on labeled examples
• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled
examples
![Page 48: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/48.jpg)
![Page 49: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/49.jpg)
![Page 50: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/50.jpg)
![Page 51: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/51.jpg)
![Page 52: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/52.jpg)
Example: Sentiment analysis
http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/
http://nlp.stanford.edu:8080/sentiment/rntnDemo.html
![Page 53: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/53.jpg)
Example: Image classification
apple
pear
tomato
cow
dog
horse
input desired output
![Page 55: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/55.jpg)
Example: Seismic data
Body wave magnitude
Sur
face
wav
e m
agni
tude
Nuclear explosions
Earthquakes
![Page 56: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/56.jpg)
![Page 57: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/57.jpg)
The basic classification framework
y = f(x)
• Learning: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the parameters of the prediction function f
• Inference: apply f to a never before seen test example x and output the predicted value y = f(x)
output classification function
input
![Page 58: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/58.jpg)
Naïve Bayes classifier
ddy
y
y
yxPyP
yPyP
yPf
)|()(maxarg
)|()(maxarg
)|(maxarg)(
x
xx
A single dimension or attribute of x
![Page 59: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/59.jpg)
Example: Image classification
Car
Input: Image Representation Classifier (e.g. Naïve Bayes, Neural Net, etc
Output: Predicted label
![Page 60: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/60.jpg)
Example: Training and testing
• Key challenge: generalization to unseen examples
Training set (labels known) Test set (labels unknown)
![Page 61: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/61.jpg)
![Page 62: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/62.jpg)
Some classification methods
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…
Neural networks
LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…
Support Vector Machines and Kernels Conditional Random Fields
McCallum, Freitag, Pereira 2000Kumar, Hebert 2003…
Guyon, VapnikHeisele, Serre, Poggio, 2001…
![Page 63: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/63.jpg)
Classification … more soon
![Page 64: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/64.jpg)
Types of ML algorithms
• Unsupervised– Algorithms operate on unlabeled examples
• Supervised– Algorithms operate on labeled examples
• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled
examples
![Page 65: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/65.jpg)
Supervised learning has many successes
• recognize speech,• steer a car,• classify documents• classify proteins• recognizing faces, objects in images• ...
Slide Credit: Avrim Blum
![Page 66: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/66.jpg)
However, for many problems, labeled data can be rare or expensive.
Unlabeled data is much cheaper.Need to pay someone to do it, requires special testing,…
Slide Credit: Avrim Blum
![Page 67: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/67.jpg)
However, for many problems, labeled data can be rare or expensive.
Unlabeled data is much cheaper.
Speech
Images
Medical outcomes
Customer modeling
Protein sequences
Web pages
Need to pay someone to do it, requires special testing,…
Slide Credit: Avrim Blum
![Page 68: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/68.jpg)
However, for many problems, labeled data can be rare or expensive.
Unlabeled data is much cheaper.
[From Jerry Zhu]
Need to pay someone to do it, requires special testing,…
Slide Credit: Avrim Blum
![Page 69: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/69.jpg)
Need to pay someone to do it, requires special testing,…
However, for many problems, labeled data can be rare or expensive.
Unlabeled data is much cheaper.
Can we make use of cheap unlabeled data?
Slide Credit: Avrim Blum
![Page 70: Machine Learning Overview](https://reader038.vdocument.in/reader038/viewer/2022103006/568134b2550346895d9bcfcb/html5/thumbnails/70.jpg)
Semi-Supervised LearningCan we use unlabeled data to augment a
small labeled sample to improve learning?
But unlabeled data is missing the most important info!!
But maybe still has useful regularities that
we can use.
But…But…But…Slide Credit: Avrim Blum