supervised learning recap machine learning. last time support vector machines kernel methods
TRANSCRIPT
![Page 1: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/1.jpg)
Supervised Learning Recap
Machine Learning
![Page 2: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/2.jpg)
Last Time
• Support Vector Machines• Kernel Methods
![Page 3: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/3.jpg)
Today
• Review of Supervised Learning• Unsupervised Learning – (Soft) K-means clustering– Expectation Maximization– Spectral Clustering– Principle Components Analysis– Latent Semantic Analysis
![Page 4: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/4.jpg)
Supervised Learning
• Linear Regression• Logistic Regression• Graphical Models– Hidden Markov Models
• Neural Networks• Support Vector Machines– Kernel Methods
![Page 5: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/5.jpg)
Major concepts
• Gaussian, Multinomial, Bernoulli Distributions• Joint vs. Conditional Distributions• Marginalization• Maximum Likelihood• Risk Minimization• Gradient Descent• Feature Extraction, Kernel Methods
![Page 6: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/6.jpg)
Some favorite distributions
• Bernoulli
• Multinomial
• Gaussian
![Page 7: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/7.jpg)
Maximum Likelihood
• Identify the parameter values that yield the maximum likelihood of generating the observed data.
• Take the partial derivative of the likelihood function• Set to zero• Solve
• NB: maximum likelihood parameters are the same as maximum log likelihood parameters
![Page 8: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/8.jpg)
Maximum Log Likelihood
• Why do we like the log function?• It turns products (difficult to differentiate) and
turns them into sums (easy to differentiate)
• log(xy) = log(x) + log(y)• log(xc) = c log(x)•
![Page 9: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/9.jpg)
Risk Minimization
• Pick a loss function– Squared loss– Linear loss– Perceptron (classification) loss
• Identify the parameters that minimize the loss function.– Take the partial derivative of the loss function– Set to zero– Solve
![Page 10: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/10.jpg)
Frequentists v. Bayesians
• Point estimates vs. Posteriors• Risk Minimization vs. Maximum Likelihood• L2-Regularization– Frequentists: Add a constraint on the size of the
weight vector– Bayesians: Introduce a zero-mean prior on the
weight vector– Result is the same!
![Page 11: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/11.jpg)
L2-Regularization
• Frequentists:– Introduce a cost on the size of the weights
• Bayesians:– Introduce a prior on the weights
![Page 12: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/12.jpg)
Types of Classifiers
• Generative Models– Highest resource requirements. – Need to approximate the joint probability
• Discriminative Models– Moderate resource requirements. – Typically fewer parameters to approximate than generative models
• Discriminant Functions– Can be trained probabilistically, but the output does not include
confidence information
![Page 13: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/13.jpg)
Linear Regression
• Fit a line to a set of points
![Page 14: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/14.jpg)
Linear Regression
• Extension to higher dimensions– Polynomial fitting
– Arbitrary function fitting• Wavelets• Radial basis functions• Classifier output
![Page 15: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/15.jpg)
Logistic Regression
• Fit gaussians to data for each class• The decision boundary is where the PDFs cross
• No “closed form” solution to the gradient.• Gradient Descent
![Page 16: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/16.jpg)
Graphical Models
• General way to describe the dependence relationships between variables.
• Junction Tree Algorithm allows us to efficiently calculate marginals over any variable.
![Page 17: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/17.jpg)
Junction Tree Algorithm
• Moralization– “Marry the parents”– Make undirected
• Triangulation– Remove cycles >4
• Junction Tree Construction– Identify separators such that the running intersection
property holds• Introduction of Evidence– Pass slices around the junction tree to generate marginals
![Page 18: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/18.jpg)
Hidden Markov Models
• Sequential Modeling– Generative Model
• Relationship between observations and state (class) sequences
![Page 19: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/19.jpg)
Perceptron
• Step function used for squashing.• Classifier as Neuron metaphor.
![Page 20: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/20.jpg)
Perceptron Loss
• Classification Error vs. Sigmoid Error– Loss is only calculated on Mistakes
Perceptrons usestrictly classificationerror
![Page 21: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/21.jpg)
Neural Networks
• Interconnected Layers of Perceptrons or Logistic Regression “neurons”
![Page 22: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/22.jpg)
Neural Networks
• There are many possible configurations of neural networks– Vary the number of layers– Size of layers
![Page 23: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/23.jpg)
Support Vector Machines
• Maximum Margin Classification Small Margin
Large Margin
![Page 24: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/24.jpg)
Support Vector Machines
• Optimization Function
• Decision Function
![Page 25: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/25.jpg)
25
Visualization of Support Vectors
![Page 26: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/26.jpg)
Questions?
• Now would be a good time to ask questions about Supervised Techniques.
![Page 27: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/27.jpg)
Clustering
• Identify discrete groups of similar data points• Data points are unlabeled
![Page 28: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/28.jpg)
Recall K-Means
• Algorithm– Select K – the desired number of clusters– Initialize K cluster centroids– For each point in the data set, assign it to the cluster
with the closest centroid
– Update the centroid based on the points assigned to each cluster
– If any data point has changed clusters, repeat
![Page 29: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/29.jpg)
k-means output
![Page 30: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/30.jpg)
Soft K-means
• In k-means, we force every data point to exist in exactly one cluster.
• This constraint can be relaxed.
Minimizes the entropy of cluster assignment
![Page 31: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/31.jpg)
Soft k-means example
![Page 32: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/32.jpg)
Soft k-means
• We still define a cluster by a centroid, but we calculate the centroid as the weighted mean of all the data points
• Convergence is based on a stopping threshold rather than changed assignments
![Page 33: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/33.jpg)
Gaussian Mixture Models
• Rather than identifying clusters by “nearest” centroids
• Fit a Set of k Gaussians to the data.
![Page 34: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/34.jpg)
GMM example
![Page 35: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/35.jpg)
Gaussian Mixture Models
• Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,
![Page 36: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/36.jpg)
Graphical Modelswith unobserved variables
• What if you have variables in a Graphical model that are never observed?– Latent Variables
• Training latent variable models is an unsupervised learning application
laughing
amused
sweating
uncomfortable
![Page 37: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/37.jpg)
Latent Variable HMMs
• We can cluster sequences using an HMM with unobserved state variables
• We will train the latent variable models using Expectation Maximization
![Page 38: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/38.jpg)
Expectation Maximization
• Both the training of GMMs and Gaussian Models with latent variables are accomplished using Expectation Maximization– Step 1: Expectation (E-step)• Evaluate the “responsibilities” of each cluster with the
current parameters
– Step 2: Maximization (M-step)• Re-estimate parameters using the existing
“responsibilities”
• Related to k-means
![Page 39: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/39.jpg)
Questions
• One more time for questions on supervised learning…
![Page 40: Supervised Learning Recap Machine Learning. Last Time Support Vector Machines Kernel Methods](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cdc5503460f949a705d/html5/thumbnails/40.jpg)
Next Time
• Gaussian Mixture Models (GMMs)• Expectation Maximization