machine learning basics
TRANSCRIPT
![Page 1: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/1.jpg)
Machine Learning Basics
Classification and Clustering
Humberto Marchezi
November 2015
![Page 2: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/2.jpg)
Definitions
Pattern recognition, artificial intelligence and a bit of data
mining
Solves a given task without explicitly being programmed to do
so instead it makes predictions from provided data
Machine learning algorithms can be divided into 3 categories:
Supervised learning
Unsupervised learning
Reinforcement learning
Problem types
Classification
Regression
Clustering
etc.
![Page 3: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/3.jpg)
Algorithms
Supervised Learning
Naive Bayesian Classifier
Linear/Polynomial/Logistic/Multinomial Regression
Neural Networks
etc.
Unsupervised Learning
K-means / K-medoids
Principal Component Analysis
Gaussian Distribution (Anomaly Detection)
etc.
![Page 4: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/4.jpg)
Naive Bayes Classifier
Classify information based on probabilistic model score
Score for a category ck with features f1, f2, f3, ..., fn
p(Ck |f1, f2, ..., fn) = P(Ck )p(f1|Ck )p(f2|Ck )...p(fn|Ck )p(f1)p(f2)...p(fn)
For a text classifier, features above are each word in the
sentence (bag-of-words model)
Also known as multinomial naive bayes classifier
![Page 5: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/5.jpg)
Naive Bayes ClassifierConcrete Example
Ingredients
2 tbsp salt
lemon
InstructionsCut lemon
Pour salt
![Page 6: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/6.jpg)
Naive Bayes ClassifierConcrete Example
Ingredients
word occurrences
2 1
tbsp 1
salt 1
lemon 1
total 4
examples 2
Instructionsword occurrences
cut 1
lemon 1
pour 1
salt 1
total 4
examples 2
Globalword occurrences
2 1
tbsp 1
salt 2
lemon 2
cut 1
pour 1
total 8
examples 4
![Page 7: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/7.jpg)
Naive Bayes ClassifierConcrete Example
Ingredients 1/2
word probability
2 1/4
tbsp 1/4
salt 1/4
lemon 1/4
Instructions 1/2
word probability
cut 1/4
lemon 1/4
pour 1/4
salt 1/4
Globalword probability
2 1/8
tbsp 1/8
salt 2/8
lemon 2/8
cut 1/8
pour 1/8
![Page 8: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/8.jpg)
Naive Bayes ClassifierConcrete Example
Query ’1 tbsp salt’
Ingredients (I)
p(I |′1′,′ tbsp′,′ salt ′) = P(I )p(′1′|I )p(′tbsp′|I )p(′salt′|I )p(′1′)p(′tbsp′)p(′salt′)
= 0.5x0.0001x0.25x0.250.0001x0.125x0.25 = 1
Instructions (D)
p(D|′1′,′ tbsp′,′ salt ′) = P(D)p(′1′|D)p(′tbsp′|D)p(′salt′|D)p(′1′)p(′tbsp′)p(′salt′)
= 0.5x0.0001x0.0001x0.250.0001x0.125x0.25 = 0.0004
Result: Ingredients (since it has the highest probability)
Note: 0.0001 is the probability of an unknown element (cannot be
zero!)
![Page 9: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/9.jpg)
Naive Bayes ClassifierExamples
Classify email as spam or not spam
Document type classification
Document sections classification
Image Classification
![Page 10: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/10.jpg)
K-Means
Unsupervised learning algorithm to identify clusters
Find clusters for unlabeled data
Algorithm
k-means
Choose K examples as initial centroids
While centroids move
1) Choose closest centroid Ki for each xi and store distance ci
2) Calculate new centroid Ki in each cluster
end
![Page 11: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/11.jpg)
K-MeansK-means example steps to converge to final solution
Figure : Taken from https://en.wikipedia.org/wiki/File:
K_Means_Example_Step_2.svg
![Page 12: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/12.jpg)
K-MeansHow to avoid sub-optimal results ?
Figure : Generated from http://www.naftaliharris.com/blog/
visualizing-k-means-clustering/
![Page 13: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/13.jpg)
K-MeansHow to avoid sub-optimal results ?
k-means
Repeat N times do
Randomly choose K examples as initial centroids
While centroids move
1) Choose closest centroid Ki for each xi and store distance ci
2) Calculate new centroid Ki in each cluster
end
Calculate result cost (average distance of examples to its centroids)
If result cost is lower
end (repeat)
![Page 14: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/14.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
![Page 15: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/15.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : Solution for k=1
![Page 16: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/16.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : Solution for k=2
![Page 17: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/17.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : Solution for k=3
![Page 18: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/18.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : Solution for k=4
![Page 19: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/19.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : Solution for k=5
![Page 20: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/20.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : Cluster costs
![Page 21: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/21.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Elbow method
Repeat for clusters K = 1,2,3,...n
Run K-Means
Compute average cost for K clusters∑n
i=1 cin (simplifying
∑ni=1 ci )
end (repeat)
Plot cost for each K and choose the one located at the ”elbow”
![Page 22: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/22.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
![Page 23: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/23.jpg)
K-MeansElbow Method - How to identify the number of clusters ?
Figure : K-means elbow method
Not always possible to find elbow (well distributes examples)
Best practice associate cluster number with business meaning
![Page 24: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/24.jpg)
K-MeansExamples
Figure : Customer segmentation with k-means
![Page 25: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/25.jpg)
K-MeansExamples
Figure : Identify related news and articles
![Page 26: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/26.jpg)
K-MeansExamples
Figure : Image color reduction -
http://opencv-python-tutroals.readthedocs.org/en/latest/
_images/oc_color_quantization.jpg
![Page 27: Machine Learning Basics](https://reader031.vdocument.in/reader031/viewer/2022021921/58f36cd11a28ab24318b463b/html5/thumbnails/27.jpg)
References and Resources
1 Coursera Machine Learning
https://www.coursera.org/learn/machine-learning
2 Naive Bayes Classifier - Wikipedia
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
3 K-Means Clustering - Wikipedia
https://en.wikipedia.org/wiki/K-means_clustering
4 Visualizing K-Means Clustering
http://www.naftaliharris.com/blog/visualizing-k-means-clustering/
5 Naive Bayes for Image Processing
http://www.cs.ubc.ca/~lowe/papers/12mccannCVPR.pdf
6 Document Clustering with K-Means
http://www.codeproject.com/Articles/439890/
Text-Documents-Clustering-using-K-Means-Algorithm