machine learning for dummies
DESCRIPTION
Introduction to Machine Learning, The ML Philosophy, Advantages, Applications and ML AlgorithmsTRANSCRIPT
![Page 1: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/1.jpg)
Introduction to Machine LearningThe PhilosophyAdvantagesApplicationsAlgorithms
Venkat Reddy
![Page 2: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/2.jpg)
The Learning
123456
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
2
![Page 3: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/3.jpg)
Machine Learning
• How did our brain process the images?
• How did the grouping happen?
• Human brain processed the given images - learning
• After learning the brain simply looked at the new image and compared with the groups classified the image to the closest group - Classification
• If a machine has to perform the same operation we use Machine Learning
• We write programs for learning and then classification, this is nothing but machine learning
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
3
![Page 4: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/4.jpg)
Machine Learning
Learning
algorithm
TRAININGDATA Answer
Trained
machine
Query
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
4
![Page 5: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/5.jpg)
Example: Classification using ML
• Image processing:
• Machine learning can be used in classification Images & objects in an image
• Ship
• Water
• Rock
• Iron object
• Fiber Object etc.,
• Does this really help?
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
5
![Page 6: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/6.jpg)
Image processing Example
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
6
![Page 7: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/7.jpg)
Machine learning application
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
7
![Page 8: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/8.jpg)
The main advantage of ML
• Learning and writing an algorithm
• Its easy for human brain but it is tough for a machine. It takes some time and good amount of training data for machine to accurately classify objects
• Implementation and automation
• This is easy for a machine. Once learnt a machine can process one million images without any fatigue where as human brain can’t
• That’s why ML with bigdata is a deadly combination
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
8
![Page 9: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/9.jpg)
Applications of Machine Learning
• Banking / Telecom / Retail
• Identify:
• Prospective customers
• Dissatisfied customers
• Good customers
• Bad payers
• Obtain:
• More effective advertising
• Less credit risk
• Fewer fraud
• Decreased churn rate
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
9
![Page 10: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/10.jpg)
Applications of Machine Learning
• Biomedical / Biometrics• Medicine:
• Screening
• Diagnosis and prognosis
• Drug discovery
• Security:
• Face recognition
• Signature / fingerprint / iris verification
• DNA fingerprinting
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
10
![Page 11: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/11.jpg)
Applications of Machine Learning
• Computer / Internet• Computer interfaces:
• Troubleshooting wizards
• Handwriting and speech
• Brain waves
• Internet• Hit ranking
• Spam filtering
• Text categorization
• Text translation
• Recommendation
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
11
![Page 12: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/12.jpg)
Main Parts in Machine Learning process• Any Machine Learning algorithm has three parts
1. The Output
2. The Objective Function or Performance Matrix
3. The Input
Email Spam Classification
• The Output: Categorize email messages as spam or legitimate.
• Objective Function: Percentage of email messages correctly classified.
• The Input: Database of emails, some with human-given labels
Hand Writing Recognition
• The Output: Recognizing hand-written words
• Objective Function: Percentage of words correctly classified
• The Input: Database of human-labeled images of handwritten words
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
12
![Page 13: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/13.jpg)
Some Machine Learning Methods
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
13
![Page 14: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/14.jpg)
Everything should be made as simple as possible, but no simpler
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
14
![Page 15: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/15.jpg)
Bayes Network
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
15
![Page 16: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/16.jpg)
The Bayes Method
A(2%) B(90%) C(8%)
1% 60% 2%
• A, B, C Generate bolts, given a machine what is the probability that it will generate a defective bolt.
• Now given a defective bolt, can we tell which machine has generated it?
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
16
![Page 17: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/17.jpg)
Several Machine in Several Layers
• Several intermediate machines• Each has its on error chance, This problem requires a Bayesian network
to solve • Does C has anything to do if the bolt is finally coming out of F?
A(20%) B(50%) C(30%)
1% 5% 2%
D(60%) E(40%)
F G H
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
17
![Page 18: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/18.jpg)
Artificial Neural Networks
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
18
![Page 19: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/19.jpg)
The Philosophy of Neural Network
• If we have to predict whether we win in a chess game
• Can we use logistic regression?
• Since there can be almost infinite number of intermediate steps its almost impossible to predict the probability of winning
• If each move is considered as an input variable then there are almost (countable)infinite moves. A coefficient for each of them is always negligible.
Neural network uses a simple technique, instead of predicting the final output they predict the intermediate output that lead to a win previously, this will be an input for predicting the next move
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
19
![Page 20: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/20.jpg)
Biological Motivation for NN
• Aartificial neural networks are built out of a densely interconnected set of simple units, where each unit takes a number of real-valued inputs (possibly the outputs of other units) and produces a single real-valued output (which may become the input to many other units).
• The human brain is estimated to contain a densely interconnected network of approximately 1011 neurons, each connected, on average, to 104 others.
• Neuron activity is typically inhibited through connections to other neurons.
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
20
![Page 21: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/21.jpg)
ANN in Self Driving Car
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
21
![Page 22: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/22.jpg)
PCA and FA
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
22
![Page 23: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/23.jpg)
Look at below Cricket Team Players Data
Player Avg Runs Total wickets
Height Notouts
HighestScore
BestBowling
1 45 3 5.5 15 120 1
2 50 34 5.2 34 209 2
3 38 0 6 36 183 0
4 46 9 6.1 78 160 3
5 37 45 5.8 56 98 1
6 32 0 5.10 89 183 0
7 18 123 6 2 35 4
8 19 239 6.1 3 56 5
9 18 96 6.6 5 87 7
10 16 83 5.9 7 32 7
11 17 138 5.10 9 12 6
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
23
![Page 24: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/24.jpg)
Describe the players
• If we have to describe or segregate the players do we really need Avg Runs, Total wickets, Height, Not outs, Highest Score, Best bowling?
• Can we simply take
• Avg Runs+ Not outs+ Highest Score as one factor?
• Total wickets+ Height+ Best bowling as second factor?
Defining these imaginary variables or a linear combination of variables to reduce the dimensions is called PCA or FA
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
24
![Page 25: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/25.jpg)
Look at below Cricket Team Players Data
Player Avg Runs Total wickets
Height Notouts
HighestScore
BestBowling
1 45 3 5.5 15 120 1
2 50 34 5.2 34 209 2
3 38 0 6 36 183 0
4 46 9 6.1 78 160 3
5 37 45 5.8 56 98 1
6 32 0 5.10 89 183 0
7 18 123 6 2 35 4
8 19 239 6.1 3 56 5
9 18 96 6.6 5 87 7
10 16 83 5.9 7 32 7
11 17 138 5.10 9 12 6
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
25
![Page 26: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/26.jpg)
SVM Classification
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
26
![Page 27: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/27.jpg)
Linear Classifier
Consider a two dimensional dataset
with two classes
How would we classify this dataset?
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
27
![Page 28: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/28.jpg)
Linear Classifier
Both of the lines can be linear classifiers.
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
28
![Page 29: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/29.jpg)
Linear Classifier
There are many lines that can be linear classifiers.
Which one is the optimal classifier.
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
29
![Page 30: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/30.jpg)
Classifier Margin
Define the margin of a linear classifier as the width that theboundary could be increased by before hitting a datapoint.
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
30
![Page 31: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/31.jpg)
Maximum Margin
• The maximum margin linear classifier is the linear classifier
• with the maximum margin.
• This is the simplest kind of SVM (Called Linear SVM)
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
31
![Page 32: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/32.jpg)
Support Vectors
• Examples closest to the hyper plane are support vectors.
• Margin ρ of the separator is the distance between support vectors.
ρw . x + b > 0
w . x + b < 0
w . x + b = 0
Support Vectors
f(x) = sign(w . x + b)
red is +1
blue is -1
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
32
![Page 33: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/33.jpg)
Difference Traditional Models & Machine Learning• ML is more heuristic• Focused on improving performance of a learning agent• Also looks at real-time self learning and robotics – areas
not part of data mining• Some algorithms are too heuristic that there is no one
right or wrong answer• Lets take K-Means example
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
33
![Page 34: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/34.jpg)
K-Means clustering
Overall population
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
34
![Page 35: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/35.jpg)
K-Means clustering
Fix the number of clusters
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
35
![Page 36: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/36.jpg)
K-Means clustering
Calculate the distance of each case from all clusters
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
36
![Page 37: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/37.jpg)
K-Means clustering
Re calculate the cluster centers
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
37
![Page 38: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/38.jpg)
K-Means clustering
Continue till there is no significant change between two iterations
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
38
![Page 39: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/39.jpg)
Calculating the distanceWeight
Cust1 68
Cust2 72
Cust3 100
Weight Age
Cust1 68 25
Cust2 72 70
Cust3 100 28
Weight Age Income
Cust1 68 25 60,000
Cust2 72 70 9,000
Cust3 100 28 62,000
Which of the two customers are similar?
Which of the two customers are similar now?
Which two of the customers are similar in this case?
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
39
![Page 40: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/40.jpg)
Distance Measures
• Euclidean distance• City-block (Manhattan) distance
• Chebychev similarity• Minkowski distance• Mahalanobis distance• Maximum distance• Cosine similarity• Simple correlation between observations• Minimum distance• Weighted distance• Venkat Reddy distance
• Not sure all these measures will result in same clusters in the above example.
• So, same ML algorithms with different results. Generally this is not the case with conventional methods
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
40
![Page 41: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/41.jpg)
Thank you
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
41
![Page 42: Machine Learning for Dummies](https://reader034.vdocument.in/reader034/viewer/2022042518/54c66ac84a795944538b4571/html5/thumbnails/42.jpg)
More ..
Big
dat
a A
nal
ytic
s
Ven
kat
Red
dy
42
http://www.slideshare.net/21_venkat/presentations
https://www.facebook.com/groups/SASanalysts/