Download - Machine Learning using Python
http://ocl.space
http://ocl.space
What is Machine Learning?
http://ocl.space
Types of Machine Learning
● Supervised Learning
● Unsupervised Learning
● Reinforcement Learning
http://ocl.space
Supervised Learning
y = f(X)
X is the features/inputsy is the target/outputf(X) is the learning function
Types :
● Regression
● Classification
http://ocl.space
Unsupervised Learning
● We have input data (X) but no corresponding output variable (y).
● The goal is to model the distribution of the data in order to learn moreabout the data.
● Types of unsupervised learning :
--> Clustering
--> Association
http://ocl.space
Other learning methods...
● Reinforcement learning
● Semi-supervised learning
● Transfer learning
http://ocl.space
Regression
● A form of predictive modelling technique which investigates the relationshipbetween a dependent (target) and independent variable (s) (predictor).
● It is used for forecasting, time series modelling and finding the causal effectrelationship between the variables.
● It indicates the significant relationships between dependent variable and independent variable.
● It indicates the strength of impact of multiple independent variables on a dependent variable.
● Types of regression : Linear, Logistic, Polynomial, Stepwise, Ridge, Lassoand ElasticNet
http://ocl.space
Classification
● A classification problem is when the output variable is a category.
● Examples : Emails filtering, Spam/Not Spam
http://ocl.space
Clustering and Association
● The aim is to segregate groups with similar traits and assign them into clusters.
● Types of Clustering :
--> Hard Clustering: In hard clustering, each data point either belongs to a cluster completely or not.
--> Soft Clustering: In soft clustering, instead of putting each data point into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned.
● When we want to discover rules that describe portions of the input data it is knownas association problem.
http://ocl.space
Linear Regression
● It is used to estimate real values (cost of houses, number of calls, total sales etc.)based on continuous variable(s).
● Here, we establish relationship between independent and dependent variables by fitting a best line.
● This best fit line is known as regression line and represented by a linear equation
Y= a * X + b
Y – Dependent Variablea – SlopeX – Independent variableb – Intercept
http://ocl.space
Linear Regression
http://ocl.space
Logistic Regression
● It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false )based on given set of independent variable(s).
● It predicts the probability of occurrence of an event by fitting data to a logit function.
http://ocl.space
Logistic Regression
http://ocl.space
Overfitting & Underfitting
● Overfitting happens when a model performs too well on training data but does not perform well on unseen data.
● Underfitting when a model does not perform well on training data as well as unseen data.
http://ocl.space
Cross Validation
● A method to test how well a model performs on unseen data.
● Types of Cross Validation methods :
--> Hold out method
--> K-fold method
--> Leave-one-out cross validation
http://ocl.space
Learning = Representation + Evaluation + Optimization
http://ocl.space
Naive Bayes
● Naive Bayes is a supervised learning algorithm which is based on bayes theorem.
● The word naive comes from the assumption of independence among features.
● We can write bayes theorem as follows :
Where,P(x) is the prior probability of a feature.P(x | y) is the probability of a feature given target. It's also known as likelihood.P(y) is the prior probability of a target or class in case of classification.p(y | x) is the posterior probability of target given feature.
http://ocl.space
Support Vector Machines (SVMs)
● SVMs are among the best supervised learning algorithms.
● It is effective in high dimensional space and it is memory efficient as well.
● We plot each data item as a point in n-dimensional space andperform classificationby finding the hyperplane that differentiate the two classes very well.
● We can draw m number of hyperplanes.
● The optimal hyperplane is obtained by maximizing the margin.
http://ocl.space
Support Vector Machines (SVMs)
http://ocl.space
Decision Tree
● Decision Tree is the supervised learning algorithm which can be used for classification as well as regression problems.
● Here we split population into set of homogeneous sets by asking set of questions.
● Example : To decide what to do on a particular day.
http://ocl.space
Decision Tree
http://ocl.space
Random Forest
● Random Forest is the most common type of Ensemble Learning.
● It is a collection of decision trees.
● To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).
● There are plethora of advantages of random forest such as they are fast to train, requires no input preparation.
● One of the disadvantage of random forest is that our model may become too large.
http://ocl.space
K-nearest Neighbors (KNN)
● KNN can be used for both classification and regression problems.
● It stores all available cases and classifies new cases by a majority vote of its k neighbors.
● KNN is computationally expensive.
http://ocl.space
K-means clustering
● K-means is one of the simplest unsupervised learning algorithm used for clustering problem.
● Our goal is to group objects based on their features similarity.
● Basic idea behind K-means is, we define k centroids, that is, one for each cluster.
http://ocl.space
Neural Networks
● Neural Network is an information processing system, that is, we pass some input to the Neural Network, some processing happens and we get some output.
● Neural Networks are inspired from biological connection of neurons and how information processing happens in the brain.
http://ocl.space
Let’s get started...