machine learning using python

27

Upload: suraj-kumar-jana

Post on 24-Jan-2018

69 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Machine Learning using Python
Page 2: Machine Learning using Python

http://ocl.space

Page 3: Machine Learning using Python

http://ocl.space

What is Machine Learning?

Page 4: Machine Learning using Python

http://ocl.space

Types of Machine Learning

● Supervised Learning

● Unsupervised Learning

● Reinforcement Learning

Page 5: Machine Learning using Python

http://ocl.space

Supervised Learning

y = f(X)

X is the features/inputsy is the target/outputf(X) is the learning function

Types :

● Regression

● Classification

Page 6: Machine Learning using Python

http://ocl.space

Unsupervised Learning

● We have input data (X) but no corresponding output variable (y).

● The goal is to model the distribution of the data in order to learn moreabout the data.

● Types of unsupervised learning :

--> Clustering

--> Association

Page 7: Machine Learning using Python

http://ocl.space

Other learning methods...

● Reinforcement learning

● Semi-supervised learning

● Transfer learning

Page 8: Machine Learning using Python

http://ocl.space

Regression

● A form of predictive modelling technique which investigates the relationshipbetween a dependent (target) and independent variable (s) (predictor).

● It is used for forecasting, time series modelling and finding the causal effectrelationship between the variables.

● It indicates the significant relationships between dependent variable and independent variable.

● It indicates the strength of impact of multiple independent variables on a dependent variable.

● Types of regression : Linear, Logistic, Polynomial, Stepwise, Ridge, Lassoand ElasticNet

Page 9: Machine Learning using Python

http://ocl.space

Classification

● A classification problem is when the output variable is a category.

● Examples : Emails filtering, Spam/Not Spam

Page 10: Machine Learning using Python

http://ocl.space

Clustering and Association

● The aim is to segregate groups with similar traits and assign them into clusters.

● Types of Clustering :

--> Hard Clustering: In hard clustering, each data point either belongs to a cluster completely or not.

--> Soft Clustering: In soft clustering, instead of putting each data point into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned.

● When we want to discover rules that describe portions of the input data it is knownas association problem.

Page 11: Machine Learning using Python

http://ocl.space

Linear Regression

● It is used to estimate real values (cost of houses, number of calls, total sales etc.)based on continuous variable(s).

● Here, we establish relationship between independent and dependent variables by fitting a best line.

● This best fit line is known as regression line and represented by a linear equation

Y= a * X + b

Y – Dependent Variablea – SlopeX – Independent variableb – Intercept

Page 12: Machine Learning using Python

http://ocl.space

Linear Regression

Page 13: Machine Learning using Python

http://ocl.space

Logistic Regression

● It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false )based on given set of independent variable(s).

● It predicts the probability of occurrence of an event by fitting data to a logit function.

Page 14: Machine Learning using Python

http://ocl.space

Logistic Regression

Page 15: Machine Learning using Python

http://ocl.space

Overfitting & Underfitting

● Overfitting happens when a model performs too well on training data but does not perform well on unseen data.

● Underfitting when a model does not perform well on training data as well as unseen data.

Page 16: Machine Learning using Python

http://ocl.space

Cross Validation

● A method to test how well a model performs on unseen data.

● Types of Cross Validation methods :

--> Hold out method

--> K-fold method

--> Leave-one-out cross validation

Page 17: Machine Learning using Python

http://ocl.space

Learning = Representation + Evaluation + Optimization

Page 18: Machine Learning using Python

http://ocl.space

Naive Bayes

● Naive Bayes is a supervised learning algorithm which is based on bayes theorem.

● The word naive comes from the assumption of independence among features.

● We can write bayes theorem as follows :

Where,P(x) is the prior probability of a feature.P(x | y) is the probability of a feature given target. It's also known as likelihood.P(y) is the prior probability of a target or class in case of classification.p(y | x) is the posterior probability of target given feature.

Page 19: Machine Learning using Python

http://ocl.space

Support Vector Machines (SVMs)

● SVMs are among the best supervised learning algorithms.

● It is effective in high dimensional space and it is memory efficient as well.

● We plot each data item as a point in n-dimensional space andperform classificationby finding the hyperplane that differentiate the two classes very well.

● We can draw m number of hyperplanes.

● The optimal hyperplane is obtained by maximizing the margin.

Page 20: Machine Learning using Python

http://ocl.space

Support Vector Machines (SVMs)

Page 21: Machine Learning using Python

http://ocl.space

Decision Tree

● Decision Tree is the supervised learning algorithm which can be used for classification as well as regression problems.

● Here we split population into set of homogeneous sets by asking set of questions.

● Example : To decide what to do on a particular day.

Page 22: Machine Learning using Python

http://ocl.space

Decision Tree

Page 23: Machine Learning using Python

http://ocl.space

Random Forest

● Random Forest is the most common type of Ensemble Learning.

● It is a collection of decision trees.

● To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

● There are plethora of advantages of random forest such as they are fast to train, requires no input preparation.

● One of the disadvantage of random forest is that our model may become too large.

Page 24: Machine Learning using Python

http://ocl.space

K-nearest Neighbors (KNN)

● KNN can be used for both classification and regression problems.

● It stores all available cases and classifies new cases by a majority vote of its k neighbors.

● KNN is computationally expensive.

Page 25: Machine Learning using Python

http://ocl.space

K-means clustering

● K-means is one of the simplest unsupervised learning algorithm used for clustering problem.

● Our goal is to group objects based on their features similarity.

● Basic idea behind K-means is, we define k centroids, that is, one for each cluster.

Page 26: Machine Learning using Python

http://ocl.space

Neural Networks

● Neural Network is an information processing system, that is, we pass some input to the Neural Network, some processing happens and we get some output.

● Neural Networks are inspired from biological connection of neurons and how information processing happens in the brain.

Page 27: Machine Learning using Python

http://ocl.space

Let’s get started...