top 5 algorithms used in data science
TRANSCRIPT
![Page 1: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/1.jpg)
www.edureka.co/data-science
Top 5 Algorithms Used in Data Science
![Page 2: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/2.jpg)
Slide 2 www.edureka.co/data-science
What are we going to learn today ?
At the end of the session you will be able to understand : What is Data Science
What does Data Scientists do
Top 5 Data Science Algorithms Decision Tree Random Forest Association Rule Mining Linear Regression K-Means Clustering
Demo on K-Means Clustering algorithm
![Page 3: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/3.jpg)
Slide 3 www.edureka.co/data-science
Data Science
![Page 4: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/4.jpg)
Slide 4 www.edureka.co/data-science
What is Data Science ?
Data science is nothing but extracting meaningful and actionable knowledge from data
![Page 5: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/5.jpg)
Slide 5 www.edureka.co/data-science
Who are Data Scientists ?
Basically data scientists are humans who have multitude of skills and who love playing with data
![Page 6: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/6.jpg)
Slide 6 www.edureka.co/data-science
Data Science from 1000 feet
Data ScienceVisualization
Data EngineeringStatistics
Advanced Computing
Domain Expertise
![Page 7: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/7.jpg)
Slide 7 www.edureka.co/data-science
Arsenal of a Data Scientist
Data Science
Data ArchitectureTool: Hadoop
Machine LearningTool: Mahout, Weka, Spark MLlib
AnalyticsTool: R, Python
Note that evaluating different machine learning algorithms is a daily work of a data scientist. So it becomes very important for a data scientist to have a good grip over various machine learning algorithms.
![Page 8: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/8.jpg)
Slide 8 www.edureka.co/data-science
Machine Learning Machine Learning is a method of teaching computers to make and improve predictions based on dataMachine learning is a huge field, with hundreds of different algorithms for solving myriad different problems
Supervised Learning : The categories of the data is already knownUnsupervised Learning : The learning process attempts to find appropriate category for the data
![Page 9: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/9.jpg)
Slide 9 www.edureka.co/data-science
Decision Tree
Decision Tree
![Page 10: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/10.jpg)
Slide 10 www.edureka.co/data-science
Decision Tree Example
Training
Data
![Page 11: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/11.jpg)
Slide 11 www.edureka.co/data-science
Decision Tree, Root : StudentStep-1
StudentNO YES
![Page 12: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/12.jpg)
Slide 12 www.edureka.co/data-science
Decision Tree, Root : StudentStep-2
Student
IncomeIncome
High
Medium LowMedium
High
NoYes
![Page 13: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/13.jpg)
Slide 13 www.edureka.co/data-science
Decision Tree, Root : StudentStep-3
Student
IncomeIncome
NoYes
YES YES
High Medium Low
Medium
High
![Page 14: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/14.jpg)
Slide 14 www.edureka.co/data-science
Decision Tree, Root : Student
Student
Income Income
Age CRCR
YES YES
No
Yes
High Medium
< = 30
31….40 Fair
Excellent
Low Medium
High
Fair
Excellent
Step-4
![Page 15: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/15.jpg)
Slide 15 www.edureka.co/data-science
Decision Tree, Root : Student
Student
Income Income
Age CRCR
NoYes
Yes
YesYes
No
Yes
High Medium
< = 30
31….40
Low Medium
High
Fair Excellent Fa
ir
Excellent
Step-5
![Page 16: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/16.jpg)
Slide 16 www.edureka.co/data-science
Decision Tree, Root : StudentStudent
Income Income
Age CR
No
Yes
High Medium
NoYes
< = 30
31….40
Age
Age
Yes No
> 40
< = 30
NoYes
> 40 31….40
CR
Age
Yes No> 40
31….40
Yes
Yes Yes
Fair
Excellent
Fair
Excellent
Low
Medium
High
Step-6
![Page 17: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/17.jpg)
Slide 17 www.edureka.co/data-science
Decision Tree, Root : Student
1. student(no)^income(high)^age(<=30) => buys_computer(no) 2. student(no)^income(high)^age(31…40) => buys_computer(yes) 3. student(no)^income(medium)^CR(fair)^age(>40) => buys_computer(yes) 4. student(no)^income(medium)^CR(fair)^age(<=30) => buys_computer(no) 5. student(no)^income(medium)^CR(excellent)^age(>40) => buys_computer(no) 6. student(no)^income(medium)^CR(excellent)^age(31..40) =>buys_computer(yes) 7. student(yes)^income(low)^CR(fair) => buys_computer(yes) 8. student(yes)^income(low)^CR(excellent)^age(31..40) => buys_computer(yes) 9. student(yes)^income(low)^CR(excellent)^age(>40) => buys_computer(no) 10. student(yes)^income(medium)=> buys_computer(yes) 11. student(yes)^income(high)=> buys_computer(yes)
Classification rules :
![Page 18: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/18.jpg)
Slide 18 www.edureka.co/data-science
Random Forest
Random Forest
![Page 19: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/19.jpg)
Slide 19 www.edureka.co/data-science
Random Forest : Example
Suppose you're very indecisive about watching a movie.
“Edge of Tomorrow”
You can do one of the following :
1. Either you ask your best friend, whether you will like the movie.
2. Or You can ask your group of friends.
![Page 20: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/20.jpg)
Slide 20 www.edureka.co/data-science
Random Forest : Example
In order to answer, your best friend first needs to figure out what movies you like, so you give her a bunch of movies and tell her whether you liked each one or not (i.e., you give her a labelled training set)
Example:Do you like movies starring Emily Blunt ?
AskBest
Friend
Is it based on a true incident?
Does Emily Blunt star in it?
No Is she the main lead?
Yes, You will like the movie
No YesNo, You will not like the
movie
No, You will not like the movie
![Page 21: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/21.jpg)
Slide 21 www.edureka.co/data-science
Random Forest : Example
But your best friend might not always generalize your preferences very well (i.e., she overfits)
In order to get more accurate recommendations, you'd like to ask a bunch of your friends e.g. Friend#1, Friend#2, and Friend#3 and they vote on whether you will like a movie
The majority of the votes will decide the final outcome
![Page 22: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/22.jpg)
Slide 22 www.edureka.co/data-science
Random Forest : Example
You didn’t like ‘Far and
away’
You liked ‘Oblivion’
You like action movies
You like Tom Cruise
You like his pairing with Emily Blunt
Yes, You will like the movie
Yes, You will like the movie
Yes, You will like the movie
Friend 2
You did not like ‘Top
Gun’
You loved ‘Godzilla’
Friend 1
No, You will not like the
movie
Yes, You will like the movie
You hate Tom Cruise
Friend 3
No, You will not like the movie
![Page 23: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/23.jpg)
Slide 23 www.edureka.co/data-science
What is Random Forest ?Random Forest is an ensemble classifier made using many decision tree models.
What are ensemble models?
Ensemble models combine the results from different models.
The result from an ensemble model is usually better than the result from one of the individual models.
![Page 24: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/24.jpg)
Slide 24 www.edureka.co/data-science
Association Rule Mining
Association Rule Mining
![Page 25: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/25.jpg)
Slide 25 www.edureka.co/data-science
Association Rule Mining
![Page 26: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/26.jpg)
Slide 26 www.edureka.co/data-science
Association Rule Mining
Association Rule Mining is a popular and well researched method for discovering interesting relations between variables in large data.
The rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat.
![Page 27: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/27.jpg)
Slide 27 www.edureka.co/data-science
Linear Regression
Linear Regression
![Page 28: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/28.jpg)
Slide 28 www.edureka.co/data-science
Regression Analysis – Linear Regression
Regression analysis helps understand how value of dependent variable changes when any one of independent variable changes, while other independent variables are kept fixed
Linear Regression is the most popular algorithm used for prediction and forecasting
![Page 29: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/29.jpg)
Slide 29 www.edureka.co/data-science
K-Means Clustering
K-Means Clustering
![Page 30: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/30.jpg)
Slide 30 www.edureka.co/data-science
K-Means Clustering
The process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group, but as much similar as possible within each group.
The objects in group 1 should be as similar as possible.
But there should be much difference between objects in different groups
The attributes of the objects are allowed to determine which objects should be grouped together.
Total population
Group 1
Group 2 Group 3
Group 4
![Page 31: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/31.jpg)
Slide 31 www.edureka.co/data-science
Hands-On
Demo K-Means Clustering
![Page 32: Top 5 algorithms used in Data Science](https://reader033.vdocument.in/reader033/viewer/2022042907/58cf2d891a28ab00168b514b/html5/thumbnails/32.jpg)
Slide 32 Course Url
Thank You …
Questions/Queries/FeedbackRecording and presentation will be made available to you within 24 hours