machine learning and data mining i spring 2021 · 2021. 2. 23. · lda vs logistic regression...

29
DS 4400 Alina Oprea Associate Professor Khoury College of Computer Science Northeastern University February 23 2021 Machine Learning and Data Mining I Spring 2021

Upload: others

Post on 13-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

DS 4400

Alina OpreaAssociate Professor

Khoury College of Computer ScienceNortheastern University

February 23 2021

Machine Learning and Data Mining ISpring 2021

Page 2: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Announcements

2

• Homework 3 will be out later today– Due on Monday, March 8

• Project proposal is due on March 4• Midterm exam is on Tuesday, March 2– During class on Gradescope, 11:45am:1:45pm– Short review on Thursday, Feb. 25

Page 3: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Project Proposal• Project Title• Project Team• Problem Description

– What is the prediction problem you are trying to solve?• Dataset

– Link to data, brief description, number of records, feature dimensionality (at least 20K records)

• Approach and methodology– Normalization – Feature selection – Machine learning models you will try (recommended >= 4)– Language and packages you plan to use

• Metrics (how you will evaluate your models) • References

– How did you find out about the dataset, did anyone else used the data for a similar prediction task

3

Page 4: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Outline• Generative vs Discriminative Models• Linear Discriminant Analysis (LDA)– Training and inference– Why LDA is a linear classifier

• Lab Logistic Regression, LDA, and kNN• Density Estimation and Naïve Bayes

4

Page 5: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Generative vs Discriminative• Generative model– Given X and Y, learns the joint probability 𝑃 𝑋, 𝑌– Can generate more examples from distribution– Examples: LDA, Naïve Bayes, language models

(GPT-2, GPT-3, BERT)

• Discriminative model – Given X and Y, learns a decision function for

classification– Examples: logistic regression, kNN

5

Page 6: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA• Classify to one of k classes• Logistic regression computes directly– P 𝑌 = 1 𝑋 = 𝑥– Assume sigmoid function

• LDA uses Bayes Theorem to estimate it

6

Page 7: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDAAssume 𝑓! 𝑥 is Gaussian!Unidimensional case (d=1)

7

Page 8: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Continuous Random Variables

• X:U⟶V is continuous RV if it takes infinite number of values• The cumulative distribution function CDF F: R ⟶ {0,1} for X is

defined for every value x by:F(x) = Pr(X £ x)

• The probability distribution function PDF f(x) for X isf(x) = dF(x)/dx

A pdf and associated cdf

Increasing

8

Page 9: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Gaussian Distribution

9

Page 10: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDAAssume 𝑓! 𝑥 is Gaussian!Unidimensional case (d=1)

10

Page 11: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA Training and TestingGiven training data 𝑥! , 𝑦! , 𝑖 = 1,… , 𝑛, 𝑦! ∈ {1,… , 𝐾}

1. Estimate mean and variance

2. Estimate prior

Given testing point 𝑥, predict k that maximizes:

12

Page 12: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA decision boundary

13

Page 13: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA decision boundaryPick class k to maximize

Example: 𝑘 = 2, 𝜋" = 𝜋#

14

Page 14: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA decision boundaryPick class k to maximize

Example: 𝑘 = 2, 𝜋" = 𝜋#Classify as class 1 if 𝑥 > $!%$"

#

True decision boundary Estimated decision boundary15

Page 15: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA Training and TestingGiven training data 𝑥! , 𝑦! , 𝑖 = 1,… , 𝑛, 𝑦! ∈ {1,… , 𝐾}

1. Estimate mean and variance

2. Estimate prior

Given testing point 𝑥, predict k that maximizes:

16

Page 16: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Multi-Dimensional LDA

17

• LDA can be extended to multi-dimensional data• Assumption that 𝑓!(𝑥) is a multi-variate Gaussian

Page 17: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Linear models

• Logistic regression

• LDA

𝑀𝑎𝑥"

18

Page 18: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA vs Logistic Regression

19

Page 19: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA vs Logistic Regression• Logistic regression computes directly Pr 𝑌 = 1 𝑋 = 𝑥 by

assuming sigmoid function– Uses Maximum Likelihood Estimation– Discriminative Model

• LDA uses Bayes Theorem to estimate it– Estimates mean, co-variance, and prior from training data– Generative model– Assumes Gaussian distribution for 𝑓! 𝑥 = Pr 𝑋 = 𝑥 𝑌 = 𝑘

• Which one is better?– LDA can be sensitive to outliers– LDA works well for Gaussian distribution– Logistic regression is more complex to solve, but provides a

better model usually20

Page 20: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Linear Classifier Lab

21

https://www.kaggle.com/ronitf/heart-disease-uci

Page 21: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Metrics for LR

22

Page 22: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Classification Report

23

Page 23: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

ROC Curve

24

Page 24: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Lab LDA

25

Page 25: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA Metrics

26

Page 26: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

LDA ROC Curve

27

Page 27: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Lab kNN

28

Page 28: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Lab kNN

29

Page 29: Machine Learning and Data Mining I Spring 2021 · 2021. 2. 23. · LDA vs Logistic Regression •Logistic regression computes directly Pr.=10="by assuming sigmoid function –Uses

Acknowledgements

• Slides made using resources from:– Andrew Ng– Eric Eaton– David Sontag

• Thanks!

30