middle term exam
DESCRIPTION
Middle Term Exam. 02/28 (Thursday), take home, turn in at noon time of 02/029 (Friday). Project. 03/14 (Phase 1): 10% of training data is available for algorithm development 04/04 (Phase 2): full training data and test examples are available - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/1.jpg)
Logistic Regression
Rong Jin
![Page 2: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/2.jpg)
Logistic Regression
• Generative models often lead to linear decision boundary
• Linear discriminatory model• Directly model the linear decision boundary
• w is the parameter to be decided
![Page 3: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/3.jpg)
Logistic Regression
![Page 4: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/4.jpg)
Logistic Regression
Learn parameter w by Maximum Likelihood Estimation (MLE)
• Given training data
![Page 5: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/5.jpg)
Logistic Regression
• Convex objective function, global optimal• Gradient descent Classification error
![Page 6: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/6.jpg)
Logistic Regression
• Convex objective function, global optimal• Gradient descent Classification error
![Page 7: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/7.jpg)
Illustration of Gradient Descent
![Page 8: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/8.jpg)
How to Decide the Step Size ?
• Back track line search
![Page 9: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/9.jpg)
Example: Heart Disease
• Input feature x: age group id• Output y: if having heart disease
• y=1: having heart disease• y=-1: no heart disease
1: 25-29
2: 30-34
3: 35-39
4: 40-44
5: 45-49
6: 50-54
7: 55-59
8: 60-64
0
2
4
6
8
10
1 2 3 4 5 6 7 8
Age group
Num
ber o
f Peo
ple
No heart Disease
Heart disease
![Page 10: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/10.jpg)
Example: Heart Disease
0
2
4
6
8
10
1 2 3 4 5 6 7 8
Age group
Num
ber o
f Peo
ple
No heart Disease
Heart disease
![Page 11: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/11.jpg)
Example: Text Categorization
Learn to classify text into two categories• Input d: a document, represented by a word
histogram• Output y=1: +1 for political document, -1 for non-
political document
![Page 12: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/12.jpg)
Example: Text Categorization
• Training data
![Page 13: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/13.jpg)
Example 2: Text Classification
• Dataset: Reuter-21578• Classification accuracy
• Naïve Bayes: 77%• Logistic regression: 88%
![Page 14: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/14.jpg)
Logistic Regression vs. Naïve Bayes
• Both are linear decision boundaries
• Naïve Bayes:
• Logistic regression: learn weights by MLE• Both can be viewed as modeling p(d|y)
• Naïve Bayes: independence assumption• Logistic regression: assume an exponential family
distribution for p(d|y) (a broad assumption)
![Page 15: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/15.jpg)
Logistic Regression vs. Naïve Bayes
![Page 16: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/16.jpg)
Discriminative vs. Generative
Discriminative ModelsModel P(y|x) Pros• Usually good performance Cons• Slow convergence• Expensive computation• Sensitive to noise data
Generative ModelsModel P(x|y)Pros• Usually fast converge• Cheap computation• Robust to noise dataCons• Usually performs worse
![Page 17: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/17.jpg)
Overfitting Problem
Consider text categorization
• What is the weight for a word j appears in only one training document dk?
![Page 18: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/18.jpg)
Overfitting Problem
![Page 19: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/19.jpg)
Using regularization Without regularization
Iteration
Overfitting Problem
Decrease in the classification accuracy of test data
![Page 20: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/20.jpg)
Solution: Regularization
Regularized log-likelihood
The effects of regularizer• Favor small weights• Guarantee bounded norm of w• Guarantee the unique solution
![Page 21: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/21.jpg)
Regularized Logistic Regression
Using regularization Without regularization
Iteration
Classification performance by regularization
![Page 22: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/22.jpg)
Regularization as Robust Optimization
• Assume each data point is unknown but bounded in a sphere of radius r and center xi
![Page 23: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/23.jpg)
Sparse Solution by Lasso Regularization
RCV1 collection: • 800K documents• 47K unique words
![Page 24: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/24.jpg)
Sparse Solution by Lasso Regularization
How to solve the optimization problem?• Subgradient descent• Minimax
![Page 25: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/25.jpg)
Bayesian Treatment
• Compute the posterior distribution of w
• Laplacian approximation
![Page 26: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/26.jpg)
Bayesian Treatment
• Laplacian approximation
![Page 27: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/27.jpg)
Multi-class Logistic Regression
• How to extend logistic regression model to multi-class classification ?
![Page 28: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/28.jpg)
Conditional Exponential Model
• Let classes be
• Need to learn
Normalization factor (partition function)
![Page 29: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/29.jpg)
Conditional Exponential Model
• Learn weights ws by maximum likelihood estimation
• Any problem ?
![Page 30: Middle Term Exam](https://reader035.vdocument.in/reader035/viewer/2022062323/5681667d550346895dda2423/html5/thumbnails/30.jpg)
Modified Conditional Exponential Model