statistical learninghorebeek/epe/rosset1.pdfoutline • part 1: introduction to statistical learning...

42
Statistical Learning Saharon Rosset Special thanks: Trevor Hastie

Upload: others

Post on 11-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Statistical Learning

Saharon RossetSpecial thanks: Trevor Hastie

Page 2: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Outline• Part 1: Introduction to Statistical Learning

Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani and Friedman (2001)– Motivation and problem examples– Introduction of fundamental concepts:

• Supervised learning: regression and classification • Local models (k-NN, kernel smoothing)• Linear models• Bias-variance tradeoff(s)• Examples

– Illustration through discussion of some simple regression methods: linear regression and k-NN

Page 3: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Outline

• Part 2: Regularization and Boosting– Regularized optimization: introduction and examples– Boosting: introduction and examples– Boosting as approximate L1 regularization

• Part 3: L1 Regularization: statistical and computational properties– Piecewise linear regularized solution paths– L1 regularization in infinite dimensional feature spaces

Page 4: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

ESL Chap1 - Introduction

Statistical Learning Problems• Identify the risk factors for prostate cancer (lcavol), based on clinical and demographic variables.

Page 5: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

• Classify a recorded phoneme, based on a log-periodogram.

Page 6: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

• Predict whether someone will have a heart attack on the basis of demographic, diet and clinical measurements

Page 7: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

• Customize an email spam detection system.

Page 8: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

• Identify the numbers in a handwritten zip code, from a digitized image

Page 9: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

• Classify a tissue sample into one of several cancer classes, based on a gene expression profile.

Page 10: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

• Classify the pixels in a LANDSAT image, according to usage:{red soil, cotton, vegetation stubble, mixture, gray soil, damp gray soil, very damp gray soil}

Page 11: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

The Supervised Learning Problem

• Outcome measurement Y (also called dependent variable, response,

target)

• Vector of p predictor measurements X (also called independent variables, inputs, regressors, covariates, features)

• In regression problems, Y is quantitative (price, blood pressure)

• In classification problems, Y takes values in a finite, unordered set

(survived/died, digit 0-9, cancer class of tissue sample)

We often use G for classification labels (e.g. G ∈ {survived, died})

• We have training data (x1, y1)L(xN , yN). These are

observations (examples, instances) of these measurements.

Page 12: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Objectives

On the basis of the training data we would like to:

• Accurately predict unseen test cases

• Understand which inputs affect the outcome, and how

• Assess the quality of our predictions and inferences

Page 13: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Philosophy

• It is important to understand the ideas behind the various techniques, in order to know how and when to use them.

• One has to understand the simpler methods first, in order to grasp the more sophisticated ones.

• It is important to accurately assess the performance of a method, to know how well or how badly it is working [simpler methods often perform as well as fancier ones!]

• This is an exciting research area, having important applications in science, industry and finance.

Page 14: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

200 points generated in R2 from an unknown distribution; 100 in each of two classes G = {GREEN; RED}. Can we build a rule to predict the color of future points?

Page 15: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Linear Regression

Page 16: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

The decision boundary is the points such that the prediction is 0.5 exactly.

It is linear (obviously) and seems to be making a lot of errors in prediction in this case

Page 17: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Possible Scenarios

Page 18: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

K-Nearest Neighbors

Page 19: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

15-nearest neighbor classification. Fewer training data are misclassified, and the decision boundary adapts to the local densities of the classes.

Page 20: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

1-nearest neighbor classification. None of the training data are misclassified.

Page 21: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Discussion• Linear regression uses 3 parameters to describe its fit.

K-nearest neighbors uses 1, the value of k?

• More realistically, k-nearest neighbors uses N/k effective number of parameters

Many modern procedures are variants of linear regression and K-nearest neighbors:

• Kernel smoothers (or viewed as RKHS regression)• Local linear regression• Linear basis expansions• Projection pursuit and neural networks• Support vector machines and logistic regression

Page 22: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

See page 17 for more details, or the book website for the actual data.

Page 23: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

The Bayes Error is the best performance possible: Using the decision boundary in the image attains this best possible performance

Page 24: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 25: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 26: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 27: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 28: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

How should we choose the right modeling approach?

• We want to minimize EPE• What kind of considerations do we need to keep in

mind?– Data in high dimension is sparse: Curse of Dimensionality

⇒ Makes estimation hard, affects some methods more– If the models we keep are too complex, they will be overfitted

⇒ Have high variance, be unstable– If the models are too simple, they will be too poor to represent

f(x)⇒ Have high bias, predict poorly

In the next few slides we will give a little more detail and examples, will revisit these concepts later

Page 29: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

The bias-variance decomposition

• In a regression setting, using squared loss• Assume we are building a model which predicts• What makes up our expected risk?

)(ˆ xf

22

2

2

))(ˆ)(ˆ())(ˆ()(

))(ˆ)(ˆ)(ˆ(

))(ˆ())(ˆ(

XfXfEEXfEEYYVar

XfXfEXfEEYEYYE

XfYEXfEPE

−+−+=

=−+−+−=

=−=

Irreducible error of best possible estimator:

)|()(ˆ XYEXf =

Squared bias, measuring our model’s lack of expressiveness

Variance of our model’s prediction

Page 30: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 31: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 32: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 33: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani

Effect as dimension p increases

Page 34: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 35: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 36: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 37: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 38: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 39: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 40: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 41: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani
Page 42: Statistical Learninghorebeek/epe/rosset1.pdfOutline • Part 1: Introduction to Statistical Learning Roughly chapters 1-3 of “Elements of Statistical Learning” by Hastie, Tibshirani