applied machine learning lecture 3: pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf ·...

42

Upload: others

Post on 24-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Applied Machine LearningLecture 3: Pre-processing steps and

hyperparameters tuning

Selpi ([email protected])

The slides are further development of Richard Johansson's slides

January 28, 2020

Page 2: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Overview

Converting features to numerical vectors

Dealing with missing values

Feature Selection

Hyperparameters Tuning

Imbalanced Classes

Review and Closing

Page 3: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

the �rst step: mapping features to numerical vectors

I scikit-learn's learning methods works with features asnumbers, not strings

I they can't directly use the feature dicts we have stored in X

I converting from string to numbers is the purpose of these lines:vec = DictVectorizer()

Xe = vec.fit_transform(X)

Page 4: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

types of vectorizers

I a DictVectorizer converts from attribute�value dicts:

I a CountVectorizer converts from texts (after splitting intowords) or lists:

I a TfidfVectorizer is like a CountVectorizer, but also usesTF*IDF (downweighting common words)

Page 5: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

what goes on in a DictVectorizer?

I each feature corresponds to one or more columns in the outputmatrix

I easy case: boolean and numerical features:

I for string features, we reserve one column for each possiblevalue: one-hot encodingI that is, we convert to booleans

Page 6: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

what goes on in a DictVectorizer?

I each feature corresponds to one or more columns in the outputmatrix

I easy case: boolean and numerical features:

I for string features, we reserve one column for each possiblevalue: one-hot encodingI that is, we convert to booleans

Page 7: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

code example (DictVectorizer)

from sklearn.feature_extraction import DictVectorizer

X = [{'f1':'B', 'f2':'F', 'f3':False, 'f4':7},

{'f1':'B', 'f2':'M', 'f3':True, 'f4':2},

{'f1':'O', 'f2':'F', 'f3':False, 'f4':9}]

vec = DictVectorizer()

Xe = vec.fit_transform(X)

print(Xe.toarray())

print(vec.get_feature_names())

the result:

[[ 1. 0. 1. 0. 0. 7.]

[ 1. 0. 0. 1. 1. 2.]

[ 0. 1. 1. 0. 0. 9.]]

['f1=B', 'f1=O', 'f2=F', 'f2=M', 'f3', 'f4']

Page 8: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

code example (DictVectorizer)

from sklearn.feature_extraction import DictVectorizer

X = [{'f1':'B', 'f2':'F', 'f3':False, 'f4':7},

{'f1':'B', 'f2':'M', 'f3':True, 'f4':2},

{'f1':'O', 'f2':'F', 'f3':False, 'f4':9}]

vec = DictVectorizer()

Xe = vec.fit_transform(X)

print(Xe.toarray())

print(vec.get_feature_names())

the result:

[[ 1. 0. 1. 0. 0. 7.]

[ 1. 0. 0. 1. 1. 2.]

[ 0. 1. 1. 0. 0. 9.]]

['f1=B', 'f1=O', 'f2=F', 'f2=M', 'f3', 'f4']

Page 9: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Overview

Converting features to numerical vectors

Dealing with missing values

Feature Selection

Hyperparameters Tuning

Imbalanced Classes

Review and Closing

Page 10: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Dealing with missing values

I Remove rows/columns that contain missing values (theeasiest, but ...)

I Replace the missing values with some values; the process iscalled imputation (more strategic)

Page 11: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Imputation of missing values

I Derived data from the same feature (univariate featureimputation)I (e.g., mean, median, most-frequent)

I Derived data from several features

I See scikit-learn Imputation of missing values

Page 12: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Overview

Converting features to numerical vectors

Dealing with missing values

Feature Selection

Hyperparameters Tuning

Imbalanced Classes

Review and Closing

Page 13: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Challenges with large number of features

I Imagine working with two data sets, one with 500 features, theother 10 features. Assume number of samples are moderateand the same in both data sets.

I What are the potential challenges with more features?

I Finding key information/feature(s) becomes more di�cult

I Higher complexity ⇒ bad for generalization

I Training takes longer time

I To solve these issues: try feature selection method

Page 14: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Challenges with large number of features

I Imagine working with two data sets, one with 500 features, theother 10 features. Assume number of samples are moderateand the same in both data sets.

I What are the potential challenges with more features?

I Finding key information/feature(s) becomes more di�cult

I Higher complexity ⇒ bad for generalization

I Training takes longer time

I To solve these issues: try feature selection method

Page 15: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Challenges with large number of features

I Imagine working with two data sets, one with 500 features, theother 10 features. Assume number of samples are moderateand the same in both data sets.

I What are the potential challenges with more features?

I Finding key information/feature(s) becomes more di�cult

I Higher complexity ⇒ bad for generalization

I Training takes longer time

I To solve these issues: try feature selection method

Page 16: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Selecting Features: Brute Force

for every possible set of features S :train and evaluate the model based on S

return the set Smax that gave the best result

Page 17: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Feature Selection: Overall Ideas

I �lter methods:

I wrapper methods:

I embedded methods:

[Source: Wikipedia]

Page 18: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Feature Selection: Overall Ideas

I �lter methods:

I wrapper methods:

I embedded methods:

[Source: Wikipedia]

Page 19: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Feature Selection: Overall Ideas

I �lter methods:

I wrapper methods:

I embedded methods:

[Source: Wikipedia]

Page 20: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Association between a feature and the output (idea)

Page 21: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Feature Ranking Example (document categories)

[Source: Manning & Schütze, Introduction to Information Retrieval]

Page 23: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Example of a wrapper method: greedy forward selection

S = empty setrepeat:�nd the feature F that gives the largest

improvement when added to Sif there was an improvement add F to S

until there was no improvement

I the MLXtend library has an implementationI see SequentialFeatureSelector

Page 24: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Example of a wrapper method: backward selection

I See notebook

I Recursive Feature Elimination and Cross-Validation

(RFECV)

Page 25: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Overview

Converting features to numerical vectors

Dealing with missing values

Feature Selection

Hyperparameters Tuning

Imbalanced Classes

Review and Closing

Page 26: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Hyperparameters vs parameters

I Hyperparameters

I often used to help estimatethe model parameters

I speci�ed by the users

I tuned for the problem athand

I In Random forests: numberof trees, number of featuresconsidered

I In Neural Networks: learningrate, number of hidden layers

I Parameters

I are needed by the model formaking predictions

I the values are estimatedfrom data

I usually are not set by theusers

I In Random forests: randomseed used

I In Neural Networks: weights

Page 27: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Hyperparameters vs parameters

I Hyperparameters

I often used to help estimatethe model parameters

I speci�ed by the users

I tuned for the problem athand

I In Random forests: numberof trees, number of featuresconsidered

I In Neural Networks: learningrate, number of hidden layers

I Parameters

I are needed by the model formaking predictions

I the values are estimatedfrom data

I usually are not set by theusers

I In Random forests: randomseed used

I In Neural Networks: weights

Page 28: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

How to tune the hyperparameters?

I Trial and error

In scikit-learn:http://scikit-learn.org/stable/modules/grid_search.html

I GridSearchCV

I RandomizedSearchCV

Page 29: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Overview

Converting features to numerical vectors

Dealing with missing values

Feature Selection

Hyperparameters Tuning

Imbalanced Classes

Review and Closing

Page 30: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

�Finding needle in haystack� problems

I Finding rare diseases

I Finding anomalies in travelpattern

I Di�erentiating crashes vsnon-crashes situations

I . . .[source]

Page 31: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Evaluation of imbalance cases

def has_disease(patient):

return False

I a DummyClassifier will have a high accuracy

I better to use precision and recall (more later)I or sensitivity/speci�city, or FPR/FNR, . . .

Page 32: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Evaluation of imbalance cases

def has_disease(patient):

return False

I a DummyClassifier will have a high accuracy

I better to use precision and recall (more later)I or sensitivity/speci�city, or FPR/FNR, . . .

Page 33: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Dealing with class imbalance

I Change parameter(s) of the learning algorithm

I Change the data

I Use appropriate performance metric(s)

Page 34: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Dealing with class imbalance

I Change parameter(s) of the learning algorithm

I Change the data

I Use appropriate performance metric(s)

Page 35: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Dealing with class imbalance

I Change parameter(s) of the learning algorithm

I Change the data

I Use appropriate performance metric(s)

Page 36: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Change the parameter(s) of the learning algorithm

I Example in scikit-learn: LinearSVC

Page 37: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Changing the data

[source]

I Or generate synthetic data using the existing data

Page 38: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Changing the data

[source]

I Or generate synthetic data using the existing data

Page 39: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Library for imbalanced-learn

I https://imbalanced-learn.readthedocs.io/

I See for example: BalancedRandomForestClassi�er

Page 40: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Overview

Converting features to numerical vectors

Dealing with missing values

Feature Selection

Hyperparameters Tuning

Imbalanced Classes

Review and Closing

Page 41: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Review of pre-processing and hyperparameters tuning

I Explain di�erent ways of selecting features

I Explain pros/cons of GridSearchCV and RandomizedSearchCV

I Explain di�erent ways of dealing with imbalance classes

Page 42: Applied Machine Learning Lecture 3: Pre-processing steps ...richajo/dit866/lectures/l3/l3.pdf · Feature Ranking Example (document categories) [Source:Manning & Schütze, Introduction

Next lecture

I Linear classi�ers and regression models

I (Methods for) data collection and bias

I Annotating data