feature selection topics · motivation •common machine learning (ml) problems in hep:...

29
Feature Selection Topics Sergei V. Gleyzer Data Science at the LHC Workshop Nov. 9, 2015

Upload: others

Post on 27-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Feature Selection Topics

Sergei V. Gleyzer

Data Science at the LHC Workshop

Nov. 9, 2015

Page 2: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Outline

• Motivation

• What is Feature Selection

• Feature Selection Methods

• Recent work and ideas

• Caveats

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 2

Page 3: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Motivation

• Common Machine Learning (ML)

problems in HEP:

– Classification or class discrimination

• Higgs event or background?

– Regression or function estimation

• How to best model particle energy based on

detector measurements

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 3

Page 4: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Motivation continued

• While performing data analysis one of

the most crucial decisions is which

features to use

– Garbage In = Garbage Out

– Ingredients:

• Relevance to the problem

• Level of understanding of the feature

• Power of the feature and its relationship with

others

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 4

Page 5: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Goal

• How to:

Select

Assess

Improve

Feature set

used to solve the problem

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 5

Page 6: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Example

• Build a classifier to discriminate events of

different classes based on event kinematics

• Typical initial feature set:

– Functions of object four-vectors in event

– Basic kinematics: transverse momenta,

invariant masses, angular separations

• More complex features relating objects in the event

topology using physics knowledge to help

discriminate among classes (thrusts, helicity e.t.c.)

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 6

Page 7: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Initial Selection

• Features initially chosen due to their

individual performance

– How well does X discriminate between

signal and background?

• Vetos: Is X well-understood?

• Theoretical and other uncertainties

• Monte-Carlo and data agreement

– Arrive at order of 10-30 features (95% use

cases)

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 7

Page 8: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Feature Engineering

• By combining features with each other, boosting into other frames of reference* this set can grow quickly from tens to hundreds of features

– That’s ok if you have enough of computational power

• Still small compared to 100k features of cancer/image recognition datasets

– Balance between Occam’s razor and need for additional performance/power

* JHEP 1104:069,2011 K. Black, et. al.

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 8

Page 9: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Practicum

Training Set

Feature Selection

Model Building

f()

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 9

Training/Test Set

Feature Selection*

Model Building

f()

Test Set f()Estimate

Performance

*Feature Selection Bias

Page 10: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Methods

Filters

Wrappers

Embedded-Hybrid

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 10

Feature Selection

Model Building

Model Building

Feature Selection

Feature Selection during Model

Building

Page 11: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

• Filters: usually fast

– No feedback from the Classifier

– Use correlations/mutual information gain

“quick and dirty” and less accurate

Useful in pre-processing

Example algorithms: information gain, Relief, rank-sum test, e.t.c.

Feature Selection

Model Building

Filter Methods

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 11

Page 12: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Wrapper Methods

• Wrappers: typically slower and relatively more accurate (due to model-building)– Tied to a chosen model:

• Use it to evaluate features

• Assess feature interactions

• Search for optimal subset of features

– Different types: • Methodical

• Probabilistic (random hill-climbing)

• Heuristic (forward backward elimination)

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 12

Model Building

Feature Selection

Page 13: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

• Feature Importance proportional to classifier

performance in which

feature participates

• Full feature set {V}

• Feature subsets {S}

• Classifier performance F(S)

• Fast stochastic version

uses random subset seeds

Ex: Feature Importance

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 13

FI(Xi ) = F(S)´WXi(S)

SÍV :XiÎS

å

)(

}){(1)(

SF

XSFSW i

X i

Page 14: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Example: RuleFit

• Rulefit: rule-based binary classification and regression (J. Friedman)– Transforms decision trees into rule ensembles– A powerful classifier even if some rules are

poor

• Feature Importance:– Proportional to performance of classifiers in

which features participate (similar)– Difference: no Wi(S)

• Individual Classifier Performance evenly divided among participating features

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 14

Page 15: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Selection Caveats

• Feature Selection Bias

– Common mistake leading to over-optimistic evaluation of classifiers from “usage” of the testing dataset

– Solution:

• M-fold cross-validation/Bootstrap

• Second testing sample for evaluation

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 15

Page 16: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Feature Interactions

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 16

Page 17: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Feature Interactions

• Features often interact strongly in the classification process.

– Their removal affects the performance of remaining interacting partners

• Strength of interaction quantifiedby some wrapper methods

– In some classifiers features can be overlooked (or shadowed) by their interacting partners

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 17

Beware of hidden reefs

Page 18: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Selection Caveats

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 18

After

Remove

Before

Importance Landscape Has Changed

Holds for any criterion that doesn’t incorporate interactions

Page 19: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Global Loss Function

• GLoss Function Global measure of loss

– Selects feature subsets for global removal

• Shows the amount of predictive power loss

relative to the upper bound of performance of

remaining classifiers

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 19

S’ is the subset

to be removed'||

)'(

2

)(

1)'(SV

SVS

SF

SGF

'||

)'(

max 2)( SV

SVS

SF

Page 20: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Global Loss and Classifier

Performance

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 20

GLoss Function minimization NOT EQUIVALENT to

Maximization of F(S) – i.e. finding the highest performing

classifier and its constituent features

Page 21: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Recent Work

• Probabilistic Wrapper Methods:

– Stochastic approach

• Hybrid Algorithms:

– combine Filters and Wrappers

• Embedded Methods

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 21

Feature Selection during Model

Building

Feature Selection

Model Building

Feature Selection

Page 22: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Embedded Methods

• At model-building stage assess feature importance and incorporate it in the process

– Way to penalize/remove features in the classification or regression process

• Regularization

• Examples: LASSO, RegTrees

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 22

Page 23: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Regularized Trees

• Inspired by Rules Regularization in

Friedman and Popescu 2008

• Decision Tree Reminder:

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 23

Pt_Jet1Jet2

Ht_AllJets QTimesEta

Shat DeltaRJet1Jet2

BGND

BGND

BGND

BGNDSIGNAL SIGNAL

< 80.46

< 140.1

< 349.3

< 1.12

> 80.46

> 140.1

> 349.3 > 3.05 < 3.05

“Votes” taken at decision

junctions on possible

splits among the features

RegTrees penalize during

voting features similar to

those used in previous

decisions

End up with a high

quality feature set

classifier

Page 24: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Feature Amplification

• Another example: feedback feature

importance into classifier building

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 24

Pt_Jet1Jet2

Ht_AllJets QTimesEta

Shat DeltaRJet1Jet2

BGND

BGND

BGND

BGNDSIGNAL SIGNAL

< 80.46

< 140.1

< 349.3

< 1.12

> 80.46

> 140.1

> 349.3 > 3.05 < 3.05

Weight votes by

log(feature importance)

Page 25: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

The Decade Ahead

Data

New Theory 1

New

Theory 2

p(Data | Theory)

SM me, mµ, mτ

mu, md, ms, mc, mb, mt

θ12, θ23, θ13, δ

g1, g2, g3

θQCD

µ, λ

Basic statistical questions:

1. Which theories are preferred, given the data?

2. And which parameter sub-spaces within these theories?

All interesting theories are multi-parameter models

p(Theory | Data)

Decade Ahead

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 25

H. Prosper

Page 26: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Minimal SUSY

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 26

33

Page 27: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

In HEP

• Often in HEP one searches for new phenomena and applies classifiers trained on MC for at least one of the classes (signal) or sometimes both to real data– Flexibility is KEY to any search

– It is more beneficial to choose a reduced parameter space that consistently produces strong performing classifiers at actual analysis time • Useful for general SUSY and other new phenomena

searches

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 27

Page 28: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Feature Selection Tools

• R (CRAN): Boruta, RFE, CFS, Fselector, caret

• TMVA: FAST algo (stochastic wrapper), Global Loss function

• Scikit-Learn

• Bioconductor

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 28

Page 29: Feature Selection Topics · Motivation •Common Machine Learning (ML) problems in HEP: –Classification or class discrimination •Higgs event or background? –Regression or function

Summary

• Feature selection is important part of robust HEP Machine Learning applications

• Many methods available

• Watch out for caveats

• Happy ANALYZING

Nov. 9, 2015 Sergei V. Gleyzer Data Science at LHC Workshop 29