towards explainable prediction models on high-dimensional ... · behavioral & textual data by...

91
Towards Explainable Prediction Models on High-dimensional Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens Research Seminar – June 19, 2020 – 11am Faculty of Business & Economics, University of Antwerp

Upload: others

Post on 08-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Towards Explainable Prediction Models on High-dimensional

Behavioral & Textual data by Yanou Ramon

supervisor: Prof. David Martens

Research Seminar – June 19, 2020 – 11amFaculty of Business & Economics, University of Antwerp

Page 2: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

ABOUT ME

PhD student at University of AntwerpApplied Data Mining Group - Prof. David Martens

Towards explainable prediction models on high-dimensional behavioral and textual data

M.Sc. in Business Engineering (Finance)University of Antwerp

Big Data, Data Mining, Artificial Intelligence, programming etc.

YANOU RAMON

119th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 3: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Prediction model

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

0

0

0

0

000

0

0

0

0

0

0

0

Predicted

value of

variable of

interest

Behavioral and textual data (High-dimensional & sparse)

2

DATA-DRIVEN DECISION-MAKING

automated

decisions or

decision

support

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 4: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

LOCATION DATA

smartphone sensor data (GPS locations), online “check-ins”,…

Example applications:• Ecommerce: efficient parcel delivery• Psychological/behavioral profiling• Customer relationship management • Political party preference & orientation• Daily habits, interests & preferences

319th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 5: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

LOCATION DATA

smartphone sensor data (GPS locations), online “check-ins”,…

Example applications:• Ecommerce: efficient parcel delivery• Psychological/behavioral profiling• Customer relationship management • Political party preference & orientation• Daily habits, interests & preferences

4

Financial Times, April 2020

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 6: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

SOCIAL MEDIA & BROWSING DATA

Facebook/Instagram “likes”, Twitter posts, online reviews/blogposts, search queries,…

Example applications:• Psychological/behavioral profiling• Product interest & online targeted advertising• Political party preference & orientation• Behavioral credit scoring

519th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 7: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

SOCIAL MEDIA & BROWSING DATA

Facebook/Instagram “likes”, Twitter posts, online reviews/blogposts, search queries,…

Example applications:• Psychological/behavioral profiling• Product interest & online targeted advertising• Political party preference & orientation• Behavioral credit scoring

6

Business Insider, November 2017; Matz et al., 2017

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 8: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

SOCIAL MEDIA & BROWSING DATA

Facebook/Instagram “likes”, Twitter posts, online reviews/blogposts, search queries,…But also: “metadata”

Example applications:• Psychological/behavioral profiling• Product interest & online targeted advertising• Political party preference & orientation• Behavioral credit scoring

7

de Montjoye et al., 2013; Financial Times, March 2019

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 9: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Prediction model

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

0

0

0

0

000

0

0

0

0

0

0

0

Predicted

value of

variable of

interest

Behavioral and textual data (High-dimensional & sparse)

8

DATA-DRIVEN DECISION-MAKING

Thousands of coefficients Nonlinear techniques

“Black Box”?

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 10: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Prediction model

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

0

0

0

0

000

0

0

0

0

0

0

0

Predicted

value of

variable of

interest

Behavioral and textual data (High-dimensional & sparse)

9

DATA-DRIVEN DECISION-MAKING

Thousands of coefficients Nonlinear techniques

“Black Box”?

“EXplainable Artificial Intelligence (XAI)” “Interpretable Machine Learning”

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 11: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

MOTIVATION

10

TRUST

INSIGHT

IMPROVEWrong

Accurate

To what extent is the prediction (model) in line with expectations?

(Martens, 2020)

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 12: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EXPLAINING PREDICTION MODELS

EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)

DIMENSIONS

11

Scope Global Instance-level

Flexibility Model-specific Model-agnostic

Faithfulness Intrinsic Post-hoc

Output format Rule, importance-ranked list, visualization, linear model,…

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 13: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EXPLAINING PREDICTION MODELS

EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)

DIMENSIONS

12

Scope Global Instance-level

Flexibility Model-specific Model-agnostic

Faithfulness Intrinsic Post-hoc

Output format Rule, importance-ranked list, visualization, linear model,…

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 14: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EXPLAINING PREDICTION MODELS

EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)

DIMENSIONS

13

Scope Global Instance-level

Flexibility Model-specific Model-agnostic

Faithfulness Intrinsic Post-hoc

Output format Rule, importance-ranked list, visualization, linear model,…

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 15: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EXPLAINING PREDICTION MODELS

EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)

DIMENSIONS

14

Scope Global Instance-level

Flexibility Model-specific Model-agnostic

Faithfulness Intrinsic Post-hoc

Output format Rule, importance-ranked list, visualization, linear model,…

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 16: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EXPLAINING PREDICTION MODELS

EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)

DIMENSIONS

15

Scope Global Instance-level

Flexibility Model-specific Model-agnostic

Faithfulness Intrinsic Post-hoc

Output format Rule, importance-ranked list, visualization, linear model,…

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 17: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

OVERVIEW OF PROJECTS

I. Deep Learning for Big, Sparse, Behavioral data De Cnudde et al., Big Data (2019)

II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)

III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)

1619th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 18: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

OVERVIEW OF PROJECTS

17

GLOBALINSTANCE-

LEVEL

Theoretical

research

Applied

research

Validation of methods in (business) applications

Metafeatures-based

rule-extraction

Counterfactual

explanation algorithms

Psychological

profiling/targeting

User study about

explanation attributes

Churn prediction …

Towards explainable prediction models on

high-dimensional behavioral and textual data

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 19: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

PROJECT 1

Page 20: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

INSTANCE-LEVEL EXPLANATION ALGORITHMS ON BEHAVIORAL AND TEXTUAL DATA:

A COUNTERFACTUAL-ORIENTED COMPARISON

19

Yanou Ramon, David Martens, Foster Provost, Theodoros EvgeniouForthcoming in Advances in Data Analysis and Classification (2020)

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 21: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

PROBLEM STATEMENT

Page 22: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Co

lum

bia

Un

ive

rsit

y

Tim

e

Sq

uare

DU

MB

O

Ch

els

ea

Mark

et

Targ

et

To

uri

st

Anna 1 1 1 … 0 1

Jack 1 0 0 … 1 0

… … … … … … …

Bill 0 0 1 … 0 0

evidence “present” = active feature

LOCATION DATA NYC: tourist or citizen?

21

data matrix is very high-dimensional and sparse

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 23: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

“Black Box” model Thousands of coefficients Nonlinear techniques

? ො𝑦 = 1 if touristelse ො𝑦 = 0

LOCATION DATA NYC

22

(Local) interpretability issues Counterfactual explanations

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 24: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

COUNTERFACTUAL EXPLANATIONS

● Instance-level● Causality within the model● Minimal set of features such that the predicted class changes

when “removing” them (setting value to zero)● Very intuitive and comprehensible contrastive nature

“Why X rather than not-X?” (Miller, 2017)

2319th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 25: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

COUNTERFACTUAL EXPLANATIONS

EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)

DIMENSIONS

24

Scope Global Instance-level

Flexibility Model-specific Model-agnostic

Faithfulness Intrinsic Post-hoc

Output format Rule, importance-ranked list, visualization, linear model,…

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 26: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

COUNTERFACTUAL EXPLANATIONS

Example: Tourist prediction using NYC location data

Anna visited 120 places last monthAnna was predicted as “tourist”

2519th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 27: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

COUNTERFACTUAL EXPLANATIONS

Example: Tourist prediction using NYC location data

Anna visited 120 places last monthAnna was predicted as “tourist”

WHY?

2619th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 28: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

COUNTERFACTUAL EXPLANATIONS

Example: Tourist prediction using NYC location data

Anna visited 120 places last monthAnna was predicted as “tourist”

IF Anna would not have visited {Time Square, DUMBO},THEN the predicted class changes from “tourist” to “NY citizen”

2719th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 29: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

COUNTERFACTUAL ALGORITHMS

Page 30: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

DESIDERATA

● Model-agnostic algorithm● Find minimum-sized counterfactual explanation 𝐸 for a

single model prediction of instance 𝐱

2919th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 31: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

DESIDERATA

● Model-agnostic algorithm● Find minimum-sized counterfactual explanation 𝐸 for a

single model prediction of instance 𝐱

3019th of June, Online Research Seminar, Explaining prediction models on Big Data

More actionable: e.g., “cloak” fewer online traces to get a desired outcome (not be targeted with ads of gay bars)

More comprehensible(~cognitive limitations)

Page 32: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

FORMAL OBJECTIVE FUNCTION

● Original instance 𝐱 vs Example: NYC location dataperturbed instance 𝐳

3119th of June, Online Research Seminar, Explaining prediction models on Big Data

𝐱𝒛𝟏𝒛𝟐

𝐼 forms a subset of the set of indices of the “active” features of 𝐱

𝑬∗ = {Time Square, DUMBO}

𝒛∗ = 𝒛𝟏

Page 33: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

FORMAL OBJECTIVE FUNCTION

• Original instance 𝐱 vs perturbed instance 𝐳

• Find 𝐳∗ (or 𝑬∗) that is as close as possible to 𝐱 and has a different predicted class

3219th of June, Online Research Seminar, Explaining prediction models on Big Data

cosine distance predicted class change only “active” features”are perturbed

Page 34: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

FORMAL OBJECTIVE FUNCTION

3319th of June, Online Research Seminar, Explaining prediction models on Big Data

𝐱

𝒛∗ = 𝒛𝟏

𝒛𝟐

𝒛𝟑𝒛𝟒

𝒛𝟓

𝒅

tourist

NY citizen

𝒛∗ = 𝒙\{Time Square, DUMBO}

perturbed instances

original instance

𝒅 cosine distance

𝑬∗ = {Time Square, DUMBO}

𝒛𝟔

Page 35: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

WHY COMPLETE SEARCH FAILS

● Start with removing one feature and increase number of features in the subset until the predicted class changes

● Scales exponentially with active features 𝑚 and required number of features 𝑘 to be removede.g., for an instance with m features, a combination of 𝑘 features requires 𝑚!

𝑚−𝑘 !𝑘!evaluations

3419th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 36: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

BEST-FIRST SEARCH (SEDC)

● Explaining document classifications (Martens & Provost, 2013)

● Model-agnostic algorithm SEDC: heuristic best-first search● Optimal for linear models

Implementation on https://github.com/yramon/edc

3519th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 37: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

BEST-FIRST SEARCH (SEDC)

Check “active” features

Expand best-first feature (set) with one extra feature

Counterfactual explanation found

Class change?

Class change?

No?

No? Yes?

Yes?

3619th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 38: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

NOVEL HYBRID ALGORITHMS

Additive Feature Attribution (AFA) methods:● LIME: Local Model-agnostic Explainer (Ribeiro et al., 2016)

● SHAP: Shapley Additive Explanations (Lundberg et al., 2018)

Output: Importance-ranked list

3719th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 39: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

NOVEL HYBRID ALGORITHMSLIME / SHAP Example: Tourist prediction using NYC location data

0.211 Time Square

0.205 DUMBO

0.202 Central Park

0.197 Top of the Rock

0.192 MoMA

0.186 Fifth Avenue

0.183 Eataly

Washington Square Park -0.185

3819th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 40: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Originality: importance rankings may be an “intelligent” starting point for efficiently computing counterfactuals

Novel algorithms: LIME–C and SHAP-C

NOVEL HYBRID ALGORITHMS

3919th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 41: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

NOVEL HYBRID ALGORITHMSLIME-C / SHAP-CExample: Tourist prediction using NYC location data

Remove features with positive importanceweight until the class changes

40

0.211 Time Square

0.205 DUMBO

0.202 Central Park

0.197 Top of the Rock

0.192 MoMA

0.186 Fifth Avenue

0.183 Eataly

Washington Square Park -0.185

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 42: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EXPERIMENTAL SETUP

Page 43: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Collect data sets

and build models

Textual data:

linear/rbf SVM

Behavioral data:

LR/MLP

4219th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 44: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Collect data sets

and build models

Textual data:

linear/rbf SVM

Behavioral data:

LR/MLP

Generate

explanations for

test instances

SEDC

LIME-C

SHAP-C

Positively-predicted test

instances

max. 2 minutes

max. 30 features

43

SEDC: max 50 iterations

LIME/SHAP-C: 5000 samples

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 45: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EVALUATION CRITERIA

The goal is to find the minimum-sized counterfactual as fast as possible tradeoff between:

• EffectivenessPercentage explainedSwitching point: amount of features in explanation

• EfficiencyComputation time in seconds

4419th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 46: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

Collect data sets

and build models

Textual data:

linear/rbf SVM

Behavioral data:

LR/MLP

Generate

explanations for

test instances

SEDC

LIME-C

SHAP-C

Positively-predicted test

instances

max. 2 minutes

max. 30 features

Evaluation

Percentage

explained

Switching point

Computation time

45

SEDC: max. 50 iterations

LIME/SHAP-C: 5000 samples

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 47: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

RESULTS & CONCLUSION

Page 48: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

47

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 49: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

48

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 50: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

49

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 51: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

50

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 52: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

51

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 53: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

52

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 54: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

53

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 55: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

54

EFFECTIVENESS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 56: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

55

EFFICIENCY

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 57: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

56

EFFICIENCY

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 58: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

57

EFFICIENCY vs SWITCHING POINT

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 59: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

58

EFFICIENCY

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 60: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

CONCLUSION

• SEDC most efficient and effective for small data instances, however- flaw in heuristic best-first for some nonlinear models

• SHAP-C overall good performance, however- problems with highly unbalanced data- computation time more sensitive to # active features than LIME-C- relatively worse effectiveness/efficiency

LIME-C: suitable alternative to SEDC because of good tradeoff- good effectiveness results for all data and models- low computation times- efficiency least sensitive to switching point

5919th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 61: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

CONCLUSION

• SEDC most efficient and effective for small data instances, however- flaw in heuristic best-first for some nonlinear models

• SHAP-C overall good performance, however- problems with highly unbalanced data- computation time more sensitive to # active features than LIME-C- relatively worse effectiveness/efficiency

LIME-C: suitable alternative to SEDC because of good tradeoff- good effectiveness results for all data and models- low computation times- efficiency least sensitive to switching point

! Also addresses problem of setting complexity of LIME/SHAP explanation

6019th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 62: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

PROJECT 2

Page 63: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

IMPROVING THE COST OF EXPLAINABILITY FOR HIGH-DIMENSIONAL, SPARSE DATA USING METAFEATURES-BASED RULE-EXTRACTION

62

Yanou Ramon, David Martens, Theodoros Evgeniou, Stiene PraetSubmitted in Machine Learning (Special Issue on Feature Engineering)

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 64: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

PROBLEM STATEMENT

Page 65: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

“Black Box” model Thousands of coefficients Nonlinear techniques

? ො𝑦 = 1 if touristelse ො𝑦 = 0

LOCATION DATA NYC

64

(Global) comprehensibility issues Rule-extraction

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 66: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

RULE-EXTRACTION

• Train a comprehensible model (“white-box”) to mimic the predictions of a more complex, highly accurate “black-box” model

• Black-box model: all models on high-dimensional, sparse data • Small decision trees and concise rule sets as “white-boxes”• Black-box model predictions 𝑦𝐵𝐵 are used as new labels instead of

the true labels 𝑦

6519th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 67: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

RULE-EXTRACTION

6619th of June, Online Research Seminar, Explaining prediction models on Big Data

LOCATION DATA NYC

“Eataly”=1

NY citizen“Chelsea

Market”=1

Tourist NY citizen

False

False True

True

Explains global classification behaviour over entire instance/feature space

Page 68: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

CHALLENGES FOR HIGH-DIMENSIONAL, SPARSE DATA

Existing research focuses on low-dimensional, dense data

Challenges1. Complexity of extracted rules 2. Computational complexity3. Fine-grained feature comprehensibility

6719th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 69: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

CHALLENGES FOR HIGH-DIMENSIONAL, SPARSE DATA

Existing research focuses on low-dimensional, dense data

Challenges1. Complexity of extracted rules 2. Computational complexity3. Fine-grained feature comprehensibility

It is questionable whether the original fine-grained (FG) features are the best representation to achieve high explanation quality. This motivates our approach to use “metafeatures”.

6819th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 70: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

METAFEATURES

Address sparsity of fine-grained features by mapping FG data onto a higher-level MF representation: ℎ 𝑥 : 𝑋𝐹𝐺 → 𝑋𝑀𝐹 ⊂ ℝ𝑘

Desired properties1. Low dimensionality2. High density3. Faithfulness4. Mutual exclusivity5. Semantic comprehensibility

6919th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 71: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

GENERATING METAFEATURESBig Behavioral & Text Data Metafeatures

Social media data

(e.g., Facebook “Likes”)

Categories of Facebook “Likes”

(e.g., Humor, Music, Art)

Transaction data Spending categories

(e.g., Gambling, Gift Shops)

Location data Regions/venue types (e.g., Concert

halls, Sports venues)

Textual data Topics

Movie viewing data Movie genres

Web browsing data Words on a page/categories of URLs

Domain-based metafeatures vs data-driven metafeatures

7019th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 72: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

MAIN CLAIM

“Metafeatures” are more appropriate (↑ fidelity, ↑ stability) for extracting comprehensible rules from classifiers that are trained on high-dimensional, sparse data than the original fine-grained features

7119th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 73: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

RULE-EXTRACTION

7219th of June, Online Research Seminar, Explaining prediction models on Big Data

LOCATION DATA NYC

“Eataly”=1

NY citizen“Chelsea

Market”=1

Tourist NY citizen

False

False True

True

Italian restaurants

>= 1

NY citizen Musicals >= 1

NY citizen Tourist

False

False True

True Rule-extraction with metafeatures(‘venue types’)

Rule-extraction with fine-grained features

Page 74: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

PROPOSED METHODOLOGY

Page 75: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

1

Build

classification

model 𝑪𝑩𝑩 from

labeled

training data

{𝑿𝑭𝑮,𝒕𝒓𝒂𝒊𝒏, 𝒀𝒕𝒓𝒂𝒊𝒏}

7419th of June, Online Research Seminar, Explaining prediction models on Big Data

Predict labels

𝑦𝐵𝐵 for all

data

instances

(train, test,

validation)

Generate

metafeatures

𝑿𝑴𝑭

Extract

cognitively

simple rules

using 𝑿𝑴𝑭

and 𝑦𝐵𝐵

Evaluate the

quality of

explanation

rules (fidelity,

stability,

accuracy)

2 3 4 5

PROPOSED METHODOLOGY

Page 76: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

• Domain-based 𝑋𝐷𝑜𝑚𝑎𝑖𝑛𝑀𝐹

• Data-driven approach 𝑋𝐷𝐷𝑀𝐹 approach based on Non-negative Matrix Factorizationparameter of 𝑋𝐷𝐷𝑀𝐹 is 𝑘 (number of generated metafeatures)𝑘 ∈ [10, 1000]

7519th of June, Online Research Seminar, Explaining prediction models on Big Data

GENERATING METAFEATURES

Page 77: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

COGNITIVELY SIMPLE RULE-EXTRACTION

• CART decision tree algorithm (Scikit-learn library in Python)• Based on Gini impurity • Max. tree depth of 5 (~32 rules) in line with cognitive simplicity

arguments and cognitive load theory

7619th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 78: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EVALUATION CRITERIA

• Fidelity: how well does the explanation model 𝐶𝑊𝐵 (extracted rules) approximate the underlying model 𝐶𝐵𝐵?

(“cost of explainability”: 100% - fidelity is the loss in fidelity when replacing the black-box with an explanation model) • Explanation stability: how stable is the explanation model over

different training sessions with (slightly) different training sets?• Accuracy: how well does the explanation model predict true labels 𝑦?

7719th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 79: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

EXPERIMENTAL SETUP

Page 80: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

79

DATA

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 81: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

80

PREDICTION MODELS

19th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 82: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

RESULTS & CONCLUSION

Page 83: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

8219th of June, Online Research Seminar, Explaining prediction models on Big Data

FIDELITY

Page 84: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

8319th of June, Online Research Seminar, Explaining prediction models on Big Data

Correlation between Gini impurityreduction ratio of best FG vs best MF and difference in fidelity: 0.929

FIDELITY

Page 85: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

8419th of June, Online Research Seminar, Explaining prediction models on Big Data

STABILITY - ACCURACY

Page 86: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

CONCLUSION• Metafeatures-based rule-extraction leads to better tradeoffs:

- Improved “cost of explainability”: small trees/rules that explain a large(r) percentage of black-box predictions+5% fidelity, +15% stability, +5% accuracy

• Important tradeoff: increasing the complexity leads to increased fidelity but decreased stability

• Finetune 𝑘 (or any other parameter of explanation model 𝐶𝑊𝐵) to get desired fidelity/stability tradeoff

8519th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 87: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

KEY TAKEAWAYS

Page 88: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

OVERVIEW OF PROJECTS

I. Deep Learning for Big, Sparse, Behavioral data De Cnudde et al., Big Data (2019)

II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)

III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)

8719th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 89: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

OVERVIEW OF PROJECTSI. Deep Learning for Big, Sparse, Behavioral data

De Cnudde et al., Big Data (2019)

II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)

III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)

8819th of June, Online Research Seminar, Explaining prediction models on Big Data

SEDC is most effective/efficient for data with small instances LIME-C algorithm is a good alternative to SEDC algorithm for large data instances

Page 90: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

OVERVIEW OF PROJECTSI. Deep Learning for Big, Sparse, Behavioral data

De Cnudde et al., Big Data (2019)

II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)

III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)

Metafeatures-based rule-extraction improves a key “cost of explainability”: higher fidelity compared to rules using fine-grained features

8919th of June, Online Research Seminar, Explaining prediction models on Big Data

Page 91: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik.

Please keep this slide for attribution.

Further questions?

Mail: [email protected]/in/yanou-ramonhttps://yramon.github.io/www.applieddatamining.com

THANKS!

90