towards explainable prediction models on high-dimensional ... · behavioral & textual data by...
TRANSCRIPT
![Page 1: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/1.jpg)
Towards Explainable Prediction Models on High-dimensional
Behavioral & Textual data by Yanou Ramon
supervisor: Prof. David Martens
Research Seminar – June 19, 2020 – 11amFaculty of Business & Economics, University of Antwerp
![Page 2: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/2.jpg)
ABOUT ME
PhD student at University of AntwerpApplied Data Mining Group - Prof. David Martens
Towards explainable prediction models on high-dimensional behavioral and textual data
M.Sc. in Business Engineering (Finance)University of Antwerp
Big Data, Data Mining, Artificial Intelligence, programming etc.
YANOU RAMON
119th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 3: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/3.jpg)
Prediction model
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
0
0
0
0
000
0
0
0
0
0
0
0
Predicted
value of
variable of
interest
Behavioral and textual data (High-dimensional & sparse)
2
DATA-DRIVEN DECISION-MAKING
automated
decisions or
decision
support
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 4: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/4.jpg)
LOCATION DATA
smartphone sensor data (GPS locations), online “check-ins”,…
Example applications:• Ecommerce: efficient parcel delivery• Psychological/behavioral profiling• Customer relationship management • Political party preference & orientation• Daily habits, interests & preferences
319th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 5: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/5.jpg)
LOCATION DATA
smartphone sensor data (GPS locations), online “check-ins”,…
Example applications:• Ecommerce: efficient parcel delivery• Psychological/behavioral profiling• Customer relationship management • Political party preference & orientation• Daily habits, interests & preferences
4
Financial Times, April 2020
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 6: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/6.jpg)
SOCIAL MEDIA & BROWSING DATA
Facebook/Instagram “likes”, Twitter posts, online reviews/blogposts, search queries,…
Example applications:• Psychological/behavioral profiling• Product interest & online targeted advertising• Political party preference & orientation• Behavioral credit scoring
519th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 7: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/7.jpg)
SOCIAL MEDIA & BROWSING DATA
Facebook/Instagram “likes”, Twitter posts, online reviews/blogposts, search queries,…
Example applications:• Psychological/behavioral profiling• Product interest & online targeted advertising• Political party preference & orientation• Behavioral credit scoring
6
Business Insider, November 2017; Matz et al., 2017
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 8: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/8.jpg)
SOCIAL MEDIA & BROWSING DATA
Facebook/Instagram “likes”, Twitter posts, online reviews/blogposts, search queries,…But also: “metadata”
Example applications:• Psychological/behavioral profiling• Product interest & online targeted advertising• Political party preference & orientation• Behavioral credit scoring
7
de Montjoye et al., 2013; Financial Times, March 2019
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 9: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/9.jpg)
Prediction model
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
0
0
0
0
000
0
0
0
0
0
0
0
Predicted
value of
variable of
interest
Behavioral and textual data (High-dimensional & sparse)
8
DATA-DRIVEN DECISION-MAKING
Thousands of coefficients Nonlinear techniques
“Black Box”?
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 10: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/10.jpg)
Prediction model
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
0
0
0
0
000
0
0
0
0
0
0
0
Predicted
value of
variable of
interest
Behavioral and textual data (High-dimensional & sparse)
9
DATA-DRIVEN DECISION-MAKING
Thousands of coefficients Nonlinear techniques
“Black Box”?
“EXplainable Artificial Intelligence (XAI)” “Interpretable Machine Learning”
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 11: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/11.jpg)
MOTIVATION
10
TRUST
INSIGHT
IMPROVEWrong
Accurate
To what extent is the prediction (model) in line with expectations?
(Martens, 2020)
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 12: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/12.jpg)
EXPLAINING PREDICTION MODELS
EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)
DIMENSIONS
11
Scope Global Instance-level
Flexibility Model-specific Model-agnostic
Faithfulness Intrinsic Post-hoc
Output format Rule, importance-ranked list, visualization, linear model,…
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 13: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/13.jpg)
EXPLAINING PREDICTION MODELS
EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)
DIMENSIONS
12
Scope Global Instance-level
Flexibility Model-specific Model-agnostic
Faithfulness Intrinsic Post-hoc
Output format Rule, importance-ranked list, visualization, linear model,…
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 14: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/14.jpg)
EXPLAINING PREDICTION MODELS
EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)
DIMENSIONS
13
Scope Global Instance-level
Flexibility Model-specific Model-agnostic
Faithfulness Intrinsic Post-hoc
Output format Rule, importance-ranked list, visualization, linear model,…
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 15: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/15.jpg)
EXPLAINING PREDICTION MODELS
EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)
DIMENSIONS
14
Scope Global Instance-level
Flexibility Model-specific Model-agnostic
Faithfulness Intrinsic Post-hoc
Output format Rule, importance-ranked list, visualization, linear model,…
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 16: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/16.jpg)
EXPLAINING PREDICTION MODELS
EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)
DIMENSIONS
15
Scope Global Instance-level
Flexibility Model-specific Model-agnostic
Faithfulness Intrinsic Post-hoc
Output format Rule, importance-ranked list, visualization, linear model,…
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 17: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/17.jpg)
OVERVIEW OF PROJECTS
I. Deep Learning for Big, Sparse, Behavioral data De Cnudde et al., Big Data (2019)
II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)
III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)
1619th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 18: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/18.jpg)
OVERVIEW OF PROJECTS
17
GLOBALINSTANCE-
LEVEL
Theoretical
research
Applied
research
Validation of methods in (business) applications
Metafeatures-based
rule-extraction
Counterfactual
explanation algorithms
Psychological
profiling/targeting
User study about
explanation attributes
Churn prediction …
Towards explainable prediction models on
high-dimensional behavioral and textual data
…
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 19: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/19.jpg)
PROJECT 1
![Page 20: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/20.jpg)
INSTANCE-LEVEL EXPLANATION ALGORITHMS ON BEHAVIORAL AND TEXTUAL DATA:
A COUNTERFACTUAL-ORIENTED COMPARISON
19
Yanou Ramon, David Martens, Foster Provost, Theodoros EvgeniouForthcoming in Advances in Data Analysis and Classification (2020)
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 21: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/21.jpg)
PROBLEM STATEMENT
![Page 22: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/22.jpg)
Co
lum
bia
Un
ive
rsit
y
Tim
e
Sq
uare
DU
MB
O
…
Ch
els
ea
Mark
et
Targ
et
To
uri
st
Anna 1 1 1 … 0 1
Jack 1 0 0 … 1 0
… … … … … … …
Bill 0 0 1 … 0 0
evidence “present” = active feature
LOCATION DATA NYC: tourist or citizen?
21
data matrix is very high-dimensional and sparse
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 23: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/23.jpg)
“Black Box” model Thousands of coefficients Nonlinear techniques
? ො𝑦 = 1 if touristelse ො𝑦 = 0
LOCATION DATA NYC
22
(Local) interpretability issues Counterfactual explanations
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 24: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/24.jpg)
COUNTERFACTUAL EXPLANATIONS
● Instance-level● Causality within the model● Minimal set of features such that the predicted class changes
when “removing” them (setting value to zero)● Very intuitive and comprehensible contrastive nature
“Why X rather than not-X?” (Miller, 2017)
2319th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 25: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/25.jpg)
COUNTERFACTUAL EXPLANATIONS
EXPLANATIONS help users to understand the relationship between the input (features) and the model’s predicted output (target)
DIMENSIONS
24
Scope Global Instance-level
Flexibility Model-specific Model-agnostic
Faithfulness Intrinsic Post-hoc
Output format Rule, importance-ranked list, visualization, linear model,…
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 26: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/26.jpg)
COUNTERFACTUAL EXPLANATIONS
Example: Tourist prediction using NYC location data
Anna visited 120 places last monthAnna was predicted as “tourist”
2519th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 27: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/27.jpg)
COUNTERFACTUAL EXPLANATIONS
Example: Tourist prediction using NYC location data
Anna visited 120 places last monthAnna was predicted as “tourist”
WHY?
2619th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 28: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/28.jpg)
COUNTERFACTUAL EXPLANATIONS
Example: Tourist prediction using NYC location data
Anna visited 120 places last monthAnna was predicted as “tourist”
IF Anna would not have visited {Time Square, DUMBO},THEN the predicted class changes from “tourist” to “NY citizen”
2719th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 29: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/29.jpg)
COUNTERFACTUAL ALGORITHMS
![Page 30: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/30.jpg)
DESIDERATA
● Model-agnostic algorithm● Find minimum-sized counterfactual explanation 𝐸 for a
single model prediction of instance 𝐱
2919th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 31: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/31.jpg)
DESIDERATA
● Model-agnostic algorithm● Find minimum-sized counterfactual explanation 𝐸 for a
single model prediction of instance 𝐱
3019th of June, Online Research Seminar, Explaining prediction models on Big Data
More actionable: e.g., “cloak” fewer online traces to get a desired outcome (not be targeted with ads of gay bars)
More comprehensible(~cognitive limitations)
![Page 32: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/32.jpg)
FORMAL OBJECTIVE FUNCTION
● Original instance 𝐱 vs Example: NYC location dataperturbed instance 𝐳
3119th of June, Online Research Seminar, Explaining prediction models on Big Data
𝐱𝒛𝟏𝒛𝟐
𝐼 forms a subset of the set of indices of the “active” features of 𝐱
𝑬∗ = {Time Square, DUMBO}
𝒛∗ = 𝒛𝟏
![Page 33: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/33.jpg)
FORMAL OBJECTIVE FUNCTION
• Original instance 𝐱 vs perturbed instance 𝐳
• Find 𝐳∗ (or 𝑬∗) that is as close as possible to 𝐱 and has a different predicted class
3219th of June, Online Research Seminar, Explaining prediction models on Big Data
cosine distance predicted class change only “active” features”are perturbed
![Page 34: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/34.jpg)
FORMAL OBJECTIVE FUNCTION
3319th of June, Online Research Seminar, Explaining prediction models on Big Data
𝐱
𝒛∗ = 𝒛𝟏
𝒛𝟐
𝒛𝟑𝒛𝟒
𝒛𝟓
𝒅
tourist
NY citizen
𝒛∗ = 𝒙\{Time Square, DUMBO}
perturbed instances
original instance
𝒅 cosine distance
𝑬∗ = {Time Square, DUMBO}
𝒛𝟔
![Page 35: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/35.jpg)
WHY COMPLETE SEARCH FAILS
● Start with removing one feature and increase number of features in the subset until the predicted class changes
● Scales exponentially with active features 𝑚 and required number of features 𝑘 to be removede.g., for an instance with m features, a combination of 𝑘 features requires 𝑚!
𝑚−𝑘 !𝑘!evaluations
3419th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 36: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/36.jpg)
BEST-FIRST SEARCH (SEDC)
● Explaining document classifications (Martens & Provost, 2013)
● Model-agnostic algorithm SEDC: heuristic best-first search● Optimal for linear models
Implementation on https://github.com/yramon/edc
3519th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 37: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/37.jpg)
BEST-FIRST SEARCH (SEDC)
Check “active” features
Expand best-first feature (set) with one extra feature
Counterfactual explanation found
Class change?
Class change?
No?
No? Yes?
Yes?
3619th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 38: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/38.jpg)
NOVEL HYBRID ALGORITHMS
Additive Feature Attribution (AFA) methods:● LIME: Local Model-agnostic Explainer (Ribeiro et al., 2016)
● SHAP: Shapley Additive Explanations (Lundberg et al., 2018)
Output: Importance-ranked list
3719th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 39: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/39.jpg)
NOVEL HYBRID ALGORITHMSLIME / SHAP Example: Tourist prediction using NYC location data
…
0.211 Time Square
0.205 DUMBO
0.202 Central Park
0.197 Top of the Rock
0.192 MoMA
0.186 Fifth Avenue
0.183 Eataly
Washington Square Park -0.185
3819th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 40: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/40.jpg)
Originality: importance rankings may be an “intelligent” starting point for efficiently computing counterfactuals
Novel algorithms: LIME–C and SHAP-C
NOVEL HYBRID ALGORITHMS
3919th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 41: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/41.jpg)
NOVEL HYBRID ALGORITHMSLIME-C / SHAP-CExample: Tourist prediction using NYC location data
Remove features with positive importanceweight until the class changes
40
…
0.211 Time Square
0.205 DUMBO
0.202 Central Park
0.197 Top of the Rock
0.192 MoMA
0.186 Fifth Avenue
0.183 Eataly
Washington Square Park -0.185
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 42: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/42.jpg)
EXPERIMENTAL SETUP
![Page 43: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/43.jpg)
Collect data sets
and build models
Textual data:
linear/rbf SVM
Behavioral data:
LR/MLP
4219th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 44: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/44.jpg)
Collect data sets
and build models
Textual data:
linear/rbf SVM
Behavioral data:
LR/MLP
Generate
explanations for
test instances
SEDC
LIME-C
SHAP-C
Positively-predicted test
instances
max. 2 minutes
max. 30 features
43
SEDC: max 50 iterations
LIME/SHAP-C: 5000 samples
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 45: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/45.jpg)
EVALUATION CRITERIA
The goal is to find the minimum-sized counterfactual as fast as possible tradeoff between:
• EffectivenessPercentage explainedSwitching point: amount of features in explanation
• EfficiencyComputation time in seconds
4419th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 46: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/46.jpg)
Collect data sets
and build models
Textual data:
linear/rbf SVM
Behavioral data:
LR/MLP
Generate
explanations for
test instances
SEDC
LIME-C
SHAP-C
Positively-predicted test
instances
max. 2 minutes
max. 30 features
Evaluation
Percentage
explained
Switching point
Computation time
45
SEDC: max. 50 iterations
LIME/SHAP-C: 5000 samples
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 47: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/47.jpg)
RESULTS & CONCLUSION
![Page 48: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/48.jpg)
47
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 49: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/49.jpg)
48
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 50: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/50.jpg)
49
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 51: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/51.jpg)
50
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 52: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/52.jpg)
51
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 53: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/53.jpg)
52
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 54: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/54.jpg)
53
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 55: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/55.jpg)
54
EFFECTIVENESS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 56: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/56.jpg)
55
EFFICIENCY
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 57: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/57.jpg)
56
EFFICIENCY
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 58: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/58.jpg)
57
EFFICIENCY vs SWITCHING POINT
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 59: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/59.jpg)
58
EFFICIENCY
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 60: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/60.jpg)
CONCLUSION
• SEDC most efficient and effective for small data instances, however- flaw in heuristic best-first for some nonlinear models
• SHAP-C overall good performance, however- problems with highly unbalanced data- computation time more sensitive to # active features than LIME-C- relatively worse effectiveness/efficiency
LIME-C: suitable alternative to SEDC because of good tradeoff- good effectiveness results for all data and models- low computation times- efficiency least sensitive to switching point
5919th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 61: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/61.jpg)
CONCLUSION
• SEDC most efficient and effective for small data instances, however- flaw in heuristic best-first for some nonlinear models
• SHAP-C overall good performance, however- problems with highly unbalanced data- computation time more sensitive to # active features than LIME-C- relatively worse effectiveness/efficiency
LIME-C: suitable alternative to SEDC because of good tradeoff- good effectiveness results for all data and models- low computation times- efficiency least sensitive to switching point
! Also addresses problem of setting complexity of LIME/SHAP explanation
6019th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 62: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/62.jpg)
PROJECT 2
![Page 63: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/63.jpg)
IMPROVING THE COST OF EXPLAINABILITY FOR HIGH-DIMENSIONAL, SPARSE DATA USING METAFEATURES-BASED RULE-EXTRACTION
62
Yanou Ramon, David Martens, Theodoros Evgeniou, Stiene PraetSubmitted in Machine Learning (Special Issue on Feature Engineering)
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 64: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/64.jpg)
PROBLEM STATEMENT
![Page 65: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/65.jpg)
“Black Box” model Thousands of coefficients Nonlinear techniques
? ො𝑦 = 1 if touristelse ො𝑦 = 0
LOCATION DATA NYC
64
(Global) comprehensibility issues Rule-extraction
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 66: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/66.jpg)
RULE-EXTRACTION
• Train a comprehensible model (“white-box”) to mimic the predictions of a more complex, highly accurate “black-box” model
• Black-box model: all models on high-dimensional, sparse data • Small decision trees and concise rule sets as “white-boxes”• Black-box model predictions 𝑦𝐵𝐵 are used as new labels instead of
the true labels 𝑦
6519th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 67: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/67.jpg)
RULE-EXTRACTION
6619th of June, Online Research Seminar, Explaining prediction models on Big Data
LOCATION DATA NYC
“Eataly”=1
NY citizen“Chelsea
Market”=1
Tourist NY citizen
False
False True
True
Explains global classification behaviour over entire instance/feature space
![Page 68: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/68.jpg)
CHALLENGES FOR HIGH-DIMENSIONAL, SPARSE DATA
Existing research focuses on low-dimensional, dense data
Challenges1. Complexity of extracted rules 2. Computational complexity3. Fine-grained feature comprehensibility
6719th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 69: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/69.jpg)
CHALLENGES FOR HIGH-DIMENSIONAL, SPARSE DATA
Existing research focuses on low-dimensional, dense data
Challenges1. Complexity of extracted rules 2. Computational complexity3. Fine-grained feature comprehensibility
It is questionable whether the original fine-grained (FG) features are the best representation to achieve high explanation quality. This motivates our approach to use “metafeatures”.
6819th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 70: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/70.jpg)
METAFEATURES
Address sparsity of fine-grained features by mapping FG data onto a higher-level MF representation: ℎ 𝑥 : 𝑋𝐹𝐺 → 𝑋𝑀𝐹 ⊂ ℝ𝑘
Desired properties1. Low dimensionality2. High density3. Faithfulness4. Mutual exclusivity5. Semantic comprehensibility
6919th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 71: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/71.jpg)
GENERATING METAFEATURESBig Behavioral & Text Data Metafeatures
Social media data
(e.g., Facebook “Likes”)
Categories of Facebook “Likes”
(e.g., Humor, Music, Art)
Transaction data Spending categories
(e.g., Gambling, Gift Shops)
Location data Regions/venue types (e.g., Concert
halls, Sports venues)
Textual data Topics
Movie viewing data Movie genres
Web browsing data Words on a page/categories of URLs
Domain-based metafeatures vs data-driven metafeatures
7019th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 72: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/72.jpg)
MAIN CLAIM
“Metafeatures” are more appropriate (↑ fidelity, ↑ stability) for extracting comprehensible rules from classifiers that are trained on high-dimensional, sparse data than the original fine-grained features
7119th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 73: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/73.jpg)
RULE-EXTRACTION
7219th of June, Online Research Seminar, Explaining prediction models on Big Data
LOCATION DATA NYC
“Eataly”=1
NY citizen“Chelsea
Market”=1
Tourist NY citizen
False
False True
True
Italian restaurants
>= 1
NY citizen Musicals >= 1
NY citizen Tourist
False
False True
True Rule-extraction with metafeatures(‘venue types’)
Rule-extraction with fine-grained features
![Page 74: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/74.jpg)
PROPOSED METHODOLOGY
![Page 75: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/75.jpg)
1
Build
classification
model 𝑪𝑩𝑩 from
labeled
training data
{𝑿𝑭𝑮,𝒕𝒓𝒂𝒊𝒏, 𝒀𝒕𝒓𝒂𝒊𝒏}
7419th of June, Online Research Seminar, Explaining prediction models on Big Data
Predict labels
𝑦𝐵𝐵 for all
data
instances
(train, test,
validation)
Generate
metafeatures
𝑿𝑴𝑭
Extract
cognitively
simple rules
using 𝑿𝑴𝑭
and 𝑦𝐵𝐵
Evaluate the
quality of
explanation
rules (fidelity,
stability,
accuracy)
2 3 4 5
PROPOSED METHODOLOGY
![Page 76: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/76.jpg)
• Domain-based 𝑋𝐷𝑜𝑚𝑎𝑖𝑛𝑀𝐹
• Data-driven approach 𝑋𝐷𝐷𝑀𝐹 approach based on Non-negative Matrix Factorizationparameter of 𝑋𝐷𝐷𝑀𝐹 is 𝑘 (number of generated metafeatures)𝑘 ∈ [10, 1000]
7519th of June, Online Research Seminar, Explaining prediction models on Big Data
GENERATING METAFEATURES
![Page 77: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/77.jpg)
COGNITIVELY SIMPLE RULE-EXTRACTION
• CART decision tree algorithm (Scikit-learn library in Python)• Based on Gini impurity • Max. tree depth of 5 (~32 rules) in line with cognitive simplicity
arguments and cognitive load theory
7619th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 78: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/78.jpg)
EVALUATION CRITERIA
• Fidelity: how well does the explanation model 𝐶𝑊𝐵 (extracted rules) approximate the underlying model 𝐶𝐵𝐵?
(“cost of explainability”: 100% - fidelity is the loss in fidelity when replacing the black-box with an explanation model) • Explanation stability: how stable is the explanation model over
different training sessions with (slightly) different training sets?• Accuracy: how well does the explanation model predict true labels 𝑦?
7719th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 79: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/79.jpg)
EXPERIMENTAL SETUP
![Page 80: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/80.jpg)
79
DATA
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 81: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/81.jpg)
80
PREDICTION MODELS
19th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 82: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/82.jpg)
RESULTS & CONCLUSION
![Page 83: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/83.jpg)
8219th of June, Online Research Seminar, Explaining prediction models on Big Data
FIDELITY
![Page 84: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/84.jpg)
8319th of June, Online Research Seminar, Explaining prediction models on Big Data
Correlation between Gini impurityreduction ratio of best FG vs best MF and difference in fidelity: 0.929
FIDELITY
![Page 85: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/85.jpg)
8419th of June, Online Research Seminar, Explaining prediction models on Big Data
STABILITY - ACCURACY
![Page 86: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/86.jpg)
CONCLUSION• Metafeatures-based rule-extraction leads to better tradeoffs:
- Improved “cost of explainability”: small trees/rules that explain a large(r) percentage of black-box predictions+5% fidelity, +15% stability, +5% accuracy
• Important tradeoff: increasing the complexity leads to increased fidelity but decreased stability
• Finetune 𝑘 (or any other parameter of explanation model 𝐶𝑊𝐵) to get desired fidelity/stability tradeoff
8519th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 87: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/87.jpg)
KEY TAKEAWAYS
![Page 88: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/88.jpg)
OVERVIEW OF PROJECTS
I. Deep Learning for Big, Sparse, Behavioral data De Cnudde et al., Big Data (2019)
II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)
III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)
8719th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 89: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/89.jpg)
OVERVIEW OF PROJECTSI. Deep Learning for Big, Sparse, Behavioral data
De Cnudde et al., Big Data (2019)
II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)
III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)
8819th of June, Online Research Seminar, Explaining prediction models on Big Data
SEDC is most effective/efficient for data with small instances LIME-C algorithm is a good alternative to SEDC algorithm for large data instances
![Page 90: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/90.jpg)
OVERVIEW OF PROJECTSI. Deep Learning for Big, Sparse, Behavioral data
De Cnudde et al., Big Data (2019)
II. Instance-level explanation algorithms on behavioural and textual data: a counterfactual-oriented comparisonRamon et al., Forthcoming in Advances in Data Analysis and Classification (2020)
III. Improving the cost of explainability for high-dimensional, sparse data using metafeatures-based rule-extraction Ramon et al., Submitted to Machine Learning (2020)
Metafeatures-based rule-extraction improves a key “cost of explainability”: higher fidelity compared to rules using fine-grained features
8919th of June, Online Research Seminar, Explaining prediction models on Big Data
![Page 91: Towards Explainable Prediction Models on High-dimensional ... · Behavioral & Textual data by Yanou Ramon supervisor: Prof. David Martens ... 1 19th of June, Online Research Seminar,](https://reader034.vdocument.in/reader034/viewer/2022052023/6039067849ec797f433c075c/html5/thumbnails/91.jpg)
CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik.
Please keep this slide for attribution.
Further questions?
Mail: [email protected]/in/yanou-ramonhttps://yramon.github.io/www.applieddatamining.com
THANKS!
90