trade-offs in explanatory model learning
DESCRIPTION
1. 22 nd of February 2012. Trade-offs in Explanatory Model Learning . DCAP Meeting Madalina Fiterau. 2. Outline. Motivation: need for interpretable models Overview of data analysis tools Model evaluation – accuracy vs complexity Model evaluation – understandability - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/1.jpg)
Trade-offs in Explanatory Model Learning
DCAP MeetingMadalina Fiterau
22nd of February 2012
1
![Page 2: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/2.jpg)
Outline•Motivation: need for interpretable models
•Overview of data analysis tools
•Model evaluation – accuracy vs complexity
•Model evaluation – understandability
•Example applications
•Summary
2
![Page 3: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/3.jpg)
Example Application:Nuclear Threat Detection•Border control: vehicles are scanned•Human in the loop interpreting results
vehicle scan
prediction
feedback
3
![Page 4: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/4.jpg)
Boosted Decision Stumps•Accurate, but hard to interpret
How is the prediction derived from the input?
4
![Page 5: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/5.jpg)
Decision Tree – More Interpretable
Radiation > x%
Payload type = ceramics
Uranium level > max. admissible for ceramicsConsider balance
of Th232, Ra226 and Co60
Clear
yes no
yes
no
Threat
yes
no
5
![Page 6: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/6.jpg)
Motivation6
Many users are willing to trade accuracy to
better understand the system-yielded results
Need: simple, interpretable model
Need: explanatory prediction process
![Page 7: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/7.jpg)
Analysis Tools – Black-box• Very accurate tree ensemble• L. Breiman,‘Random Forests’,
2001
Random Forests
• Guarantee: decreases training error
• R. Schapire, ‘The boosting approach to machine learning’
Boosting
• Bagged boosting• G. Webb, ‘MultiBoosting: A
Technique for Combining Boosting and Weighted Bagging’
Multi-boosting
7
![Page 8: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/8.jpg)
Analysis Tools – White-box• Decision tree based on the Gini
Impurity criterionCART
• Dec. tree with leaf classifiers• K. Ting, G. Webb, ‘FaSS:
Ensembles for Stable Learners’Feating
• Ensemble: each discriminator trained on a random subset of features
• R. Bryll, ‘Attribute bagging ’
Subspacing
• Builds a decision list that selects the classifier to deal with a query point
EOP
8
![Page 9: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/9.jpg)
Explanation-Oriented Partitioning
-4-3
-2-1
01
23
45
-3-2
-10
12
34
5-3
-2
-1
0
1
2
3
4
5
2 Gaussians Uniform cube
-4 -3 -2 -1 0 1 2 3 4 5-3
-2
-1
0
1
2
3
4
5
(X,Y) plot
9
![Page 10: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/10.jpg)
EOP Execution Example – 3D data
Step 1: Select a projection - (X1,X2)
10
![Page 11: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/11.jpg)
Step 1: Select a projection - (X1,X2)
11
EOP Execution Example – 3D data
![Page 12: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/12.jpg)
Step 2: Choose a good classifier - call it h1
h1
12
EOP Execution Example – 3D data
![Page 13: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/13.jpg)
Step 2: Choose a good classifier - call it h1
13
EOP Execution Example – 3D data
![Page 14: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/14.jpg)
Step 3: Estimate accuracy of h1 at each point
OK NOT OK
14
EOP Execution Example – 3D data
![Page 15: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/15.jpg)
Step 3: Estimate accuracy of h1 for each point
15
EOP Execution Example – 3D data
![Page 16: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/16.jpg)
Step 4: Identify high accuracy regions
16
EOP Execution Example – 3D data
![Page 17: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/17.jpg)
Step 4: Identify high accuracy regions
17
EOP Execution Example – 3D data
![Page 18: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/18.jpg)
Step 5:Training points - removed from consideration
18
EOP Execution Example – 3D data
![Page 19: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/19.jpg)
19
Step 5:Training points - removed from consideration
EOP Execution Example – 3D data
![Page 20: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/20.jpg)
Finished first iteration
20
EOP Execution Example – 3D data
![Page 21: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/21.jpg)
21
EOP Execution Example – 3D data
Finished second iteration
![Page 22: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/22.jpg)
Iterate until all data is accounted for or error cannot be decreased
22
EOP Execution Example – 3D data
![Page 23: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/23.jpg)
Learned Model – Processing query [x1x2x3]
[x1x2] in R1 ?
[x2x3] in R2 ?
[x1x3] in R3 ?
h1(x1x2
)
h2(x2x3
)
h3(x1x3
)
Default Value
yes
yes
yes
no
no
no
23
![Page 24: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/24.jpg)
Parametric / Nonparametric RegionsBounding Polyhedra Nearest-neighbor ScoreEnclose points in convex shapes (hyper-rectangles /spheres).
Consider the k-nearest neighborsRegion: { X | Score(X) > t}t – learned threshold
Easy to test inclusion Easy to test inclusionVisually appealing Can look insularInflexible Deals with
irregularities
24
decision
pn1
n2
n3
n4
n5
Incorrectly classified Correctly classified Query point
decision
![Page 25: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/25.jpg)
EOP in context
Local models
Models trained on all features
Feating
25
Similarities Differences
CART Decision structure
Default classifiers in leafs
Subspacing Low-d projection Keeps all data
points
Boosting
MultiboostingCommittee decision Gradually
deals with difficult data
Ensemble learner
![Page 26: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/26.jpg)
Outline•Motivation: need for interpretable models
•Overview of data analysis tools
•Model evaluation – accuracy vs complexity
•Model evaluation – understandability
•Example applications
•Summary
26
![Page 27: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/27.jpg)
Overview of datasets•Real valued features, binary output•Artificial data – 10 features
▫Low-d Gaussians/uniform cubes•UCI repository•Application-related datasets•Results by k-fold cross validation
▫Accuracy▫Complexity = expected number of vector
operations performed for a classification task▫Understandability = w.r.t. acknowledged
metrics
27
![Page 28: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/28.jpg)
EOP vs AdaBoost - SVM base classifiers•EOP is often less accurate, but not
significantly•the reduction of complexity is statistically
significant
p-value of paired t-test: 0.832 p-value of paired t-test: 0.003
1
3
5
7
9
0.840.860.88 0.9 0.920.940.960.98 1BoostingEOP (nonparametric)
1
3
5
7
9
0 50 100 150 200 250Accuracy Complexity
28
mean diff in accuracy: 0.5% mean diff in complexity: 85
![Page 29: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/29.jpg)
EOP (stumps as base classifiers) vs CART on data from the UCI repository
BCW
MB
V
BT
0 0.2 0.4 0.6 0.8 1Accuracy
0 5 10 15 20 25 30 35Complexity
CARTEOP N.EOP P.
29
Dataset # of Features # of PointsBreast Tissue 10 1006Vowel 9 990MiniBOONE 10 5000Breast Cancer 10 596
CART is the most accurate
Parametric EOP yields the simplest models
![Page 30: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/30.jpg)
Typical XOR dataset
30
Why are EOP models less complex?
![Page 31: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/31.jpg)
Typical XOR dataset
CART• is accurate• takes many iterations• does not uncover or leverage structure of data
31
Why are EOP models less complex?
![Page 32: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/32.jpg)
Typical XOR dataset
EOP• equally accurate •uncovers structure
Iteration 1
Iteration 2
32
CART• is accurate• takes many iterations• does not uncover or leverage structure of data
+ o
o +
Why are EOP models less complex?
![Page 33: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/33.jpg)
• At low complexities, EOP is typically more accurate
Error Variation With Model Complexity for EOP and CART
Depth of decision tree/list
EO
P_E
rror
-CAR
T_E
rror
34
1 2 3 4 5 6 7 8
-2
-1.5
-1
-0.5
0
0.5
BCWBTMBVowel
![Page 34: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/34.jpg)
UCI data – Accuracy
BCW
MB
BT
Vow
0 0.2 0.4 0.6 0.8 1 1.2
R-EOP
N-EOP
CART
Feating
Sub-spacing
Multiboosting
Random Forests
35
White box models – including EOP – do not lose much
![Page 35: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/35.jpg)
UCI data – Model complexity
BCW
MB
BT
Vow
0 10 20 30 40 50 60 70
R-EOP
N-EOP
CART
Feating
Sub-spacing
Multiboosting
36
Complexity of Random Forests is huge- thousands of nodes -
White box models – especially EOP – are less complex
![Page 36: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/36.jpg)
Robustness•Accuracy-targeting EOP
▫Identifies which portions of the data can be confidently classified with a given rate.
▫Allowed to set aside the noisy part of the data.
37
Accuracy of AT-EOP on 3D data
Max allowed expected error
Accu
rac
y
![Page 37: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/37.jpg)
Outline•Motivation: need for interpretable models
•Overview of data analysis tools
•Model evaluation – accuracy vs complexity
•Model evaluation – understandability
•Example applications
•Summary
38
![Page 38: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/38.jpg)
Metrics of Explainability*39
Lift
Bayes Factor
J-Score
Normalized Mutual
Information
* L.Geng and H. J. Hamilton - ‘Interestingness measures for data mining: A survey’
![Page 39: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/39.jpg)
Evaluation with usefulness metrics •For 3 out of 4 metrics, EOP beats
CARTCART EOP
BF L J NMI BF L J NMIMB 1.982 0.004 0.389 0.040 1.889 0.007 0.201 0.502
BCW 1.057 0.007 0.004 0.011 2.204 0.069 0.150 0.635BT 0.000 0.009 0.210 0.000 Inf 0.021 0.088 0.643V Inf 0.020 0.210 0.010 2.166 0.040 0.177 0.383
Mean 1.520 0.010 0.203 0.015 2.047 0.034 0.154 0.541
BF =Bayes Factor. L = Lift. J = J-score. NMI = Normalized Mutual Info
40
Higher values are better
![Page 40: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/40.jpg)
Outline•Motivation: need for interpretable models
•Overview of data analysis tools
•Model evaluation – accuracy vs complexity
•Model evaluation – understandability
•Example applications
•Summary
41
![Page 41: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/41.jpg)
Spam Detection (UCI ‘SPAMBASE’)•10 features: frequencies of misc. words in e-
mails•Output: spam or not
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
0102030405060708090
100Splits
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
0.65
0.7
0.75
0.8
0.85
0.9 Accuracy
42
Complexity
![Page 42: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/42.jpg)
Spam Detection – Iteration 1▫classifier labels everything as spam▫high confidence regions do enclose mostly
spam and: Incidence of the word ‘your’ is low Length of text in capital letters is high
43
![Page 43: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/43.jpg)
Spam Detection – Iteration 2▫the required incidence
of capitals is increased▫the square region on the
left also encloses examples that will be marked as `not spam'
44
![Page 44: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/44.jpg)
Spam Detection – Iteration 345
word_frequency_hi
▫Classifier marks everything as spam▫Frequency of ‘our’ and ‘hi’ determine the regions
![Page 45: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/45.jpg)
46
Example ApplicationsStem Cells
ICU Medication
Fuel Consumption
Nuclear Threats
explain relationships between treatment and cell evolution
identify combinations of drugs that correlate with readmissions
identify causes of excess fuel consumption among driving behaviors
support interpretation of radioactivity detected in cargo containers
sparse featuresclass imbalance
- high-d data- train/test from different distributions
adapted regression problem
![Page 46: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/46.jpg)
Effects of Cell Treatment•Monitored population of cells•7 features: cycle time, area, perimeter ...•Task: determine which cells were treated
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
0.70.720.740.760.780.8 Accuracy
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
05
10152025 Splits
47
Complexity
![Page 47: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/47.jpg)
48
![Page 48: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/48.jpg)
Mimic Medication Data•Information about administered
medication•Features: dosage for each drug•Task: predict patient return to ICU
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
0.99150.992
0.99250.993
0.99350.994
0.9945Accuracy
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
0
5
10
15
20
25Splits
49
Complexity
![Page 49: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/49.jpg)
50
![Page 50: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/50.jpg)
Predicting Fuel Consumption•10 features: vehicle and driving style
characteristics•Output: fuel consumption level (high/low)
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
00.10.20.30.40.50.60.70.8 Accuracy
Mult
iboos
ting
Sub-s
pacin
g
Featin
gCART
N-EOP
R-EOP
Adabo
ost
0
5
10
15
20
25Splits
51
Complexity
![Page 51: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/51.jpg)
52
![Page 52: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/52.jpg)
Nuclear threat detection data325 Features•Random Forests accuracy: 0.94•Rectangular EOP accuracy: 0.881… butRegions found in 1st iteration for Fold 0:
▫incident.riidFeatures.SNR [2.90,9.2]▫Incident.riidFeatures.gammaDose
[0,1.86]*10-8
Regions found in 1st iteration for Fold 1:▫incident.rpmFeatures.gamma.sigma [2.5,
17.381]▫incident.rpmFeatures.gammaStatistics.ske
wdose [1.31,…]
No
mat
ch53
![Page 53: Trade-offs in Explanatory Model Learning](https://reader036.vdocument.in/reader036/viewer/2022081512/568165eb550346895dd90dd3/html5/thumbnails/53.jpg)
Summary•In most cases EOP:
▫ maintains accuracy▫reduces complexity▫identifies useful aspects of the data
•EOP typically wins in terms of expressiveness
•Open questions:▫What if no good low-dimensional projections
exist?▫What to do with inconsistent models in folds of
CV?▫What is the best metric of explainability?
55