ben gal
TRANSCRIPT
“To Explain or To Predict”
“To Know or To Act” (Pure Science vs. Engineering, 2004)
Using Target-Based Bayesian Nets for Suspects Monitoring (joint work with A. Gruber and S. Yanovski)
Irad Ben-Gal
Tel Aviv University
Tel Aviv University Department of Industrial Engineering
DOE: Vs-optimal designs Ginsburg & Ben-Gal (2004)
f(x) known: f(x)/x=0 x*
f(x) unknown:
Estimate g(x) (Meta Model: DOE, RSM,…)
g(x)/x=0 x* (R.V.)
‘Scientists’ (to Know): Best estimation of f(x) min V() (e.g., D-optimal exp.)
‘Practitioner’ (to act) : Best estimation of x* min V(x*) (new DOE optimality criterion)
f(x) x (control) Y (output)
Tel Aviv University Department of Industrial Engineering
The Bias-Variance Tradeoff
Tel Aviv University Department of Industrial Engineering
Presentation Layout
Bayesian networks and classifiers
Targeted Bayesian Network Learning (TBNL) (with Gruber)
TBNL application on suspects monitoring
Summary
4/35
Tel Aviv University Department of Industrial Engineering
Bayesian Networks (Pearl, 85)
Tel Aviv University Department of Industrial Engineering
What is a Bayesian Network?
X1 X2 X3 X4 Prob.
1 1 1 2 0.083
1 1 2 2 0.167
1 2 2 3 0.25
2 2 1 1 0.25
2 2 2 1 0.25
Joint Probability
Distribution
),,|(),|()|()()(2341234232
XXXXPXXXPXXPXPP X
A Complete
Bayesian Network
encodes the domain’s JPD ),( ΘGB
EV ,G = Directed Acyclic Graph
X2 1 2
1 0.33 0.33
2 0.67 0.67
6/35
)(3
XΘ
Factorization
Tel Aviv University Department of Industrial Engineering
Explain or Predict (classify)
Tree / GBN Chow & Liu (1968)
Williamson (2000) TBNL
Gruber & Ben-Gal (2010)
True distribution
Modeled distribution
Objective
Principle Minimize Minimize
Consequence
Maximize
Maximize
Maximize
)( Xp
)( Xq
)( Xp
XX qpD ||KL
i
iiZXI ;
)( Xp
)( Xq
ii
XqXpD ||KL
ii
ZXI ;
ijZX
jjZXI ;
''|
\'
xx
Xx
pXpXp
iX
ii
11/35
Tel Aviv University Department of Industrial Engineering
Unconstrained Learning
GBN (adding-arrows) Target-Oriented (TBNL)
Assume is the target variable 3
X
i=1 i=4 i=3 i=1 i=4
Equivalent Encoding!!!
13/35
Tel Aviv University Department of Industrial Engineering
Constrained Learning
Assume is the target variable 3
X
i=1 i=4 i=1 i=4 i=3
GBN (adding-arrows) Target-Oriented (TBNL)
14/35
Tel Aviv University Department of Industrial Engineering
Differential Complexity
r t
𝜂𝑡 = maximum percentage relative information exploitation about the target
𝜂𝑟 = maximum percentage relative information exploitation about the rest attributes
Predict (Classify)
Explain
Tel Aviv University Department of Industrial Engineering
Results (1/2) Data Sets Properties and Testing Methods
Dataset # Attributes # Classes # Instances Test Instances/Attributes Ratio
australian 14 2 690 CV5 ~49
breast 9 2 683 CV5 ~76
chess 36 2 3196 holdout ~89
cleve 11 2 196 CV5 ~18
corral 6 2 128 CV5 ~21
crx 15 2 653 CV5 ~44
german 20 2 1000 CV5 ~50
glass 9 7 214 CV5 ~24
Iris 5 3 150 CV5 ~30
lymphography 18 4 148 CV5 ~8
mofn-3-7-10 10 2 1324 holdout ~132
vote 16 3 435 CV5 ~27
16/35
Tel Aviv University Department of Industrial Engineering
Naïve Bayes: Predict
A0
A1
B0
B1Irrelevant
Correlated
Class
Corral Dataset
17/35
Tel Aviv University Department of Industrial Engineering
A0A1
B0
B1 Irrelevant
Correlated
Class
A0
A1
B0
B1
Irrelevant
Correlated
Class
A0
A1
B0
B1
Irrelevant
Correlated
Class
A0
A1
B0
B1
Irrelevant
Correlated
Class
Tree Augmented Network (TAN)
A0
A1
B0
B1
Irrelevant
Correlated
Class
A0
A1
B0 B1
Irrelevant
Correlated
Class
18/35
Tel Aviv University Department of Industrial Engineering
Managing the Trade-off
CV5
CV5
Holdout
2/3:1/3
20/35
Tel Aviv University Department of Industrial Engineering
Results (2/2) Accuracy
Best & worst methods (incl. 5% runner up) in Bold & Italic respectively
Dataset TBNL BNC-2P NB TAN C4.5 HGC
australian 83.3 87.0 85.1 82.5 84.9 85.6
breast 95.9 95.8 97.6 96.5 93.9 97.6
chess 96.9 95.8 87.3 92.4 99.5 95.3
cleve 81.4 80.0 82.1 78.4 79.4 78.7
corral 100.0 98.8 87.2 98.6 98.5 100.0
crx 86.4 84.2 85.0 83.7 86.1 86.9
german 69.7 73.6 75.4 73.9 72.9 72.5
glass 60.0 58.3 55.9 54.2 59.3 31.2
Iris 97.0 95.8 93.0 92.4 96.0 95.7
lymphography 81.8 83.7 83.4 82.2 78.4 63.8
mofn-3-7-10 100.0 91.4 86.7 91.5 84.0 86.7
vote 96.0 95.8 90.1 94.9 94.7 95.4
Average 87.4 86.7 84.1 85.1 85.6 82.4
StdE 4% 3% 3% 4% 3% 6%
21/35
Paired t-tests show significance
Tel Aviv University Department of Industrial Engineering
Presentation Layout
Bayesian networks and classifiers
Targeted Bayesian Network Learning (TBNL)
TBNL application on suspects monitoring (w. Gruber & Yanovski)
Summary
22/35
Tel Aviv University Department of Industrial Engineering
Domain Description
Motivation Simplicity: complexity-error tradeoff
Information extraction: utilization of meta-data
Support: help the expert understand
Available Data CDR
Privatized
Laundered
Requirements 50% Recall with 1% False Alarm at most
23/35
Tel Aviv University Department of Industrial Engineering
Data Description of the Domain
Field Description
Main party Monitored Object unique IDENTIFIER
Other party Other Party unique IDENTIFIER
year Year of call start
month Month of call start
day Day of call start
hour Hour of call start
minute Minute of call start
second Second of call start
duration Call duration in Seconds
caller Indication of call initiator : {1/0}
1 – main party initiated the call
0 – other party initiated the call
type_id Type of interaction initiator : {1/0}
1 - phone call
0 - sms (text message)
tag Type (group) of monitored Object : {1/0}
0 – main party is a non-target
1 – main party is a target
Call Detail Record (CDR)
24/35
Tel Aviv University Department of Industrial Engineering
ROC curve
1900
missed
targets
40 suspects to no avail
27/35
Tel Aviv University Department of Industrial Engineering
Feature Extraction
Inter_prc_q1, Inter_prc_q2, Inter_prc_q3, Inter_prc_q4 – percentage of
activities in 1st, 2nd, 3rd and 4th quarter of the day
Activity of calls during the day of two distinct groups
28/35
Learning & Mining Mobility Patterns (PI’s: Ben-Gal, Toch and Lerner, 2012)
Tel Aviv University Department of Industrial Engineering
Conclusions
“To Explain or to Predict” –
“To know or to Act” (constraint modeling)
Managing the error-complexity tradeoff!
An “engineering approach” to modeling
Target-based BN Learning (2006), Gruber and Ben-Gal (2010)…
Vs-optimality criterion min V(x*), Ginsburg and Ben-Gal (2006)
VOBN Ben-Gal et at (2005) – scenario dependent
More….
32/35
Tel Aviv University Department of Industrial Engineering
Prediction can help…