ben gal

23
“To Explain or To Predict” “To Know or To Act” (Pure Science vs. Engineering, 2004) Using Target-Based Bayesian Nets for Suspects Monitoring (joint work with A. Gruber and S. Yanovski) Irad Ben-Gal Tel Aviv University

Upload: yairgo11

Post on 24-May-2015

2.744 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Ben Gal

“To Explain or To Predict”

“To Know or To Act” (Pure Science vs. Engineering, 2004)

Using Target-Based Bayesian Nets for Suspects Monitoring (joint work with A. Gruber and S. Yanovski)

Irad Ben-Gal

Tel Aviv University

Page 2: Ben Gal

Tel Aviv University Department of Industrial Engineering

DOE: Vs-optimal designs Ginsburg & Ben-Gal (2004)

f(x) known: f(x)/x=0 x*

f(x) unknown:

Estimate g(x) (Meta Model: DOE, RSM,…)

g(x)/x=0 x* (R.V.)

‘Scientists’ (to Know): Best estimation of f(x) min V() (e.g., D-optimal exp.)

‘Practitioner’ (to act) : Best estimation of x* min V(x*) (new DOE optimality criterion)

f(x) x (control) Y (output)

Page 3: Ben Gal

Tel Aviv University Department of Industrial Engineering

The Bias-Variance Tradeoff

Page 4: Ben Gal

Tel Aviv University Department of Industrial Engineering

Presentation Layout

Bayesian networks and classifiers

Targeted Bayesian Network Learning (TBNL) (with Gruber)

TBNL application on suspects monitoring

Summary

4/35

Page 5: Ben Gal

Tel Aviv University Department of Industrial Engineering

Bayesian Networks (Pearl, 85)

Page 6: Ben Gal

Tel Aviv University Department of Industrial Engineering

What is a Bayesian Network?

X1 X2 X3 X4 Prob.

1 1 1 2 0.083

1 1 2 2 0.167

1 2 2 3 0.25

2 2 1 1 0.25

2 2 2 1 0.25

Joint Probability

Distribution

),,|(),|()|()()(2341234232

XXXXPXXXPXXPXPP X

A Complete

Bayesian Network

encodes the domain’s JPD ),( ΘGB

EV ,G = Directed Acyclic Graph

X2 1 2

1 0.33 0.33

2 0.67 0.67

6/35

)(3

Factorization

Page 7: Ben Gal

Tel Aviv University Department of Industrial Engineering

Explain or Predict (classify)

Tree / GBN Chow & Liu (1968)

Williamson (2000) TBNL

Gruber & Ben-Gal (2010)

True distribution

Modeled distribution

Objective

Principle Minimize Minimize

Consequence

Maximize

Maximize

Maximize

)( Xp

)( Xq

)( Xp

XX qpD ||KL

i

iiZXI ;

)( Xp

)( Xq

ii

XqXpD ||KL

ii

ZXI ;

ijZX

jjZXI ;

''|

\'

xx

Xx

pXpXp

iX

ii

11/35

Page 8: Ben Gal

Tel Aviv University Department of Industrial Engineering

Unconstrained Learning

GBN (adding-arrows) Target-Oriented (TBNL)

Assume is the target variable 3

X

i=1 i=4 i=3 i=1 i=4

Equivalent Encoding!!!

13/35

Page 9: Ben Gal

Tel Aviv University Department of Industrial Engineering

Constrained Learning

Assume is the target variable 3

X

i=1 i=4 i=1 i=4 i=3

GBN (adding-arrows) Target-Oriented (TBNL)

14/35

Page 10: Ben Gal

Tel Aviv University Department of Industrial Engineering

Differential Complexity

r t

𝜂𝑡 = maximum percentage relative information exploitation about the target

𝜂𝑟 = maximum percentage relative information exploitation about the rest attributes

Predict (Classify)

Explain

Page 11: Ben Gal

Tel Aviv University Department of Industrial Engineering

Results (1/2) Data Sets Properties and Testing Methods

Dataset # Attributes # Classes # Instances Test Instances/Attributes Ratio

australian 14 2 690 CV5 ~49

breast 9 2 683 CV5 ~76

chess 36 2 3196 holdout ~89

cleve 11 2 196 CV5 ~18

corral 6 2 128 CV5 ~21

crx 15 2 653 CV5 ~44

german 20 2 1000 CV5 ~50

glass 9 7 214 CV5 ~24

Iris 5 3 150 CV5 ~30

lymphography 18 4 148 CV5 ~8

mofn-3-7-10 10 2 1324 holdout ~132

vote 16 3 435 CV5 ~27

16/35

Page 12: Ben Gal

Tel Aviv University Department of Industrial Engineering

Naïve Bayes: Predict

A0

A1

B0

B1Irrelevant

Correlated

Class

Corral Dataset

17/35

Page 13: Ben Gal

Tel Aviv University Department of Industrial Engineering

A0A1

B0

B1 Irrelevant

Correlated

Class

A0

A1

B0

B1

Irrelevant

Correlated

Class

A0

A1

B0

B1

Irrelevant

Correlated

Class

A0

A1

B0

B1

Irrelevant

Correlated

Class

Tree Augmented Network (TAN)

A0

A1

B0

B1

Irrelevant

Correlated

Class

A0

A1

B0 B1

Irrelevant

Correlated

Class

18/35

Page 14: Ben Gal

Tel Aviv University Department of Industrial Engineering

Managing the Trade-off

CV5

CV5

Holdout

2/3:1/3

20/35

Page 15: Ben Gal

Tel Aviv University Department of Industrial Engineering

Results (2/2) Accuracy

Best & worst methods (incl. 5% runner up) in Bold & Italic respectively

Dataset TBNL BNC-2P NB TAN C4.5 HGC

australian 83.3 87.0 85.1 82.5 84.9 85.6

breast 95.9 95.8 97.6 96.5 93.9 97.6

chess 96.9 95.8 87.3 92.4 99.5 95.3

cleve 81.4 80.0 82.1 78.4 79.4 78.7

corral 100.0 98.8 87.2 98.6 98.5 100.0

crx 86.4 84.2 85.0 83.7 86.1 86.9

german 69.7 73.6 75.4 73.9 72.9 72.5

glass 60.0 58.3 55.9 54.2 59.3 31.2

Iris 97.0 95.8 93.0 92.4 96.0 95.7

lymphography 81.8 83.7 83.4 82.2 78.4 63.8

mofn-3-7-10 100.0 91.4 86.7 91.5 84.0 86.7

vote 96.0 95.8 90.1 94.9 94.7 95.4

Average 87.4 86.7 84.1 85.1 85.6 82.4

StdE 4% 3% 3% 4% 3% 6%

21/35

Paired t-tests show significance

Page 16: Ben Gal

Tel Aviv University Department of Industrial Engineering

Presentation Layout

Bayesian networks and classifiers

Targeted Bayesian Network Learning (TBNL)

TBNL application on suspects monitoring (w. Gruber & Yanovski)

Summary

22/35

Page 17: Ben Gal

Tel Aviv University Department of Industrial Engineering

Domain Description

Motivation Simplicity: complexity-error tradeoff

Information extraction: utilization of meta-data

Support: help the expert understand

Available Data CDR

Privatized

Laundered

Requirements 50% Recall with 1% False Alarm at most

23/35

Page 18: Ben Gal

Tel Aviv University Department of Industrial Engineering

Data Description of the Domain

Field Description

Main party Monitored Object unique IDENTIFIER

Other party Other Party unique IDENTIFIER

year Year of call start

month Month of call start

day Day of call start

hour Hour of call start

minute Minute of call start

second Second of call start

duration Call duration in Seconds

caller Indication of call initiator : {1/0}

1 – main party initiated the call

0 – other party initiated the call

type_id Type of interaction initiator : {1/0}

1 - phone call

0 - sms (text message)

tag Type (group) of monitored Object : {1/0}

0 – main party is a non-target

1 – main party is a target

Call Detail Record (CDR)

24/35

Page 19: Ben Gal

Tel Aviv University Department of Industrial Engineering

ROC curve

1900

missed

targets

40 suspects to no avail

27/35

Page 20: Ben Gal

Tel Aviv University Department of Industrial Engineering

Feature Extraction

Inter_prc_q1, Inter_prc_q2, Inter_prc_q3, Inter_prc_q4 – percentage of

activities in 1st, 2nd, 3rd and 4th quarter of the day

Activity of calls during the day of two distinct groups

28/35

Page 21: Ben Gal

Learning & Mining Mobility Patterns (PI’s: Ben-Gal, Toch and Lerner, 2012)

Page 22: Ben Gal

Tel Aviv University Department of Industrial Engineering

Conclusions

“To Explain or to Predict” –

“To know or to Act” (constraint modeling)

Managing the error-complexity tradeoff!

An “engineering approach” to modeling

Target-based BN Learning (2006), Gruber and Ben-Gal (2010)…

Vs-optimality criterion min V(x*), Ginsburg and Ben-Gal (2006)

VOBN Ben-Gal et at (2005) – scenario dependent

More….

32/35

Page 23: Ben Gal

Tel Aviv University Department of Industrial Engineering

Prediction can help…