decision-making under statisticalmaking under statistical ...€¦ · decision-making under...

Post on 07-Jul-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Decision-Making under StatisticalDecision Making under Statistical Uncertainty

Jayakrishnan Unnikrishnan

PhD DefenseECE Department

University of Illinois at Urbana ChampaignUniversity of Illinois at Urbana-Champaign

CSL 14112 June 2010

Statistical Decision-Making

Relevant in several contextsReceiver design for communication systemsSensor networks for environment-monitoring and failure detectionfailure detectionDrug-testing

Based on probabilistic model for observations

Well-studied problem but questions still remainUncertain statistical knowledgeUncertain statistical knowledge

2

Statistics in Detection

Example: Likelihood ratio test for binary h thhypotheses

ˆ { ( ) }H L X τ= >I

requires knowledge of likelihood ratio function

{ ( ) }

( )X1

0

( )( )( )

p Xp

L XX

=

3

Imperfect Statistics in Detection

Often perfect statistical knowledge is not available e gavailable e.g.,

fault-onset detectioni t i d t ti

Robust change d t tiintrusion detection

anomaly detectionfilt i

Universal h th i t ti

detection

spam filteringprimary detection and dynamic

t f iti di

hypothesis testing

Online spectrum access for cognitive radio

How to cope with uncertain statistics?F i i d b ti

learning

Focus on i.i.d. observations4

Outline

Robust Quickest Change DetectionDesigning for worst-case guarantees minimaxoptimality

Universal Hypothesis TestingPartial knowledge helpsPartial knowledge helps

Universal Hypothesis TestingUniversal Hypothesis Testing Model Uncertainty

5

Outline

Robust Quickest Change DetectionDesigning for worst-case guarantees minimaxoptimality

6

Quickest Change Detection

Single observation sequenceSStopping time at which change is declaredTradeoff between

Detection dela

τ

Detection delayFrequency of false alarms

Applications: process monitoring quality control7

Applications: process monitoring, quality control

Lorden Criterion

Change-point modeled as deterministic Minimize worst-case delay subject to bound on expected time to false alarm

Mi i i WDD( ) bj t t ( ) B≥E0

Minimize WDD( ) subject to ( ) Bν ττ ≥E

1 11

where WDD( ) supess sup [( 1) | , , ]X X λλ

τ τ λ +−

≥…= − +E

8

Lorden Criterion

Change-point modeled as deterministic Minimize worst-case delay subject to bound on expected time to false alarm

Mi i i WDD( ) bj t t ( ) B≥E0

Minimize WDD( ) subject to ( ) Bν ττ ≥E

CUSUM stopping rule is optimal

1 11

where WDD( ) supess sup [( 1) | , , ]X X λλ

τ τ λ +−

≥…= − +E

CUSUM stopping rule is optimal

1C 1

( )inf{ 1: max }( )

ni

k nXnX

ντ ην≤ ≤= ≥ ≥∏

9

0 ( )i k iXν=

Uncertain Statistics

Most known results assume pre-change and post-change distributions and are knownOften and are not completely known in

li ti

0ν1ν

1νapplications

10

Example1: Infrastructure Monitoring

Post-fault distribution is uncertain 11

Example 2: Intrusion Detection

Post-intrusion system behavior is uncertain e.g. network security

12

Robust Change Detection

Suppose and are known to be in t i t l f d iti dP

0νP

1νuncertainty classes of densities and Minimax robust formulation

Mi i i d l ll

0P 1P

Minimize worst-case delay among all distributions from and subject to uniform bound on expected time to

0P 1Psubject to uniform bound on expected time to false alarm under all possible distributions from

min sup WDD( )τ

0P

0 0 1 1,min sup WDD( )

ν ντ

∈ ∈P P

00 0

s.t. inf ( ) Bνντ

∈≥

PE

13

0 0ν ∈P

Solution via LFDs

Approach: identify least favorable distributions (LFDs) under a stochastic ordering condition [Veeravalli et al.1994]g [ ]

Like Huber’s approach to robust hypothesis testing [Huber 1965]

14

Solution via LFDs

Approach: identify least favorable distributions (LFDs) under a stochastic ordering condition [Veeravalli et al.1994]g [ ]

Like Huber’s approach to robust hypothesis testing [Huber 1965]

For random variables we denote

1 2X X

1 2if ( ) ( ) for all X t X t t≥ ≥ ≥P P

JSB condition: For let

1 2( ) ( )

0 1 0 1( , )ν ν ∈ ×P P * 1

0

(.)(.)(.)

L νν

=

E.g. -contamination classes, total variation and Prohorovdi t i hb h d

1 1 0 0

* * * *we need ( ( )) ( ( )) and ( ( )) ( ( ))L X L X L X L Xν ν ν ν

εdistance neighborhoods

15

Solution via LFDs

Under JSB and some other regularity conditions the optimal stopping rule designed with respect to LFDsoptimal stopping rule designed with respect to LFDssolves robust problemExample: { (0 1)}P Np

0

1

{ (0,1)}{ ( ,1) : 0.1 3}θ θ

== ≤ ≤

P NP N

Can easily show that LFDs are*0 (0,1)ν = N0*1 (0.1,1)ν = N

16

Cost of Robustness

17

Comparison with GLR test

A benchmark scheme: CUSUM based on G li d Lik lih d R ti (GLR t t)Generalized Likelihood Ratio (GLR test)

1( )inf{ 1: max sup }n

iXn ντ η≥ ≥∑

A t ti ll d CUSUM ith k

1 1

1GLR 1

0

inf{ 1: max sup }( )

ik n

i k i

nXν

τ ην≤ ≤

∈ =

= ≥ ≥∑P

Asymptotically as good as CUSUM with known distributions in exponential familiesOften too complex to implementOften too complex to implement

Robust CUSUM admits simple recursionRobust CUSUM admits simple recursion18

Robust test vs GLR test

19

Other Criteria for Optimality

Pollak criterion: Alternate definition for delaySRP stopping rule is asymptotically optimal

B i it i Ch i t d l dBayesian criterion: Change-point modeled as geometric random variable

Minimize average delay subject to probability ofMinimize average delay subject to probability of false alarm constraintShiryaev test is optimaly

20

Other Criteria for Optimality

Pollak criterion: Alternate definition for delaySRP stopping rule is asymptotically optimal

B i it i Ch i t d l dBayesian criterion: Change-point modeled as geometric random variable

Minimize average delay subject to probability ofMinimize average delay subject to probability of false alarm constraintShiryaev test is optimaly

Robust tests designed for LFDs are optimal

21

Outline

Universal Hypothesis TestingPartial knowledge helpsPartial knowledge helps

22

Universal Hypothesis Testing

Given a sequence of i.i.d. observationst t h th th dtest whether they were drawn

according to a modeled distribution1 2, , , nXX X…

0p

0 0

1 0

NullAlternate : ,

: ~ ~ unknown

i

i

H X pp ppH X ≠

Applications: anomaly detection, spam filt i t

1 0i p pp

filtering etc.

23

Hoeffding’s Universal Test

Hoeffding test is optimal in error-exponent sense:

0ˆ { ( ) }nH D p p τ= >I ‖

Uses Kullback-Leibler divergence as test statistic

0{ : ( ) }q D q p τ≤‖ 0{ : ( ) }q D q p τ≤‖

p

242N n≈

0p

Hoeffding’s Universal Test

Hoeffding test is optimal in error-exponent sense:

0ˆ { ( ) }nH D p p τ= >I ‖

Uses Kullback-Leibler divergence as test statisticSelect for target false alarm probability via

S ’ Th i L D i ti ( t )τ

Sanov’s Theorem in Large Deviations (error-exponents)

0FAˆ( 0) exp( )

pp H nτ= ≠ ≈ −P

Weak convergence under

p

. 22 ( ) dnD p p χ→‖

0p

252N n≈

0 12 ( )n NnnD p p χ −→∞⎯⎯⎯→‖

Error exponents are inaccurate

26Alphabet size, N = 20

Large Alphabet Regime

Hoeffding test performs poorly for large ( l h b t i )

N(alphabet size)

suffers from high bias and variance

0 01)]

2[ (p n

Npn

D p −≈E ‖

0 0 2

1[ ( )]2p nND p p

n−

≈Var ‖

272N n≈

Large Alphabet Regime

Hoeffding test performs poorly for large ( l h b t i )

N(alphabet size)

suffers from high bias and variance

0 01)]

2[ (p n

Npn

D p −≈E ‖

0 0 2

1[ ( )]2p nND p p

n−

≈Var ‖

Can do better if we have partial information about alternate hypothesisalternate hypothesis

282N n≈

Mismatched Test

Mismatched test uses mismatched divergencei t d f KL diinstead of KL divergence

0ˆ { ( ) }MMH D p p τ= >‖I

introduced as a lower bound to KL divergence

0{ ( ) }nH D p p τ>‖I

MM test is equivalent to replacing with ML ti t f f il i it i GLRT

np{ }estimate from a family i.e., it is a GLRT{ }pθ

ˆ0 0( ) ( )MMnD p p D p p

θ=‖ ‖ML

29

0 0( ) ( )n

np p p pθ

‖ ‖ML

Mismatched Test properties

+ Addresses high variance issuesd

0 0 )]2

([ MMp n

MM

dpn

d

D p ≈E ‖

where dθ ∈

H b i l i

0 0 2[ ( )]2

MMp n

dD p pn

≈Var ‖

- However, sub-optimal in error-exponent sense+ Optimal when alternate distribution lies in { }pθ

30

Mismatched Test properties

+ Addresses high variance issuesd

0 0 )]2

([ MMp n

MM

dpn

d

D p ≈E ‖

where dθ ∈

H b i l i

0 0 2[ ( )]2

MMp n

dD p pn

≈Var ‖

- However, sub-optimal in error-exponent sense+ Optimal when alternate distribution lies in { }pθ

Partial knowledge of unknown alternate distribution can give substantial performance improvement for large alphabetsg p

31

Performance comparison

32N = 19, n = 40

Outline

Universal Hypothesis Testing under ModelUniversal Hypothesis Testing under Model Uncertainty

33

Uncertain Null Hypothesis

Consider following hypothesis testing blproblem

0 : ~ for any ,iH X p p ∈P

A robust universal formulation

1 ~ , for : anyiH X q q ∉P

A robust universal formulationRelevant when null hypothesis distribution is uncertain

0p

Pandit and Meyn studied this when isP{ : ( ) ( ) 0}, 1ip p x x i dψ= = ≤ ≤∑P

34

{ ( ) ( ) },ix

p p ψ∑

Robust Hoeffding Test

Robust Hoeffding test

where ( ) : inf ( )ROBD D‖ ‖P

ˆ { ( ) }ROBnH D p τ= >I ‖P

where ( ) : inf ( )O

pD q D q p

∈=‖ ‖

PP

{ : ( ) }q D q p τ≤‖ 0{ : ( ) }q D q p τ≤‖

35

0p

Robust Hoeffding Test

Robust Hoeffding test

where ( ) : inf ( )ROBD D‖ ‖P

ˆ { ( ) }ROBnH D p τ= >I ‖P

where ( ) : inf ( )O

pD q D q p

∈=‖ ‖

PP

{ : ( ) }q D q p τ≤‖

{ : ( ) }ROBq D q τ≤‖P

P0{ : ( ) }q D q p τ≤‖

36

0p

Robust Hoeffding Test

Robust Hoeffding test

where ( ) : inf ( )ROBD D‖ ‖P

ˆ { ( ) }ROBnH D p τ= >I ‖P

Guarantees exponential decay of worst-case f l l b bilit

where ( ) : inf ( )O

pD q D q p

∈=‖ ‖

PP

false alarm probability

ˆmax ( 0) exp( )p p H nτ∈ ≠ ≈ −PP

- Error-exponents not good indicator of error probability

( ) p( )p p∈P

probability37

Weak Convergence Result

Can interpret robust divergence as a i t h d dimismatched divergence

Yields weak convergence result under p. 22 ( )

p

dROBn dnnD p χ→∞⎯⎯⎯→‖P

where gives better approximation for false alarm probability

pd d≤

probabilitySimilar robust Kolmogorov-Smirnov test for continuous distributionscontinuous distributions

38

Kolmogorov-Smirnov Test

Universal hypothesis test for continuous alphabet

0 0: ~iH X F

KS test statistic0sup | ( ) ( ) |n nD F x F x= −

where

p | ( ) ( ) |n nx

1( ) { }n

n iF x X xn

= ≤∑ I

Thresholds set using weak convergence of

1in =

nDProblem of overfitting for large

39n

Robust KS Test

unknown from uncertainty class0F

{ : ( ) ( ) ( ), }F F x F x F x x− += ≤ ≤ ∀F

0 ( )F x

40

Robust KS Test

unknown from uncertainty class0F

{ : ( ) ( ) ( ), }F F x F x F x x− += ≤ ≤ ∀F

( )F x−

0 ( )F x

( )F x+

( )F x

41

Robust KS Test

Uncertainty class via stochastic ordering

{ : ( ) ( ) ( ), }F F x F x F x x− += ≤ ≤ ∀F

Modified test statistic

min sup | ( ) ( ) |FE F x F x= −F

We obtain weak convergence results for that

min sup | ( ) ( ) |n F nx

E F x F x∈F

nEgare useful for setting thresholds

n

42

Conclusion

Various approaches to coping with uncertaintyRobust change detection: Designing for LFDs guarantees minimax optimalityUniversal hypothesis testing: Partial knowledgeUniversal hypothesis testing: Partial knowledge improves performanceDynamic spectrum access: Online learning

43

Conclusion

Various approaches to coping with uncertaintyRobust change detection: Designing for LFDs guarantees minimax optimalityUniversal hypothesis testing: Partial knowledgeUniversal hypothesis testing: Partial knowledge improves performanceDynamic spectrum access: Online learning

ExtensionsPerformance analysis of other robust stopping rulesAdapting dimensionality with observation lengthConvergence rates of weak convergence resultsE di i i d i

d n

Extending to non - i.i.d. setting44

Thank You!Thank You!

45

References

J. Unnikrishnan, D. Huang, S. Meyn, A. Surana, and V. V. Veeravalli, “Universal and Composite Hypothesis Testing via Mismatched Divergence” IEEE Trans. Inf. Theory, revised April 2010.

J. Unnikrishnan, V. V. Veeravalli, and S. Meyn, “Minimax Robust Quickest Change Detection” submitted to IEEE Trans. Inf. Theory, revised May, 2010.

J. Unnikrishnan, S. Meyn, and V. Veeravalli, “On Thresholds for Robust Goodness-of-Fit Tests” to be presented at IEEE pInformation Theory Workshop, Dublin, Aug. 2010.

available at http://www ifp illinois edu/~junnikr2available at http://www.ifp.illinois.edu/ junnikr2

46

top related