universal outlier hypothesis testing

28
Universal Outlier Hypothesis Testing Venu Veeravalli ECE Dept & CSL & ITI University of Illinois at Urbana-Champaign http://www.ifp.illinois.edu/~vvv (with Sirin Nitinawarat and Yun Li) NSF Workshop on Signal Processing for Big Data March 22, 2013

Upload: others

Post on 18-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Universal Outlier Hypothesis Testing

Universal Outlier Hypothesis Testing

Venu Veeravalli ECE Dept & CSL & ITI

University of Illinois at Urbana-Champaign http://www.ifp.illinois.edu/~vvv

(with Sirin Nitinawarat and Yun Li)

NSF Workshop on Signal Processing for Big Data

March 22, 2013

Page 2: Universal Outlier Hypothesis Testing

2

Statistical Outlier Detection

•  Single sequence of observations

•  Normal observations follow some fixed (possibly unknown) distribution or generating mechanism

•  Outliers follow different generating mechanism

•  Goal: To find outliers efficiently

•  Applications: fraud detection, public health monitoring, cleaning up sensor data, etc.

v

Veeravalli, NSF Big Data 3/22/13

Page 3: Universal Outlier Hypothesis Testing

3

Fraud Detection

•  Example: spending records for a male graduate student

Trans-actions

Grocery Gas Books ---

Amount 30$ 35$ 75$

Normal behavior

Veeravalli, NSF Big Data 3/22/13

Page 4: Universal Outlier Hypothesis Testing

4

Fraud Detection

•  Example: spending records for a male graduate student

Trans-actions

Grocery Gas Books --- Spa Women apparels

Cosmetics

Amount 30$ 35$ 75$ 250$ 1500$ 500$

Normal behavior Abnormal behavior

Veeravalli, NSF Big Data 3/22/13

Page 5: Universal Outlier Hypothesis Testing

5

Fraud Detection: Group Monitoring

• Male graduate students

Student 1 Student 2 Student 3 … Student M

Grocery Dining Grocery … Gas

Dining Grocery Gas … Books

Books Movie Books … Grocery

… … … … …

Movie Books Spa … Dining

Grocery Gas Women apparel … Movie

Gas Books Cosmetics … Grocery

Veeravalli, NSF Big Data 3/22/13

Page 6: Universal Outlier Hypothesis Testing

6

Outlier Hypothesis Testing

• M sequences of observations, with M large •  Almost all sequences are generated from

common typical distribution

•  Small subset of sequences generated from different (outlier) distributions

Veeravalli, NSF Big Data 3/22/13

Page 7: Universal Outlier Hypothesis Testing

7

Outlier Hypothesis Testing

• M sequences of observations, with M large •  Almost all sequences are generated from

common typical distribution

•  Small subset of sequences generated from different (outlier) distributions

•  Special case: o  Exactly one sequence is generated from outlier

distribution

o Goal: to detect outlier sequence efficiently o Universal setting: neither typical nor outlier

distributions known; no training data provided

Veeravalli, NSF Big Data 3/22/13

Page 8: Universal Outlier Hypothesis Testing

8

Universal Outlier Hypothesis Testing

•  Typical distribution

• Outlier distribution

8

µπ

πµ π π ππ µ π π ππ µπ π π

π µππ ππ π ππ µ

21 MH1

H

2

H

M

Veeravalli, NSF Big Data 3/22/13

Page 9: Universal Outlier Hypothesis Testing

9

Mathematical Model

yn(1) y

n(2) … y

n(M )

y1(1) y

1(2) … y

1(M )

y2(1) y

2(2) … y

2(M )

H

i: p

iy(1), …, y(M )( )  =    µ y

k(i )( ) π y

k( j )( )

j  ≠ i∏  

⎣⎢⎢

⎦⎥⎥

k  = 1 

n

y(1)

y(2)

y(M )

Veeravalli, NSF Big Data 3/22/13

Page 10: Universal Outlier Hypothesis Testing

10

Hypothesis Testing Problem

Universal Detector: δ : Y Mn →  1, …, M{ }

H

i: p

iyMn( )  =    µ y

k(i )( ) π y

k( j )( )

j  ≠ i∏  

⎣⎢⎢

⎦⎥⎥

k  = 1 

n

Nothing is known about (µ,π ) except that they are distinct

Independent of (µ,π )

Veeravalli, NSF Big Data 3/22/13

Page 11: Universal Outlier Hypothesis Testing

11

Performance Metrics

• Maximal error probability:

•  Exponent for maximal error probability:

α δ,  µ, π( )( )  =   lim

n→∞ −

1

nloge δ,  µ, π( )( ) 

e δ,  µ, π( )( )  =  max

iP

iδ(yMn )≠ i{ }

Veeravalli, NSF Big Data 3/22/13

Page 12: Universal Outlier Hypothesis Testing

12

Performance Metrics

• Maximal error probability:

•  Exponent for maximal error probability:

α δ,  µ, π( )( )  =   lim

n→∞ −

1

nloge δ,  µ, π( )( ) 

e δ,  µ, π( )( )  =  max

iP

iδ(yMn )≠ i{ }

Consistency : e→ 0 as n→∞Exponential Consistency : α> 0 

Veeravalli, NSF Big Data 3/22/13

Page 13: Universal Outlier Hypothesis Testing

13

Background: Binary Hypothesis Testing

H

1: p

1(y)  =  

k=1

n

∏ π(yk) H

2: p

2(y)  =  

k=1

n

∏µ(yk)

Chernoff Info :C(µ,π) = max

0≤s≤1− log µ

y∑ (y )sπ(y )1−s⎛

⎝⎜⎜⎜⎜

⎠⎟⎟⎟⎟

If (µ,π) known, δML

(y)  =  argmaxi

 logpi(y)

has α(δ,(µ,π)) = C(µ,π) > 0 ← exponential consistency

Veeravalli, NSF Big Data 3/22/13

Page 14: Universal Outlier Hypothesis Testing

14

Outlier Hypothesis testing: known (µ,π )

Hi: p

iyMn( )  =    µ y

k(i )( ) π y

k( j )( )

j  ≠ i∏  

⎣⎢⎢

⎦⎥⎥

k  = 1 

n

ML Rule :  δ

ML(yMn )  =  argmax

i logp

i(yMn )

πµ π π ππ µ π π ππ µπ π π

π µππ ππ π ππ µ

21 MH1

H

2

H

M

Exponential Consistency :  α(δML

,(µ,π))  =  2B(µ,π)

Bhattacharya Distance :B(µ,π) = −log µy∑ (y )1/2π(y )1/2⎛

⎝⎜⎜⎜⎜

⎠⎟⎟⎟⎟

Veeravalli, NSF Big Data 3/22/13

Page 15: Universal Outlier Hypothesis Testing

15

Binary Hypothesis Testing: Unknown

H

1: p

1(y)  =  

k=1

n

∏ π(yk) H

2: p

2(y)  =  

k=1

n

∏µ(yk)

µ

If µ unknown

for any given δ there existsµ s.t. α = 0

No exponential consistency!

Veeravalli, NSF Big Data 3/22/13

Page 16: Universal Outlier Hypothesis Testing

16

Outlier Hypothesis Testing: unknown µ

H

i: p̂

iyMn( )  =    µ̂

iy

k(i )( ) π y

k( j )( )

j  ≠ i∏  

⎣⎢⎢

⎦⎥⎥

k  = 1 

n

πµ π π ππ µ π π ππ µπ π π

π µππ ππ π ππ µ

21 MH1

H

2

H

M

Generalized Likelihood (GL) Rule :

δGL

(yMn )  =  argmaxi

 logp̂i(yMn )

Exponential Consistency :  α(δ

GL,(µ,π))  =  2B(µ,π)

Same as known µ,π

µ uknown: µ̂i

= γi← empirical Distribution

Veeravalli, NSF Big Data 3/22/13

Page 17: Universal Outlier Hypothesis Testing

17

Sanov’s Theorem

•  Sanov’s Theorem: For i.i.d. rvs exponent of probability that random empirical distribution falls in closed set E is

Yn p,

Veeravalli, NSF Big Data 3/22/13

Page 18: Universal Outlier Hypothesis Testing

18

Key Tool: Sanov’s Theorem

•  Sanov’s Theorem: For i.i.d. rvs exponent of probability that random empirical distribution falls in closed set E is

Yn p,

lim

n  −

1

nlogP Empirical Y n( ) ∈ E{ }  =  min

q∈E  D q || p( )

E

p

D q* || p( )

q*

Veeravalli, NSF Big Data 3/22/13

Page 19: Universal Outlier Hypothesis Testing

19

Proposed Universal Test

H

i: ˆ̂p

iyMn( )  =    µ̂

iy

k(i )( ) π̂ y

k( j )( )

j  ≠ i∏  

⎣⎢⎢⎢

⎦⎥⎥⎥k  = 1 

n

πµ π π ππ µ π π ππ µπ π π

π µππ ππ π ππ µ

21 MH1

H

2

H

M

µ,π( ) not known : µ̂

i= γ

iπ̂

i=

1

M −1γ

jj≠i∑

Generalized Likelihood (GL) Rule :

δGL

(yMn ) = argmaxilog ˆ̂p

i(yMn )

Empirical  distributions

Veeravalli, NSF Big Data 3/22/13

Page 20: Universal Outlier Hypothesis Testing

20

Performance of Universal Test

α δ,  µ, π( )( )=  minp1, …., pM

D(p1|| µ)+D(p

2|| π)+…+D(p

M|| π)

D(p

j||

1

M −1p

kk≠1∑ )

j≠1∑    ≥    D(p

j||

1

M −1p

kk≠2∑ )

j≠2∑  

Veeravalli, NSF Big Data 3/22/13

Page 21: Universal Outlier Hypothesis Testing

21

Universally exponential consistency!

Performance of Universal Test

α δ,  µ, π( )( )=  minp1, …., pM

D(p1|| µ)+D(p

2|| π)+…+D(p

M|| π)

D(p

j||

1

M −1p

kk≠1∑ )

j≠1∑    ≥    D(p

j||

1

M −1p

kk≠2∑ )

j≠2∑  

>  0,  ∀ µ,π( )

Veeravalli, NSF Big Data 3/22/13

Page 22: Universal Outlier Hypothesis Testing

22

Asymptotic Optimality

• Motivation: When only is known, optimal error exponent is

•  Estimate of satisfies

π

2B µ, π( )

π

limn→∞

  1

ii=1

M

∑   =  1

Mµ +

M −1

Mπ,

limM→∞

  1

Mµ +

M −1

Mπ   =  π

Veeravalli, NSF Big Data 3/22/13

Page 23: Universal Outlier Hypothesis Testing

23

Our universal outlier detector achieves error exponent lower bounded by

Asymptotic Optimality

  min q: D (q||π) ≤ 

1

M−1( 2B(µ, q )+Cπ )

  2B(µ, q)

Veeravalli, NSF Big Data 3/22/13

Page 24: Universal Outlier Hypothesis Testing

24

Our universal outlier detector achieves error exponent lower bounded by

This lower bound is non-decreasing in

and converges to as

Asymptotic Optimality

  min q: D (q||π) ≤ 

1

M−1( 2B(µ, q )+Cπ )

  2B(µ, q)

M ≥3,

 2B(µ, π) M →∞

Veeravalli, NSF Big Data 3/22/13

Page 25: Universal Outlier Hypothesis Testing

25

Numerical Results

5 10 15 20 250

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

log2 (M), where M is the number of the coordinates

Low

er B

ound

of t

he T

rue

Erro

r Exp

onen

t

Lower bound of the errorexponent

2 B (µ, π), µ=(0.36, 0.64)π=(0.64, 0.36)

M Veeravalli, NSF Big Data 3/22/13

Page 26: Universal Outlier Hypothesis Testing

26

Discussion

•  Generalized likelihood (GL) test is universally exponentially consistent for single outlier hypothesis testing •  GL test is asymptotically optimum in error

exponent for large M • What if we have more than one outlier?

Veeravalli, NSF Big Data 3/22/13

Page 27: Universal Outlier Hypothesis Testing

27

Discussion

•  Generalized likelihood (GL) test is universally exponentially consistent for single outlier hypothesis testing •  GL test is asymptotically optimum in error

exponent for large M • What if we have more than one outlier?

o  If it is known that we have exactly K << M outliers, then GL test is still universally exponentially consistent

Veeravalli, NSF Big Data 3/22/13

Page 28: Universal Outlier Hypothesis Testing

28

Discussion

•  Generalized likelihood (GL) test is universally exponentially consistent for single outlier hypothesis testing •  GL test is asymptotically optimum in error

exponent for large M • What if we have more than one outlier?

o  If it is known that we have exactly K << M outliers, then GL test is still universally exponentially consistent

o  If number of outliers is not known a priori, then universally exponentially does not exist!

Veeravalli, NSF Big Data 3/22/13