universal outlier hypothesis testing

Universal Outlier Hypothesis Testing

Venu Veeravalli ECE Dept & CSL & ITI

University of Illinois at Urbana-Champaign http://www.ifp.illinois.edu/~vvv

(with Sirin Nitinawarat and Yun Li)

NSF Workshop on Signal Processing for Big Data

March 22, 2013

2

Statistical Outlier Detection

•  Single sequence of observations

•  Normal observations follow some fixed (possibly unknown) distribution or generating mechanism

•  Outliers follow different generating mechanism

•  Goal: To find outliers efficiently

•  Applications: fraud detection, public health monitoring, cleaning up sensor data, etc.

v

Veeravalli, NSF Big Data 3/22/13

3

Fraud Detection

•  Example: spending records for a male graduate student

Trans-actions

Grocery Gas Books ---

Amount 30$ 35$ 75$

Normal behavior


4

Fraud Detection

•  Example: spending records for a male graduate student

Trans-actions

Grocery Gas Books --- Spa Women apparels

Cosmetics

Amount 30$ 35$ 75$ 250$ 1500$ 500$

Normal behavior Abnormal behavior


5

Fraud Detection: Group Monitoring

• Male graduate students

Student 1 Student 2 Student 3 … Student M

Grocery Dining Grocery … Gas

Dining Grocery Gas … Books

Books Movie Books … Grocery

… … … … …

Movie Books Spa … Dining

Grocery Gas Women apparel … Movie

Gas Books Cosmetics … Grocery


6

Outlier Hypothesis Testing

• M sequences of observations, with M large •  Almost all sequences are generated from

common typical distribution

•  Small subset of sequences generated from different (outlier) distributions


7

Outlier Hypothesis Testing

• M sequences of observations, with M large •  Almost all sequences are generated from

common typical distribution

•  Small subset of sequences generated from different (outlier) distributions

•  Special case: o  Exactly one sequence is generated from outlier

distribution

o Goal: to detect outlier sequence efficiently o Universal setting: neither typical nor outlier

distributions known; no training data provided


8

Universal Outlier Hypothesis Testing

•  Typical distribution

• Outlier distribution

8

µπ

πµ π π ππ µ π π ππ µπ π π

π µππ ππ π ππ µ

21 MH1

H

2

H

M


9

Mathematical Model

yn(1) y

n(2) … y

n(M )

y1(1) y

1(2) … y

1(M )

y2(1) y

2(2) … y

2(M )

H

i: p

iy(1), …, y(M )( ) = µ y

k(i )( ) π y

k( j )( )

j ≠ i∏

⎡

⎣⎢⎢

⎤

⎦⎥⎥

k = 1

n

∏

y(1)

y(2)

y(M )


10

Hypothesis Testing Problem

Universal Detector: δ : Y Mn → 1, …, M{ }

H

i: p

iyMn( ) = µ y

k(i )( ) π y

k( j )( )

j ≠ i∏

⎡

⎣⎢⎢

⎤

⎦⎥⎥

k = 1

n

∏

Nothing is known about (µ,π ) except that they are distinct

Independent of (µ,π )


11

Performance Metrics

• Maximal error probability:

•  Exponent for maximal error probability:

α δ, µ, π( )( ) = lim

n→∞ −

1

nloge δ, µ, π( )( )

e δ, µ, π( )( ) = max

iP

iδ(yMn )≠ i{ }


12

Performance Metrics

• Maximal error probability:

•  Exponent for maximal error probability:

α δ, µ, π( )( ) = lim

n→∞ −

1

nloge δ, µ, π( )( )

e δ, µ, π( )( ) = max

iP

iδ(yMn )≠ i{ }

Consistency : e→ 0 as n→∞Exponential Consistency : α> 0


13

Background: Binary Hypothesis Testing

H

1: p

1(y) =

k=1

n

∏ π(yk) H

2: p

2(y) =

k=1

n

∏µ(yk)

Chernoff Info :C(µ,π) = max

0≤s≤1− log µ

y∑ (y )sπ(y )1−s⎛

⎝⎜⎜⎜⎜

⎞

⎠⎟⎟⎟⎟

If (µ,π) known, δML

(y) = argmaxi

logpi(y)

has α(δ,(µ,π)) = C(µ,π) > 0 ← exponential consistency


14

Outlier Hypothesis testing: known (µ,π )

Hi: p

iyMn( ) = µ y

k(i )( ) π y

k( j )( )

j ≠ i∏

⎡

⎣⎢⎢

⎤

⎦⎥⎥

k = 1

n

∏

ML Rule : δ

ML(yMn ) = argmax

i logp

i(yMn )



21 MH1

H

2

H

M

Exponential Consistency : α(δML

,(µ,π)) = 2B(µ,π)

Bhattacharya Distance :B(µ,π) = −log µy∑ (y )1/2π(y )1/2⎛

⎝⎜⎜⎜⎜

⎞

⎠⎟⎟⎟⎟


15

Binary Hypothesis Testing: Unknown

H

1: p

1(y) =

k=1

n

∏ π(yk) H

2: p

2(y) =

k=1

n

∏µ(yk)

µ

If µ unknown

for any given δ there existsµ s.t. α = 0

No exponential consistency!


16

Outlier Hypothesis Testing: unknown µ

H

i: p̂

iyMn( ) = µ̂

iy

k(i )( ) π y

k( j )( )

j ≠ i∏

⎡

⎣⎢⎢

⎤

⎦⎥⎥

k = 1

n

∏



21 MH1

H

2

H

M

Generalized Likelihood (GL) Rule :

δGL

(yMn ) = argmaxi

logp̂i(yMn )

Exponential Consistency : α(δ

GL,(µ,π)) = 2B(µ,π)

Same as known µ,π

µ uknown: µ̂i

= γi← empirical Distribution


17

Sanov’s Theorem

•  Sanov’s Theorem: For i.i.d. rvs exponent of probability that random empirical distribution falls in closed set E is

Yn p,


18

Key Tool: Sanov’s Theorem

•  Sanov’s Theorem: For i.i.d. rvs exponent of probability that random empirical distribution falls in closed set E is

Yn p,

lim

n −

1

nlogP Empirical Y n( ) ∈ E{ } = min

q∈E D q || p( )

E

p

D q* || p( )

q*


19

Proposed Universal Test

H

i: ˆ̂p

iyMn( ) = µ̂

iy

k(i )( ) π̂ y

k( j )( )

j ≠ i∏

⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥k = 1

n

∏



21 MH1

H

2

H

M

µ,π( ) not known : µ̂

i= γ

iπ̂

i=

1

M −1γ

jj≠i∑

Generalized Likelihood (GL) Rule :

δGL

(yMn ) = argmaxilog ˆ̂p

i(yMn )

Empirical distributions


20

Performance of Universal Test

α δ, µ, π( )( )= minp1, …., pM

D(p1|| µ)+D(p

2|| π)+…+D(p

M|| π)

D(p

j||

1

M −1p

kk≠1∑ )

j≠1∑ ≥ D(p

j||

1

M −1p

kk≠2∑ )

j≠2∑


21

Universally exponential consistency!

Performance of Universal Test

α δ, µ, π( )( )= minp1, …., pM

D(p1|| µ)+D(p

2|| π)+…+D(p

M|| π)

D(p

j||

1

M −1p

kk≠1∑ )

j≠1∑ ≥ D(p

j||

1

M −1p

kk≠2∑ )

j≠2∑

> 0, ∀ µ,π( )


22

Asymptotic Optimality

• Motivation: When only is known, optimal error exponent is

•  Estimate of satisfies

π

2B µ, π( )

π

limn→∞

1

Mγ

ii=1

M

∑ = 1

Mµ +

M −1

Mπ,

limM→∞

1

Mµ +

M −1

Mπ = π


23

Our universal outlier detector achieves error exponent lower bounded by


min q: D (q||π) ≤

1

M−1( 2B(µ, q )+Cπ )

2B(µ, q)


24

Our universal outlier detector achieves error exponent lower bounded by

This lower bound is non-decreasing in

and converges to as


min q: D (q||π) ≤

1

M−1( 2B(µ, q )+Cπ )

2B(µ, q)

M ≥3,

2B(µ, π) M →∞


25

Numerical Results

5 10 15 20 250

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

log2 (M), where M is the number of the coordinates

Low

er B

ound

of t

he T

rue

Erro

r Exp

onen

t

Lower bound of the errorexponent

2 B (µ, π), µ=(0.36, 0.64)π=(0.64, 0.36)

M Veeravalli, NSF Big Data 3/22/13

26

Discussion

•  Generalized likelihood (GL) test is universally exponentially consistent for single outlier hypothesis testing •  GL test is asymptotically optimum in error

exponent for large M • What if we have more than one outlier?


27

Discussion



o  If it is known that we have exactly K << M outliers, then GL test is still universally exponentially consistent


28

Discussion



o  If it is known that we have exactly K << M outliers, then GL test is still universally exponentially consistent

o  If number of outliers is not known a priori, then universally exponentially does not exist!


universal outlier hypothesis testing

Documents