sequential change-point detection based on nearest neighbors

Post on 04-Oct-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sequential Change-Point Detection Basedon Nearest Neighbors

Hao Chen

Department of StatisticsUniversity of California, Davis

February, 2018

*This work is partially supported by NSF-DMS 1513653.

Control chart

How about

monitor multiple streams?

monitor non-Euclidean data?

Control chart

How about

monitor multiple streams?

monitor non-Euclidean data?

Modern data examples

fMRI:

Social networks:

. . .

Outline

1 Graph-based two-sample test

2 Offline change-point detection

3 Sequential (Online) change-point detection

4 An application

Outline

1 Graph-based two-sample test

2 Offline change-point detection

3 Sequential (Online) change-point detection

4 An application

Graph-based two-sample test

Assume we already have a similarity measure on the sample space.

Two samples from the same distribution:

# of NNs from the other sample: 27

Graph-based two-sample test

Assume we already have a similarity measure on the sample space.

Two samples from the same distribution:

# of NNs from the other sample: 27

Graph-based two-sample test

Assume we already have a similarity measure on the sample space.

Two samples from the same distribution:

# of NNs from the other sample: 27

Graph-based two-sample test

Assume we already have a similarity measure on the sample space.

Two samples from the same distribution:

# of NNs from the other sample: 27

Two-sample test based on nearest neighbors

Two samples from different distributions:

# of NNs from the other sample: 8

Two-sample test based on nearest neighbors

Two samples from different distributions:

# of NNs from the other sample: 8

Two-sample test based on nearest neighbors

Two samples from different distributions:

# of NNs from the other sample: 8

Two-sample test based on k-nearest neighbors

y1, . . . ,yn be the pooled observations of two samples.

gi =

{1 if yi belongs to sample 1,0 if yi belongs to sample 2.

a(r)ij =

{1 if yj is the rth nearest neighbor of yi,0 otherwise.

a+ij =∑k

r=1 a(r)ij .

# of nearest neighbors from the other sample:

X =1

2

n∑i=1

n∑j=1

(a+ij + a+ji)I(gi 6= gj)

[Schilling, 1986; Henze, 1988]

Outline

1 Graph-based two-sample test

2 Offline change-point detection

3 Sequential (Online) change-point detection

4 An application

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline change-point detection based on NNs

ObservationSequence:

Offline Change-point detection based on NNs

# of NN from the other sample:

Standardize the count

X(t) =1

2

n∑i=1

n∑j=1

(a+ij + a+ji)I(gi(t) 6= gj(t)), gi(t) = I(i > t).

Expectation and variance under permutation null distribution:

E(X(t)) =2kt(n− t)n− 1

,

Var(X(t)) =t(n− t)n− 1

(h(t, n− t)

(q1,n + k − 2k2

n− 1

)+(1− h(t, n− t))

(q2,n + k − k2

)),

where h(t, n− t) =4t(n− t)

(n− 2)(n− 3), q1,n =

1

n

∑i,j

a+ija+ji, q2,n =

1

n

∑i 6=j;l

a+ila+jl.

Standardized count:

Z(t) = −X(t)− E(X(t))√Var(X(t))

Standardized count

In contrast: no change-point

Test statistic: maxn0≤t≤n−n0

Z(t)

In contrast: no change-point

Test statistic: maxn0≤t≤n−n0

Z(t)

Outline

1 Graph-based two-sample test

2 Offline change-point detection

3 Sequential (Online) change-point detection

4 An application

Online change-point detection based on NNs

N0 historical observations: y1, . . . ,yN0

subsequent observations: yN0+1,yN0+2, . . . ,yn, . . .

Zn(t): standardized count for the sequence y1, . . . ,yn.

maxn0≤t≤n−n0

Zn(t)

Online change-point detection based on NNs

N0 historical observations: y1, . . . ,yN0

subsequent observations: yN0+1,yN0+2, . . . ,yn, . . .

Zn(t): standardized count for the sequence y1, . . . ,yn.

maxn0≤t≤n−n0

Zn(t)

Stopping Time

T1 = inf

{n−N0 : max

n0≤t≤n−n0

Zn(t) > b1

}

T2 = inf

{n−N0 : max

n−n1≤t≤n−n0

Zn(t) > b2

}

T3 = inf

{n−N0 : max

n−n1≤t≤n−n0

ZnL(t) > b3

},

ZnL(t): standardized count for observations yn−L+1, . . . ,yn.

Stopping Time

T1 = inf

{n−N0 : max

n0≤t≤n−n0

Zn(t) > b1

}

T2 = inf

{n−N0 : max

n−n1≤t≤n−n0

Zn(t) > b2

}

T3 = inf

{n−N0 : max

n−n1≤t≤n−n0

ZnL(t) > b3

},

ZnL(t): standardized count for observations yn−L+1, . . . ,yn.

Stopping Time

T1 = inf

{n−N0 : max

n0≤t≤n−n0

Zn(t) > b1

}

T2 = inf

{n−N0 : max

n−n1≤t≤n−n0

Zn(t) > b2

}

T3 = inf

{n−N0 : max

n−n1≤t≤n−n0

ZnL(t) > b3

},

ZnL(t): standardized count for observations yn−L+1, . . . ,yn.

Detection Delay

Average run length: E∞(T ).Expected detection delay: Er(N − r|N > r).

Threshold b selected subject to P∞(T < 1, 000) = 0.05.

r −N0 = 200.

Detection Delay

Average run length: E∞(T ).Expected detection delay: Er(N − r|N > r).

Threshold b selected subject to P∞(T < 1, 000) = 0.05.

r −N0 = 200.

Early stops (False discovery)

Threshold b selected subject to P∞(T < 1, 000) = 0.05.

False discovery rate at 200 new observations after the startingof the test:

T1 T2 T3

1-NN 0.0178 0.0205 0.0107

3-NN 0.0148 0.0183 0.0103

Average run length

T = inf

{n : max

n−n1≤t≤n−n0

ZnL(t) > b

},

E∞(Tb) = 10, 000 ⇒ b =?

Average run length

T = inf

{n : max

n−n1≤t≤n−n0

ZnL(t) > b

},

E∞(Tb) = 10, 000 ⇒ b =?

Average run length

T = inf

{n : max

n−n1≤t≤n−n0

ZnL(t) > b

},

Theorem

Suppose L, b, n0, n1 →∞ in such a way that b = c√L, n0 = u0L and

n1 = u1L for some fixed 0 < c <∞, 0 < u0 < u1 < 1. When there is nochange point, T is asymptotically exponentially distributed with expectation

E∞(Tb) ∼√2π exp(b2/2)

c2 b∫ u1

u0h1(u)h2(u)ν

(c√

2h1(u))ν(c√

2h2(u))du,

where

h1(u) =[16u(1− u)(k + pk,∞) + 2(1− 2u)2(qk,∞ − k2 + k)

]/σ2(u),

h2(u) =[16u2(1− u)2(pk,∞ + qk,∞ + k2 + 2p

(k)k,∞ − 2q

(k)k,∞)

+ 4u(1− u)(2q(k)k,∞ − 3qk,∞ + k2 + k) + 2(qk,∞ − k2 + k)]/σ2(u),

σ(u) = 4u(1− u)(4u(1− u)(k + pk,∞) + (1− 2u)2(qk,∞ − k2 + k)).

Mutual NN and Shared NN

Mutual NN:

pk,∞ = limn→∞

E

1

n

∑j

a+n,ija+n,ji

, p(k)k,∞ = lim

n→∞E

1

n

∑j

a+n,ija(k)n,ji

Shared NN:

qk,∞ = limn→∞

E

1

n

∑j 6=l

a+n,jia+n,li

, q(k)k,∞ = lim

n→∞E

1

n

∑j 6=l

a+n,jia(k)n,li

For multivariate data and under Euclidean distance, pk,∞, qk,∞,

p(k)k,∞, q

(k)k,∞ can be expressed as analytic functions of the

dimension of the data.

In practice, it is better to use pk,L, qk,L, p(k)k,L, q

(k)k,Lestimated from

the data.

Mutual NN and Shared NN

Mutual NN:

pk,∞ = limn→∞

E

1

n

∑j

a+n,ija+n,ji

, p(k)k,∞ = lim

n→∞E

1

n

∑j

a+n,ija(k)n,ji

Shared NN:

qk,∞ = limn→∞

E

1

n

∑j 6=l

a+n,jia+n,li

, q(k)k,∞ = lim

n→∞E

1

n

∑j 6=l

a+n,jia(k)n,li

For multivariate data and under Euclidean distance, pk,∞, qk,∞,

p(k)k,∞, q

(k)k,∞ can be expressed as analytic functions of the

dimension of the data.

In practice, it is better to use pk,L, qk,L, p(k)k,L, q

(k)k,Lestimated from

the data.

How does the asymptotic result work for finite L?

L = 200.

Check the threshold b such that E∞(T ) = 10, 000.

Multivariate Gaussian data.

n0 = 3 n0 = 10Monte Asymp. Monte Asymp.Carlo1 Carlo

d = 10k = 1 4.04 4.40 4.04 4.31k = 3 4.14 4.34 4.14 4.23

d = 100k = 1 3.76 4.37 3.76 4.26k = 3 3.78 4.33 3.78 4.20

110,000 simulation runs.

Skewness correction

E∞(T3) ∼√2π exp(b2/2)

c2 b∫ u1

u0S(u)h1(u)h2(u)ν

(c√

2h1(u))ν(c√

2h2(u))du

S(u) depends on the probabilities of the following events:

Skewness correction

E∞(T3) ∼√2π exp(b2/2)

c2 b∫ u1

u0S(u)h1(u)h2(u)ν

(c√

2h1(u))ν(c√

2h2(u))du

S(u) depends on the probabilities of the following events:

Skewness Correction

Check the threshold b such that E∞(T ) = 10, 000.

n0 = 3 n0 = 10Monte Skewness Asymp. Monte Skewness Asymp.Carlo Corrected Carlo Corrected

d = 10

k = 1 4.04 4.07 4.40 4.04 4.07 4.31k = 3 4.14 4.14 4.34 4.14 4.14 4.23

d = 100

k = 1 3.76 3.79 4.37 3.76 3.79 4.26k = 3 3.78 3.79 4.33 3.78 3.79 4.20

Power assessment

Percentage of trials (out of 1,000) that the method successfullydetects the change-point.

“Successful detection”: Detect the change-point within 100observations after it occurs.

Normal data Lognormal datad = 10 d = 100 d = 10 d = 100

∆ = 0.7 ∆ = 1.8 ∆ = 1.5 ∆ = 2

1-NN 0.02 0.21 0.48 0.08

3-NN 0.07 0.55 0.87 0.48

5-NN 0.15 0.81 0.95 0.77

Hotelling’s T 2 0.69 0.63 0.34 0.02

∆: change in the mean parameter.

Outline

1 Graph-based two-sample test

2 Offline change-point detection

3 Sequential (Online) change-point detection

4 An application

Is there a change in phone call pattern?

Mobile phone datacollected by MIT medialab

87 students and faculty

7/20/2004 – 6/14/2005

Mt: adjacency matrix for day t, 1 for element [i, j] if subject icalled subject j on day t.

We consider two distances:

The number of different entries: ‖Mt1 −Mt2‖2F .

The number of different entries, normalized by thegeometric mean of the total edges in each day:‖Mt1−Mt2‖

2F

‖Mt1‖F ‖Mt2‖F.

Is there a change in phone call pattern?

Mobile phone datacollected by MIT medialab

87 students and faculty

7/20/2004 – 6/14/2005

Mt: adjacency matrix for day t, 1 for element [i, j] if subject icalled subject j on day t.

We consider two distances:

The number of different entries: ‖Mt1 −Mt2‖2F .

The number of different entries, normalized by thegeometric mean of the total edges in each day:‖Mt1−Mt2‖

2F

‖Mt1‖F ‖Mt2‖F.

Is there a change in phone call pattern?

Mobile phone datacollected by MIT medialab

87 students and faculty

7/20/2004 – 6/14/2005

Mt: adjacency matrix for day t, 1 for element [i, j] if subject icalled subject j on day t.

We consider two distances:

The number of different entries: ‖Mt1 −Mt2‖2F .

The number of different entries, normalized by thegeometric mean of the total edges in each day:‖Mt1−Mt2‖

2F

‖Mt1‖F ‖Mt2‖F.

Phone-call network

Stopping times and nearby academic events

Distance 1 Distance 2 Nearby academic event*n = 66: n = 60:

2004/9/23 2004/9/17 9/9: First day of class for Fall termn = 166: n = 140:2005/1/1 2004/12/6 12/18: last day of class for Fall termn = 198: n = 194:2005/2/2 2005/1/29 2/2: First day of class for Spring term

n = 252:— 2005/3/28 3/21: Spring vacation

* The dates of the academic events are from the 2015-2016 academiccalendar of MIT as the 2004-2005 academic calendar of MIT cannot befound online.

Summary

Sequential change-point detection based on nearestneighbors can be applied to multivariate data andnon-Euclidean data as long as a similarity measure on thesample space can be well defined.

The stopping time based on the recent observations isrecommended. Its asymptotic distribution is derived andshown to be quite accurate for finite scenarios afterskewness correction. This makes the method aneasy-off-the-shelf approach to real problems.

Thank You!

Summary

Sequential change-point detection based on nearestneighbors can be applied to multivariate data andnon-Euclidean data as long as a similarity measure on thesample space can be well defined.

The stopping time based on the recent observations isrecommended. Its asymptotic distribution is derived andshown to be quite accurate for finite scenarios afterskewness correction. This makes the method aneasy-off-the-shelf approach to real problems.

Thank You!

top related