almost optimal sequential detection in multiple data streams

46
Almost optimal sequential detection in multiple data streams Georgios Fellouris Department of Statistics University of Illinois Joint work with Alexander Tartakovsky University of Michigan Ann Arbor, May 13th, 2015 Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 1 / 49

Upload: others

Post on 20-Jan-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Almost optimal sequential detection in multiple data streams

Almost optimal sequential detectionin multiple data streams

Georgios Fellouris

Department of Statistics

University of Illinois

Joint work with Alexander Tartakovsky

University of MichiganAnn Arbor, May 13th, 2015

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 1 / 49

Page 2: Almost optimal sequential detection in multiple data streams

Outline

1 Simple null against simple alternative

2 A simple null against a finite number of alternatives

3 The continuous-parameter case

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 2 / 49

Page 3: Almost optimal sequential detection in multiple data streams

Sequentially testing of two simple hypotheses

Sequentially acquired observations

X1, . . . ,Xt, . . .iid∼ f .

Stop sampling as soon as possible and distinguish between

H0 : f = f0 and H1 : f = f1.

Let Ft be the history of observations up to time t,

Ft = σ(Xs : 1 ≤ s ≤ t).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 4 / 49

Page 4: Almost optimal sequential detection in multiple data streams

Wald’s formulation

Find an Ft-stopping time, T , at which to stop samplingand an FT -measurable r.v., dT , so that

{dT = 1} = {Accept H1,T <∞}{dT = 0} = {Accept H0,T <∞}.

A sequential test is such a pair (T, dT).

Goal: Minimize E0[T] and E1[T] in

Cα,β = {(T, dT) : P0(dT = 1) ≤ α and P0(dT = 1) ≤ β}.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 5 / 49

Page 5: Almost optimal sequential detection in multiple data streams

Wald’s SPRT (1945)

Let Zt the log-likelihood ratio of the first t observations:

Zt :=t∑

s=1

logf1(Xs)f0(Xs)

, Z0 := 0.

Sequential Probability Ratio Test (SPRT)Let α, β such that α+ β < 1 and A,B > 0 be fixed thresholds. Define

S = inf{

t ≥ 1 : Zt /∈ (−A,B)}

dS ={

0, if ZS ≤ −A1, if ZS ≥ B

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 6 / 49

Page 6: Almost optimal sequential detection in multiple data streams

An exact optimality property (Wald & Wolfowitz (1948))

Suppose that A,B are selected so that

P0(dS = 1) = α and P1(dS = 0) = β.

Then,

E0[S] = inf(T,dT )∈Cα,β

E0[T] and E1[S] = inf(T,dT )∈Cα,β

E1[T].

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 7 / 49

Page 7: Almost optimal sequential detection in multiple data streams

Optimal Asymptotic Performance (Woodroofe’ 1982)

Suppose that β| logα|+ α| logβ| → 0 and Ei[Z21 ] <∞, i = 0, 1.

Then, as α, β → 0,

E1[S] = 1I1

[| logα|+ ρ1 + log δ1 + o(1)]

E0[S] = 1I0

[| logβ|+ ρ0 + log δ0 + o(1)] .

I1 := D(f1||f0) and I0 := D(f0||f1) are the K-L information numbers.Let H1 be the asymptotic distribution of the overshoot of Zk under P1. Then

ρ1 :=∫

x H1(dx) and δ1 := log∫

e−x H1(dx).

Let H0 be the asymptotic distribution of the overshoot of −Z under P0. Then

ρ0 :=∫

x H0(dx) and δ0 := log∫

e−x H0(dx).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 8 / 49

Page 8: Almost optimal sequential detection in multiple data streams

Sequentially testing a simple null against a finite number of alternatives

Sequentially acquired observations

X1, . . . ,Xt, . . .iid∼ f .

Stop sampling as soon as possible and distinguish between

H0 : f = f0 vs H1 : f ∈ {f1, . . . , fM}.

Let Ft be the history of observations up to time t,

Ft = σ(Xs : 1 ≤ s ≤ t).

Find (T, dT), where T is an Ft-stopping time and dT an FT -measurable r.v.

{dT = i} = {Accept Hi,T <∞}, i = 0, 1.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 10 / 49

Page 9: Almost optimal sequential detection in multiple data streams

1st Motivation: The multichannel problem

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 11 / 49

Page 10: Almost optimal sequential detection in multiple data streams

1st Motivation: The multichannel problem

Observations are collected from K independent sources so that

Xt = (X1t , . . . ,X

Kt ).

For every sensor k, the true density is fk and

Xk1, . . . ,X

kt , . . .

iid∼ f k ={

f k0 , noise

f k1 , signal

We want to test the simple hypothesis

H0 : f k = f k0 ∀ k ∈ {1, . . . ,K}

against

H1 : f k ={

f k0 , k /∈ A

f k1 , k ∈ A

where A is an unknown subset of {1, . . . ,K}.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 12 / 49

Page 11: Almost optimal sequential detection in multiple data streams

1st Motivation: The multichannel problem

A is known to belong to some class of subsets of {1, . . . ,K}, P .

Then, H1 contains M = |P| possibilities, where |P| is the size of class P .

When signal can be present in only one sensor, then |P| = K.

When signal can be present in at most L sensors, then |P| =∑L

k=1

(Kk

).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 13 / 49

Page 12: Almost optimal sequential detection in multiple data streams

2nd motivation: Discretization of a continuous alternative

Sequentially acquired observations

X1, . . . ,Xt, . . .iid∼ f ∈ {fθ, θ ∈ Θ}

Stop sampling as soon as possible and distinguish between

H0 : θ = θ0 and H1 : θ ∈ Θ1,

where θ0 /∈ Θ1 ⊂ Θ.

Then, we would like to minimize Eθ0 [T] and Eθ[T] for every θ ∈ Θ1 in

Cα,β = {(T, dT) : Pθ0(dT = 1) ≤ α and supθ∈Θ1

Pθ(dT = 0) ≤ β}.

Approximating Θ1 by {θ1, . . . , θM} ⊂ Θ1 may have (computational) benefits.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 14 / 49

Page 13: Almost optimal sequential detection in multiple data streams

A simple null against a finite number of alternatives

Sequentially acquired observations

X1, . . . ,Xt, . . .iid∼ f .

Let Ft be the history of observations up to time t,

Ft = σ(Xs : 1 ≤ s ≤ t).

Stop sampling as soon as possible and distinguish between

H0 : f = f0 vs H1 : f ∈ {f1, . . . , fM}.

Find (T, dT), where T is an Ft-stopping time and dT an FT -measurable r.v.

{dT = i} = {Accept Hi,T <∞}, i = 0, 1.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 15 / 49

Page 14: Almost optimal sequential detection in multiple data streams

Sequentially testing a simple null against a finite number of alternatives

Pi is the probability measure and Ei the expectation when

f = fi, i = 0, 1, . . . ,M.

Goal: MinimizeE0[T] and Ei[T], i = 1, . . . ,K

among sequential tests in

Cα,β = {(T, dT) : P0(dT = 1) ≤ α and max1≤i≤K

Pi(dT = 0) ≤ β.}

This can be done only in an asymptotic sense, i.e., as α, β → 0.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 16 / 49

Page 15: Almost optimal sequential detection in multiple data streams

Generalized Sequential Likelihood Ratio Test (GSLRT)

For i = 1, . . . ,M let

Λit :=

t∏s=1

fi(Xs)f0(Xs)

, Zit := log Λi

t, t ∈ N.

GSLRTFollowing a maximum likelihood approach, we obtain

S = inf{

t ≥ 1 : max1≤i≤M

Zit /∈ (−A,B)

},

{dS = 1} ={

max1≤i≤M

ZiS ≥ B

}, {dS = 0} =

{max

iZi

S ≤ −A}.

where A,B > 0 are fixed thresholds.

Studied by Tartakovsky et al (2003).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 17 / 49

Page 16: Almost optimal sequential detection in multiple data streams

Weighted Sequential Likelihood Ratio Test (WSLRT)

Recall that

Λit :=

t∏s=1

fi(Xs)f0(Xs)

, Zit := log Λi

t, t ∈ N.

We will call q = (q1, . . . , qM) a weight if qi > 0 for every i.We will write:

Λt(q) :=K∑

i=1

qi Λit and Zt(q) := log Λt(q)

WSLRT

S = inf{

t ≥ 1 : Zt(q) /∈ (−A,B)}

{dS = 1} ={

ZS(q) ≥ B}, {dS = 0} =

{ZS(q) ≤ −A

}An idea that goes back to Wald (1945) in the case of a continuous parameter.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 18 / 49

Page 17: Almost optimal sequential detection in multiple data streams

Weighted GSLRT

In order to treat the maximizing and the averaging approach similarly, letq = (q1, . . . , qM) a weight.We will write:

Λt(q) := max1≤i≤K

(qi Λi

t

)and Zt(q) = log Λt(q).

Weighted (WGSLRT)

S = inf{

t ≥ 1 : Zt(q) /∈ (−A,B)},

{dS = 1} ={

ZS ≥ B}, {dS = 0} =

{ZS ≤ −A

}.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 19 / 49

Page 18: Almost optimal sequential detection in multiple data streams

Controlling the error probabilities

For any given α, β ∈ (0, 1), S, S ∈ Cα,β when A,B are chosen so that

A = | logβ|+ log(

max1≤k≤K

qk

)and B = | logα|+ log

( K∑k=1

qk

).

Suppose also that each Zi has a non-arithmetic distribution. Then,P0(S = 1) ∼ α when

B = | logα|+ log( K∑

k=1

qiδi

).

Let Hi the limiting distribution of the overshoot of the random walk Zi under Pi.That is, if we set

T ia := inf{t : Zi

t ≥ a},

then Hi is the limiting distribution of ZiT i

a− a as a→∞. Then

δi := log∫

e−x Hi(dx).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 20 / 49

Page 19: Almost optimal sequential detection in multiple data streams

Asymptotic Expansions under H1

Suppose further

that each Zi1 has a finite second moment under Pi.

| logα|/| logβ| goes to some constant as α, β → 0.

A,B→∞ so that

k0α(1 + o(1)) ≤ P0(dS = 1) ≤ α(1 + o(1))k1β(1 + o(1)) ≤ max

1≤i≤MPi(dS = 0) ≤ β(1 + o(1))

for some k0, k1 ∈ (0, 1).

Then, as A,B→∞ we have

Ei[S] = 1Ii

[B + ρi − log qi] + o(1) = Ei[S],

where ρi is the limiting expected overshoot of Zi under Pi, i.e.,

ρi :=∫

x Hi(dx) = lima→∞

Ei[ZiT i

a− a], T i

a := inf{n : Zin ≥ a}.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 21 / 49

Page 20: Almost optimal sequential detection in multiple data streams

Uniform Second-Order Asymptotic Optimality under H1

Suppose that the previous assumptions hold.

If A,B are selected so that S, S ∈ Cα,β , then for every i we have

Ei[S] = inf(T,dT )∈Cα,β

Ei[T] +O(1) = Ei[S]

However, even first-order asymptotic optimality is lost when f /∈ {f1, . . . , fM}.If A,B are selected so that P0(S = 1) ∼ α ∼ P0(S = 1), then

Ei[S] = 1Ii

[| logα|+ log

( M∑k=1

qkδk

)+ ρi − log qi

]+ o(1) ≥ Ei[S].

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 22 / 49

Page 21: Almost optimal sequential detection in multiple data streams

Accuracy of asymptotic approximations

2.0 2.5 3.0 3.5 4.0 4.5 5.0

4060

8010

012

014

0

First Channel

log10(α)

Expe

cted

Sam

ple

Size

M = 3, exponential distribution, θ1 = 0.5, θ2 = 1, θ3 = 2

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 23 / 49

Page 22: Almost optimal sequential detection in multiple data streams

Accuracy of asymptotic approximations

2.0 2.5 3.0 3.5 4.0 4.5 5.0

1520

2530

3540

45

Second Channel

log10(α)

Expe

cted

Sam

ple

Size

K = 3, exponential distribution, θ1 = 0.5, θ2 = 1, θ3 = 2

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 24 / 49

Page 23: Almost optimal sequential detection in multiple data streams

Accuracy of asymptotic approximations

2.0 2.5 3.0 3.5 4.0 4.5 5.0

68

1012

1416

Third Channel

log10(α)

Expe

cted

Sam

ple

Size

M = 3, exponential distribution, θ1 = 0.5, θ2 = 1, θ3 = 2

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 25 / 49

Page 24: Almost optimal sequential detection in multiple data streams

Asymptotic Optimality under H0

Let Ii0 = D(f0||fi) for every 1 ≤ i ≤ M and

I0 = min1≤i≤M

Ii0.

If there is a unique i that attains I0, then

E0[S] = 1I0

[| logβ|+O(1)]

If not,

E0[S] = 1I0

[| logβ|+ Θ(

√log B)

]The second-order term is not always constant.If A,B are selected so that S, S ∈ Cα,β , then

E0[S] ∼ inf(T,d)∈Cα,β

E0[T] ∼ E0[S].

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 26 / 49

Page 25: Almost optimal sequential detection in multiple data streams

Remarks

These asymptotic results are based on non-linear renewal theory (Lai andSiegmund ’77,’79, Woodroofe ’82, Zhang’88) and Dragalin et al. (’99,’00) .

Were known for the GSLRT (Tartakovsky, 2003).

Here, we have shown that they hold for arbitrary weights (and both tests).

How should one choose these weights?

For this choice, we will show that a particular choice of weights satisfies an evenstronger asymptotic optimality property.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 27 / 49

Page 26: Almost optimal sequential detection in multiple data streams

Almost minimax?

What if we select q so that

max1≤i≤M

Ei[S] = inf(T,dT )∈Cα,β

max1≤i≤M

Ei[S] + o(1)?

This would require that Ei[S] = Ej[S] + o(1) for every 1 ≤ i, j ≤ M.

However, to have Ei[S] ∼ Ej[S] for every 1 ≤ i, j ≤ M, we need

Ei[S] ∼ | logα|Ii

∼ | logα|Ij

∼ Ej[S].

This is not possible unless I1 = . . . = IM .

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 28 / 49

Page 27: Almost optimal sequential detection in multiple data streams

Almost optimality with respect to a weighted expected sample size

Let p = (p1, . . . , pK) a vector of positive numbers that add up to 1.

We will try to design the proposed tests so that

K∑k=1

piEi[S] = inf(T,dT )∈Cα,β

K∑k=1

piEi[T] + o(1) =K∑

k=1

piEi[S].

(Later how to choose the pi’s).

For this, we need to generalize the class of sequential tests.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 29 / 49

Page 28: Almost optimal sequential detection in multiple data streams

Two Families of Sequential Tests

Let q0, q1 M-dimensional vectors of positive numbers.

WSLRT

S = inf{

t ≥ 1 : Zt(q1) ≥ B or Zt(q0) ≤ −A}

{dS = 1} ={

ZS(q1) ≥ B}, {dS = 0} =

{ZS(q0) ≤ A

}WG-SLRT

S = inf{

t ≥ 1 : Zt(q1) ≥ B or Zt(q0) ≤ A},

{dS = 1} ={

ZS(q1) ≥ B}, {dS = 0} =

{ZS(q0) ≤ A

}.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 30 / 49

Page 29: Almost optimal sequential detection in multiple data streams

Almost optimality

Theorem(F. & Tartakovsky 2013)If A,B are chosen so that (S, dS) ∈ Cα,β and

qi1 = pi/Li, qi

0 = piLi,

then as α, β → 0 so that | logα| ∼ | logβ| we have

M∑i=1

piEi[S] = inf(T,dT )∈Cα,β

M∑i=1

piEi[T] + o(1)

The Li’ were introduced by Lorden (1977)

Li : = exp{−∞∑

n=1

n−1[P0(Zin > 0) + Pi(Zi

n ≤ 0)]}

= δi Ii.

| logα| ∼ | logβ| is more restrictive than what we had assumed before.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 31 / 49

Page 30: Almost optimal sequential detection in multiple data streams

Ingredients of proof

Formulate a Bayesian problem, in which there is a penalty for a wrong decisionunder each hypothesis and a cost of sampling, c, per observation.

Show that the WSLRT with these particular weights that involve the L numbers(and appropriate thresholds) attains the Bayes risk up to an o(c) term (Lorden(1977)).

A third-order asymptotic expansion for expected sample size of this rule:

Ei[S] = 1Ii

[| logα|+ ρi + log δi + Ci(p)

]+ o(1),

where

Ci(p) = log

(M∑

k=1

pk

Ik

)− log

(pk

Ik

).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 32 / 49

Page 31: Almost optimal sequential detection in multiple data streams

How to select p?

We have seen that an almost minimax rule does not make sense.

We may design the rule so that

max1≤i≤M

(Ii Ei[S]) = inf(T,dT )∈Cα,β

max1≤i≤M

(Ii Ei[T]) + o(1).

This is done when pi is selected ∝ Li eρi .

It is not clear with this is a good criterion.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 33 / 49

Page 32: Almost optimal sequential detection in multiple data streams

Robustness

Let Si the optimal SPRT for testing f0 against fi and set

Ji[S] := Ei[S]− Ei[Si]Ei[Si]

when both tests satisfy, at least approximately, the error probability constraints.

Based on the previous approximations,

Ji[S] ≈ Ci(p)| logα|+ ρi + log δi

, where Ci(p) = log

(M∑

k=1

pk

Ik

)− log

(pi

Ii

).

Setting pi ∝ Ii guarantees that

Ji[S] ∼ Jj[S] ∀ 1 ≤ i 6= j ≤ M.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 34 / 49

Page 33: Almost optimal sequential detection in multiple data streams

Example

Two channels with densities

f k0 (x) = h(x) and f k

1 (x) = eθkx−ψ(θk) h(x), k = 1, 2.

Say, θ1 = 4 (fixed) and let θ2 = x vary.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 35 / 49

Page 34: Almost optimal sequential detection in multiple data streams

Relative performance loss vs relative signal strength

0 2 4 6 8

0.00.1

0.20.3

0.40.5

0.6

First Channel (θ = 4)

x

0 2 4 6 8

0.00.1

0.20.3

0.40.5

0.6

Second Channel (θ = x)

x

Li ≤ Ii ≤ eρkLi 1

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 36 / 49

Page 35: Almost optimal sequential detection in multiple data streams

Sequentially testing of a continuous parameter

Sequentially acquired observations

X1, . . . ,Xn, . . .iid∼ f ∈ {fθ, θ ∈ Θ}

Stop sampling as soon as possible and distinguish between

H0 : θ = θ0 and H1 : θ ∈ Θ1,

where θ0 /∈ Θ1 ⊂ Θ.Then, we would like to minimize Eθ0 [T] and Eθ[T] for every θ ∈ Θ1 in

Cα,β = {(T, dT) : Pθ0(dT = 1) ≤ α and supθ∈Θ1

Pθ(dT = 0) ≤ β}.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 38 / 49

Page 36: Almost optimal sequential detection in multiple data streams

A multi-parameter exponential family

SetupAn exponential family

fθ(x) := e〈θ,x〉−ψ(θ), x ∈ Rd , θ ∈ Θ ⊂ Rd.

Θ = {θ ∈ Rd :∫

e〈θ,x〉 ν(dx) <∞} is the natural parameter space.ψ(θ) = log

∫e〈θ,x〉 ν(dx) is the log-moment generating function of X.

We denote by ψ(θ) the gradient and by ψ(θ) the Hessian matrix of ψ(θ).We assume that ψ(θ) is non-singular for all θ ∈ Θ.The Kullback–Leibler information number between fθ2 and fθ1 is

I(θ2, θ1) := Eθ2

[log

fθ2(X)fθ1(X)

]= 〈θ2 − θ1, ψ(θ2)〉 − [ψ(θ2)− ψ(θ1)].

I(θ0, θ1) > 0 ∀ θ0 ∈ Θ0, θ1 ∈ Θ1.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 39 / 49

Page 37: Almost optimal sequential detection in multiple data streams

The one-sided setup

Suppose that sampling needs to stop only to reject H0.

Then, we need to minimize Eθ[T] for every θ ∈ Θ1 among stopping times in

Cα = {T : P0(T <∞) ≤ α}.

Let `n(θ) the likelihood of the first n observations under Pθ, i.e.,

`n(θ) =n∏

k=1

fθ(Xk).

Open-ended WSPRT and GSLRTLet B > 1 be a fixed threshold and g a positive function on Θ1. Define

SB(g) = inf{t ≥ 1 : Λt ≥ B}, Λt = 1`n(θ0)

∫Θ1

`t(θ) g(θ) dθ

SB = inf{t ≥ 1 : Λt ≥ B}, Λt = 1`t(θ0) sup

θ∈Θ1

`t(θ).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 40 / 49

Page 38: Almost optimal sequential detection in multiple data streams

A minimax, second-order property

The weighted idea goes back to Wald (1945).The GSLRT has been studied by Schwarz (1962), Wong (1968), Lorden (1977),Lai (1988,2004), etc.

If Θ1 is a compact set bounded away from 0,both tests attain

infT∈Cα

supθ∈Θ1

I(θ, 0)Eθ[T]

within an O(1) term as α→ 0.

Pollak (1978) proved this result for the WSPRT with any continuous mixingdensity whose support includes Θ1 (for a one-parameter exponential family).

Lai (2004) proved this result for the GSLRT.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 41 / 49

Page 39: Almost optimal sequential detection in multiple data streams

Almost Minimax WSPRT

Asymptotic average overshootConsider the one-sided SPRT for testing fθ versus f0,

Tθa := inf{t : Zθt ≥ a}, where Zθt := log`n(θ)`t(θ0)

and define κθ := lima→∞ Eθ[ZθTθa − a].

Theorem (F. & Tartakovsky (2013)Consider the WSPRT SB(g) with weight function

g(θ) := eκθ√

det(ψ(θ))/I(θ, 0)

and suppose that P0(SB(g) <∞) = α. Then, as α→ 0,

supθ∈Θ1

I(θ, 0) Eθ[SB(g)] = infT∈Cα

supθ∈Θ1

I(θ, 0)Eθ[T] + o(1).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 42 / 49

Page 40: Almost optimal sequential detection in multiple data streams

Idea of Proof

Auxiliary Bayesian problemConsider the sequential decision problem with

loss 1 when stopping under P0,sampling cost per observation equal to cIθ under Pθ,conditional prior distribution on Θ1 given that θ 6= 0 equal to g.

The WSPRT SB(g) is asymptotically Bayes as c→ 0 within an o(c) term.

Almost equalizerAs B→∞

I(θ, 0)Eθ[SB(g)] = log B + d2

log log B + C + o(1),

where C is a constant term that does not depend on θ.

Idea of proof“Almost Bayesian + Almost Equalizer = Almost Minimax”

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 43 / 49

Page 41: Almost optimal sequential detection in multiple data streams

Almost Minimax Weighted GSLRT

A weighted version of the GSLRT turns out to have the same optimality property.Recall that

Λn = supθ∈Θ1

`n(θ)`n(θ0) = `n(θn)

`n(θ0) ,

where θn is the (constrained on Θ1) MLE of θ based on the first n observations.We define:

SB(g) = inf{n ≥ 1 : Λn g(θn) ≥ B}

where g is some positive function on Θ1 and B > 1 is a fixed threshold.

Theorem

Consider the WGSLRT with g(θ) = eκθ . If P0(SB(g) <∞) = α, then as α→ 0

supθ∈Θ1

I(θ, 0) Eθ[SB(g)] = infT∈Cα

supθ∈Θ1

I(θ, 0)Eθ[T] + o(1).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 44 / 49

Page 42: Almost optimal sequential detection in multiple data streams

Remarks

The function θ → κθ usually does not admit a closed-form expression.

As a result, the previous nearly minimax sequential tests can be implementedonly approximately, as the corresponding mixture-based and generalizedlikelihood ratio statistics can only be computed numerically.

For the two-sided testing problem, we need an almost Bayes rule for exponentialfamilies (Lorden (1977)) and we need to consider again the L numbers (Keener(2005)).

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 45 / 49

Page 43: Almost optimal sequential detection in multiple data streams

Work in progress

Extension to a composite null hypothesis.

Extension to multiple hypotheses.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 46 / 49

Page 44: Almost optimal sequential detection in multiple data streams

THE END

THANK YOU!

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 47 / 49

Page 45: Almost optimal sequential detection in multiple data streams

References

Dragalin, V. P., Tartakovsky, A. G., and Veeravalli, V. V. (2000).“Multihypothesis se quential probability ratio tests - Part II: Accurate asymptoticexpansions for the ex- pected sample size.” IEEE Trans. Inform. Theory 46,1366-1343.

Fellouris, G. and Tartakovsky, A. G. (2012). “Nearly minimax mixture-basedopen-ended sequential tests.” Sequential Analysis 31, 297-325.

G. Fellouris and A.G. Tartakovsky (2013) “Almost optimal sequential tests fordiscrete composite hypotheses.” Statistica Sinica vol. 23

Lai, T. L. (1988). “Nearly optimal sequential tests of composite hypotheses.”Ann. Statist. 16, 856-886.

Lai, T. L. and Siegmund, D. (1977). “A nonlinear renewal theory withapplications to sequential analysis I. ” Ann. Statist. 5, 628-643.

Lai, T. L. and Siegmund, D. (1979). “A nonlinear renewal theory withapplications to sequential analysis II.” Ann. Statist. 7, 60-76.

Lorden, G. (1967). “Integrated risk of asymptotically Bayes sequential tests.]]Ann. Math. Statist. 38, 1399-1422.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 48 / 49

Page 46: Almost optimal sequential detection in multiple data streams

References

Lorden, G. (1977). “Nearly optimal sequential tests for finitely many parametervalues.” Ann. Statist. 5, 1-21.

Schwarz, G. (1962). “Asymptotic shapes of Bayes sequential testing regions.”Ann. Math. Statist. 33, 224-236

Tartakovsky, A. G., Li, X. R., and Yaralov, G. (2003). Sequential detection oftargets in multichannel systems. IEEE Trans. Inform. Theory, vol. 49, 425-445.

A. Wald and J. Wolfowitz, “ Optimum character of the sequential probabilityratio test,” Ann. Math. Statist., vol. 19, pp. 326-339, 1948.

A. Wald, Sequential analysis. Wiley, New York, 1947.

Georgios Fellouris (UIUC) Almost optimal sequential tests July 9, 2012 49 / 49