estimation – detectionherwig.wendt/data/slides_dect_12.pdf · 2012-12-03 · estimation –...

29
Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis [email protected] Estimation - Detection, 2012 – p. 1/29

Upload: others

Post on 30-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Estimation – Detection

Herwig Wendt

CNRS

IRIT - ENSEEIHT

Theme 1: Information Analysis and Synthesis

[email protected]

Estimation - Detection, 2012 – p. 1/29

Page 2: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Outline

Part 1: Estimation

Part 2: Detection - Statistical tests

Introduction, example

Neyman Pearson Theorem

Bayesian test (simple hypotheses)

Generalized likelihood ratio test

Bayesian test (composite hypotheses)

Goodness of fit tests

Estimation - Detection, 2012 – p. 2/29

Page 3: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Introduction

Principle: A statistical test is a procedure that enablesto decide between different hypotheses H0, H1, ... fromn observations x1, ..., xn. We restrict ourselves here totwo hypotheses H0 and H1. Performing a test consistsin determining a test statistic T (X1, ..., Xn) and a set ∆such that

H0 rejected if T (X1, ..., Xn) ∈ ∆

H0 accepted if T (X1, ..., Xn) /∈ ∆.

Terminology

H0 is the null hypothesis

H1 is the alternative hypothesis

(x1, ..., xn)|T (x1, ..., xn) ∈ ∆ : critical region

Estimation - Detection, 2012 – p. 3/29

Page 4: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Definitions

Parametric and non-parametric tests

Simple and composite hypotheses

Type I error = probability of false alarm

α = PFA = P [Reject H0|H0 true]

Type II error = probability of non-detection

β = PND = P [Reject H1|H1 true]

Power of the test = probability of detection:

π = PD = P [Reject H0|H1 true]

= 1− β

Estimation - Detection, 2012 – p. 4/29

Page 5: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Example

Xi ∼ N (m,σ2), σ2 known

Hypotheses

H0 : m = m0, H1 : m = m1 > m0

Test strategy

Reject H0 if X =1

n

n∑

i=1

Xi > tα

ProblemsDetermine the critical value tα, the risk β and the powerπ of the test.

Estimation - Detection, 2012 – p. 5/29

Page 6: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

ROC

Receiver operating charactersitic

PD = h (PFA)

Example: Xi ∼ N (m,σ2), σ2 known

H0 : m = m0, H1 : m = m1 > m0

Probability of false alarm

tα = m0 +σ√nF−1(1− α)

Detection probability

PD = π = F

(tα −m1

σ√n

)

Estimation - Detection, 2012 – p. 6/29

Page 7: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

ROC

ROC curves for previous example

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

α = PFA

β =

PD

m0=0, m1=1

σ=3, n=7σ=1, n=7σ=3, n=15σ=1, n=15

Estimation - Detection, 2012 – p. 7/29

Page 8: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Outline

Part 1: Estimation

Part 2: Detection - Statistical tests

Introduction, example

Neyman Pearson Theorem

Bayesian test (simple hypotheses)

Generalized likelihood ratio test

Bayesian test (composite hypotheses)

Goodness of fit tests

Estimation - Detection, 2012 – p. 8/29

Page 9: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Neyman-Pearson Theorem

Parametric test for simple hypotheses

H0 : θ = θ0 and

H1 : θ = θ1

Continuous random variables

Theorem: for α fixed, the test that minimizes β(maximizes π) is defined by

Reject H0 ifL(x1, ..., xn|H1)

L(x1, ..., xn|H0)> Sα

Remark: L(x1, ..., xn|Hi) = f(x1, ..., xn|θi)

Example: Xi ∼ N (m,σ2), σ2 known

H0 : m = m0, H1 : m = m1 > m0

Estimation - Detection, 2012 – p. 9/29

Page 10: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Neyman-Pearson theorem

Discrete random variables

Theorem: among all tests with type I error ≤ α fixed,the most powerful test is defined by

Reject H0 ifL(x1, ..., xn|H1)

L(x1, ..., xn|H0)> tα

Remark:

L(x1, ..., xn|Hi) = P [X1 = x1, ..., Xn = xn|θi]

Example: Xi ∼ P(λ), H0 : λ = λ0, H1 : λ = λ1 > λ0

Asymptotic law: when n is sufficiently large,application of central limit theorem

Estimation - Detection, 2012 – p. 10/29

Page 11: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Summary

Performing a Neyman-Pearson test:

1. Determination of the test statistique and the criticalregion of the test

2. Determination of the relation between the critical valuetα and the significance α of the test

3. Determination of the risk β and the power π of the test(or the ROC curve)

4. Numerical application: one accepts or rejects H0 forgiven risk α

Estimation - Detection, 2012 – p. 11/29

Page 12: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Outline

Part 1: Estimation

Part 2: Detection - Statistical tests

Introduction, example

Neyman Pearson Theorem

Bayesian test (simple hypotheses)

Generalized likelihood ratio test

Bayesian test (composite hypotheses)

Goodness of fit tests

Estimation - Detection, 2012 – p. 12/29

Page 13: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Bayesian test

Simple hypotheses H0 : θ = θ0 and H1 : θ = θ1 withprior probabilities P (H0) and P (H1)

costs cij to decide on Hi when Hj is true

probabilities pij to decide on Hi when Hj is true

Minimize cost E[C] = c00p00 + c01p01 + c10p10 + c11p11

Definition:

Reject H0 ifL(x1, ..., xn|H1)

L(x1, ..., xn|H0)>

P (H0)

P (H1)

c10 − c00c01 − c11

Example: Xi ∼ N (m,σ2), σ2 known

H0 : m = m0, H1 : m = m1 > m0

Estimation - Detection, 2012 – p. 13/29

Page 14: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Outline

Part 1: Estimation

Part 2: Detection - Statistical tests

Introduction, example

Neyman Pearson Theorem

Bayesian test (simple hypotheses)

Generalized likelihood ratio test

Bayesian test (composite hypotheses)

Goodness of fit tests

Estimation - Detection, 2012 – p. 14/29

Page 15: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Generalized likelihood ratio test

Parametric test for composite hypotheses

H0 : θ ∈ Ω0 and H1 : θ ∈ Ω1

Definition (GLR Test)

Reject H0 if

L(x1, ..., xn|θ

ML

1

)

L(x1, ..., xn|θ

ML

0

) > tα

where θML

0 and θML

1 are the maximum likelihoodestimators of θ under the hypotheses H0 and H1.

RemarkL(x1, ..., xn|θ

ML

i

)= sup

θ∈Ωi

L (x1, ..., xn|θ)

Estimation - Detection, 2012 – p. 15/29

Page 16: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Outline

Part 1 : Estimation

Part 2 : Detection - Statistical tests

Introduction, example

Neyman Pearson Theorem

Bayesian test (simple hypotheses)

Generalized likelihood ratio test

Bayesian test (composite hypotheses)

Goodness of fit tests

Estimation - Detection, 2012 – p. 16/29

Page 17: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Bayesian test

Parametric test for composite hypotheses

H0 : θ ∈ Ω0 and H1 : θ ∈ Ω1

Principle: Define prior laws p0(θ) and p1(θ) thatcorrespond to the constraints θ ∈ Ω0 and θ ∈ Ω1

Definition (Bayesian detector)

Reject H0 if

∫f (x1, · · · , xn|θ) p1(θ)dθ∫f (x1, · · · , xn|θ) p0(θ)dθ

> tα

Example: Xi ∼ N (m,σ2), σ2 known

H0 : m = 0, H1 : m ∼ N (0, ν2)

Estimation - Detection, 2012 – p. 17/29

Page 18: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Outline

Part 1 : Estimation

Part 2 : Detection - Statistical tests

Introduction, example

Neyman Pearson Theorem

Bayesian test (simple hypotheses)

Generalized likelihood ratio test

Bayesian test (composite hypotheses)

Goodness of fit tests

Estimation - Detection, 2012 – p. 18/29

Page 19: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

χ2 test

The χ2 test is a non parametric goodness of fit test whichenables to test the following two hypotheses,

H0 : L = L0, H1 : L 6= L0,

where L0 is a given law. The test consists in determiningwhether (x1, ..., xn) is of law L0 or not. For simplicity, we onlyconsider the case xi ∈ R.

Definition

Reject H0 if φn =

K∑

k=1

(nk − npk)2

npk> tα

Remark: L0 can be discrete or continuous

Estimation - Detection, 2012 – p. 19/29

Page 20: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

χ2 test

Test statistic

nk: number of observations xi in class Ck,k = 1, ...,K

pk: probability that an observation xi belongs toclass Ck when Xi ∼ L0

P [Xi ∈ Ck|Xi ∼ L0]

n: total number of observations

Law of the test statistic

φnL→

n→∞χ2K−1

Estimation - Detection, 2012 – p. 20/29

Page 21: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Remarks

Interpretation of φ

φn =

K∑

k=1

n

pk

(nkn

− pk

)2

Distance between theoretical and empirical probabilities

Asymptotic law of φn: see course or textbooks

Finite number of observationHeuristic: The asymptotic law of φn is a goodapproximation for finite n if npk ≥ 5 ∀k = 1, ...,K

=⇒ equally likely classes

Estimation - Detection, 2012 – p. 21/29

Page 22: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Remarks

CorrectionWhen (a subset of) the parameters of L0 are unknown

φnL→

n→∞χ2K−1−np

where np is the number of unknown parameters,estimated by the maximum likelihood method

Power of the testCan not be computed

Estimation - Detection, 2012 – p. 22/29

Page 23: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Example

4.13 1.41 −1.16 −0.75 1.96 2.46 0.197 0.24 0.42 2.00

2.08 1.48 1.73 0.82 0.33 −0.76 0.42 4.60 −2.83 0.197

2.59 0.54 4.06 −0.69 4.99 0.67 2.45 5.61 2.13 1.76

5.03 0.85 1.29 0.17 −0.38 2.76 −1.03 1.87 4.48 0.73

Is it reasonable to assume that the observations stem froma population of law N (1, 4)?Solution

Classes

C1 : ]−∞,−0.34], C2 : ]−0.34, 1], C3 : ]1, 2.34], C4 : ]2.34,∞[

Number of observations

Z1 = 7, Z2 = 12, Z3 = 10, Z4 = 11

Estimation - Detection, 2012 – p. 23/29

Page 24: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Example

Test statistic

φn = 1.4

critical values

χ22 χ2

3

t0.05 5.991 7.815

t0.01 9.210 11.345

hence hypothesis H0 is accepted with risks α = 0.01 andα = 0.05.

Estimation - Detection, 2012 – p. 24/29

Page 25: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Kolmogorov test

The Kolmogorov test is a non parametric goodness of fittest which enables to test the following two hypotheses,

H0 : L = L0, H1 : L 6= L0,

where L0 is a given law. The test consists in determiningwhether (x1, ..., xn) is of law L0 or not. For simplicity, we onlyconsider the case xi ∈ R.

Definition

Reject H0 if Ψn = supx∈R

|F (x)− F0(x)| > tα

Remark: L0 must be a continuous law

Estimation - Detection, 2012 – p. 25/29

Page 26: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Remarks

Test statisticF0(x) is the theoretical cumulative distribution function

of L0 and F (x) is the empirical distribution function of(x1, ..., xn)

Asymptotic law of Ψn: see textbooks

P [√nΨn < y]

L→n→∞

1− 2

+∞∑

l=1

(−1)l−1 exp(−2l2y2) = K(y)

Determination of the critical value: tα = 1√nK−1(1− α)

The critical value depends on α and n.

Estimation - Detection, 2012 – p. 26/29

Page 27: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Remarks

Computing Ψn

Ψn = maxi∈1,...,n

maxE+i , E

−i

E+i =

∣∣∣F(x∗+i)− F0 (x

∗i )∣∣∣ , E−

i =∣∣∣F(x∗−i)− F0 (x

∗i )∣∣∣

x∗1, ..., x∗n is the order statistic of x1, ..., xn.

F(x∗+i)= i/n and F

(x∗−i)= (i− 1)/n.

Power of the testCan not be computed

Estimation - Detection, 2012 – p. 27/29

Page 28: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Example

Is it reasonable to believe that the following observationsstem from a population of uniform law U(0, 1)?

i 1 2 3 4 5 6 7 8 9 10

xi 0.0078 0.063 0.10 0.25 0.32 0.39 0.40 0.48 0.49 0.53

E−

i0.0078 0.013 0.00 0.10 0.07 0.14 0.05 0.008 0.04 0.03

E+

i0.0422 0.037 0.05 0.05 0.12 0.09 0.10 0.13 0.09 0.08

max(E+

i, E

i) 0.0422 0.037 0.05 0.1 0.12 0.14 0.10 0.13 0.09 0.08

i 11 12 13 14 15 16 17 18 19 20

xi 0.67 0.68 0.69 0.73 0.79 0.80 0.87 0.88 0.90 0.996

E−

i0.17 0.13 0.04 0.03 0.04 0.05 0.07 0.03 0.05 0.046

E+

i0.12 0.08 0.09 0.08 0.09 0.00 0.02 0.02 0.00 4e− 3

max(E+

i, E

i) 0.17 0.13 0.09 0.08 0.09 0.05 0.07 0.03 0.05 0.046

Estimation - Detection, 2012 – p. 28/29

Page 29: Estimation – DetectionHerwig.Wendt/data/slides_dect_12.pdf · 2012-12-03 · Estimation – Detection Herwig Wendt CNRS IRIT - ENSEEIHT Theme 1: Information Analysis and Synthesis

Example

Test statistic

Dn = 0.17

Critical values for n = 20

t0.05 0.294

t0.01 0.352

hence hypothesis H0 is accepted with risks α = 0.01 andα = 0.05.

Estimation - Detection, 2012 – p. 29/29