a) transformation method (for continuous distributions) u(0,1) : uniform distribution f(x) :...

Post on 22-Dec-2015

237 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

a) Transformation method (for continuous distributions)

U(0,1) : uniform distributionf(x) : arbitrary distribution

f(x) dx = U(0,1)(u) du

When inverse function of integral, F-1(u), is known, then x = F-1(u) distributed according to f(x)

Example: Exponential distribution

4. MC Methods 4.2 Generators for arbitrary distributions

K. Desch – Statistical methods of data analysis SS10

x

uF(x)f(t)dt

λxλeλ)f(x; 0x x

λt λx

0

u λe dt 1 e 1x F (u)= -ln(1-u)/λ

b) Transformation method (discrete distributions)

4. MC Methods 4.2 Generators for arbitrary distributions

K. Desch – Statistical methods of data analysis SS10

k

1ii1k )P(xP 1P0,P 1n1

c) Hit-or-miss method (brute force)

Uniform distr. fr. 0 to c: ui

Uniform distr. from xmin to xmax: xi

when ui ≤ f(xi) → accept xi, otherwise not

- two random numbers per try

- inefficient when f(x) « c

- need to (conservatively) estimate c (maximum of f(x))

(can be done in “warm-up” run)

4. MC Methods 4.2 Generators for arbitrary distributions

K. Desch – Statistical methods of data analysis SS10

Improvement:

- search for analytical function s(x) close to f(x)

- use c so that c • s(x) >f(x) for all x

1ix S (u)

x

S(x): s(t)dt

1. take ui in [0,1] and calculate xi = S-1 (ui)

2. take uj in [0,c]

3. when uj • s(xi) ≤ f(xi) accept xi, otherwise not

b

a

I g(x)dxsearch for:

4. MC Methods 4.3 Monte Carlo Integration

K. Desch – Statistical methods of data analysis SS10

Integration over one dimension:

(E[g] = expectation value of g w.r.t. uniform distribution)

Take xi uniformly distributed in [a,b] →

n

1iiMC )g(x

n

abII

2

i2i2

i2ii n

g

n

g]E[g]E[g]V[g

b

a

1I g(x)dx (b a)E gb a

b a

2 2n n2

MC I i i ii 1 i 1

b a b a (b a)V[I ] σ V g V[ g ] V[g ]

n n n

Variance:

(CLT)

4. MC Methods 4.3 Monte Carlo Integration

K. Desch – Statistical methods of data analysis SS10

Alternative: hit-or-miss integration

- Variance of r(x): will be small when r is flat, so f ≈ g

- The method takes care of (integrable) singularities

(find f(x) with has the same singularity structure as g(x))

xi distributed as f(x)

4. MC Methods 4.3 Monte Carlo Integration

K. Desch – Statistical methods of data analysis SS10

Variance-reduced methods

a) importance sampling:

If f(x) is a known p.d.f., which could be integrated and inverted, then:

r(x)Ef(x)

g(x)Ef(x)dx

f(x)

g(x)g(x)dxI

b

a

b

a

2ii )r(rE]V[r

n

1i i

iMC )f(x

)g(x

n

abI

Expectation value of r(x) can be obtained with random numbers, which is distributed according to f(x):

4. MC Methods 4.3 Monte Carlo Integration

K. Desch – Statistical methods of data analysis SS10

b) Control function

(subtraction of an integrable analytical function)

dxf(x)g(x)f(x)dxg(x)dx

analytical MC

c) Partitioning

(split integration range into several more „flat“ regions)

let x be a random variable distributed according to f(x)

n independent “measurements” of x, x = (x1,…,xn) is sample of a distribution f(x) of size n (outcome of an experiment)

x = itself is a random variable with p.d.f. fsample (x)

sample space: all possible values of x = (x1,…,xn)

If all xi are independent

fsample(x) = f(x1)•f(x2)• … •f(xn)

is the p.d.f. for x

5. Estimation 5.1 Sample space, Estimators

K. Desch – Statistical methods of data analysis SS10

A central problem of (frequentist) statistics:

Find the properties of f(x) when only a sample x = (x1,…,xn) has been measured

Task: construct functions of xi to estimate the properties of f(x)(e.g. μ, σ2, …)

Often f depends on parameters θj : f(xi;θj) try to estimate the parameters θj from measured sample x

Functions of (xi) are called a statistic.

If a statistic is used to estimate parameters (μ, σ2, θ, …), it called an estimator

Notation: is an estimator for θ

can be calculated; true value θ is unknown

Estimation of p.d.f. parameters is also called a fit

5. Estimation 5.1 Sample Space, Estimators

K. Desch – Statistical methods of data analysis SS10

in simple words: n→∞ θ →

2.Bias:

itself is a random variable, distributed according to a p.d.f.

This p.d.f. is called the sampling distribution

Expectation value of the sampling distribution:

(or “ “)

1 Consistency:

an estimator is consistent if for each ε > 0 :

5. Estimation 5.2 Properties of Estimators

K. Desch – Statistical methods of data analysis SS10

0ε|θθ|Plimn

θθlimn

)x,...,(xθ 21 θ);θg(

1 n 1 nˆ ˆ ˆ ˆ ˆE θ(x) θ(x) g(θ,θ) dθ(x) ... θ(x) f(x ;θ)...f(x ;θ)dx ...dx

1ˆ ˆg(θ(x ,...,x ))dθ f(x )dxn i ibecause

5. Estimation 5.2 Properties of estimators

K. Desch – Statistical methods of data analysis SS10

The bias of an estimator is defined as

An estimator is unbiased (or bias-free) if b=0

An estimator is asymptotically unbiased if

Attentions Consistent: for large sample size

Unbiased: for fixed sample size

3. Efficiency:

One estimator is more efficient than another if its variance is smaller,

or more precise if its mean squared error (MSE) is smaller

ˆE[ ]

θθ

0b limn

ˆb E[ ]

2 2ˆ ˆE (θ-θ) MSE V[θ] b

2 2 2 2 2 2 2ˆ ˆ ˆ ˆE (θ-θ) E[θ ]-2θE[θ] θ E[θ ] b E[θ] V[θ] b

2 2 2 2b (E[θ] θ) E[θ] 2θE[θ] θ

2ˆE ( - )

and

5. Estimation 5.2 Properties of estimators

K. Desch – Statistical methods of data analysis SS10

4. Robustness

An estimator is robust if it does not strongly depend on single measurements(which might be systematically wrong)

5. Simplicity

(subjective)

5. Estimation 5.3 Estimation of the mean

K. Desch – Statistical methods of data analysis SS10

n

1ix

n

1xx

In principle one can construct an arbitrary number of different esitmatorsfor the mean value of a pdf, = E[x]

Examples:

mean of the sample

10

i1

1x x

10mean of the first ten members of the sample

n

i1

1x x x

n-1

x 42

x = median of the sample

max minx xx =

2

all have different (wanted and unwanted)properties

5. Estimation 5.3 Estimation of the mean

K. Desch – Statistical methods of data analysis SS10

The mean of a sample provides an estimate of the true mean:

a) is consistent:

CLT: p.d.f. of approaches Gaussian with variance

b) is unbiased

c) Is efficient ?

n

1ix

n

1xx

i

1 1E[x] E x (n )

n n

2

i2

n

xE)(E]xV[]θV[ xxx

2

i 2 2j i2 2

x 1 1 1 1E E(x ) E (x ) nV[x] σ

n n n n n

0j)cov(i,

x

x

x

x2 2x x

10 for n

n

top related