the pdf estimation problem - rice ujjy5/probtheory.pdf · estimation the pdf estimation problem....

The PDF Estimation ProblemScientific Computing and Numerical Analysis Seminar

October 5, 2010

The PDF Estimation Problem

Outline

The Big PictureBasic Probability TheoryHermite Polynomial InterpolationHistogram InterpolationKernel EstimationData Regeneration


The Big Picture

Continuum-Microscopic Method Steps1 Create a microscopic system2 Run the microscopic updating scheme for a

short number of time steps3 Average the results and send these values

to the macro-scale4 Run the macroscopic updating scheme


The Big Picture

Goal: Perform Step 1 of the CM Algorithm byutilizing past information from themicro-scaleTrack the evolution of the microscopicvariables by tracking their probabilitydistribution functions (PDFs)Use these PDFs to predict the PDF of eachvariable at the desired future point in time


Probability Theory

The Probability Distribution Function (PDF)A random variable X is defined by its set ofpossible values Ω and its probabilitydistribution function f (X )

The probability that X takes on a valuebetween x and x + dx is given by∫ x+dx

x f (x)dxf (x) is such that its integral normalizes to 1∫

Ω

f (X )dX = 1


Probability Theory

ExpectationThe expected value (or mean) of a probability distributionfunction is given by:

E(X ) =

∫ ∞−∞

xf (x)dx

More generally, the expectation of any function g(X),related to the PDF f(X) is given by:

E(g(X )) =

∫ ∞−∞

g(x)f (x)dx

If f (X ) is unknown, the expectation can be approximatedby taking the average from the given data values:

E(g(X )) ≈∑N

i=1 g(Xi)

N


Probability Theory

The Cumulative Distribution Function (CDF)CDF is defined as:

F (x) =

∫ x

−∞f (X )dX

In words, F (x) represents the probability thatX takes on a value between −∞ to xThe CDF will be useful for DataRegeneration


Probability Theory

Joint Probability Distribution Function (JPDF)Given random variables X1,X2, ...XN , theJPDF f (X1,X2, ...,XN) can be interpreted asthe probability that X1 ∈ (x1, x1 + dx1),X2 ∈ (x2, x2 + dx2), ... , XN ∈ (xN , xN + dxN)is given by:∫ x1+dx1

x1

∫ x2+dx2

x2

...

∫ xN+dxN

xN

f (x1, x2, ..., xN)dx1dx2..dxN

The JPDF can be written as a product ofsingle variable PDFs (f (x1) ∗ f (x2)... ∗ f (xN))if the variables are independent


Probability Theory

Common Distribution Functions

Uniform Distribution: f (x) =

1

b−a for a ≤ x ≤ b,0 for x < a or x > b

Normal Distribution: f (x) = 1√2πσ2 e

−(x−µ)2

2σ2

Uniform Normal


Probability Theory

The PDF Estimation ProblemClassic problem of Probability TheoryGiven a set of data, the goal is to determinethe PDF f (X ) that produced that dataCommon techniques: Series Expansions,Histogram Interpolation, and KernelEstimation


Probability Theory

The PDF Estimation ProblemEach technique will be tested on a set ofdata produced by a normal distribution,mean = 0, standard deviation = 1


Probability Theory

Error EstimationError for each technique will be estimated bycomputing:

E((f (x)− ˆf (x))2)

The Mean Square Error (MSE)E indicates Expectation or average

E((f (x)− ˆf (x))2) ≈ 1n

n∑i=1

(f (xn)− ˆf (xn)

)2


Hermite Polynomial Expansion

Goal is to estimate the underlying PDF f (x)

f (x) could be approximated by a truncatedseries expansion:

f (x) =N∑

n=0

cnHn(x)

where cn are coefficients and Hn(x) are a setof basis functionsFor this demonstration, we choose Hn(x) tobe the orthogonal Hermite polynomials



The Hermite polynomials are defined as:

Hn(x) = (−1)nex2 dn

dxn e−x2

The Hermite polynomials are orthogonal on(−∞,∞), meaning:∫ ∞−∞

Hm(x)Hn(x)e−x2dx =

0 if m 6= nn!2n√π m = n

The orthogonality of Hn(x) will allow for easycomputation of the cn coefficients


Hermite Polynomial Interpolation

ˆf (x) =N∑

n=0

cnHn(x)

ˆf (x)Hm(x)e−x2=

N∑n=0

cnHn(x)Hm(x)e−x2

∫ ∞−∞

ˆf (x)Hm(x)e−x2=

∫ ∞−∞

N∑n=0

cnHn(x)Hm(x)e−x2

∫ ∞−∞

ˆf (x)Hm(x)e−x2=

N∑n=0

∫ ∞−∞

cnHn(x)Hm(x)e−x2



∫ ∞−∞

ˆf (x)Hn(x)e−x2= cnn!2n√π

cn =1

n!2n√π

∫ ∞−∞

ˆf (x)Hn(x)e−x2

cn =1

n!2n√π

E(Hn(x)e−x2)

cn =1

n!2n√π

∑Ni=1 Hn(xi)e−x2

i )

N



Results for different numbers of terms in the Expansion:

Terms: 6, 10, 20, 40



Number of Terms MSE6 0.0337410 0.0010920 0.0029140 0.01914

ˆf (x) does poor at the edges of the domainErrors due to truncation of termsApproximation theory says error betweenf (x) and

∑Nn=0 cnHn(x) should decrease as

N increases if cn are computed exactly


Histogram Interpolation

One of the oldest, most common PDFestimation techniquesFirst step is to establish the bins into whichdata will be sortedGiven a starting point x0 and bin width h, thebins can be established as:

[x0 + mh, x0 + (m + 1)h]

The histogram gets defined as:

ˆf (x) =1

nh(No. of Xi in same bin as x)



ˆf (x) is a piecewise constant estimate of theunderlying PDF f (x)

If a continuous function approximation isneeded, ˆf (x) can be interpolated (e.g.splines)Choice of bin endpoints and width will createdifferent resultsWide bins: Smooth and blur details in dataNarrow bins: Not enough data per bin,resulting approximation very spiky



Results for different bin widths: h = 0.8, 0.5



Results for different bin widths: h = 0.2, 0.05



Bin Width MSE0.8 3.045e-50.5 1.589e-50.2 3.753e-5

0.05 2.094e-4

Optimal bin width can be found by solving anerror minimization problem (provided f (X ) isknown)Formulas exist to estimate optimal bin widthfor data that is close to normally distributedExample: Sturges’ formula: k = log2 n + 1,where k is the number of bins


Kernel Estimation

Another very popular PDF estimationtechniqueSimilar to histograms, but instead of creatingseparate bins into which data is collectedand counted, ˆf (X ) is computed as a sum offunctions centered at each data point

ˆf (x) =1

nh

n∑i=1

K(

x − Xi

h

)

h is still a "width" parameter, and n is thenumber of data points Xi


Kernel Estimation

The "kernel" function K is usually a symmetric probabilitydistribution function, like a normal distribution

K(

x − Xi

h

)=

1√2πh2

e−12

(x−Xi

h

)2

ˆf (X ) is a sum of normal distributions


Kernel Estimation

ˆf (X ) is a smooth, differentiable functionDo not need to choose where to center bins,here bins are centered at each data pointand overlap with one anotherAs in histogram interpolation, there arevarious methods for choosing hMethods include: Minimization of meansquare error (which requires knowledge off (X )) and others such as least squarescross-validation


Kernel Estimation

Results for test case, different h values:

h = 0.5, 0.2, 0.1, 0.05The PDF Estimation Problem

Kernel Estimation

Bin Width MSE0.5 2.983e-40.2 1.896e-50.1 2.345e-5

0.05 5.110e-5

The estimated function ˆf (X ) is a smoothfunction, only requires a choice of bin width,and has low errorsConclusions: Kernel Estimation will be usedto estimate the PDFs of data frommicroscopic variables in the CM model


Data Regeneration

Given a PDF, we need to generate a set ofdata from it, to assign to the elements orparticles in the micro-systemThis is done by computing the cumulativedensity function F (x) =

∫ x−∞ f (X )dX

The values of F (x) range from 0 to 1A random number generator is used to picka value c ∈ [0,1]

A root-finding algorithm is then used to solveF (x)− c = 0 for x (the desired data point)


Data Regeneration

PDF→ CDF→ Data Set


Summary

Kernel Estimation will be used to estimatePDFs of various variables of the microscopicsystemThese PDFs will be collected over timeduring the microscopic evolutionA new PDF at the desired future point in timewill be extrapolated from these saved PDFsA new micro-system will be created at thefuture point in time based on these predictedPDFs


References

Silverman, B.W. "Density Estimation for Statistics and DataAnalysis", Chapman and Hall, 1986.


Seminar Speakers

We need volunteers to give a talk at thisseminar for the following dates:October 20, 27 and November 3, 10


the pdf estimation problem - rice ujjy5/probtheory.pdf · estimation the pdf estimation problem....

Documents