the pdf estimation problem - rice ujjy5/probtheory.pdf · estimation the pdf estimation problem....

33
The PDF Estimation Problem Scientific Computing and Numerical Analysis Seminar October 5, 2010 The PDF Estimation Problem

Upload: hahanh

Post on 14-Mar-2018

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

The PDF Estimation ProblemScientific Computing and Numerical Analysis Seminar

October 5, 2010

The PDF Estimation Problem

Page 2: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Outline

The Big PictureBasic Probability TheoryHermite Polynomial InterpolationHistogram InterpolationKernel EstimationData Regeneration

The PDF Estimation Problem

Page 3: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

The Big Picture

Continuum-Microscopic Method Steps1 Create a microscopic system2 Run the microscopic updating scheme for a

short number of time steps3 Average the results and send these values

to the macro-scale4 Run the macroscopic updating scheme

The PDF Estimation Problem

Page 4: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

The Big Picture

Goal: Perform Step 1 of the CM Algorithm byutilizing past information from themicro-scaleTrack the evolution of the microscopicvariables by tracking their probabilitydistribution functions (PDFs)Use these PDFs to predict the PDF of eachvariable at the desired future point in time

The PDF Estimation Problem

Page 5: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

The Probability Distribution Function (PDF)A random variable X is defined by its set ofpossible values Ω and its probabilitydistribution function f (X )

The probability that X takes on a valuebetween x and x + dx is given by∫ x+dx

x f (x)dxf (x) is such that its integral normalizes to 1∫

Ω

f (X )dX = 1

The PDF Estimation Problem

Page 6: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

ExpectationThe expected value (or mean) of a probability distributionfunction is given by:

E(X ) =

∫ ∞−∞

xf (x)dx

More generally, the expectation of any function g(X),related to the PDF f(X) is given by:

E(g(X )) =

∫ ∞−∞

g(x)f (x)dx

If f (X ) is unknown, the expectation can be approximatedby taking the average from the given data values:

E(g(X )) ≈∑N

i=1 g(Xi)

N

The PDF Estimation Problem

Page 7: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

The Cumulative Distribution Function (CDF)CDF is defined as:

F (x) =

∫ x

−∞f (X )dX

In words, F (x) represents the probability thatX takes on a value between −∞ to xThe CDF will be useful for DataRegeneration

The PDF Estimation Problem

Page 8: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

Joint Probability Distribution Function (JPDF)Given random variables X1,X2, ...XN , theJPDF f (X1,X2, ...,XN) can be interpreted asthe probability that X1 ∈ (x1, x1 + dx1),X2 ∈ (x2, x2 + dx2), ... , XN ∈ (xN , xN + dxN)is given by:∫ x1+dx1

x1

∫ x2+dx2

x2

...

∫ xN+dxN

xN

f (x1, x2, ..., xN)dx1dx2..dxN

The JPDF can be written as a product ofsingle variable PDFs (f (x1) ∗ f (x2)... ∗ f (xN))if the variables are independent

The PDF Estimation Problem

Page 9: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

Common Distribution Functions

Uniform Distribution: f (x) =

1

b−a for a ≤ x ≤ b,0 for x < a or x > b

Normal Distribution: f (x) = 1√2πσ2 e

−(x−µ)2

2σ2

Uniform Normal

The PDF Estimation Problem

Page 10: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

The PDF Estimation ProblemClassic problem of Probability TheoryGiven a set of data, the goal is to determinethe PDF f (X ) that produced that dataCommon techniques: Series Expansions,Histogram Interpolation, and KernelEstimation

The PDF Estimation Problem

Page 11: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

The PDF Estimation ProblemEach technique will be tested on a set ofdata produced by a normal distribution,mean = 0, standard deviation = 1

The PDF Estimation Problem

Page 12: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Probability Theory

Error EstimationError for each technique will be estimated bycomputing:

E((f (x)− ˆf (x))2)

The Mean Square Error (MSE)E indicates Expectation or average

E((f (x)− ˆf (x))2) ≈ 1n

n∑i=1

(f (xn)− ˆf (xn)

)2

The PDF Estimation Problem

Page 13: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Hermite Polynomial Expansion

Goal is to estimate the underlying PDF f (x)

f (x) could be approximated by a truncatedseries expansion:

f (x) =N∑

n=0

cnHn(x)

where cn are coefficients and Hn(x) are a setof basis functionsFor this demonstration, we choose Hn(x) tobe the orthogonal Hermite polynomials

The PDF Estimation Problem

Page 14: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Hermite Polynomial Expansion

The Hermite polynomials are defined as:

Hn(x) = (−1)nex2 dn

dxn e−x2

The Hermite polynomials are orthogonal on(−∞,∞), meaning:∫ ∞−∞

Hm(x)Hn(x)e−x2dx =

0 if m 6= nn!2n√π m = n

The orthogonality of Hn(x) will allow for easycomputation of the cn coefficients

The PDF Estimation Problem

Page 15: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Hermite Polynomial Interpolation

ˆf (x) =N∑

n=0

cnHn(x)

ˆf (x)Hm(x)e−x2=

N∑n=0

cnHn(x)Hm(x)e−x2

∫ ∞−∞

ˆf (x)Hm(x)e−x2=

∫ ∞−∞

N∑n=0

cnHn(x)Hm(x)e−x2

∫ ∞−∞

ˆf (x)Hm(x)e−x2=

N∑n=0

∫ ∞−∞

cnHn(x)Hm(x)e−x2

The PDF Estimation Problem

Page 16: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Hermite Polynomial Expansion

∫ ∞−∞

ˆf (x)Hn(x)e−x2= cnn!2n√π

cn =1

n!2n√π

∫ ∞−∞

ˆf (x)Hn(x)e−x2

cn =1

n!2n√π

E(Hn(x)e−x2)

cn =1

n!2n√π

∑Ni=1 Hn(xi)e−x2

i )

N

The PDF Estimation Problem

Page 17: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Hermite Polynomial Expansion

Results for different numbers of terms in the Expansion:

Terms: 6, 10, 20, 40

The PDF Estimation Problem

Page 18: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Hermite Polynomial Expansion

Number of Terms MSE6 0.0337410 0.0010920 0.0029140 0.01914

ˆf (x) does poor at the edges of the domainErrors due to truncation of termsApproximation theory says error betweenf (x) and

∑Nn=0 cnHn(x) should decrease as

N increases if cn are computed exactly

The PDF Estimation Problem

Page 19: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Histogram Interpolation

One of the oldest, most common PDFestimation techniquesFirst step is to establish the bins into whichdata will be sortedGiven a starting point x0 and bin width h, thebins can be established as:

[x0 + mh, x0 + (m + 1)h]

The histogram gets defined as:

ˆf (x) =1

nh(No. of Xi in same bin as x)

The PDF Estimation Problem

Page 20: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Histogram Interpolation

ˆf (x) is a piecewise constant estimate of theunderlying PDF f (x)

If a continuous function approximation isneeded, ˆf (x) can be interpolated (e.g.splines)Choice of bin endpoints and width will createdifferent resultsWide bins: Smooth and blur details in dataNarrow bins: Not enough data per bin,resulting approximation very spiky

The PDF Estimation Problem

Page 21: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Histogram Interpolation

Results for different bin widths: h = 0.8, 0.5

The PDF Estimation Problem

Page 22: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Histogram Interpolation

Results for different bin widths: h = 0.2, 0.05

The PDF Estimation Problem

Page 23: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Histogram Interpolation

Bin Width MSE0.8 3.045e-50.5 1.589e-50.2 3.753e-5

0.05 2.094e-4

Optimal bin width can be found by solving anerror minimization problem (provided f (X ) isknown)Formulas exist to estimate optimal bin widthfor data that is close to normally distributedExample: Sturges’ formula: k = log2 n + 1,where k is the number of bins

The PDF Estimation Problem

Page 24: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Kernel Estimation

Another very popular PDF estimationtechniqueSimilar to histograms, but instead of creatingseparate bins into which data is collectedand counted, ˆf (X ) is computed as a sum offunctions centered at each data point

ˆf (x) =1

nh

n∑i=1

K(

x − Xi

h

)

h is still a "width" parameter, and n is thenumber of data points Xi

The PDF Estimation Problem

Page 25: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Kernel Estimation

The "kernel" function K is usually a symmetric probabilitydistribution function, like a normal distribution

K(

x − Xi

h

)=

1√2πh2

e−12

(x−Xi

h

)2

ˆf (X ) is a sum of normal distributions

The PDF Estimation Problem

Page 26: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Kernel Estimation

ˆf (X ) is a smooth, differentiable functionDo not need to choose where to center bins,here bins are centered at each data pointand overlap with one anotherAs in histogram interpolation, there arevarious methods for choosing hMethods include: Minimization of meansquare error (which requires knowledge off (X )) and others such as least squarescross-validation

The PDF Estimation Problem

Page 27: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Kernel Estimation

Results for test case, different h values:

h = 0.5, 0.2, 0.1, 0.05The PDF Estimation Problem

Page 28: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Kernel Estimation

Bin Width MSE0.5 2.983e-40.2 1.896e-50.1 2.345e-5

0.05 5.110e-5

The estimated function ˆf (X ) is a smoothfunction, only requires a choice of bin width,and has low errorsConclusions: Kernel Estimation will be usedto estimate the PDFs of data frommicroscopic variables in the CM model

The PDF Estimation Problem

Page 29: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Data Regeneration

Given a PDF, we need to generate a set ofdata from it, to assign to the elements orparticles in the micro-systemThis is done by computing the cumulativedensity function F (x) =

∫ x−∞ f (X )dX

The values of F (x) range from 0 to 1A random number generator is used to picka value c ∈ [0,1]

A root-finding algorithm is then used to solveF (x)− c = 0 for x (the desired data point)

The PDF Estimation Problem

Page 30: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Data Regeneration

PDF→ CDF→ Data Set

The PDF Estimation Problem

Page 31: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Summary

Kernel Estimation will be used to estimatePDFs of various variables of the microscopicsystemThese PDFs will be collected over timeduring the microscopic evolutionA new PDF at the desired future point in timewill be extrapolated from these saved PDFsA new micro-system will be created at thefuture point in time based on these predictedPDFs

The PDF Estimation Problem

Page 32: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

References

Silverman, B.W. "Density Estimation for Statistics and DataAnalysis", Chapman and Hall, 1986.

The PDF Estimation Problem

Page 33: The PDF Estimation Problem - Rice Ujjy5/ProbTheory.pdf · Estimation The PDF Estimation Problem. Probability Theory The PDF Estimation Problem Each technique will be tested on a set

Seminar Speakers

We need volunteers to give a talk at thisseminar for the following dates:October 20, 27 and November 3, 10

The PDF Estimation Problem