modeling and estimation of uncertain systems lecture 1: uncertainty i: probability and stochastic...

Modeling and Estimation of Uncertain Systems

Lecture 1:

Uncertainty I: Probability and Stochastic Processes


Two distinct activities: Modeling is the activity of constructing a mathematical description of

a system of interest and encompasses specification of model structure and its parameterization,

Estimation is concerned with determining the “state” of a system relative to some model.

Broadly speaking, there are four possible cases: Well defined system, rich data source(s), Poorly defined system, rich data source(s), Well defined system, sparse data, Poorly defined system, sparse data.

Classification characterized by amount of a priori information that can be embedded in model and amount of data available for inference.

Modeling and EstimationWhat is the problem?

2Lecture 1

May 15, 2011

Increasing difficulty


Uncertainty permeates every aspect of this problem: What parts of the system are important? What are the right descriptions of the constituents? What is the

right way to describe their interactions? What are the available observations? What are the dynamics of the

observation processes?.... Two types of uncertainty:

Epistemic: Uncertainty due to a lack of knowledge of quantities or processes of the system or the environment. AKA subjective uncertainty, reducible uncertainty, or model form uncertainty.

Aleatory: Inherent variation associated with the physical system or the environment. AKA variability, irreducible uncertainty, or stochastic uncertainty.

Many different “uncertainty” theories, each with their own strengths and weaknesses.

Uncertainty“What” is the problem!

3Lecture 1

May 15, 2011


SyllabusLecture Series Agenda

4Lecture 1

May 15, 2011

1. Uncertainty I: Probability and Stochastic Processes

2. Filtering I: Kalman Filtering

3. Filtering II: Estimation – The Big Picture

4. Uncertainty II: Information Theory

5. Model Inference I: Symbolic Dynamics and the Thermodynamic Formalism

6. Model Inference II: Probabilistic Grammatical Inference

7. Uncertainty III: Representations of Uncertainty

8. Decision under Uncertainty: Plausible Inference


Probability and Stochastic ProcessesLecture 1 Agenda

5Lecture 1

May 15, 2011

1. What is probability?

a) Frequentist Interpretation

b) Bayesian Interpretation

2. Calculus of Probability

a) Probability Spaces

b) Kolmogorov’s Axioms

c) Conditioning and Bayes’ Theorem

3. Random Variables (RVs)

a) Distribution and Density Functions

b) Joint and Marginal Distributions

c) Expectation and Moments

4. Stochastic Processes

a) Stationarity

b) Ergodicity


Probabilities of events are associated with their relative frequencies in a long run of trials: Associated with random physical systems (e.g., dice), Makes sense only in the context of well defined situations.

Frequentism If nA is the number of occurances of event A in n trials, then

The odds of getting “Heads” in a fair coin toss is ½ because its been demonstrated empirically not because there are two equally likely events.

Propensity Theory Interpret probability as the “propensity” for an events occurance, Explain long run frequencies via the Law of Large Numbers (LLN);

Frequentist InterpretationPhysical Interpretation

6Lecture 1

May 15, 2011

P rfAg= limn! 1

nA

n= p

P rn¯

¯¯nA

n¡ p

¯¯¯· "

o! 1 as n ! 1


Probability can be assigned to any statement whatsoever, even when no random process is involved:

Represents subjective plausibility or the degree to which the statement is supported by available evidence,

Interpreted as a “measure of a state of knowledge.” Bayesian approach specifies a prior probability which is then

updated in the light of new information. Objectivism

Bayesian statistics can be justified by the requirements of rationality and consistency and interpreted as an extension of logic,

Not dependent upon belief. Subjectivism

Probability is regarded as a measure of the degree of belief of the individual assessing the uncertainty of a particular situation,

Rationality and consistency constrain the probabilities.

Bayesian InterpretationEvidentiary Interpretation

7Lecture 1

May 15, 2011


A probability space is a triple (, F,½) is the set of all possible outcomes, known as the sample space, F is a set of events, where each event is a subset of containing

zero or more outcomes, F must form a ¾-algebra under complementation and intersection,

½ is a measure of the probability of an event and is called a probability measure.

Example: Pairs of (fair) coin tosses:

Describes processes containing states that occur

randomly.

Probability SpacesBasis for Axiomatic Probability Theory

8Lecture 1

May 15, 2011

= fHH;HT;TH;TTg

F =f? ;fHH g;fHTg;fTHg;fT;Tg;fH H;TTg;fHT;THg;

fHH;HT;THg;fHT;TH;TTg;fHH;HT;TH;TTgg½(E ) = jE j=j j


Constraints on ½ are needed to ensure consistency Kolmogorov Axioms:

Non-negativity: The probability of an event is a non-negative real number ( for all events )

Unit measure: The probability that some event in the entire sample space will occur is 1 ( ),

¾-additivity: The probability of the union (sum) of a collection of non-intersecting sets is equal to the sum of the probabilities of each of the sets (whenever is a sequence of pairwise disjoint sets in F such that is also in F, then ).

A measure which satisfies these axioms is known as a probability measure.

Not the only set of probability axioms, merely the most common.

Kolmogorov AxiomsThe Probability Axioms

9Lecture 1

May 15, 2011

½(E ) ¸ 0 E 2 F

½( ) = 1

fAngS 1n=1 An ½(

S 1n=1 An) =

P 1n=1 ½(An)


Assume that if an event E occurs we will only know the event that has occurred is in E (E 2 E). What is the probability that E has occurred given E has occurred? If E is true, then its complement is false . The relative probabilities of events in E remain unchanged, i.e., if

E1, E2 µ E, with ½(E2) ¸ 0, then

A little bit of algebra yields

We call ½(E|E) the conditional probability of E and say that it is conditioned on E.

ConditioningBasic Probabilistic Inference

10Lecture 1

May 15, 2011

¹E ½( ¹EjE) = 0

½(E1)½(E2)

=½(E1jE)½(E2jE)

½(E jE) =½(E \ E)

½(E)

EE

E \ E


Consider a biased coin that has a bias (towards heads) of either bH = 2/3 or bT = 1/3; Assume bT is deemed more likely with ½(bT ) = 0.99 Assume coin is tossed 25 times and heads comes up 19 times…

We’ve probably made a bad assumption and would like to update the probability based upon new information: 226 possible events (225 for two biases), The prior probability of a coin having a bias bT and getting a

particular sequence En with n heads is

Conditioning ExampleConverging to the right answer…

11Lecture 1

May 15, 2011

½(bT \ En) = 0:99µ

13

¶n µ23

¶25¡ n


Thus, given 19 heads, we have

and

Conditioned on seeing the sequence E19, the probability that the coin has bias bT is thus

Conditioning Example (Cont.)Converging to the right answer…

12Lecture 1

May 15, 2011

½(E19) = ½(bT \ E19) + ½(bH \ E19)

½(bT ) =0:99(1=3)19(2=3)6

½(E19)=

9999+ 213 ¼0:01

½(bT \ E19) = 0:99µ

13

¶19 µ23

¶6

½(bH \ E19) = 0:01µ

23

¶19 µ13

¶6


Bayes’ Rule: for ½(E), ½(E) > 0

One the most widely used results in probability theory as it provides a fundamental mechanism for updating beliefs.

Example, consider a test known to be 99% reliable. If this test indicates that an event E has occurred, how likely is it that event has occurred? What does 99% reliable mean? Assume that it means

– 99% of the time E occurs, the test indicates correctly that it has occurred (false negative rate), and,

– 99% of the time that E does not occur, the source correctly indicates that it does not occur (false positive rate).

Bayes’ RuleInverse Conditioning

13Lecture 1

May 15, 2011

½(E jE) =½(EjE )½(E )

½(E)


Let P be the event that the test indicates that E has occurred. By Bayes’ rule we have

Since the (positive) reliability is 99%, we have ½(P|E) = 0.99, Note, we cannot compute the ½(E|P) without additional information,

i.e., ½(E) . Though it looks like we also need ½(P) , we can in fact construct

this from the reliability (positive and negative):

where

Bayes’ Rule Example (Cont.)The Reliability of Tests

14Lecture 1

May 15, 2011

½(E jP ) =½(P jE )½(E )

½(P )

½(P ) = ½(E \ P ) + ½( ¹E \ P )

½(E \ P ) = ½(P jE )½(E ) = 0:99½(E )

½( ¹E \ P ) = ½(P j ¹E )½( ¹E ) =¡1¡ ½( ¹P j ¹E )

¢(1¡ ½(E ))

= 0:01(1¡ ½(E ))


hence

Substituting into Bayes’ rule produces

N.B. We cannot determine the probability of event E conditioned on a positive test result P without knowing the probability of E, i.e., ½(E).

Bayes’ Rule (Cont.)The Reliability of Tests

15Lecture 1

May 15, 2011

½(P ) = 0:99½(E ) + 0:01¡ 0:01½(E )

= 0:01+ 0:98½(E )

½(E jP ) =0:99½(E )

0:01+ 0:98½(E )

r(E) r(E|P)

0.01 0.5

0.001 ≈ 0.9

0.3333… 0.98

Only if the event is very common does the reliability approximate the probability!


A random variable (RV) x is a process of assigning a number x(E) to every event E. This function must satisfy two conditions: The set {x · x} is an event for every x. The probabilities of the events x = 1 and x = -1 equals zero.

The key observation here is that random variables provide a tool for structuring sample spaces.

In many cases, some decision or diagnosis must be made upon the basis of expectation and RVs play a key role in computing these expectations.

Note that a random variable does not have a value per se. A realization of a random variable does, however have a definite value.

Random VariablesAlea iacta est

16Lecture 1

May 15, 2011


The elements of the set S that are contained in the event {x · x} change as the number x takes on different values. The probability Pr{x · x} of the event {x · x} is, therefore, a number that depends on x. This number is expressed in terms of a (cumulative) distribution function of the random variable x and is denoted Fx(x). Formally, we say Fx(x)= Pr{x · x} for every x. The derivative

is known as the density function and is closely related to the measure ½ introduced earlier. For our purposes, we will treat this density function as the specification of ½.

Probability DistributionsSpreading the Wealth Around

17Lecture 1

May 15, 2011

f x(x) =dFx(x)

dx


Distribution ExampleNormal Distribution

18Lecture 1

May 15, 2011

Fx(x) =12

·1+ erf

µx ¡ ¹p

2¾2

¶¸

CDF: f x(x) =1

p2¼¾2

e¡ ( x ¡ ¹ ) 2

2¾2PDF:

Distributions can be discrete, continuous, or hybrid

Discrete CDF

Continuous CDF

Hybrid CDF


A probability distribution that is a function of multiple RVs is a multivariate distribution and is defined by the joint distribution

and the associated joint density function can be determined via partial differentiation

The probability that an event lies within a domain is thus

Joint DistributionsBuilding Up Multivariate Distributions

19Lecture 1

May 15, 2011

F (x1; : : : ;xN ) = Prfx1 · x1; : : :;xN · xN g

f (x1; : : : ;xN ) =@N F

@x1 : : :@xN

¯¯¯¯x

Pr(x1; : : :;xN 2 D) =Z

D

f x1 ;:::;xN (x1; : : : ;xN )dx1; : : : ;dxN


We say that the RVs are independent if

The statistics of a subset of the random variables a multivariate distribution are known as marginal statistics. The associated distributions are known as marginal distributions and are defined

The distribution of the marginal variables is said to be obtained by marginalizing over the distribution of the variables being discarded and the discarded variables are said to have been marginalized out.

Independence and Marginal DistributionsBreaking Down Multivariate Distributions

20Lecture 1

May 15, 2011

f x1 ;:::;xN (x1; : : :;xN ) = f x1 (x1); : : :; f xN (xN )

f xi (xi ) =Z

D 1

¢¢¢Z

D i ¡ 1

Z

D i + 1

¢¢¢Z

D N

f x1 ;:::;xN (x1; : : :;xN )dx1 ¢¢¢dxi ¡ 1dxi+1 ¢¢¢dxN


Example Multivariate DistributionsVisualizing Joint and Marginal Distributions

21Lecture 1

May 15, 2011

Marginal Distributions

Joint Distribution

Marginal distributions are projections of the joint distribution


The expected value, or mean of an RV x is defined by the integral

This is commonly denoted hx or just h. For RVs of discrete (lattice) type, we obtain the

expected value via the sum

The conditional mean or conditional expected value is obtained by replacing fx(x) with the conditional density f(x|E)

Expected ValuesWhat did you expect?

22Lecture 1

May 15, 2011

E fxg=

1Z

¡ 1

xf x(x)dx

E fxg=X

i

pi xi pi = Prfx = xi g

E fxjEg=

1Z

¡ 1

xf (xjE)dx


The variance is defined by the integral

The constant ¾, also denoted ¾x, is called the standard deviation of x,

The variance measures the concentration of probability mass near the mean h.

This is the second (central) moment of the distribution, other moments of interest are: Moments: Central Moments: Absolute Moments: Generalized Moments:

Variance and Higher MomentsConcentration and Distortion

23Lecture 1

May 15, 2011

¾2 =

1Z

¡ 1

(x ¡ ´)2f x(x)dx

mn = E fxng=R1

¡ 1 xn f (x)dx¹ n = E f (x ¡ ´)ng=

R1¡ 1 (x ¡ ´)nf (x)dx

E f jxjng E f jx ¡ ´jng

E f (x ¡ a)ng E f jx ¡ ajng


A stochastic process x(t) is a rule for assigning a function x(t,E) to every event E.

We shall denote stochastic processes by x(t) and hence x(t) can be interpreted several ways: A family, or ensemble, of functions x(t,E) [t and E are variable], A single time function (or sample of the process) [E is fixed], A random variable [t is fixed], A number [t and E are fixed].

Examples: Brownian motion - x(t) consists of the motion of all particles

(ensemble), A realization x(t,Ei) is the motion of a specific particle, Phasor with random amplitude and phase is a

family of pure sine waves and a single sample of

Stochastic ProcessesGeneralized RVs

24Lecture 1

May 15, 2011

x(t) = r cos(! t + Á)

x(t;E i ) = r(E i ) cos(! t + Á(E i ))


First order properties: For a specific t, x(t) is an RV with distribution

F(x,t) is called the first-order distribution of x(t). Its derivative w.r.t. x is called the first-order density of x(t);

Second-Order properties: The mean h(t) of x(t) is the expected value of the RV x(t);

The autocorrelation R(t1,t2) of x(t) is the expected value of the product x(t1)x(t2):

The autocovariance C(t1,t2) of x(t) is the covariance of the RVs x(t1) and x(t2):

Statistics of Stochastic ProcessesTime Dependence

25Lecture 1

May 15, 2011

F (x;t) = Prfx(t) · xg

f (x;t) =@F (x;t)

@x

´(t) = E fx(t)g=R1

¡ 1 xf (x;t)dx

R(t1; t2) = E fx(t1)x(t2)g=R1

¡ 1 x1x2f (x1;x2;t1; t2)dx1dx2

C(t1;t2) = R(t1; t2) ¡ ´(t1)´(t2)


A stochastic process x(t) is called strict-sense stationary (SSS) if its statistical properties are invariant to a shift of the origin:

for any c. A stochastic process is called wide-sense stationary

(WSS) if its mean is constant

And its autocorrelation depends only on ¿ = t1 – t2

A SSS process is WSS. Stationarity basically says that the statistical

properties don’t evolve in time.

StationarityHard to hit a moving target

26Lecture 1

May 15, 2011

f (x1; : : : ;xn ; t1; : : : ; tn) = f (x1; : : :;xn ; t1 + c;:: : ;tn + c)

E fx(t)g= ´

E fx(t + ¿)x¤(t)g= R(¿)


Ergodicity is a property connected with the homogeneity of a process. A process whose time average is the same as its space or ensemble average is said to be mean-ergodic. This definition can be extended to include other statistics as well

(e.g., covariance), This is also a measure of how well the process “mixes.”

Example: Brownian motion The average motion of a specific particle will tend toward the

ensemble average of all of the particle’s motion. Ergodicity is important because it tells us how long or

how often a process must be sampled in order for its statistics to be estimated.

ErgodicityMixing it up

27Lecture 1

May 15, 2011


Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, Third Ed., McGraw-Hill, New York, NY, 1991.

E.T. Jaynes, Probability: The Logic of Science, Cambridge University Press, Cambridge, UK, 2003.

Eugene Wong, Stochastic Processes in Information and Dynamical Systems, McGraw-Hill, New York, NY, 1971.

ReferencesSome Good Books….

28Lecture 1

May 15, 2011

modeling and estimation of uncertain systems lecture 1: uncertainty i: probability and stochastic...

Documents