modeling and estimation of uncertain systems lecture 1: uncertainty i: probability and stochastic...
TRANSCRIPT
Modeling and Estimation of Uncertain Systems
Lecture 1:
Uncertainty I: Probability and Stochastic Processes
Modeling and Estimation of Uncertain Systems
Two distinct activities: Modeling is the activity of constructing a mathematical description of
a system of interest and encompasses specification of model structure and its parameterization,
Estimation is concerned with determining the “state” of a system relative to some model.
Broadly speaking, there are four possible cases: Well defined system, rich data source(s), Poorly defined system, rich data source(s), Well defined system, sparse data, Poorly defined system, sparse data.
Classification characterized by amount of a priori information that can be embedded in model and amount of data available for inference.
Modeling and EstimationWhat is the problem?
2Lecture 1
May 15, 2011
Increasing difficulty
Modeling and Estimation of Uncertain Systems
Uncertainty permeates every aspect of this problem: What parts of the system are important? What are the right descriptions of the constituents? What is the
right way to describe their interactions? What are the available observations? What are the dynamics of the
observation processes?.... Two types of uncertainty:
Epistemic: Uncertainty due to a lack of knowledge of quantities or processes of the system or the environment. AKA subjective uncertainty, reducible uncertainty, or model form uncertainty.
Aleatory: Inherent variation associated with the physical system or the environment. AKA variability, irreducible uncertainty, or stochastic uncertainty.
Many different “uncertainty” theories, each with their own strengths and weaknesses.
Uncertainty“What” is the problem!
3Lecture 1
May 15, 2011
Modeling and Estimation of Uncertain Systems
SyllabusLecture Series Agenda
4Lecture 1
May 15, 2011
1. Uncertainty I: Probability and Stochastic Processes
2. Filtering I: Kalman Filtering
3. Filtering II: Estimation – The Big Picture
4. Uncertainty II: Information Theory
5. Model Inference I: Symbolic Dynamics and the Thermodynamic Formalism
6. Model Inference II: Probabilistic Grammatical Inference
7. Uncertainty III: Representations of Uncertainty
8. Decision under Uncertainty: Plausible Inference
Modeling and Estimation of Uncertain Systems
Probability and Stochastic ProcessesLecture 1 Agenda
5Lecture 1
May 15, 2011
1. What is probability?
a) Frequentist Interpretation
b) Bayesian Interpretation
2. Calculus of Probability
a) Probability Spaces
b) Kolmogorov’s Axioms
c) Conditioning and Bayes’ Theorem
3. Random Variables (RVs)
a) Distribution and Density Functions
b) Joint and Marginal Distributions
c) Expectation and Moments
4. Stochastic Processes
a) Stationarity
b) Ergodicity
Modeling and Estimation of Uncertain Systems
Probabilities of events are associated with their relative frequencies in a long run of trials: Associated with random physical systems (e.g., dice), Makes sense only in the context of well defined situations.
Frequentism If nA is the number of occurances of event A in n trials, then
The odds of getting “Heads” in a fair coin toss is ½ because its been demonstrated empirically not because there are two equally likely events.
Propensity Theory Interpret probability as the “propensity” for an events occurance, Explain long run frequencies via the Law of Large Numbers (LLN);
Frequentist InterpretationPhysical Interpretation
6Lecture 1
May 15, 2011
P rfAg= limn! 1
nA
n= p
P rn¯
¯¯nA
n¡ p
¯¯¯· "
o! 1 as n ! 1
Modeling and Estimation of Uncertain Systems
Probability can be assigned to any statement whatsoever, even when no random process is involved:
Represents subjective plausibility or the degree to which the statement is supported by available evidence,
Interpreted as a “measure of a state of knowledge.” Bayesian approach specifies a prior probability which is then
updated in the light of new information. Objectivism
Bayesian statistics can be justified by the requirements of rationality and consistency and interpreted as an extension of logic,
Not dependent upon belief. Subjectivism
Probability is regarded as a measure of the degree of belief of the individual assessing the uncertainty of a particular situation,
Rationality and consistency constrain the probabilities.
Bayesian InterpretationEvidentiary Interpretation
7Lecture 1
May 15, 2011
Modeling and Estimation of Uncertain Systems
A probability space is a triple (, F,½) is the set of all possible outcomes, known as the sample space, F is a set of events, where each event is a subset of containing
zero or more outcomes, F must form a ¾-algebra under complementation and intersection,
½ is a measure of the probability of an event and is called a probability measure.
Example: Pairs of (fair) coin tosses:
Describes processes containing states that occur
randomly.
Probability SpacesBasis for Axiomatic Probability Theory
8Lecture 1
May 15, 2011
= fHH;HT;TH;TTg
F =f? ;fHH g;fHTg;fTHg;fT;Tg;fH H;TTg;fHT;THg;
fHH;HT;THg;fHT;TH;TTg;fHH;HT;TH;TTgg½(E ) = jE j=j j
Modeling and Estimation of Uncertain Systems
Constraints on ½ are needed to ensure consistency Kolmogorov Axioms:
Non-negativity: The probability of an event is a non-negative real number ( for all events )
Unit measure: The probability that some event in the entire sample space will occur is 1 ( ),
¾-additivity: The probability of the union (sum) of a collection of non-intersecting sets is equal to the sum of the probabilities of each of the sets (whenever is a sequence of pairwise disjoint sets in F such that is also in F, then ).
A measure which satisfies these axioms is known as a probability measure.
Not the only set of probability axioms, merely the most common.
Kolmogorov AxiomsThe Probability Axioms
9Lecture 1
May 15, 2011
½(E ) ¸ 0 E 2 F
½( ) = 1
fAngS 1n=1 An ½(
S 1n=1 An) =
P 1n=1 ½(An)
Modeling and Estimation of Uncertain Systems
Assume that if an event E occurs we will only know the event that has occurred is in E (E 2 E). What is the probability that E has occurred given E has occurred? If E is true, then its complement is false . The relative probabilities of events in E remain unchanged, i.e., if
E1, E2 µ E, with ½(E2) ¸ 0, then
A little bit of algebra yields
We call ½(E|E) the conditional probability of E and say that it is conditioned on E.
ConditioningBasic Probabilistic Inference
10Lecture 1
May 15, 2011
¹E ½( ¹EjE) = 0
½(E1)½(E2)
=½(E1jE)½(E2jE)
½(E jE) =½(E \ E)
½(E)
EE
E \ E
Modeling and Estimation of Uncertain Systems
Consider a biased coin that has a bias (towards heads) of either bH = 2/3 or bT = 1/3; Assume bT is deemed more likely with ½(bT ) = 0.99 Assume coin is tossed 25 times and heads comes up 19 times…
We’ve probably made a bad assumption and would like to update the probability based upon new information: 226 possible events (225 for two biases), The prior probability of a coin having a bias bT and getting a
particular sequence En with n heads is
Conditioning ExampleConverging to the right answer…
11Lecture 1
May 15, 2011
½(bT \ En) = 0:99µ
13
¶n µ23
¶25¡ n
Modeling and Estimation of Uncertain Systems
Thus, given 19 heads, we have
and
Conditioned on seeing the sequence E19, the probability that the coin has bias bT is thus
Conditioning Example (Cont.)Converging to the right answer…
12Lecture 1
May 15, 2011
½(E19) = ½(bT \ E19) + ½(bH \ E19)
½(bT ) =0:99(1=3)19(2=3)6
½(E19)=
9999+ 213 ¼0:01
½(bT \ E19) = 0:99µ
13
¶19 µ23
¶6
½(bH \ E19) = 0:01µ
23
¶19 µ13
¶6
Modeling and Estimation of Uncertain Systems
Bayes’ Rule: for ½(E), ½(E) > 0
One the most widely used results in probability theory as it provides a fundamental mechanism for updating beliefs.
Example, consider a test known to be 99% reliable. If this test indicates that an event E has occurred, how likely is it that event has occurred? What does 99% reliable mean? Assume that it means
– 99% of the time E occurs, the test indicates correctly that it has occurred (false negative rate), and,
– 99% of the time that E does not occur, the source correctly indicates that it does not occur (false positive rate).
Bayes’ RuleInverse Conditioning
13Lecture 1
May 15, 2011
½(E jE) =½(EjE )½(E )
½(E)
Modeling and Estimation of Uncertain Systems
Let P be the event that the test indicates that E has occurred. By Bayes’ rule we have
Since the (positive) reliability is 99%, we have ½(P|E) = 0.99, Note, we cannot compute the ½(E|P) without additional information,
i.e., ½(E) . Though it looks like we also need ½(P) , we can in fact construct
this from the reliability (positive and negative):
where
Bayes’ Rule Example (Cont.)The Reliability of Tests
14Lecture 1
May 15, 2011
½(E jP ) =½(P jE )½(E )
½(P )
½(P ) = ½(E \ P ) + ½( ¹E \ P )
½(E \ P ) = ½(P jE )½(E ) = 0:99½(E )
½( ¹E \ P ) = ½(P j ¹E )½( ¹E ) =¡1¡ ½( ¹P j ¹E )
¢(1¡ ½(E ))
= 0:01(1¡ ½(E ))
Modeling and Estimation of Uncertain Systems
hence
Substituting into Bayes’ rule produces
N.B. We cannot determine the probability of event E conditioned on a positive test result P without knowing the probability of E, i.e., ½(E).
Bayes’ Rule (Cont.)The Reliability of Tests
15Lecture 1
May 15, 2011
½(P ) = 0:99½(E ) + 0:01¡ 0:01½(E )
= 0:01+ 0:98½(E )
½(E jP ) =0:99½(E )
0:01+ 0:98½(E )
r(E) r(E|P)
0.01 0.5
0.001 ≈ 0.9
0.3333… 0.98
Only if the event is very common does the reliability approximate the probability!
Modeling and Estimation of Uncertain Systems
A random variable (RV) x is a process of assigning a number x(E) to every event E. This function must satisfy two conditions: The set {x · x} is an event for every x. The probabilities of the events x = 1 and x = -1 equals zero.
The key observation here is that random variables provide a tool for structuring sample spaces.
In many cases, some decision or diagnosis must be made upon the basis of expectation and RVs play a key role in computing these expectations.
Note that a random variable does not have a value per se. A realization of a random variable does, however have a definite value.
Random VariablesAlea iacta est
16Lecture 1
May 15, 2011
Modeling and Estimation of Uncertain Systems
The elements of the set S that are contained in the event {x · x} change as the number x takes on different values. The probability Pr{x · x} of the event {x · x} is, therefore, a number that depends on x. This number is expressed in terms of a (cumulative) distribution function of the random variable x and is denoted Fx(x). Formally, we say Fx(x)= Pr{x · x} for every x. The derivative
is known as the density function and is closely related to the measure ½ introduced earlier. For our purposes, we will treat this density function as the specification of ½.
Probability DistributionsSpreading the Wealth Around
17Lecture 1
May 15, 2011
f x(x) =dFx(x)
dx
Modeling and Estimation of Uncertain Systems
Distribution ExampleNormal Distribution
18Lecture 1
May 15, 2011
Fx(x) =12
·1+ erf
µx ¡ ¹p
2¾2
¶¸
CDF: f x(x) =1
p2¼¾2
e¡ ( x ¡ ¹ ) 2
2¾2PDF:
Distributions can be discrete, continuous, or hybrid
Discrete CDF
Continuous CDF
Hybrid CDF
Modeling and Estimation of Uncertain Systems
A probability distribution that is a function of multiple RVs is a multivariate distribution and is defined by the joint distribution
and the associated joint density function can be determined via partial differentiation
The probability that an event lies within a domain is thus
Joint DistributionsBuilding Up Multivariate Distributions
19Lecture 1
May 15, 2011
F (x1; : : : ;xN ) = Prfx1 · x1; : : :;xN · xN g
f (x1; : : : ;xN ) =@N F
@x1 : : :@xN
¯¯¯¯x
Pr(x1; : : :;xN 2 D) =Z
D
f x1 ;:::;xN (x1; : : : ;xN )dx1; : : : ;dxN
Modeling and Estimation of Uncertain Systems
We say that the RVs are independent if
The statistics of a subset of the random variables a multivariate distribution are known as marginal statistics. The associated distributions are known as marginal distributions and are defined
The distribution of the marginal variables is said to be obtained by marginalizing over the distribution of the variables being discarded and the discarded variables are said to have been marginalized out.
Independence and Marginal DistributionsBreaking Down Multivariate Distributions
20Lecture 1
May 15, 2011
f x1 ;:::;xN (x1; : : :;xN ) = f x1 (x1); : : :; f xN (xN )
f xi (xi ) =Z
D 1
¢¢¢Z
D i ¡ 1
Z
D i + 1
¢¢¢Z
D N
f x1 ;:::;xN (x1; : : :;xN )dx1 ¢¢¢dxi ¡ 1dxi+1 ¢¢¢dxN
Modeling and Estimation of Uncertain Systems
Example Multivariate DistributionsVisualizing Joint and Marginal Distributions
21Lecture 1
May 15, 2011
Marginal Distributions
Joint Distribution
Marginal distributions are projections of the joint distribution
Modeling and Estimation of Uncertain Systems
The expected value, or mean of an RV x is defined by the integral
This is commonly denoted hx or just h. For RVs of discrete (lattice) type, we obtain the
expected value via the sum
The conditional mean or conditional expected value is obtained by replacing fx(x) with the conditional density f(x|E)
Expected ValuesWhat did you expect?
22Lecture 1
May 15, 2011
E fxg=
1Z
¡ 1
xf x(x)dx
E fxg=X
i
pi xi pi = Prfx = xi g
E fxjEg=
1Z
¡ 1
xf (xjE)dx
Modeling and Estimation of Uncertain Systems
The variance is defined by the integral
The constant ¾, also denoted ¾x, is called the standard deviation of x,
The variance measures the concentration of probability mass near the mean h.
This is the second (central) moment of the distribution, other moments of interest are: Moments: Central Moments: Absolute Moments: Generalized Moments:
Variance and Higher MomentsConcentration and Distortion
23Lecture 1
May 15, 2011
¾2 =
1Z
¡ 1
(x ¡ ´)2f x(x)dx
mn = E fxng=R1
¡ 1 xn f (x)dx¹ n = E f (x ¡ ´)ng=
R1¡ 1 (x ¡ ´)nf (x)dx
E f jxjng E f jx ¡ ´jng
E f (x ¡ a)ng E f jx ¡ ajng
Modeling and Estimation of Uncertain Systems
A stochastic process x(t) is a rule for assigning a function x(t,E) to every event E.
We shall denote stochastic processes by x(t) and hence x(t) can be interpreted several ways: A family, or ensemble, of functions x(t,E) [t and E are variable], A single time function (or sample of the process) [E is fixed], A random variable [t is fixed], A number [t and E are fixed].
Examples: Brownian motion - x(t) consists of the motion of all particles
(ensemble), A realization x(t,Ei) is the motion of a specific particle, Phasor with random amplitude and phase is a
family of pure sine waves and a single sample of
Stochastic ProcessesGeneralized RVs
24Lecture 1
May 15, 2011
x(t) = r cos(! t + Á)
x(t;E i ) = r(E i ) cos(! t + Á(E i ))
Modeling and Estimation of Uncertain Systems
First order properties: For a specific t, x(t) is an RV with distribution
F(x,t) is called the first-order distribution of x(t). Its derivative w.r.t. x is called the first-order density of x(t);
Second-Order properties: The mean h(t) of x(t) is the expected value of the RV x(t);
The autocorrelation R(t1,t2) of x(t) is the expected value of the product x(t1)x(t2):
The autocovariance C(t1,t2) of x(t) is the covariance of the RVs x(t1) and x(t2):
Statistics of Stochastic ProcessesTime Dependence
25Lecture 1
May 15, 2011
F (x;t) = Prfx(t) · xg
f (x;t) =@F (x;t)
@x
´(t) = E fx(t)g=R1
¡ 1 xf (x;t)dx
R(t1; t2) = E fx(t1)x(t2)g=R1
¡ 1 x1x2f (x1;x2;t1; t2)dx1dx2
C(t1;t2) = R(t1; t2) ¡ ´(t1)´(t2)
Modeling and Estimation of Uncertain Systems
A stochastic process x(t) is called strict-sense stationary (SSS) if its statistical properties are invariant to a shift of the origin:
for any c. A stochastic process is called wide-sense stationary
(WSS) if its mean is constant
And its autocorrelation depends only on ¿ = t1 – t2
A SSS process is WSS. Stationarity basically says that the statistical
properties don’t evolve in time.
StationarityHard to hit a moving target
26Lecture 1
May 15, 2011
f (x1; : : : ;xn ; t1; : : : ; tn) = f (x1; : : :;xn ; t1 + c;:: : ;tn + c)
E fx(t)g= ´
E fx(t + ¿)x¤(t)g= R(¿)
Modeling and Estimation of Uncertain Systems
Ergodicity is a property connected with the homogeneity of a process. A process whose time average is the same as its space or ensemble average is said to be mean-ergodic. This definition can be extended to include other statistics as well
(e.g., covariance), This is also a measure of how well the process “mixes.”
Example: Brownian motion The average motion of a specific particle will tend toward the
ensemble average of all of the particle’s motion. Ergodicity is important because it tells us how long or
how often a process must be sampled in order for its statistics to be estimated.
ErgodicityMixing it up
27Lecture 1
May 15, 2011
Modeling and Estimation of Uncertain Systems
Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, Third Ed., McGraw-Hill, New York, NY, 1991.
E.T. Jaynes, Probability: The Logic of Science, Cambridge University Press, Cambridge, UK, 2003.
Eugene Wong, Stochastic Processes in Information and Dynamical Systems, McGraw-Hill, New York, NY, 1971.
ReferencesSome Good Books….
28Lecture 1
May 15, 2011