beamer lecture

7/29/2019 Beamer Lecture

1/43

Stochastic Signal Processing

Arnab Ghoshal

Spoken Language SystemsSaarland University

November 09, 2009

Arnab Ghoshal Stochastic Signal Processing
http://find/


2/43

Outline

1 Introduction to probability

2 Random variables, probability distributions

3 Moments of random variables

4

Random vectors5 Random sequences

6 Random processes

Coverage: Stark & Woods: Probability and Random Processeswith Applications to Signal Processing, Chapters 1-7

http://find/http://goback/


3/43

Sigma fields

For a set , a subset F of the power-set2 forms a -field (or-algebra) if it satisfies the following properties:

1 Non-empty: F, and F

2 Closed under complement: Ec

F,

E

F

3 Closed under countable unions and intersections:

i=1Ei F and i=1Ei F, Ei F

For example: If is the sample space of an experiment thenthe subsets of are called events and form a -field.



4/43

Axiomatic definition of probability

Given a sample space , and a -field F of events defined on, we define probability Pr[] as a measure on each event

E F, such that:1 Pr[E] 02 Pr[] = 13 Pr[E F] = Pr[E] + Pr[F], if E F =

The triplet P= (,F, Pr) is called a probability space.

It follows from the axiomatic definition

Pr[E F] = Pr[E] + Pr[F] Pr[E F]

Pr Nn=1En = Nn=1 Pr[En], if Ei Ej = for all i = jThis can be extended to the case of countable additivity:

Pr

n=1

En

=

n=1

Pr[En], if i = j, Ei Ej = .



5/43

Conditional probability

Given a probability space P= (,F, Pr), and events

E, F F, we define:

1 Joint Probability: Pr[E, F] = Pr[E F]

2 Conditional probability: Pr[E|F] = Pr[E,F]Pr[F]

3

Events E and F are independentiff:

Pr[E, F] = Pr[E] Pr[F] Pr[E|F] = Pr[E] Pr[F|E] = Pr[F]

4 If E1, . . . ,En are exhaustive (i.e. ni=1Ei = ) and mutually

exclusive (i.e. Ei Ej = , i = j), then:

Pr[F] = Pr[F|E1]Pr[E1] + . . . + Pr[F|En]Pr[En]



7/43

Random Variable

A real Random Variable X() is a mapping from to the realline (i.e. X : ), which assigns a number X() to everyoutcome , and satisfies the following properties:

1 For every Borel set B making up the -field B on ,the set EB { : X() B} is an event

2 Pr[X = ] = Pr[X = +] = 0

Note that:

An RV can be complex: Z() = X() + jY(), where X andY are real RVs.

Property implies that the set { : X() x} is an eventfor every x.

An RV can also be thought of as a mapping between

probability spaces, i.e. X : (,F,Pr) (,B,PrX)


P b bili Di ib i F i


8/43

Probability Distribution Function

The distribution function F() of the RV X is defined as:

FX(x) = Pr[{ : X() x}]

Properties:

1 FX() = 1, FX() = 0

2

FX(x) is a non-decreasing function of x, i.e. if x1 x2 thenFX(x1) FX(x2)

3 FX(x) is continuous from the right, i.e.

FX(x) = lim0

FX(x + )

4 If FX(x0) = 0 then FX(x) = 0 for every x x05 Pr[X > x] = 1 FX(x)

6 Pr[x1 < X x2] = FX(x2) FX(x1)


P b bilit D it F ti ( df)


9/43

Probability Density Function (pdf)

If FX(x) is continuous and differentiable, then the pdf isdefined as

fX(x) dFX(x)

dx= lim

x0

Pr[x X x + x]

x

If X is discrete and Pr[X = xi] = pi, then

fX(x) =

i

pi(x xi), where (x) =

1, for x = 00, otherwise

Properties:

1 fX(x) 02 FX(x) =

x

fX(y)dy

3fX(y)dy = FX() FX() = 1

4 Pr[x1 < X x2] = FX(x2) FX(x1) = x2x1 fX(y)dyArnab Ghoshal Stochastic Signal Processing

C diti l di t ib ti d d iti


10/43

Conditional distributions and densities

The conditional distribution functionof X : given theevent B is defined as:

FX(x|B) Pr[X x,B]

Pr[B],

where Pr[X x,B] = Pr[{ B : X() x}].

Similarly, the conditional densityf(x|B) is defined as:

f(x|B) dFX(x|B)

dx

For exhaustive and mutually exclusive B1, . . . ,Bn,

FX(x) = FX(x|B1) Pr[B1] + FX(x|B2) Pr[B2] + . . . + FX(x|Bn) Pr[Bn]

fX(x) = fX(x|B1) Pr[B1] + fX(x|B2)Pr[B2] + . . . + fX(x|Bn)Pr[Bn]


B th d t t l b bilit f df


12/43

Joint distributions and densities

The joint distribution of two RVs X and Y is defined as:

FXY(x,y) Pr[X x, Y y].

For continuous and differentiable FXY(x,y), the joint pdf is:

fXY(x,y) =2

x y

FXY(x,y)

FXY(x,y) =

x

y

fXY(, ) d d

Statistics for each of the RVs are called marginals:

FX(x) = FXY(x, ) FY(y) = FXY(,y)

fX(x) =

fXY(x,y) dy fY(y) =

fXY(x,y) dx


Functions of random variables


13/43

Functions of random variables

For a real RV X and a function g() defined on the reals, the

expression Y = g(X) is a new RV, defined as follows:For every , X() is a real number, and g(X()) Y()is another real number assigned to by the RV Y.

The events { : Y() y} { : g(X()) y}, and

FY(y) = Pr[Y y] = Pr[g(X) y]

pdf of Y = g(X): If y = g(x) has n real roots x1, . . . ,xn, then,

fY(y) =

ni=1

fX(xi)|g(xi)|

g(xi) = 0.


Expected value of an RV


14/43

Expected value of an RV

The expected value (or mean) of an RV is defined as:

E[X]

x fX(x) dx = X

The expected value of Y = g(X) is:

E[g(X)] =

g(x) fX(x) dx,

The linearityof expectation follows from its definition:

E

n

i=1

i gi(X)

=

ni=1

i E[gi(X)]


Variance and Covariance


15/43

Variance and Covariance

The variance of an RV X is defined as:

2

X

(x X

)2fX

(x) dx

For two RVs X and Y, the covariance is defined as:

Cov[X, Y] E[(X X)(Y Y)] = E[XY] XY,

where E[XY] is called the correlationof X and Y.

Cauchy-Schwarz inequality:

Cov[X, Y] E[(X X)2]E[(Y Y)2]Expectation and variance may not exist; e.g Cauchy pdf:

fX(x) =1

(x2 + 1)


Independence and correlation


16/43

Independence and correlation

Two random variables X and Y are said to be independentif their joint distributions and densities can be factorized as

the product of the marginals:

FXY(x,y) = FX(x)FY(y) fXY(x,y) = fX(x)fY(y)

RVs X & Y are said to be uncorrelatedif

Cov[X, Y] = 0 E[XY] = E[X]E[Y]

Independence uncorrelated, but not vice versa

For uncorrelated X & Y: 2X+Y = 2

X + 2Y


Random vectors


17/43

Random vectors

A random vector is a vector X = [X1, . . . ,Xn]T, whose

components Xi are random variables

The probability distribution function of X is the jointdistributionof the RVs Xi:

FX(x) Pr[X x] Pr[{X1 x1, . . . ,Xn xn}].

The probability density function of X is similarly:

fX(x) nFX(x)

x1 . . . xn.

For two random vectors X = [X1, . . . ,Xn]

T

andY = [Y1, . . . , Ym]T:

FXY Pr[X x,Y y] fXY(x, y) (n+m)FXY(x, y)

x1 . . . xn y1 . . . ym.


Random Vectors: Expectation vector


18/43

Random Vectors: Expectation vector

The expectation of X is a vector whose elements are:

i

. . .

xi fX(x1, . . . ,xn) dx1 . . . dxn

=

xi

fXi (

xi)

dxi

where fXi (xi) is the marginal of the i-th component:

fXi (xi)

. . .

fX(x1, . . . ,xn) dx1 . . . dxi1dxi+1 . . . dxn


Random Vectors: Covariance matrix


19/43

Random Vectors: Covariance matrix

For a realrandom vector X, the covariance matrix is:

K E[(X X)(X X)T],

while for a complexrandom vector Z it is:

K E[(Z Z)(Z Z)H],

where ZH is the conjugate transpose or Hermitian

transposeof a complex-valued vector (or matrix) Z.

Similarly, the correlation matrix is defined as:

R

E[XXT] = K+ T, for real random vector X

E[ZZH] = K + H, for complex random vector Z


Properties of covariance matrix


20/43

Properties of covariance matrix

1 K is symmetric(i.e. K = KT) for real RV, and Hermitian

(i.e. K = KH

) for complex RV.

Kij E[(Xi i)(Xj j)]

= E[(Xj j)(Xi i)]

= Kji i,j = 1, . . . , n

2 K is positive semi-definite(p.s.d) i.e. yTKy 0, y

yTKy = yTE[(X )(X )T] y

= E[yT(X)(X

)Ty]

= E[(yT(X ))2] 0

3 If K is full-rank, then it is positive definite


Diagonalization of two covariance matrices


21/43

Diagonalization of two covariance matrices

Let P and Q be n n real symmetric matrices, and P be positivedefinite. Then,

P = UDUT, where UTU = I and D = diag(1, . . . , n), i > 0.

The diagonal matrix Z = diag(1/21 , . . . ,

1/2n ) exists and is real.

ZTUTPUZ = ZTDZ = I.

Now, A ZTUTQUZ is real symmetric, and so it can be

factorized as A = WWT, where WTW = I and = diag(1, . . . , n).

The transform V UZW simultaneously diagonalizes P and Q.

VTPV = WTZTUTPUZW = WTIW = I

VTQV = WTZTUTQUZW = WTAW =

QV = (VT)1 = PV


Random sequences


22/43

Random sequences

A random sequence is a sequence of RVs. Formally, a random

sequence X[n, ] is a mapping of the sample space into the

space of real (or complex) valued sequences, such thatX

[n

, ]is a random variable for each integer n.

A random sequence X[n] is statistically specifiedby its N-thorder distribution functions for all N 1, and for all times n:

FX(xn,xn+1, . . . ,xn+N1; n, n + 1, . . . , n + N 1)= Pr{X[n] xn,X[n + 1] xn+1, . . . ,X[n + N 1] xn+N1}

Similarly, the N-th order densities are obtained as:

fX(

xn,

xn+1, . . . ,

xn+N1;

n,

n+ 1, . . . ,

n+

N 1)

=NFX(xn,xn+1, . . . ,xn+N1; n, n + 1, . . . , n + N 1)

xnxn+1 . . . xn+N1

Note that, for each order N, we need to specify an infinite

number of PDFs for all times < n < +.Arnab Ghoshal Stochastic Signal Processing

Moments of Random sequences


23/43

q

A random sequence can also be specified (in a weaker sense)

by its moments:

The mean functionof a random sequence at time n is:

X[n] E{X[n]} =

xn fX(xn; n) dx

The (auto)correlation functionfor times m and n is:

RXX[m, n] E{X[m]X[n]} =

xmxn fX(xm,xn; m, n) dxm dxn

Similarly, (auto)covariance KXX[m, n] is defined as thecorrelation of the centeredsequence Xc[n] = X[n] X[n]

KXX[m, n] E{(X[m] X[m])(X[n] X[n])}


Properties of correlation functions


24/43

p

RXX[m, n] is Hermitian symmetric, i.e. RXX[m, n] = RXX[n, m]

RXX[m, n] is Positive semidefinite, i.e. for any a1, . . . , aN,

Nn=1

Nm=1

ana

mRXX[m, n] 0, for all N > 0.

The average powerof X[n] is given by RXX[n, n] = E{|X[n]|2}

Diagonal dominance: |RXX[m, n]| RXX[m, m]RXX[n, n],which follows from the Cauchy-Schwarz inequality.


Stationary sequences


25/43

y q

A random sequence X[n] is said to be stationary if for all ordersN 1 and all shifts k,

FX(xn,xn+1, . . . ,xn+N1; n, n + 1, . . . , n + N 1)

= FX(xn,xn+1, . . . ,xn+N1; n + k, n + 1 + k, . . . , n + N 1 + k)

A random sequence is called wide-sense stationary(WSS) if

(1) The mean function is constant for all n, i.e X[n] = X[0](2) For all times m and n and all shifts k, the autocorrelation

(autocovariance) function does not depend on k

RXX[m, n] = RXX[m + k, n + k] KXX[m, n] = KXX[m + k, n + k]

Properties of WSS sequences:

1 Hermitian symmetric: RXX[m] = R

XX[m]

2 m, 0 |RXX[m]| RXX[0], and |RXY[m]|

RXX[0]RYY[0]


WSS Random sequences and LTI systems


26/43

q y

For an LTI system L() and a WSS random sequence X[n], themean of the output random sequence Y[n] is

E{Y[n]} = E{h[n] X[n]} = h[n] E{X[n]} = H(z)|z=1X,

if the impulse response h[n] of L() is absolutely summable.

The output cross-correlation function RXY[m] is given by:

RXY[m] E{X[m]Y[0]}

=

k=h[k]E{X[m]X[k]}

=

k=

h[k]RXX[m k]

= h[m] RXX[m]


WSS Random sequences and LTI systems (contd.)


27/43

q y

The output auto-correlation function RYY[m] is given by:

RYY[m]

E{Y[m]Y

[0]}

=

k=

h[k]E{X[m k]Y[0]}

=

k=

h[k]RXY[m k]

= h[m] RXY[m]

= h[m] (h[m] RXX[m])

= (h[m] h[m]) RXX[m]

= g[m] RXX[m]

where g[m] h[m] h[m] is called the autocorrelation impulseresponse(AIR)


Power Spectral Density


28/43

p y

For a WSS random sequence X[n], the power spectral density(psd) is defined as the Fourier transform of its autocorrelation

function:

SXX()

m=

RXX[m]ejm, for +.

Hence the autocorrelation can be computed as the IFT:

RXX[m] =

1

2 S

XX()ejm

d.

Cross-power spectral density between two jointly WSS random

sequences X[n] and Y[n] is:

SXY()

m=

RXY[m]ejm

, for +.

For an LTI system Y[n] = h[n] X[n] with WSS input X[n]:

SYY = G()SXX() = |H()|2SXX().


Properties of power spectral density


29/43

SXX()

m=

RXX[m]ejm, for +.

1 SXX() is real valued for any X[n] since RXX[m] is Hermitiansymmetric

2 SXX() is an even function of if X[n] is real-valued

3 For any X[n], SXX() 0 for every 4 If RXX[m] has finite support(i.e. RXX[m] = 0, |m| > M for

some finite M > 0) then SXX() is an analytic function in .5 Average power in X[n] can be computed as:

E{|X[n]|2} = RXX[0] = 12

SXX() d.

Average power in a narrow band [0 , 0 + ] centered at0 is approximately SXX(0)


Vector random sequences


30/43

A vector random sequence is a sequence of random vectors,such that for each event we have a random vector X[n, ].Let X[n] and Y[n] be the input and output of a first-orderLCCDE:

Y[n] = AY[n 1] + BX[n]

With zero initial-conditions, the response to X[n] is:

Y[n] =n

k=0AnkBX[k] h[n] X[n].

where h[n] = AnB u[n] is the vector impulse response.


Vector random sequences (contd.)


31/43

For WSS X[n] the cross-correlation matrices are obtained as:

RXY[m] E{X[m]YH[0]} RYX[m] E{Y[m]X

H[0]}

= RXX[m] hH[m]. = h[m] RXX[m].

The output autocorrelation matrix is:

RYY[m] = h[m] RXX[m] hH[m].

Taking the matrix Fourier transform, the psd is:

SYY() = H()SXX()HH().


Convergence of series


32/43

Convergence A sequence of numbers xn converges to x

if > 0, N() such that n > N() |xn x| < .

Cauchy criterion A sequence xn converges to a limit iff

> 0, N() such that n, m > N() |xn xm| < .

Uniform convergence A sequence of functionsfn : D converges uniformly to the function f(x) iff > 0, N() s.t. n > N() |fn(x) f(x)| < , x D.

Pointwise convergence A sequence of functions fn(x)

defined over the same domain D, converges pointwise tothe function f(x) iff > 0, and x D, N(,x) such thatn > N(,x) |fn(x) f(x)| < .


Convergence of random sequences


33/43

Sure convergence The random sequence X[n]

converges to the RV X if the sequence of functions X[n, ]converges to the function X() for every .

Almost-sure convergence The random sequence X[n, ]converges almost surely (or with probability-1) to X() if

limnX[n, ] = X(), A and Pr[A] = 1.

Pr[{ : limn

X[n, ] = X()}] = 1

Mean-square convergence A random sequence X[n]converges to the RV X in mean-square sense if

limnE{|X[n] X|2} = 0


Convergence of random sequences (contd.)


34/43

Convergence in probability The probability

Pr[|X[n] X| > ] is a sequence of numbers which depend

on . If this sequence has a limit of 0 i.e.

limn

Pr[|X[n] X| > ] = 0,

then the random sequence X[n, ] is said to converge to

the RV X() in probability.

Convergence in distribution A random sequence X[n]with probability distribution function Fn(x) converges indistribution to the random variable X with probability

distribution F(x) if

limn

Fn(x) = F(x),

at all x for which F is continuous.


Random processes


35/43

A random process X(t, ) is a mapping from the sample space to the space of continuous time functions, such that X(t, ) isa random variable for each t (, +).

X(t) E[X(t)], for all < t < +

RXX(t1, t2) E[X(t1)X(t2)], for all < t1, t2 < +

KXX(t1, t2) E[(X(t1) X(t1))(X(t2)

X(t2))]

KXX(t1, t2) = RXX(t1, t2) X(t1)X(t2)

Variance function: 2X(t) KXX(t, t) and average powerfunction: E[|X(t)|2] = RXX(t, t)

Hermitian symmetry: RXX(t1, t2) = R

XX(t2, t1)

Positive semi-definite: for all N > 0 and t1 < t2 < . . . < tN,

and for any a1, . . . , aN:N

i=1

Nj=1 aia

j RXX(ti, tj) 0.

Diagonal dominance: |RXX(t, s)| RXX(t, t)RXX(s, s)Arnab Ghoshal Stochastic Signal Processing

Random processes (contd.)


36/43

X(t) and Y(t) are uncorrelatedif RXY(t1, t2) = X(t1)Y(t2)for all t1 and t2.

X(t) and Y(t) are orthogonal if RXY(t1, t2) = 0 for all t1, t2.

X(t) and Y(t) are independent if for all N > 0

FXY(x1,y1,x2,y2, . . . ,xN,yN; t1, t2, . . . , tN)

= FX(x1, . . . ,xN; t1, . . . , tN)FY(y1, . . . ,yN; t1, . . . , tN)

X(t) is stationary if for all N > 0 and all times T,

FX(x1, . . . ,xN; t1, . . . , tN) = FX(x1, . . . ,xN; t1 + T, . . . , tN + T)

X(t) is called wide-sense stationary(WSS) if: E[X(t)] = X, a constant; and E[X(t+ )X

(t)] = RXX()for all < < +, independent of the time parameter t.


WSS Random processes and LTI systems


37/43

For an LTI system L() with impulse response h(t) and a WSSrandom process X(t), the mean of the output process Y(t) is

E[Y(t)] =

h()X(t ) d = XH(0),

where H() is the frequency response of L.

The output cross-correlation functions are given by:

RXY() = h() RXX()

RYX() = h() RXX()

The output autocorrelation function is given by:

RYY() = h() h() RXX()

= g() RXX()

where g() h() h() is the autocorrelation impulseresponse.


Power Spectral Density


38/43

For a WSS random process X(t), the power spectral density(psd) is defined as the Fourier transform of its autocorrelation

function:SXX()

RXX()ej d

The autocorrelation is the inverse Fourier transform of psd:

RXX[m] = 12

SXX()ej d.

Cross-power spectral density between two jointly WSS random

processes X(t) and Y(t) is:

SXY()

RXY()ej d.


Properties of power spectral density


39/43

SXX()

RXX()ej d

1 SXX() is real valued since RXX() is Hermitian symmetric.

2 SXX() is an even function of if X(t) is real-valued

3 SXX() 0 for every

4 Average power in X(t) is computed as:

E{|X[n]|2} = RXX[0] =1

2

SXX() d.

5

Power in the frequency band (1, 2) is1

221 SXX() d.

6 For every real, non-negative and integrable function F(),there exists a stationary random process with power

spectral density S() = F().


Periodic and cyclostationary processes


40/43

A random process X(t) is wide-sense periodic if thereexists a T > 0 such that:

X(t) = X(t+ T) for all t

KXX(t1, t2) = KXX(t1 + T, t2) = KXX(t1, t2 + T) for all t1, t2.

The smallest such T is called the period

A random process X(t) is wide-sense cyclostationary ifthere exists a T > 0 such that:

X(t) = X(t+ T) for all t

KXX(t1, t2) = KXX(t1 + T, t2 + T) for all t1, t2.


Vector processes


41/43

Let X(t) [X1(t),X2(t)]T

, and Y(t) [Y1(t), Y2(t)]T

be the inputand output of a general two-channel LTI system denoted by:

h(t)

h11(t) h12(t)h21(t) h22(t)

.

Then Y(t) = h(t) X(t) where vector convolution is defined as:

(h(t) X(t))i N

j=1hij(t) Xj(t).


Vector processes


42/43

Defining the input and output correlation matrices, we get:

RXX()

RX1X1() RX1X2()RX2X1() RX2X2()

RYY()

RY1Y1() RY1Y2()RY2Y1() RY2Y2()

,

RYY() = h() RXX() hH().

Taking the matrix Fourier transform:

SYY() = H()SXX()HH().



43/43


beamer lecture

Documents