stochastic data representation

4/30/2018

1

P. S. Game – PICT, Pune

Stochastic Data Representation

Ref: T1- C3,C5 1 P. S. Game – PICT, Pune

• No system behaves in a precisely predictable manner. They are innately noisy.

• A degree of randomness need to be added to make model realistic

• Random ??? Do they have a definite structure?

• One can predict how next event will be distributed.

• Using historical data, one can statistically analyze the input.

• Using statistics (mean, SD etc), one can generate realistic inputs.

2


Uniformly Distributed Random Numbers

• U[0,1] generators

• Most of the uniform random number generators are based on Linear Congruential generators (LCG)

a - multiplier, c - increment, m – modulus

Maximum possible random numbers?

• Determine the sequence of numbers generated by the LCG with a=5, c=3, m=16 and Z0 = 7

• Pseudorandom numbers generated by LCG are DETERMINISTIC.

• The LCG chosen is said to be full period, if m different random numbers occur for m repetitions.

Z0 = ‘seed’ Zk+1 = (aZk + c) mod (m) Uk = Zk / m

Explicit formula

Zk = akZ0 +

𝑐(𝑎𝑘−1)

𝑎−1mod (m)

Uk = Zk / m

3 P. S. Game – PICT, Pune

• Hull-Dobell Theorem

The LCG has full period iff:

- a and c are relatively prime

- all prime numbers q that divide m also divide a-1

- if 4 divides m, then 4 also divides a-1

4

4/30/2018

2


• Computer Implementation (of LCG exmple)at hardware level

– Use 4-bit shift register (R) to store value of Zk.

– Multiplication is done by using left shifts

– R [0000] – [1111]

– Z0 = 7, so initially R will have (7)10 = [0111]2 (which is required remainder). To get the U0, assume a binary point to the left of leading digit.

– (0.0111)2 = (0.437)10

– Division is just right shift of the shift register.

– For register values more than 16, Leading 1 is lost, resulting in (decimal) number; which is the required remainder.

- Try for Z1…

- As shift operations are used multiplications and divisions will not take much cycles.

Zk+1 = aZk + c


• If the increment is set to c=0, the LCG is called multiplicative generator.

• Hull-Dobell theorem is not satisfied.

• Example generators

– Coveyoo and MacPherson used on UNIVAC 1100 series machine (a= 515, b=36, c=0)

– RANDU generator on IBM’s old Scientific subroutine (a = 216 +3, c= 0, m = 231)

– IBM’s new Mathematical Subroutine Library uses (a =75, c=0, m= 231-1

6


Statistical Properties of U[0,1] Generators

• Should have large period and a hardware compatible modulus.

- The generator must be uniform.

- Tested using “Chi-square” test

- The sequence must be independent.

– Tested using “runs” test


Chi-Square test

• Given:

n – generated random numbers

m- subclasses or subintervals

fk- theoretical frequency for each subinterval

ek = n/m – expected or empirical frequency for each subclass

– To accept or reject the result is compared with the standard values based on the confidence α, degree of freedom v = m-1

8

4/30/2018

3


• Ex. A proposed U[0,1] generator called SNAFU is tested by generating 100 numbers and counting the frequencies in each of following ranges: 0.00 <= x < 0.25, 0.25<= x < 0.50, 0.50 <= x < 0.75, 0.75<= x <1.00. The results are f1= 21, f2= 31, f3 = 26, f4 = 22. Is this “close enough” to be uniform?

• Solution: n=100 , m= 4, preferred frequencies n/m = 25 (i.e. numbers in each class)

χ2 = 2.48

Using Chi-Square function table critical χ2 for α = 95% and v= 3 is 7.81, which more than the calculated value.

Hence the given generator is close enough to be uniform.


Chi-Square Distribution Function

10


Runs test

• Generate the sequence of n proposed random numbers

• Check the length of ascending length throughout

Each of the n numbers in the list belongs to exactly one ‘run’. The frequencies rk are defined as

r1 = number of runs of length 1





r6 = number of runs of length greater than 5


B = [bi] is given by

A = [aij] is given by

B is theoretical probability of achieving a run of length k. Expected frequency is nbk.

12

4/30/2018

4


Generation of non-uniform random numbers

• Formula Method

– It is simple to generate uniform random numbers on the arbitrary interval [a,b], using a U[0,1] RND generator

X= (b-a)RND + a

– Formula for non-uniform random distribution (normalized Gaussian random variate)

Z is random variate with mean 0 and standard deviation 1.

– Formula by inversion method

Compute the distribution function F and inverse it.

(Example on next slide)


Derive a formula by which to generate exponentially distributed random variates with mean u.

• Solution: Density function f(x) for an exponential random variates with mean E[X] = 1/λ = μ

The distribution function is

Solving for x, x= -μ ln(1-F)

X= -μ ln(RND) 14


• Rejection Method

– Inverse of distribution function is not always possible

– E.g., Gaussian and Beta distributions

– Methods based on throwing darts can be used.

x= a+(b-a)RND y= cRND

Target. It must enclose the density

function


• Convolution method • Consider a random variable X defined as the sum of n other

independent and identically distributed (IID) random variables X1, X2, . . ., Xn.

• Specifically, if Xi has the same density function fi(x) for i = 1, 2, . . . ,n, the density function f(x) of X is the convolution of each of all the n basis density functions

• Formally,

• fi(x) is density function of Xi, convolution operator is defined as

• So the random variate itself can be found by adding n IID variates , each of which can be found by other means.

16

4/30/2018

5


• m-Erlang distribution (ref. pg. 495) is defined as the sum of m IID exponential random variates. The mean of an m-Erlang distribution is

where λ is the reciprocal of the exponential distribution's mean.

• m-Erlang random variate can be generated, with mean μ by adding m exponential random variates.

x=0,

for k=1 to m

x = x – u ln(RND) /m

next k

print x 17 P. S. Game – PICT, Pune

Generation of Arbitrary random variates • For many situations, there is no explicit formula for

density function.

• Only a set of empirical variates is known.

• In order to validate a model, one must verify the system performance from the historical records.

• Random variates can be generated: – Set of data {x1,x2,…xn}

– F(x) is piece-wise-linear and continuous distribution function.

– Find F-1(x) to generate random variates.

18


• For discrete distributions.

• Consider mass functions, p(0),p(1),.. Summing to unity.


Random Processes • To simulate dynamic system, need to model input along

with the system.

• Input is rarely deterministic- it is a random process.

• Stochastic signal does not mean that it is totally random.

• It is actually a collection of signals called the process ensemble.

• Individual entity in the ensemble is called instance of signal. The whole process is called Random Process.

• Continuous-time random process is denoted by X(t).

• X(t) is represents ensemble and x(t) an instant.

• Simply view X(t) as a random variable and normal statistics –mean, SD, moments, autocorrelation- are available!!

20

4/30/2018

6


• Autocorrelation is the expected value of the product of signal with itself, evaluated at τ time units later.

• In the event that R isis function of only of τ , random process is called autocorrelated.

• If the mean μX(t) = E[X(t)] is constant, the process is wide-sense stationary.

• Stationary property means that system has achieved steady-state behavior.


• For wide-sense stationary, the autocorrelation holds following properties:

22


• Consider an ensemble consisting of four equally likely signals defined over the time interval [0,1] as:

X(t) = {2t+1, t+2, 3t+2, 4t+1}

Find mean, second moment, variance, and autocorrelation of X(t).


Characterizing Random Processes

Random Variables Vs Random Processes

• Random variables are theoretical collection of numbers {xi}, from which we randomly select one and check P[X=xi] and P[X<xi]

• Random process is a collection of signals or functions {xi(t)}.

• There is no explicit formula for signals but characterization of signals.

• Random variates: create sequence of numbers with predefined density function

• Random Process: create a sequence of signals with a predefined autocorrelation.

24

4/30/2018

7


• Mean and autocorrelation are used to characterize the random processes.

• Autocovariance, Cxx(t,τ) = E[(x(t)-μx) (x(t+τ) – μx )], measuring autocorrelation of process relative to mean can also be used.

• If mean is 0, Cxx = Rxx

• An ergodic process is one in which every sequence or sizable sample represents the whole process.


• For wide-sense stationary ergodic process

• Time average is taken over only one instance.

Continuous time t Discrete time k

26


Discrete Probability Distributions • Discrete Uniform distributions

Parameters:

i: left endpoint;

j: right endpoint.


• Binomial distribution

Parameters: n (a positive integer): the number or trials;

p(0 < p < 1): the probability of a single success.

28

4/30/2018

8


• Geometric distribution

Parameter: p (0 < p < 1): the probability of a single success


• Poisson Distribution

Parameter: p. (positive real number): mean.

30


Continuous Probability Distributions • Continuous Uniform distributions

Parameters:

a: left endpoint;

b: right endpoint.


• Gamma Distributions

Parameters: α (α > 0), β (β>0)

32

4/30/2018

9


• Exponential Distribution: This is a special case of the Gamma distribution with α = 1 and β = 1/λ.

Parameter: λ(λ > 0)


• Chi-Square distribution: This is a special case of the Gamma distribution with α = (½) v, v a positive integer, and β = 2.

Parameter: v (a positive integer): degrees of freedom.

34


• m- Erlang Distribution : This is a special case of the Gamma distribution with α = m and β= 1/λ.

m (a positive integer ) is the number of IID exponential variates.

Parameters: λ(λ > 0).


• Gaussian Distribution

Parameters:

μ: mean;

σ: standard

deviation.

36

4/30/2018

10


Generating Random Processes

• For simulations, random processes are random signals. Two types:

– Regular : at each clock-tick or beat, a random signal is generated. Regular Random Process

– Episodic: irregular asynchronous random event. When event occurs, signal value changes.

• Time-driven systems, open loop systems

• Event-driven systems, closed-loop systems


• Episodic Random Processes

– Characterized by non-deterministic inter-event times.

– If times are exponentially distributed, process is called Markovian

– If the signal value x(t) is the number of events up to that time, the process is Poisson.

– Simulation requires to Schedule the event times {tk} and Generate the event sequence {xk}.

– For Poisson Process

• Schedule : tk = tk-1 – μ for event k >0

• Generate : xk = 0, xk = xk-1 + 1

38


Generating a Single instance of a Poisson process with mean u on time horizon [0, t(m)]


• Telegraph Process

– It models a digital pulse train of bipolar bits as they as they proceed Synchronously over a communication link.

– Amplitude and frequency are fixed, however, phase and sign of amplitude are random.

– Let X(t) be either +1 or -1 with Equal Probability

– Every T seconds new x arrives, but when data stream starts is unknown. Let it start somewhere [0,T] with Equal Probability.

– Randomness is in initial time t0 and the x-value.

– Schedule: t0 = (1/T) RND , tk = tk-1 + T

– Generate : P[X(t) = 1] = ½ and P[X(t) = -1] = ½ .

So if b= RND, then xk = -1, b< 0.5 and xk = 1, b>= 0.5 for all k

40

4/30/2018

11


Regular Random Process • Scheduling in regular random process model is automatic.

• Both the initial time and the inter-arrival times are given, and are usually fixed throughout the simulation.

• three important cases to note:

1. If the signal is discrete, time is defined on the basis of a non-negative integer k, Between successive times k and k + 1, it is assumed that the system is dormant and there are no signal changes.

2. It is also possible to have a system in which system dynamics can occur between discrete times k. In this case, the discrete time is related to the continuous time t by t = kT, where T is the sampling interval and 1/T is the sampling frequency.

3. The last option is continuous time t. With continuous time, models are often described by differential equations. we define the integration step size h to be a small time increment over which no system dynamics can take place and by which appropriate numerical methods can be employed where t = hk.


Random Walks

Four Random walks with p=1/2 42


• Probability of in position n at time k

• This equation is similar to Binomial equation.

• It can be made more generalized by considering 2D, walker steps left-right, up-down.

• Step functions can also be used, like increment in steps can be made using Gaussian random variable.


White Noise

• A random walk requires two specifications:

– the initial state and

– a recursive formula by which to proceed to the next state at each time step.

• Clearly, such a random process will have a significant autocorrelation.

• This is in stark contrast to the concept of noise, where the future is more unpredictable.

• Since to model noise is to model a signal that cannot be anticipated, we think of noise as not related to any other signal X(k) or to itself in any way whatsoever.

44

4/30/2018

12


• consider a noisy signal W(k) to be a signal whose cross-correlation is

• And Autocorrelation function is

• In the special case where the mean of the noise is μw = 0, it is called white noise.

• In general, gives only non-zero contribution to autocorrelation

• This is a defining equation for signal of white noise W(k)


• For continuous time δ(t) is taken as Dirac delta

• For discrete time δ(t) is taken as Kronecker delta

• Time-series white noise is characterized by following properties

• Note: there is no explicit or implicit formula for white noise, since it is not a unique signal.

46


Random Process Models

• System is fixed and deterministic

• Signal driving system is random and noisy.

• Random signals come from ensembles of random processes

• Need to find out how the statistical characteristics of random process input relate to random process output.

Signal Deterministic Stochastic

Input Single input vector Ensemble of input vectors

Output Single output vector Ensemble of output vector

Analysis Transient phase Steady-state system

Initial non-stationary phase Stationary phase

Defining input Impulse White noise

Signal descriptor Explicit formula Autocorrelation Spectral density


• Statistical characteristics of random signals can be defined using mean, autocorrelation, spectral density

• Spectral density (Sxx) is the Fourier Transform of autocorrelation.

• Wiener-Khinchine relation

• In deterministic linear system; if x(t) and y(t) are input and output signals, X(ω) and Y(ω) are Fourier transforms, then

Y(ω) = H(j ω)X(ω)

Syy(ω) = |H(j ω)|2 Sxx(ω)

• thus, if input characteristics are well understood, output characteristics can be inferred.

48

4/30/2018

13


Moving-Average (MA) Processes

• Why random signals are used in simulations?

– There is noise that contaminated the underlying process

– Ignorance

• Noise is often of the form x(k) = w(k), Rxx(τ) = σ2𝑤

δ(τ)

• Ignorance: it can happen that the signal x(k) depends on previous white noise events (shock events), such that x(k) = biw(k−i)

𝑞𝑖=0

This is called as moving-average process and it actually acts as a low-pass filter.

• Being average it softens the peaks and valleys of white noise environment.


• Another possibility is rather than retaining memory through delayed inputs w(k), it can be retained though delays in the output. This model is autoregressive process.

x(k) =w(k) - 𝑎𝑖 𝑥(𝑘 − 𝑖)𝑝𝑖=1

It observes not only the current white noise input but also the previous states.

• Combining the two, we get the most general of all linear random processes- Autoregressive Moving- Average (ARMA) model

x(k) = - 𝑎𝑖 𝑥(𝑘 − 𝑖)𝑝𝑖=1 + biw(k−i)

𝑞𝑖=0

This model retains memory through both white noise shock inputs and signal value at sampling points.

50


General ARMA Model

• Autocorrelation of MA is given as,


Autoregressive (AR) Processes

• Mathematical presentation x(k) =w(k) - 𝑎𝑖 𝑥(𝑘 − 𝑖)𝑝𝑖=1

• AR is similar to random walk. Next signal value depends on previous signal value.

• AR is similar to MA in case of phases: transient and post-transient (i.e. stationary)

• However, if MA process will become stationary regardless of coefficients.

• AR process, not all coefficients lead to stationary state.

52

4/30/2018

14


• If process static, AR process is ergodic. Autocorrelation can be found using Yule-Walker equations:

• For cross-correlation Rxw(τ), there is no relationship between x(k) and w(k+τ) unless τ=0

Rxw(0)= E[w(k)w(k)] = σ2𝑤

in general ,

This equation a system of linear equations. By taking τ = 0,1,2,…,p and writing resultant equation in matrix form…..


• Rxx(τ) is an even function, there are actually p unknown AR.

• So if ai , and variance is known AR can be determined.

54


Big-Z notation • Z - Special operator to represent discrete signals.

One time step advance Z = Z[x(k)] = x(k+1)

• Higher order operators Two time step advance Z2 = Z[Zx(k)] = x(k+2)

• Z-1 - Inverse operator, is called signal delay.

Z-1[x(k)] = x(k-1)

•𝑥(𝑘)

𝑤(𝑘) = H(Z) is called as signal transfer function.

Big – Z notation is very closely related with common Z-transform for discrete signals. 55 P. S. Game – PICT, Pune

Autoregressive Moving-Average (ARMA) models

• We have seen x(k) = - 𝑎𝑖 𝑥(𝑘 − 𝑖)𝑝𝑖=1 + biw(k−i)

𝑞𝑖=0

• If ai are zero, it reduces to MA, and if q=0, it reduces to AR process.

• Rewriting using Z notation

x(k) = - x(k) 𝑎𝑖 𝑝𝑖=1 Z-i + w(k) bi

𝑞𝑖=0 Z-i

• System transfer function is

56

4/30/2018

15


• The denominator is called characteristic polynomial C(Z).

• Solution to C(Z) = 0 is called system poles.

• System poles determine the stability of random process.

• If the transfer function ratio is divided out so that it is a power series

then the coefficients h(i) of Z-i is called signal impulse response.


• Autocorrelation function can be found by

58


Additive Noise

• Analyzing the response to a ideal input is not enough

• There is inevitably a residual error between ideal input and realistic input.

• A model which adds a random noise component is called additive noise model.

• This is done using superimposition principle.

59

stochastic data representation

Documents