stochastic processes and their applications in computer … · 2006-06-23 · continuous...

1

Bi04a_1

© Copyright W. Schreiner 2005

StochasticStochastic ProcessesProcesses and and theirtheir ApplicationsApplicationsin Computer in Computer SimulationsSimulations

forfor BioinformaticsBioinformatics

Unit 04a:

Bi04a_2


StochasticStochastic ProcessesProcesses and and theirtheir ApplicationsApplications in in Computer Computer SimulationsSimulations forfor BioinformaticsBioinformatics

Basic Probability Concepts revisitedCrude Monte CarloMarkov-Chain Monte Carlo

a) Metropolis-Hastingsb) Gibbs-Sampling

Real Example: How to find the Consensus Sequence (Unit 4c, Start)

Simulated Annealing using Markov-Chain-Monte CarloReal Example: How to find the Consensus Sequence (Unit 4c, ctd)

Genetic Algorithmsa) General applications used to optimize structure of multidimensional

objects: molecule conformations, sequence alignments, etc.Real example: Multiple sequence alignment by GA Program SAGA

b) Genetic Computing: Genetic Algorithms used to optimize computerprograms

2

Bi04a_3


Basic Basic ProbabilityProbability ConceptsConcepts revisitedrevisited

Discrete distributionsabsolute frequenciesrelative frequencieslimiting behaviours as N ∞probabilites & cumulative probabilities

Continuous distributionsabsolute frequenciesrelative frequencieslimiting behaviours as N ∞probabilites & cumulative probabilities

Bi04a_4


{ }1 2 3 4 5 6

6

1

61 21 2 6

6

1

, , , , , ,

, ,...

1

"

=

=

=

⎧ ⎫= = =⎨ ⎬⎩ ⎭

=

∑

∑

ii

ii

H H H H H H absolute Frequences

H N Summation Condition for absolute Frequences

HH Hh h h relative frequencesN N N

h Normalization Condition for relative Frequences

irgendein Wert muß ..."ja schließlich auftreten

Basic Basic ConceptsConcepts revisitedrevisited

1. Discrete Distributionsrandom variable takes only a few possible values

H3-times dice shows „3“

example: ξ ∈ {1, 2, 3, 4, 5, 6} for (normal) diceN-times throwing a dice:

3

Bi04a_5



Limiting behaviour as N →∞

Generalize:a) Consider an arbitrary number of values (L instead of 6)b) Consider unequal probabilities pi≠pj for i≠j

{h1,.........h6}↓ ↓

{p1..........p6} „Probability“ is limit of relative frequencies as N →∞

Normalization condition to 16 6

1 1

lim lim

1

→∞ →∞

= =

= =

= =∑ ∑

ii iN N

i ii i

Hp hN

h p

1 11

= == =∑ ∑

L L

i ii i

h p

Bi04a_6



Example for limiting behaviour of relative frequencies as N →∞

small N:

{h1, h2, ...h6}

we obtain very different results for each lap

large N:

{h1, h2, ...h6}

we obtain more similarresults for each lap

{h1, h2, ...h6}

N→∞

{p1, p2, ...p6}

4

Bi04a_7



“cumulative probability”

“probability distribution”

p(i) = Pr(ξ=i)

F(i) = Pr(ξ≤i)

( ) ( )1=

=∑i

lF i p i

Bi04a_8



A simple example:

normal dice

“Mafia” dice

0 1 2 3 4 5 6

1/6

1

p

F

0 1 2 3

p F1

1/6

( )

( ) ( )

1

1

1=

=

=

=

∑

∑

L

li

l

p l

F i p l

5

Bi04a_9



2. Continuous Distributionsrandom variable takes arbitrary (real) values within given intervallex.: ξ ∈ , 0 ≤ ξ ≤ 1

-2 ≤ ξ ≤ 50 ≤ ξ ≤ ∞

-∞ ≤ ξ ≤ ∞

finite intervals

infinite intervals

Bi04a_10


p(x) is defined such that

Pr(x ≤ ξ ≤ x + dx) = p(x)dx dx ... infinitesimally small interval

shaded area

[a,b] ... “real” intervall Normalization for probability density function (p.d.f.)“irgendwo muß ξ ja liegen”

( ) ( )Pr 1∞

−∞

−∞ ≤ ≤ ∞ = =∫ p dξ ξ ξ( ) ( )Pr ≤ ≤ = ∫b

a

a b p dξ ξ ξ

Basic Basic ConceptsConcepts revisitedrevisitedContinuous Distributions, ctd.

Since the random variable can take any (in between) value, the concept of

discrete probability values

must be generalized to a

continuous probability distribution function

uniform distributionp(x)1

x x+dx1

ξ

6

Bi04a_11



Continuous Distributions, ctd.specifically consider interval [-∞,x]

P(-∞ ≤ ξ ≤ x ) =

probabilitydensityfunction(p.d.f.)

cumulativedistribution

function

if this holds, we know from calculus:

F(-∞) = 0F(+∞) = 1F(x) is increasing since p(x)≥0

F(x)p

x 1ξ

1p(x)

( ) ( ) ( )Pr= = ≤∫x

p d F x xξ ξ ξ

( ) ( )=dp x F xdx

Bi04a_12


( )

( )

11 2

0 0

2 2

00

2 1 0 12

= = − =

= =

∫

∫x

x

xp x dx

p d xξ ξ ξ


Example: linearly increasing p.d.f. on [0,1]:p(x) = 2x

Normalization:

Cumulativedistribution function:p(x)

F(x)

7

Bi04a_13



high resolution

from Kalos (1986)

Bi04a_14



how to get a random variable ξ?physically:

throw 10 sided prism to construct real number as 0.n1n2n3n4...nL ∈ [0,1)

on a computer:

use random numbergenerator! (see expendix)

from Morgan (1984)

8

Bi04a_15


ExpendixExpendix 1: 1: HowHow to to generategenerate randomrandom numbersnumbers

„Real“ random numbers (fromnature, i.e. physical process)

„Pseudo“ random numbers (PRNs) (from computer programs)

xn+1=a•xn+b (mod m)„congruential method“e.g. a = 1573, b = 19, m = 1000, x0 =89

seed

from Morgan (1984)

[0,1]= ∈nn

x uniform distributedm

ξ

Bi04a_16


ReallyReally randomrandom versus versus PseudoPseudo randomrandom..

„Real“ random numbers (fromnature, i.e. physical process)

„Pseudo“ random numbers (PRNs) (from computer programs)

no prediction possiblerandomness (of e.g. dice) due to uncertaincy principle of quantum physics

strictly predictable“randomness” due to algorithm, its parameters & seed

statistical tests will not reveal systematic predictabilityawkward for simulation

statistical tests will not easily reveal systematic predictabilityhowever: beware of loopshandy for simulation

9

Bi04a_17


ExpendixExpendix 1: 1: PRNsPRNs QualityQuality

„Bad Generator“ „Improved Generator“

high resolution high resolution

from Morgan (1984)

Bi04a_18


ExpendixExpendix 1: 1: PRNsPRNs fromfrom mostmost commoncommondistributionsdistributions

ξ ∈ [0,1] uniformly distributed is available in (almost) every programinglanguage (library)

ξ ∈ N(µ,σ) normal (Gaussian) distribution

is available in many programing language (libraries)mean standard deviation

high resolution

high resolutionfrom Kalos (1986)

from Kalos (1986)

10

Bi04a_19


ExpendixExpendix 1:1:Ho to Ho to getget PRNsPRNs fromfrom otherother distributionsdistributions

statistical and mathematical program librariesderiving desired distribution from uniform PRNs on [0,1]:

inversion methodother, even more sophisticated methods ...rejection method (works for arbitrary distributions wanted!)table lookup-method

Bi04a_20


ExpendixExpendix 1: Linear 1: Linear transformationstransformations of of PRNsPRNsuniform PRNs → uniform PRNsGaussian PRNs → Gaussian PRNsarbitrary distributed PRNs → (? not generally predictable)

ξ →N. ξ ∈[0,N]

ξ →2 ξ-1 ∈[-1,+1]

ξ →A+(B-A) ξ ∈[A,B]

CAUTION:what happens at start & end of interval!

!

10

0 N

trafo

A B-

0 1

trafo

-1

1

+1

0

0

trafo

11

Bi04a_21


( ) ( ) ( ) ( )

( ) ( )

( ) ( )

( )1

Pr

−

= ≤ = ≤ =

= ⋅

= ⋅

⎛ ⎞= ⎜ ⎟⎝ ⎠

y x

xy

y x

dF y Y y X x F Xdy

d F x dxf ydx dy

dxf y f xdy

dyf xdx

ExpendixExpendix 1: General 1: General transformationstransformations of of PRNsPRNs

considercumulativedistribution

functionfor

argumentation

And for details: see Morgan (1984), p.29 or Kalos (1986), p.40

taking absolute values | | will make formulavalid for decreasing and increasingtransformation functions!

high resolution

from Morgan (1984)

Apply differentiatiation operator to both sides of equation!

Use chain-rule for differentiation

Bi04a_22


ExpendixExpendix 1: General 1: General transformationstransformations of of PRNsPRNs

An illustration of thetransformationy = √x

relating the densities

fX(x) = e-x, fY (y) = 2ye-y2.

The shaded regions haveequal areas.

high resolution

from Morgan (1984)

12

Bi04a_23


ExpendixExpendix 1: 1: RandomRandom numbersnumbers fromfrom librarieslibraries

high resolution

SAS NAG IMSLNORMAL generates a normally distributed

pseudo-random variateG05CAF uniform over (0,1) nextBeta beta distribution

RANBIN generates an observation from abinomial distribution

G05DAF uniform over (a,b) nextBinomial binomial distribution

RANCAU generates a Cauchy deviate G05DBF exponential nextCauchy Cauchy distributionRANEXP generates an exponential deviate G05DDF Normal nextChiSquared Chi-squared distributionRANGAM generates an observation from a

gamma distributionG05DYF discrete uniform nextExponential standard exponential

distributionRANNOR generates a normal deviate G05DRF Poisson nextExponentialMix mixture of two

exponential distributionsRANPOI generates an observation from a

Poisson distributionG05FEF Beta distribution (multiple) nextGamma standard gamma

distributionRANTBL generates deviates from a tabled

probability mass functionG05DFF Cauchy distribution nextGeometric geometric distribution

RANTRI generates an observation from atriangular distribution

G05DHF Chi-square distribution nextHypergeometric hypergeometricdistribution

RANUNI generates a uniform deviate G05DKF F-distribution nextLogarithmic logarithmic distributionUNIFORM generates a pseudo-random variate

uniformly distributed on theinterval (0,1)

G05FFF Gamma distribution (multiple) nextLogNormal lognormal distribution

G05DCF Logistic distribution nextMultivariateNormal multivariate normaldistribution

G05DEF Lognormal distribution nextNegativeBinomial negative binomialdistribution

G05DJF Student's t-distribution nextNormal standard normaldistribution using aninverse CDF method

G05FSF von Mises distribution nextNormalAR standard normaldistribution using anacceptance/rejectionmethod

G05DPF Weibull distribution nextPoisson Poisson distributionnextStudentsT Student's t distributionnext Triangular triangular distributionnextVonMises von Mises distributionnextWeibull Weibull distribution

Bi04a_24


ExpendixExpendix 1: 1: RejectionRejection methodmethod

wanted distribution

(for some peculiar reason...)

a

ξ

p

b

• Rejection method is simple!

• works for the most fancydistributions

• „normalizes itself“!recipe:

• generate ξ ∈ [a,b], uniform

• generate η ∈ [0, max(p)], uniform

• if η < p(ξ) ⇒ take ξ as next random number ( Points)otherwise discard (ξ, η) ( Points)

13

Bi04a_25


ExpendixExpendix 1: 1: RejectionRejection methodmethod, , ctdctd..rejection method is simple but:

may be inefficient: „Simple but dead slow!“

Bi04a_26


ExpendixExpendix 1: 1: SelectSelect randomlyrandomly fromfrom a a groupgroup of n of n individualsindividuals accordingaccording to to weightsweights ((probabilitiesprobabilities))

(„table lookup method for a discrete random variable“)

1) Case 1: equal probabilities (=weights)

equidistant intervals

individual #

ξ

0 1/n 2/n 3/n . . . . . . . . . . . . . . .(n-1)/n n/n = 1

1 2 3 n

recipe:

• draw pseudo random number ξ from uniform distribution over [0,1]

• individual selected: ( )int 1, min ,1/⎡ ⎤= + =⎢ ⎥⎣ ⎦

i i i nnξ

14

Bi04a_27


( ) ( )1

/ /

1 1: 1

−− −

= =

⎡ ⎤= =⎢ ⎥⎢ ⎥⎣ ⎦∑ ∑i in n

E s E si i

i ip e e Normalization pτ τ

ExpendixExpendix 1: 1: SelectSelect randomlyrandomly fromfrom a a groupgroup of n of n individualsindividuals accordingaccording to to weightsweights ((probabilitiesprobabilities))

2) Case 2: different probabilities (=weights), e.g. pi proportional e-E(si)/τ

recipe:

• draw pseudo random number ξ from uniform distribution over [0,1]

• individual i selected so that:

ξ

equidistant intervals

individual # 1 2 3 4 5 6 7 . . . n

0 1

(„table lookup method for a discrete random variable“)

1

1 1

−

= =

< ≤∑ ∑i i

l ll l

p pξ

Bi04a_28


LiteratureLiterature on on RandomRandom NumbersNumbers

Morgan,B.J.T. 1984. Elements of Simulation. Chapman and Hall, New York.

Rosanow,J.A. 1974. Wahrscheinlichkeitstheorie. Rowohlt Taschenbuch Verlag, Hamburg.

Kalos,M.H. and P.A.Whitlock 1986. Monte Carlo methods. Wiley, New York.

SAS Institute. 1985. SAS User's Guide: Basics. Cary.

NAg. 1993. Fortran Library Mark 16. Oxford.

JMSL Reference Manual. 2002. Visual Numerics, San Ramon. http://www.vni.com/books/docs/

stochastic processes and their applications in computer … · 2006-06-23 · continuous...

Documents