stochastic processes and their applications in computer … · 2006-06-23 · continuous...

14
1 Bi04a_1 © Copyright W. Schreiner 2005 Stochastic Stochastic Processes Processes and and their their Applications Applications in Computer in Computer Simulations Simulations for for Bioinformatics Bioinformatics Unit 04a: Bi04a_2 © Copyright W. Schreiner 2005 Stochastic Stochastic Processes Processes and and their their Applications Applications in in Computer Computer Simulations Simulations for for Bioinformatics Bioinformatics Basic Probability Concepts revisited Crude Monte Carlo Markov-Chain Monte Carlo a) Metropolis-Hastings b) Gibbs-Sampling Real Example: How to find the Consensus Sequence (Unit 4c, Start) Simulated Annealing using Markov-Chain-Monte Carlo Real Example: How to find the Consensus Sequence (Unit 4c, ctd) Genetic Algorithms a) General applications used to optimize structure of multidimensional objects: molecule conformations, sequence alignments, etc. Real example: Multiple sequence alignment by GA Program SAGA b) Genetic Computing: Genetic Algorithms used to optimize computer programs

Upload: vuongtu

Post on 28-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Bi04a_1

© Copyright W. Schreiner 2005

StochasticStochastic ProcessesProcesses and and theirtheir ApplicationsApplicationsin Computer in Computer SimulationsSimulations

forfor BioinformaticsBioinformatics

Unit 04a:

Bi04a_2

© Copyright W. Schreiner 2005

StochasticStochastic ProcessesProcesses and and theirtheir ApplicationsApplications in in Computer Computer SimulationsSimulations forfor BioinformaticsBioinformatics

Basic Probability Concepts revisitedCrude Monte CarloMarkov-Chain Monte Carlo

a) Metropolis-Hastingsb) Gibbs-Sampling

Real Example: How to find the Consensus Sequence (Unit 4c, Start)

Simulated Annealing using Markov-Chain-Monte CarloReal Example: How to find the Consensus Sequence (Unit 4c, ctd)

Genetic Algorithmsa) General applications used to optimize structure of multidimensional

objects: molecule conformations, sequence alignments, etc.Real example: Multiple sequence alignment by GA Program SAGA

b) Genetic Computing: Genetic Algorithms used to optimize computerprograms

2

Bi04a_3

© Copyright W. Schreiner 2005

Basic Basic ProbabilityProbability ConceptsConcepts revisitedrevisited

Discrete distributionsabsolute frequenciesrelative frequencieslimiting behaviours as N ∞probabilites & cumulative probabilities

Continuous distributionsabsolute frequenciesrelative frequencieslimiting behaviours as N ∞probabilites & cumulative probabilities

Bi04a_4

© Copyright W. Schreiner 2005

{ }1 2 3 4 5 6

6

1

61 21 2 6

6

1

, , , , , ,

, ,...

1

"

=

=

=

⎧ ⎫= = =⎨ ⎬⎩ ⎭

=

ii

ii

H H H H H H absolute Frequences

H N Summation Condition for absolute Frequences

HH Hh h h relative frequencesN N N

h Normalization Condition for relative Frequences

irgendein Wert muß ..."ja schließlich auftreten

Basic Basic ConceptsConcepts revisitedrevisited

1. Discrete Distributionsrandom variable takes only a few possible values

H3-times dice shows „3“

example: ξ ∈ {1, 2, 3, 4, 5, 6} for (normal) diceN-times throwing a dice:

3

Bi04a_5

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

Limiting behaviour as N →∞

Generalize:a) Consider an arbitrary number of values (L instead of 6)b) Consider unequal probabilities pi≠pj for i≠j

{h1,.........h6}↓ ↓

{p1..........p6} „Probability“ is limit of relative frequencies as N →∞

Normalization condition to 16 6

1 1

lim lim

1

→∞ →∞

= =

= =

= =∑ ∑

ii iN N

i ii i

Hp hN

h p

1 11

= == =∑ ∑

L L

i ii i

h p

Bi04a_6

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

Example for limiting behaviour of relative frequencies as N →∞

small N:

{h1, h2, ...h6}

we obtain very different results for each lap

large N:

{h1, h2, ...h6}

we obtain more similarresults for each lap

{h1, h2, ...h6}

N→∞

{p1, p2, ...p6}

4

Bi04a_7

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

“cumulative probability”

“probability distribution”

p(i) = Pr(ξ=i)

F(i) = Pr(ξ≤i)

( ) ( )1=

=∑i

lF i p i

Bi04a_8

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

A simple example:

normal dice

“Mafia” dice

0 1 2 3 4 5 6

1/6

1

p

F

0 1 2 3

p F1

1/6

( )

( ) ( )

1

1

1=

=

=

=

L

li

l

p l

F i p l

5

Bi04a_9

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

2. Continuous Distributionsrandom variable takes arbitrary (real) values within given intervallex.: ξ ∈ , 0 ≤ ξ ≤ 1

-2 ≤ ξ ≤ 50 ≤ ξ ≤ ∞

-∞ ≤ ξ ≤ ∞

finite intervals

infinite intervals

Bi04a_10

© Copyright W. Schreiner 2005

p(x) is defined such that

Pr(x ≤ ξ ≤ x + dx) = p(x)dx dx ... infinitesimally small interval

shaded area

[a,b] ... “real” intervall Normalization for probability density function (p.d.f.)“irgendwo muß ξ ja liegen”

( ) ( )Pr 1∞

−∞

−∞ ≤ ≤ ∞ = =∫ p dξ ξ ξ( ) ( )Pr ≤ ≤ = ∫b

a

a b p dξ ξ ξ

Basic Basic ConceptsConcepts revisitedrevisitedContinuous Distributions, ctd.

Since the random variable can take any (in between) value, the concept of

discrete probability values

must be generalized to a

continuous probability distribution function

uniform distributionp(x)1

x x+dx1

ξ

6

Bi04a_11

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

Continuous Distributions, ctd.specifically consider interval [-∞,x]

P(-∞ ≤ ξ ≤ x ) =

probabilitydensityfunction(p.d.f.)

cumulativedistribution

function

if this holds, we know from calculus:

F(-∞) = 0F(+∞) = 1F(x) is increasing since p(x)≥0

F(x)p

x 1ξ

1p(x)

( ) ( ) ( )Pr= = ≤∫x

p d F x xξ ξ ξ

( ) ( )=dp x F xdx

Bi04a_12

© Copyright W. Schreiner 2005

( )

( )

11 2

0 0

2 2

00

2 1 0 12

= = − =

= =

∫x

x

xp x dx

p d xξ ξ ξ

Basic Basic ConceptsConcepts revisitedrevisited

Example: linearly increasing p.d.f. on [0,1]:p(x) = 2x

Normalization:

Cumulativedistribution function:p(x)

F(x)

7

Bi04a_13

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

high resolution

from Kalos (1986)

Bi04a_14

© Copyright W. Schreiner 2005

Basic Basic ConceptsConcepts revisitedrevisited

how to get a random variable ξ?physically:

throw 10 sided prism to construct real number as 0.n1n2n3n4...nL ∈ [0,1)

on a computer:

use random numbergenerator! (see expendix)

from Morgan (1984)

8

Bi04a_15

© Copyright W. Schreiner 2005

ExpendixExpendix 1: 1: HowHow to to generategenerate randomrandom numbersnumbers

„Real“ random numbers (fromnature, i.e. physical process)

„Pseudo“ random numbers (PRNs) (from computer programs)

xn+1=a•xn+b (mod m)„congruential method“e.g. a = 1573, b = 19, m = 1000, x0 =89

seed

from Morgan (1984)

[0,1]= ∈nn

x uniform distributedm

ξ

Bi04a_16

© Copyright W. Schreiner 2005

ReallyReally randomrandom versus versus PseudoPseudo randomrandom..

„Real“ random numbers (fromnature, i.e. physical process)

„Pseudo“ random numbers (PRNs) (from computer programs)

no prediction possiblerandomness (of e.g. dice) due to uncertaincy principle of quantum physics

strictly predictable“randomness” due to algorithm, its parameters & seed

statistical tests will not reveal systematic predictabilityawkward for simulation

statistical tests will not easily reveal systematic predictabilityhowever: beware of loopshandy for simulation

9

Bi04a_17

© Copyright W. Schreiner 2005

ExpendixExpendix 1: 1: PRNsPRNs QualityQuality

„Bad Generator“ „Improved Generator“

high resolution high resolution

from Morgan (1984)

Bi04a_18

© Copyright W. Schreiner 2005

ExpendixExpendix 1: 1: PRNsPRNs fromfrom mostmost commoncommondistributionsdistributions

ξ ∈ [0,1] uniformly distributed is available in (almost) every programinglanguage (library)

ξ ∈ N(µ,σ) normal (Gaussian) distribution

is available in many programing language (libraries)mean standard deviation

high resolution

high resolutionfrom Kalos (1986)

from Kalos (1986)

10

Bi04a_19

© Copyright W. Schreiner 2005

ExpendixExpendix 1:1:Ho to Ho to getget PRNsPRNs fromfrom otherother distributionsdistributions

statistical and mathematical program librariesderiving desired distribution from uniform PRNs on [0,1]:

inversion methodother, even more sophisticated methods ...rejection method (works for arbitrary distributions wanted!)table lookup-method

Bi04a_20

© Copyright W. Schreiner 2005

ExpendixExpendix 1: Linear 1: Linear transformationstransformations of of PRNsPRNsuniform PRNs → uniform PRNsGaussian PRNs → Gaussian PRNsarbitrary distributed PRNs → (? not generally predictable)

ξ →N. ξ ∈[0,N]

ξ →2 ξ-1 ∈[-1,+1]

ξ →A+(B-A) ξ ∈[A,B]

CAUTION:what happens at start & end of interval!

!

10

0 N

trafo

A B-

0 1

trafo

-1

1

+1

0

0

trafo

11

Bi04a_21

© Copyright W. Schreiner 2005

( ) ( ) ( ) ( )

( ) ( )

( ) ( )

( )1

Pr

= ≤ = ≤ =

= ⋅

= ⋅

⎛ ⎞= ⎜ ⎟⎝ ⎠

y x

xy

y x

dF y Y y X x F Xdy

d F x dxf ydx dy

dxf y f xdy

dyf xdx

ExpendixExpendix 1: General 1: General transformationstransformations of of PRNsPRNs

considercumulativedistribution

functionfor

argumentation

And for details: see Morgan (1984), p.29 or Kalos (1986), p.40

taking absolute values | | will make formulavalid for decreasing and increasingtransformation functions!

high resolution

from Morgan (1984)

Apply differentiatiation operator to both sides of equation!

Use chain-rule for differentiation

Bi04a_22

© Copyright W. Schreiner 2005

ExpendixExpendix 1: General 1: General transformationstransformations of of PRNsPRNs

An illustration of thetransformationy = √x

relating the densities

fX(x) = e-x, fY (y) = 2ye-y2.

The shaded regions haveequal areas.

high resolution

from Morgan (1984)

12

Bi04a_23

© Copyright W. Schreiner 2005

ExpendixExpendix 1: 1: RandomRandom numbersnumbers fromfrom librarieslibraries

high resolution

SAS NAG IMSLNORMAL generates a normally distributed

pseudo-random variateG05CAF uniform over (0,1) nextBeta beta distribution

RANBIN generates an observation from abinomial distribution

G05DAF uniform over (a,b) nextBinomial binomial distribution

RANCAU generates a Cauchy deviate G05DBF exponential nextCauchy Cauchy distributionRANEXP generates an exponential deviate G05DDF Normal nextChiSquared Chi-squared distributionRANGAM generates an observation from a

gamma distributionG05DYF discrete uniform nextExponential standard exponential

distributionRANNOR generates a normal deviate G05DRF Poisson nextExponentialMix mixture of two

exponential distributionsRANPOI generates an observation from a

Poisson distributionG05FEF Beta distribution (multiple) nextGamma standard gamma

distributionRANTBL generates deviates from a tabled

probability mass functionG05DFF Cauchy distribution nextGeometric geometric distribution

RANTRI generates an observation from atriangular distribution

G05DHF Chi-square distribution nextHypergeometric hypergeometricdistribution

RANUNI generates a uniform deviate G05DKF F-distribution nextLogarithmic logarithmic distributionUNIFORM generates a pseudo-random variate

uniformly distributed on theinterval (0,1)

G05FFF Gamma distribution (multiple) nextLogNormal lognormal distribution

G05DCF Logistic distribution nextMultivariateNormal multivariate normaldistribution

G05DEF Lognormal distribution nextNegativeBinomial negative binomialdistribution

G05DJF Student's t-distribution nextNormal standard normaldistribution using aninverse CDF method

G05FSF von Mises distribution nextNormalAR standard normaldistribution using anacceptance/rejectionmethod

G05DPF Weibull distribution nextPoisson Poisson distributionnextStudentsT Student's t distributionnext Triangular triangular distributionnextVonMises von Mises distributionnextWeibull Weibull distribution

Bi04a_24

© Copyright W. Schreiner 2005

ExpendixExpendix 1: 1: RejectionRejection methodmethod

wanted distribution

(for some peculiar reason...)

a

ξ

p

b

• Rejection method is simple!

• works for the most fancydistributions

• „normalizes itself“!recipe:

• generate ξ ∈ [a,b], uniform

• generate η ∈ [0, max(p)], uniform

• if η < p(ξ) ⇒ take ξ as next random number ( Points)otherwise discard (ξ, η) ( Points)

13

Bi04a_25

© Copyright W. Schreiner 2005

ExpendixExpendix 1: 1: RejectionRejection methodmethod, , ctdctd..rejection method is simple but:

may be inefficient: „Simple but dead slow!“

Bi04a_26

© Copyright W. Schreiner 2005

ExpendixExpendix 1: 1: SelectSelect randomlyrandomly fromfrom a a groupgroup of n of n individualsindividuals accordingaccording to to weightsweights ((probabilitiesprobabilities))

(„table lookup method for a discrete random variable“)

1) Case 1: equal probabilities (=weights)

equidistant intervals

individual #

ξ

0 1/n 2/n 3/n . . . . . . . . . . . . . . .(n-1)/n n/n = 1

1 2 3 n

recipe:

• draw pseudo random number ξ from uniform distribution over [0,1]

• individual selected: ( )int 1, min ,1/⎡ ⎤= + =⎢ ⎥⎣ ⎦

i i i nnξ

14

Bi04a_27

© Copyright W. Schreiner 2005

( ) ( )1

/ /

1 1: 1

−− −

= =

⎡ ⎤= =⎢ ⎥⎢ ⎥⎣ ⎦∑ ∑i in n

E s E si i

i ip e e Normalization pτ τ

ExpendixExpendix 1: 1: SelectSelect randomlyrandomly fromfrom a a groupgroup of n of n individualsindividuals accordingaccording to to weightsweights ((probabilitiesprobabilities))

2) Case 2: different probabilities (=weights), e.g. pi proportional e-E(si)/τ

recipe:

• draw pseudo random number ξ from uniform distribution over [0,1]

• individual i selected so that:

ξ

equidistant intervals

individual # 1 2 3 4 5 6 7 . . . n

0 1

(„table lookup method for a discrete random variable“)

1

1 1

= =

< ≤∑ ∑i i

l ll l

p pξ

Bi04a_28

© Copyright W. Schreiner 2005

LiteratureLiterature on on RandomRandom NumbersNumbers

Morgan,B.J.T. 1984. Elements of Simulation. Chapman and Hall, New York.

Rosanow,J.A. 1974. Wahrscheinlichkeitstheorie. Rowohlt Taschenbuch Verlag, Hamburg.

Kalos,M.H. and P.A.Whitlock 1986. Monte Carlo methods. Wiley, New York.

SAS Institute. 1985. SAS User's Guide: Basics. Cary.

NAg. 1993. Fortran Library Mark 16. Oxford.

JMSL Reference Manual. 2002. Visual Numerics, San Ramon. http://www.vni.com/books/docs/