mathematical biology (ust)

Mathematical Biology

Lecture notes for MATH 365

Jeffrey R. Chasnov

The Hong Kong University of

Science and Technology

The Hong Kong University of Science and TechnologyDepartment of MathematicsClear Water Bay, Kowloon

Hong Kong

Copyright c© 2009 by Jeffrey Robert ChasnovAll rights reserved.

Unlimited permission to copy or use is hereby granted subject to inclusion of this

copyright notice.

Preface

What follows are my lecture notes for Math 365: Mathematical Biology, taughtat the Hong Kong University of Science and Technology. This applied mathe-matics course is primarily for final year mathematics major and minor students.Other students are also welcome to enroll, but must have the necessary mathe-matical skills.

My main emphasis is on mathematical modeling, with biology the sole ap-plication area. I assume that students have no knowledge of biology, but I hopethey will learn a substantial amount during the course. Students are requiredto know differential equations and linear algebra, and this usually means havingtaken two courses in these subjects. I will also touch on topics in stochas-tic modeling, which requires some knowledge of probability. A full course onprobability, however, is not a prerequisite though may be helpful.

Biology, as is usually taught, requires memorizing a wide selection of factsand remembering them for the exams, sometimes forgetting them soon after.For students exposed to biology in secondary school, my course may seem likea different subject. The ability to model problems using mathematics requiresalmost no rote memorization, but does require a deep understanding of basicprinciples and a wide range of mathematical techniques. Biology offers a richnumber of topics to practice modeling and to exercise applied mathematics, andI chose the specific topics that I find to be the most interesting.

If, as a UST student, you have not yet decided whether to take my course,please browse these lecture notes to see if you too are interested in these topics.Other web surfers are welcome to download these notes and to use them freelyfor teaching and learning. I welcome any comments, suggestions, or correctionssent to me by email ([email protected]). Although most of the materialin my notes can be found elsewhere, I hope some of it will be considered original.

Jeffrey R. ChasnovHong Kong

May 2009

iii

Contents

1 Population Growth 11.1 Deterministic model of population growth . . . . . . . . . . . . . 11.2 Stochastic model of population growth . . . . . . . . . . . . . . . 21.3 Asymptotics of large initial populations . . . . . . . . . . . . . . 5

1.3.1 Derivation of deterministic model . . . . . . . . . . . . . . 71.3.2 Derivation of Normal probability distribution . . . . . . . 9

1.4 Simulation of population growth . . . . . . . . . . . . . . . . . . 12

2 Age-structured populations 172.1 Fibonacci’s rabbits . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 The golden ratio Φ . . . . . . . . . . . . . . . . . . . . . . 192.2 Rabbits are an age-structured population . . . . . . . . . . . . . 202.3 Discrete age-structured population . . . . . . . . . . . . . . . . . 212.4 Continuous age-structured population . . . . . . . . . . . . . . . 252.5 The brood size of a hermaphroditic worm . . . . . . . . . . . . . 26

3 Population Dynamics 333.1 Logistic equation for population growth . . . . . . . . . . . . . . 333.2 A model of species competition . . . . . . . . . . . . . . . . . . . 383.3 Lotka-Volterra predator-prey model . . . . . . . . . . . . . . . . 40

4 Infectious Disease Modeling 454.1 SI model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 SIS model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.3 SIR epidemic disease model . . . . . . . . . . . . . . . . . . . . . 474.4 SIR endemic disease model . . . . . . . . . . . . . . . . . . . . . 504.5 Vaccination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.6 Evolution of virulence . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Population Genetics 575.1 Haploid genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.1 Spread of a favored allele . . . . . . . . . . . . . . . . . . 595.1.2 Mutation-selection balance . . . . . . . . . . . . . . . . . 60

5.2 Diploid genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.1 Sexual reproduction . . . . . . . . . . . . . . . . . . . . . 635.2.2 Spread of a favored allele . . . . . . . . . . . . . . . . . . 655.2.3 Mutation-selection balance . . . . . . . . . . . . . . . . . 685.2.4 Heterosis . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

v

vi CONTENTS

5.3 Frequency-dependent selection . . . . . . . . . . . . . . . . . . . 715.4 Linkage equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . 735.5 Random genetic drift . . . . . . . . . . . . . . . . . . . . . . . . 78

6 Biochemical Reactions 836.1 Law of mass action . . . . . . . . . . . . . . . . . . . . . . . . . . 836.2 Enzyme kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.3 Competitive inhibition . . . . . . . . . . . . . . . . . . . . . . . . 876.4 Allosteric inhibition . . . . . . . . . . . . . . . . . . . . . . . . . 906.5 Cooperativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Sequence Alignment 977.1 DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.2 Brute force alignment . . . . . . . . . . . . . . . . . . . . . . . . 997.3 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . 1027.4 Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.5 Local alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Chapter 1

Deterministic andStochastic Modeling ofPopulation Growth

Populations may grow in size when the birth rate exceeds the death rate. If weneglect interactions of individuals with their environment or with each other,then modeling population growth should be easy, and indeed it is, providedpopulation sizes are large. But smaller populations exhibit stochastic effectsand these can considerably complicate the modeling. Since in general, modelingstochastic processes in biology is an important yet difficult topic, we will spendsome time here analyzing the simplest model of births in finite populations.

1.1 Deterministic model of population growth

Let N(t) be the number of individuals in a population at time t, and b and dthe average per capita birth rate and death rate, respectively. In a short time∆t, the number of births in the population is b∆tN , and the number of deathsis d∆tN . An equation for N at time t+ ∆t is then determined to be

N(t+ ∆t) = N(t) + b∆tN(t)− d∆tN(t),

which can be rearranged to

N(t+ ∆t)−N(t)∆t

= (b− d)N(t),

and as ∆t→ 0,dN

dt= (b− d)N.

With an initial population size of N0, the solution shows that the populationsize grows exponentially:

N(t) = N0e(b−d)t,

provided b > d. Our deterministic approach to population growth is clearlyan approximation: the precise number of births and deaths in the population

1

2 CHAPTER 1. POPULATION GROWTH

during a time ∆t should, in principle, be a random variable since b and d areonly average rates. An exact equation within our simple model, however, couldhave been obtained by working with the expected number of individuals in apopulation at time t, 〈N〉(t), rather than with the actual number of individuals,so that

〈N〉(t) = N0e(b−d)t.

Other important statistics such as the variance of N(t) remain unknown.In general, a deterministic approach to modeling is simplest and is usually

the first approach to a problem when the biology of interest is unaffected bystochastic effects. Nevertheless, some fundamental problems such as extinctionor genetic drift require a stochastic approach, and we show how to formulatea stochastic model for the simplest case of births in a population of specifiedinitial size. Inclusion also of deaths will be left as an exercise.

1.2 Stochastic model of population growth

The size of the population N is now considered to be a discrete random variable.We define the time-dependent probability mass function pN (t) of N to be theprobability that the population is of size N at time t. Since N must take onone of the values from zero to infinity, we have

∞∑N=0

pN (t) = 1,

for all t ≥ 0. Again, let b be the average per capita birth rate. We make thesimplifying approximations that all births are singlets, and that the probabilityof an individual giving birth is independent of past birthing history. We canthen interpret b probabilistically by supposing that as ∆t → 0, the probabilitythat an individual gives birth during the time ∆t is given by b∆t. For example,if the average per capita birthrate is one offspring every 365-day year, then theprobability that a given individual gives birth in a given day is 1/365. We neglectprobabilities of more than one birth in the population in the time interval ∆tsince they are of order (∆t)2 or higher, and we will be considering the limit as∆t → 0. Furthermore, we will suppose that at t = 0, the population size isknown to be N0, so that pN0(0) = 1, with all other pN ’s at t = 0 equal to zero.

We can determine a system of differential equations for the probability massfunction pN (t) as follows. For a population to be of size N > 0 at a time t+∆t,either it was of size N − 1 at time t and one birth occured, or it was of size Nat time t and there were no births, that is

pN (t+ ∆t) = pN−1(t)b(N − 1)∆t+ pN (t)(1− bN∆t).

Subtracting pN (t) from both sides, dividing by ∆t, and taking the limit ∆t→ 0results in the forward Kolmogorov differential equations:

dpNdt

= b [(N − 1)pN−1 −NpN ] , N = 1, 2, . . . (1.1)

where p0(t) = p0(0) since a population of zero size remains zero. This systemof coupled first-order linear differential equations can be solved iteratively.

1.2. STOCHASTIC MODEL OF POPULATION GROWTH 3

We first review how to solve a first-order linear differential equation of theform

dy

dt+ ay = g(t), y(0) = y0, (1.2)

where y = y(t) and a is constant. First, we look for an integrating factor µ suchthat

d

dt(µy) = µ

(dy

dt+ ay

).

Differentiating the left-hand-side and multiplying out the right-hand-side resultsin

dµ

dty + µ

dy

dt= µ

dy

dt+ aµy;

and cancelling terms yieldsdµ

dt= aµ.

We may integrate this equation with arbitrary initial condition, so for simplicitywe take µ(0) = 1. Therefore, µ(t) = eat. Hence,

d

dt

(eaty

)= eatg(t).

Integrating this equation from 0 to t, yields

eaty(t)− y(0) =∫ t

0

easg(s)ds.

Therefore, the solution is

y(t) = e−at(y(0) +

∫ t

0

easg(s)ds). (1.3)

The forward Kolmogorov differential equation (1.1) is of the form Eq. (1.2)with a = bN and g(t) = b(N − 1)pN−1. With the population size known to beN0 at t = 0, the initial conditions can be written succinctly as pN (0) = δN,N0 ,where δij is the Kronecker delta, defined as

δij ={

0 for i 6= j,1 for i = j.

Therefore, formal integration of (1.1) using Eq. (1.3) results in

pN (t) = e−bNt[δN,N0 + b(N − 1)

∫ t

0

ebNspN−1(s)ds]. (1.4)

The first few solutions of (1.4) can now be obtained by successive integra-tions:

pN (t) =

0 if N < N0

e−bN0t if N = N0

N0e−bN0t[1− e−bt] if N = N0 + 1

12N0(N0 + 1)e−bN0t[1− e−bt]2 if N = N0 + 2. . . if . . .

Although we will not need this, for your information I will show you the formof the complete solution. Defining the binomial coefficient as the number of


ways one can select k objects from a set of n identical objects, where order ofselection is immaterial: (

n

k

)=

n!k!(n− k)!

,

(read as “n choose k”), the general solution for pN (t), N ≥ N0, is known to be

pN (t) =(N − 1N0 − 1

)e−bN0t[1− e−bt]N−N0 ,

which statisticians call a “shifted negative binomial distribution.” Determiningthe time-evolution of the probability mass function of N completely solves thestochastic problem.

Of usual main interest is the mean and variance of the population size,and although both could in principle be computed from the probability massfunction, we will compute them directly from the differential equation for pN .The definitions of the mean population size 〈N〉 and its variance σ2 are

〈N〉 =∞∑N=0

NpN , σ2 =∞∑N=0

(N − 〈N〉)2pN , (1.5)

and we will make use of the equality

σ2 = 〈N2〉 − 〈N〉2. (1.6)

Multiplying the differential equation (1.1) by the constant N , summing over N ,and using pN = 0 for N < N0, we obtain

d〈N〉dt

= b

[ ∞∑N=N0+1

N(N − 1)pN−1 −∞∑

N=N0

N2pN

].

Now N(N−1) = (N−1)2+(N−1), so that the first term on the right-hand-sideis

∞∑N=N0+1

N(N − 1)pN−1 =∞∑

N=N0+1

(N − 1)2pN−1 +∞∑

N=N0+1

(N − 1)pN−1

=∞∑

N=N0

N2pN +∞∑

N=N0

NpN ,

where the second equality was obtained by shifting the summation index down-ward by one. Therefore, we find the familiar equation

d〈N〉dt

= b〈N〉.

Together with the initial condition 〈N〉(0) = N0, we have the solution 〈N〉(t) =N0 exp(bt). We proceed similarly to find σ2 by first determining the differentialequation for 〈N2〉. Multiplying the differential equation for pN , Eq. (1.1) by N2

and summing over N results in

d〈N2〉dt

= b

[ ∞∑N=N0+1

N2(N − 1)pN−1 −∞∑

N=N0

N3pN

].

1.3. ASYMPTOTICS OF LARGE INITIAL POPULATIONS 5

Here, we use the equality N2(N−1) = (N−1)3+2(N−1)2+(N−1). Proceedingin the same way as above by shifting the index downward, we obtain

d〈N2〉dt

− 2b〈N2〉 = b〈N〉,

which is a first-order linear inhomogeneous equation for 〈N2〉 since 〈N〉 is known,and can be solved using an integrating factor. The solution obtained usingEq. (1.3) is

〈N2〉 = e2bt

(N2

0 +∫ t

0

e−2bsb〈N〉(s)ds),

with 〈N〉(t) = N0ebt. Performing the integration, we obtain

〈N2〉 = e2bt[N2

0 +N0(1− e−bt)].

Finally, using σ2 = 〈N2〉 − 〈N〉2, we obtain the variance. Thus we collect ourfinal results for the population mean and variance:

〈N〉 = N0ebt, σ2 = N0e

2bt(1− e−bt). (1.7)

1.3 Asymptotics of large initial populations

Here, we determine the limiting form of the probability distribution when theinitial population size is large. Our goal is to solve to leading-orders an expan-sion of the distribution in powers of 1/N0; notice 1/N0 is small if N0 is large.To zeroth-order, that is in the limit N0 → ∞, we show that the deterministicmodel of population growth is recovered. To first-order in 1/N0, we show thatthe probability distribution is Normal. The latter result will be shown to be aconsequence of the well-known Central Limit Theorem in probability theory.

We develop our expansion by working directly with the differential equationfor pN (t). Now, when the population size N is a discrete random variable(taking only nonnegative integer values 0, 1, 2, . . .), pN (t) is the probability massfunction for N . If N0 is large, then the discrete nature of N is inconsequential,and it is preferable to work with a continuous random variable and its probabilitydensity function. Accordingly, we define the random variable x = N/N0, andtreat x as a continuous random variable, with 0 ≤ x < ∞. Now pN (t) is theprobability that the population is of size N at time t, and the probability densityfunction of x, P (x, t), is defined such that

∫ baP (x, t)dx is the probability that a ≤

x ≤ b. The relationship between p and P can be determined by considering howto approximate a discrete probability distribution by a continuous distribution,that is by defining P such that

pN (t) =∫ (N+ 1

2 )/N0

(N− 12 )/N0

P (x, t)dx

= P (N/N0, t)/N0

where the last equality becomes exact as N0 → ∞. Therefore, the appropriatedefinition for P (x, t) is given by

P (x, t) = N0pN (t), x = N/N0, (1.8)


which satisfies∫ ∞0

P (x, t)dx =∞∑N=0

P (N/N0, t)(1/N0) =∞∑N=0

pN (t) = 1,

the first equality (exact only when N0 →∞) being a Reimann sum approxima-tion to the integral.

We now transform the infinite set of ordinary differential equations (1.1)for pN (t) into a single partial differential equation for P (x, t). We multiply(1.1) by N0, and substitute N = N0x, pN (t) = P (x, t)/N0, and pN−1(t) =P (x− 1

N0, t)/N0 to obtain

∂P (x, t)∂t

= b

[(N0x− 1)P (x− 1

N0, t)−N0xP (x, t)

]. (1.9)

We next Taylor series expand P (x−1/N0, t) around x, treating 1/N0 as a smallparameter. That is, we make use of

P (x− 1N0

, t) = P (x, t)− 1N0

Px(x, t) +1

2N20

Pxx(x, t)− . . .

=∞∑n=0

(−1)n

n!Nn0

∂nP

∂xn.

The two leading terms proportional to N0 on the right-hand-side of (1.9) cancelexactly, and if we group the remaining terms in powers of 1/N0, we obtain forthe first three leading terms in the expansion

Pt = −b[(xPx + P )− 1

N0

(xPxx

2!+Px1!

)+

1N2

0

(xPxxx

3!+Pxx2!

)− . . .

]= −b

[(xP )x −

1N02!

(xP )xx +1

N20 3!

(xP )xxx − . . .]

; (1.10)

and higher-order terms can be obtained by following the evident pattern.Equation (1.10) may be further analyzed by a perturbation expansion of the

probability density function in powers of 1/N0:

P (x, t) = P (0)(x, t) +1N0

P (1)(x, t) +1N2

0

P (2)(x, t) + . . . . (1.11)

Here, the unknown functions P (0)(x, t), P (1)(x, t), P (2)(x, t), etc. are to bedetermined by substituting the expansion (1.11) into (1.10) and equating coef-ficients of powers of 1/N0. We thus obtain for the coefficients of (1/N0)0 and(1/N0)1,

P(0)t = −b

(xP (0)

)x, (1.12)

P(1)t = −b

[(xP (1)

)x− 1

2

(xP (0)

)xx

]. (1.13)


1.3.1 Derivation of deterministic model

The zeroth-order term in the perturbation expansion (1.11),

P (0)(x, t) = limN0→∞

P (x, t),

satisfies (1.12). Equation (1.12) is a first-order linear partial differential equationwith a variable coefficient. One way to solve this equation is to try the ansatz

P (0)(x, t) = h(t)f(r), r = r(x, t), (1.14)

together with the initial condition

P (0)(x, 0) = f(x),

since at t = 0, the probability distribution is assumed to be a known function.This initial condition imposes further useful constraints on the functions h(t)and r(x, t):

h(0) = 1, r(x, 0) = x.

The partial derivatives of Eq. (1.14) are

P(0)t = h′f + hrtf

′, P (0)x = hrxf

′,

which after substitution into Eq. (1.12) results in

(h′ + bh) f + (rt + bxrx)hf ′ = 0.

This equation can be satisfied for any f provided that h(t) and r(x, t) satisfy

h′ + bh = 0, rt + bxrx = 0. (1.15)

The first equation for h(t), together with the initial condition h(0) = 1, iseasily solved to yield

h(t) = e−bt. (1.16)

To determine a solution for r(x, t), we try the technique of separation of vari-ables. We write r(x, t) = X(x)T (t), and upon substitution into the differentialequation for r(x, t), we obtain

XT ′ + bxX ′T = 0;

and division by XT and separation yields

T ′

T= −bxX

′

X. (1.17)

Since the left-hand side is independent of x, and the right-hand-side is inde-pendent of t, both the left-hand-side and right-hand-side must be constant,independent of both x and t. Now, our initial condition is r(x, 0) = x, so thatX(x)T (0) = x. Without lose of generality, we can take T (0) = 1, so thatX(x) = x. The right-hand-side of (1.17) is therefore equal to the constant −b,and we obtain the differential equation and initial condition

T ′ + bT = 0, T (0) = 1,


which we can solve to yield T (t) = e−bt. Therefore, our solution for r(x, t) is

r(x, t) = xe−bt. (1.18)

Putting together our solutions (1.16) and (1.18) into our ansatz (1.14), we haveobtained the general solution to the pde:

P (0)(x, t) = e−btf(xe−bt).

To determine f , we apply the probability mass function initial conditionpN (0) = δN,N0 . From Eq. (1.8), the corresponding initial condition on theprobability distribution function is

P (x, 0) ={N0 if 1− 1

2N0≤ x ≤ 1 + 1

2N0,

0 otherwise .

In the limit N0 → ∞, P (x, 0) → P (0)(x, 0) = δ(x − 1), where δ(x − 1) is theDirac delta-function, centered around 1. The delta-function is widely used inquantum physics and was introduced by Dirac for that purpose. It now findsmany uses in applied mathematics. It can be defined by requiring that for anyfunction g(x), ∫ +∞

−∞g(x)δ(x)dx = g(0).

The usual view of the delta-function δ(x−a) is that it is zero everywhere exceptat x = a at which it is infinite, and its integral is one. It is not really a function,but is what mathematicians call a distribution.

Now, since P (0)(x, 0) = f(x) = δ(x− 1), our solution becomes

P (0)(x, t) = e−btδ(xe−bt − 1). (1.19)

This can be rewritten by noting that (letting y = ax− c),∫ +∞

−∞g(x)δ(ax− c)dx =

1a

∫ +∞

−∞g((y + c)/a)δ(y)dy =

1ag(c/a),

yielding the identity

δ(ax− c) =1aδ(x− c

a).

From this, we can rewrite our solution (1.19) in the more intuitive form

P (0)(x, t) = δ(x− ebt). (1.20)

Using (1.20), the zeroth-order expected value of x is

〈x0〉 =∫ ∞

0

xP (0)(x, t)dx

=∫ ∞

0

xδ(x− ebt)dx

= ebt;


while the zeroth-order variance is

σ2x0

= 〈x20〉 − 〈x0〉2

=∫ ∞

0

x2P (0)(x, t)dx− e2bt

=∫ ∞

0

x2δ(x− ebt)dx− e2bt

= e2bt − e2bt

= 0.

Thus in the infinite population limit, the random variable x has zero variance,and is therefore no longer random, but follows x = ebt deterministically. We saythat the probability distribution of x becomes sharp in the limit of large popula-tion sizes. The general principle of modeling large populations deterministicallycan simplify mathematical models when stochastic effects are unimportant.

1.3.2 Derivation of Normal probability distribution

We now consider the first-order term in the perturbation expansion (1.11), whichsatisfies (1.13). We do not know how to solve (1.13) directly, so we will attemptto find a solution following a more circuitous route. First, we proceed by com-puting the moments of the probability distribution. We have

〈xn〉 =∫ ∞

0

xnP (x, t)dx

=∫ ∞

0

xnP (0)(x, t)dx+1N0

∫ ∞0

xnP (1)(x, t)dx+ . . .

= 〈xn0 〉+1N0〈xn1 〉+ . . . .

where the last equality defines 〈xn0 〉, etc. Now, using (1.20),

〈xn0 〉 =∫ ∞

0

xnP (0)(x, t)dx

=∫ ∞

0

xnδ(x− ebt)dx

= enbt. (1.21)

To determine 〈xn1 〉 we need to start with the pde (1.13). Mutiplying by xn andintegrating, we have

d〈xn1 〉dt

= −b[∫ ∞

0

xn(xP (1)

)xdx− 1

2

∫ ∞0

xn(xP (0)

)xxdx

]. (1.22)

We integrate by parts to remove the derivatives of xP , assuming that xP andall its derivatives vanish on the boundaries of integration (x equal to zero andinfinity). For example∫ ∞

0

xn(xP (1)

)xdx = −

∫ ∞0

nxnP (1)dx

= −n〈xn1 〉,


and

12

∫ ∞0

xn(xP (0)

)xxdx = −n

2

∫ ∞0

xn−1(xP (0)

)xdx

=n(n− 1)

2

∫ ∞0

xn−1P (0)dx

=n(n− 1)

2〈xn−1

0 〉.

Therefore, after integration by parts, (1.22) becomes

d〈xn1 〉dt

= −b[n〈xn1 〉+

n(n− 1)2

〈xn−10 〉

]. (1.23)

Equation (1.23) is a first-order linear inhomogeneous differential equation andcan be solved using an integrating factor (see (1.3) and preceeding discussion).Solving this differential equation using (1.21) and initial condition 〈xn1 〉(0) = 0,we obtain

〈xn1 〉 =n(n− 1)

2enbt

(1− e−bt

). (1.24)

The probability distribution function, accurate to order 1/N0, may be obtainedby making use of the so-called moment generating function Ψ(s), defined as

Ψ(s) = 〈esx〉

= 1 + s〈x〉+s2

2!〈x2〉+

s3

3!〈x3〉+ . . .

=∞∑n=0

sn〈xn〉n!

.

To order 1/N0, we have

Ψ(s) =∞∑n=0

sn〈xn0 〉n!

+1N0

∞∑n=0

sn〈xn1 〉n!

+ O(1/N20 ). (1.25)

Now, using (1.21),

∞∑n=0

sn〈xn0 〉n!

=∞∑n=0

(sebt

)nn!

= esebt

,

and using (1.24),

∞∑n=0

sn〈xn1 〉n!

=12(1− e−bt

) ∞∑n=0

n(n− 1)n!

(sebt

)n=

12(1− e−bt

)s2e2bt

∞∑n=0

1n!n(n− 1)

(sebt

)n−2

=12(1− e−bt

)s2∞∑n=0

1n!

∂2

∂s2

(sebt

)n


=12(1− e−bt

)s2 ∂

2

∂s2

∞∑n=0

(sebt

)nn!

=12(1− e−bt

)s2 ∂

2

∂s2

(ese

bt)

=12(1− e−bt

)s2e2btese

bt

.

Therefore,

Ψ(s) = esebt

(1 +

12N0

(1− e−bt

)s2e2bt + . . .

). (1.26)

We can recognize the parenthetical term of (1.26) as a Taylor-series expansionof an exponential function truncated at first-order, i.e.,

exp(

12N0

(1− e−bt

)s2e2bt

)= 1 +

12N0

(1− e−bt

)s2e2bt + O(1/N2

0 ).

Therefore, to first-order in 1/N0, we have

Ψ(s) = exp(sebt +

12N0

(1− e−bt

)s2e2bt

)+ O(1/N2

0 ). (1.27)

Standard books on probability theory (e.g., A first course in probability bySheldon Ross, pg. 365) detail the derivation of the moment generating functionof a normal random variable:

Ψ(s) = exp(〈x〉s+

12σ2s2

), for a normal random variable; (1.28)

and comparing (1.28) with (1.27) shows us that the probability distributionP (x, t) to first-order in 1/N0 is normal with mean and variance given by

〈x〉 = ebt, σ2x =

1N0

e2bt(1− e−bt

). (1.29)

The mean and variance of x = N/N0 is equivalent to those derived for N in(1.7), but now we learn that N is approximately normal for large populations.

The appearance of a normal probability distribution (also called a Gaussianprobability distribution) at first-order is in fact a particular case of the CentralLimit Theorem, one of the most important and useful theorems in probabilityand statistics. We state here a simple version of this theorem without proof:

Central Limit Theorem: Suppose X1, X2, . . ., Xn are independent and identi-cally distributed (iid) random variables with mean 〈X〉 and variance σ2

X . Thenfor sufficiently large n, the probability distribution of the average of the Xi’s,denoted as the random variable Z = 1

n

∑ni=1Xi, is well-approximated by a Gaus-

sian with mean 〈X〉 and variance σ2 = σ2X/n.

The central limit theorem can be applied directly to our problem. Consider thatour population consists of N0 founders. If mi(t) denotes the number of indi-viduals descendent from founder i at time t (including the still living founder),then the total number of individuals at time t is N(t) =

∑N0i=1mi(t); and the

average number of descendents of a single founder is x(t) = N(t)/N0. If the


mean number of descendents of a single founder is 〈m〉, with variance σ2m, then

applying the central limit theorem for large N0, the probability distributionfunction of x is well-approximated by a Gaussian with mean 〈x〉 = 〈m〉 andvariance σ2

x = σ2m/N0. Comparing with our results (1.29), we find 〈m〉 = ebt

and σ2m = e2bt(1− e−bt).

1.4 Simulation of population growth

As we have seen, stochastic modeling is significantly more complicated thandeterministic modeling. As the modeling becomes more detailed, it may becomenecessary to solve a stochastic model numerically. Here, for illustration, we showhow to simulate individual realizations of population growth.

A straightforward approach would make use of the birth rate b directly. Ina time ∆t, each individual of a population has probability b∆t of giving birth,accurate to order ∆t. If we have N individuals at time t, we can compute thenumber of individuals at time t+∆t by computing N random deviates (randomnumbers between zero and unity) and calculating the number of new births bythe number of random deviates less than b∆t. In principle, this approach willbe accurate provided ∆t is sufficiently small.

There is, however, a more efficient way to simulate population growth. Wefirst determine the probability density function of the interevent time τ , definedas the time required for the population to grow from size N to size N+1 becauseof a single birth. A simulation from population size N0 to size Nf would thensimply require computing Nf − N0 different random values of τ , a relativelyeasy and quick computation.

Let P (τ)dτ be the probability that the interevent time for a population ofsize N lies in the interval (τ, τ + dτ), F (τ) =

∫ τ0P (τ ′)dτ ′ the probability that

the interevent time is less than τ , and G(τ) = 1 − F (τ) the probability thatthe interevent time is greater than τ . P (τ) is called the probability densityfunction of τ , and F (τ) the cumulative distribution function of τ . They satisfythe relation P (τ) = F ′(τ). Since the probability that the interevent time isgreater than τ + ∆τ is given by the probability that it is greater than τ timesthe probability that no new births occur in the time interval (τ, τ + ∆τ), thedistribution G(τ + ∆τ) is given for small ∆τ by

G(τ + ∆τ) = G(τ)(1− bN∆τ).

Differencing G and taking the limit ∆τ → 0, yields the differential equation

dG

dτ= −bNG,

which can be integrated using initial condition G(0) = 1 to further yield

G(τ) = e−bNτ .

Therefore,F (τ) = 1− e−bNτ , P (τ) = bNe−bNτ ; (1.30)

P (τ) has the form of an exponential distribution.

1.4. SIMULATION OF POPULATION GROWTH 13

0

1

τ1 τ2

y1=F(τ1)

y2=F(τ2)y = F(τ) = 1 − e−bNτ

Figure 1.1: How to generate a random number with a given probability densityfunction.

We now compute random values for τ using its known cumulative distribu-tion function F (τ). We do this in reference to Figure 1.1, where we plot y = F (τ)versus τ . The range y of F (τ) is [0, 1), and for each y, we can compute τ byinverting y(τ) = 1− exp (−bNτ), that is

τ(y) = − ln (1− y)bN

.

Our claim is that with y a random number sampled uniformly on (0, 1), therandom numbers τ(y) have probability density function P (τ).

Now, the fraction of random numbers y in the interval (y1, y2) is equal tothe corresponding fraction of random numbers τ in the interval (τ1, τ2), whereτ1 = τ(y1) and τ2 = τ(y2); and for y sampled uniformly on (0, 1) this fraction isequal to y2 − y1. We thus have the following string of equalities: (the fractionof random numbers in (y1, y2)) =(the fraction of random numbers in (τ1, τ2))=y2 − y1 = F (τ2) − F (τ1) =

∫ τ2τ1P (τ)dτ . Therefore, (the fraction of random

numbers in (τ1, τ2)) =∫ τ2τ1P (τ)dτ . This is exactly the definition of τ having

probability density function P (τ).Below, we illustrate a simple MATLAB function which simulates one real-

ization of population growth from initial size N0 to final size Nf , with growthrate b.

function [t, N] = population_growth_simulation(b,N0,Nf)% simulates population growth from N0 to Nf with growth rate b


N=N0:Nf; t=0*N;y=rand(1,Nf-N0);tau=-log(1-y)./(b*N(1:Nf-N0)); % interevent timest=[0 cumsum(tau)]; % cumulative sum of interevent times

The function population growth simulation.m can be driven by a matlab scriptto compute realizations of population growth. For instance, the following scriptcomputes 100 realizations for a population growth from 10 to 100 with b = 1and plots all the realizations:

% calculate nreal realizations and plotb=1; N0=10; Nf=100;nreal=100;for i=1:nreal

[t,N]=population_growth_simulation(b,N0,Nf);plot(t,N); hold on;

endxlabel(’t’); ylabel(’N’);

In Figure 1.2 are three graphs, showing 25 realizations of population growthstarting with population sizes 10, 100, and 1000, and ending with populationsizes a factor of 10 larger. Observe that the variance, relative to the initialpopulation size, decreases as the initial population size increases, following ouranalytical result (1.29).

1.4. SIMULATION OF POPULATION GROWTH 15

0 1 2 30

50

100

t

N

0 1 2 30

500

1000

t

N

0 1 2 30

5000

10000

t

N

Figure 1.2: Twenty-five realizations of population growth with initial populationsizes 10, 100, and 1000, respectively.

Chapter 2

Age-structured populations

Determining the age-structure of a population helps governments plan economicdevelopment. Age-structure theory can also help evolutionary biologists betterunderstand a species’s life-history. An age-structured population occurs becauseoffspring are born to mothers at different ages. If average per capita birth anddeath rates at different ages are constant, then a stable age-structure arises.However, a rapid change in birth or death rates can cause the age-structureto shift distributions. In this section, we develop the theory of age-structuredpopulations. We will develop both discrete- and continuous-time models. Wealso present two interesting applications: (1) modeling age-structure changes inChina and other countries as these populations age, and; (2) modeling the lifecycle of a hermaphroditic worm. We begin this section, however, with what isone of the oldest problems in mathematical biology: Fibonacci’s rabbits. Thiswill lead us to a brief digression about the golden mean, rational approximationsand flower development, before returning to our main topic.

2.1 Fibonacci’s rabbits

In 1202, Fibonacci introduced the following puzzle, which we paraphrase here:

A certain man put a pair of newly born rabbits in a place surroundedon all sides by a wall. How many pairs of rabbits can be produced bythat pair in a year if it is supposed that every month each pair begetsa new pair, which from the second month on becomes productive?

To answer Fibonnaci’s question, we first need to decide how to count rabbits.Let an be the number of rabbit pairs at the start of the nth month, censusedbefore the birth of any new rabbits. There are twelve months in a year, sothe number of rabbits at the start of the 13th month minus the initial pair ofrabbits will be the solution to Fibonacci’s puzzle. Now a1 = 1 since this is theinitial newborn rabbit pair at the start of the first month; a2 = 1 since this isstill the initial rabbit pair before they have reproduced; and a3 = a2 + a1 sincea2 are the number of rabbit pairs present in the previous month (here equal toone), and a1 are the new rabbit pairs born to all rabbit pairs that are at leastone-month-old (here, also equal to one). In general

an+1 = an + an−1. (2.1)

17

18 CHAPTER 2. AGE-STRUCTURED POPULATIONS

For only 12 months we can simply count using (2.1) and a1 = a2 = 1, so theFibonacci numbers are

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, . . .

where a13 = 233, so that the solution to Fibonnaci’s puzzle is 232 rabbit pairs.Let us solve Eq. (2.1) for all the an’s. This equation is a second-order linear

difference equation, and to solve we can look for a solution of the form an = λn.Substitution into Eq. (2.1) yields

λn+1 = λn + λn−1,

or after division by λn−1:λ2 − λ− 1 = 0,

with solution

λ± =1±√

52

.

Define

Φ =1 +√

52

= 1.61803 . . . ,

and

φ =√

5− 12

= Φ− 1 = 0.61803 . . . .

Then λ+ = Φ and λ− = −φ. Also, notice that since Φ2 − Φ − 1 = 0, divisionby Φ yields 1/Φ = Φ− 1, so that

φ =1Φ.

As in the solution of linear homogeneous differential equations, the two valuesof λ can be used to construct a general solution to the linear difference equationusing the principle of linear superposition:

an = c1Φn + c2(−1)nφn.

Extending the Fibonacci sequence to a0 = 0 (since a0 = a2− a1), we satisfy theconditions a0 = 0 and a1 = 1:

c1 + c2 = 0,c1Φ− c2φ = 1.

Therefore, c2 = −c1, and c1(Φ + φ) = 1, or c1 = 1/√

5, c2 = −1/√

5. We canrewrite the solution as

an =1√5

Φn[1 + (−1)n+1Φ−2n

]. (2.2)

In this form, we see that as n→∞, an → Φn/√

5, and an+1/an → Φ.

2.1. FIBONACCI’S RABBITS 19

2.1.1 The golden ratio Φ

The number Φ is known as the golden ratio. Two positive numbers are said tobe in the golden ratio if the ratio between the sum of those numbers and thelarger one is the same as the ratio between the larger one and the smaller. Analgebraic solution for this ratio yields Φ. In some well-defined way, Φ can alsobe called the most irrational of the irrational numbers.

To understand why Φ has this distinction of most irrational number, we needfirst to understand an algorithm – continued fraction expansions – for approxi-mating irrational numbers by rational numbers. Let ni be positive integers. Toconstruct a continued fraction expansion to the positive irrational number x,we choose the largest possible ni’s which satisfy the following inequalities:

x > c1 = n1,

x < c2 = n1 + 1n2,

x > c3 = n1 + 1n2+ 1

n3

,

x < c4 = n1 + 1n2+ 1

n3+ 1n4

, etc.

There is a simple algorithm for calculating the ni’s. Firstly, c1 = n1 is theinteger part of x. One computes the remainder, x− c1, and takes its reciprical1/(x− c1). The integer part of this inverse is n2. One then takes the remainder1/(x − c1) − n2, and its reciprical. The integer part of this inverse is n3. Wefollow this algorithm to find successively better rational approximations to π =3.141592654 . . .:

c′s remainder 1/remainder

c1 = 3 0.141592654 7.06251329

c2 = 3 + 17 0.06251329 15.99659848

c3 = 3 + 17+ 1

150.99659848 1.00341313

c4 = 3 + 17+ 1

15+ 11

Follow how this works. The first rational approximation to π is c1 = 3, whichis less than π. The remainder is 0.141592654. The remainder’s reciprical is7.06251329. The integer part of this reciprical is 7, so our next rational ap-proximation to π is c2 = 3 + 1/7, which is larger than π since 1/7 is greaterthan 0.141592654. The next remainder is 7.06251329 − 7 = 0.06251329, andits reciprical is 15.99659848. Therefore, our next rational approximation isc3 = 3 + 1/(7 + (1/15)), and since 1/15 is greater than 0.06251329, c3 is lessthan π. Rationalizing the continued fractions gives us the following succes-sive rational number approximation to π: {3, 22/7, 333/106, 355/113, . . .}, or indecimal: {3, 3.1428 . . . , 3.141509 . . . , 3.14159292 . . .}.

Now, consider a rational approximation for Φ = 1.61803399 . . .:


c′s remainder 1/remainder

c1 = 1 0.61803399 . . . = φ Φ

c2 = 1 + 11 φ Φ

and so on. All the ni’s are one, and the sequence results in the slowest possibleconvergence for the ci’s. Another derivation of the continued fraction expansionfor Φ can be obtained from Φ2−Φ−1 = 0, written as Φ = 1+1/Φ. Then insertingthis expression for Φ back into the right-hand-side gives Φ = 1 + 1/(1 + 1/Φ),and iterating infinitum shows that all the ni’s are one. (As a side note, anotherinteresting expression for Φ comes from iterating Φ =

√1 + Φ.)

The sequence of rational approximations just constructed for Φ are 1, 2, 3/2,5/3, 8/5, 13/8, etc., which are the ratios of consecutive Fibonacci numbers. Wehave already shown that this ratio converges to Φ.

Because the golden ratio is the most irrational number, it has a way ofappearing unexpectedly in nature. One well-known example is in the structureof flower petals. In class, I will show you how to see Φ in a sunflower. Now, letus return to our discussion of age-structured populations.

2.2 Rabbits are an age-structured population

Fibonacci’s rabbits form an age-structured population and we can use this sim-ple case to illustrate the more general approach. Fibonacci’s rabbits can becategorized into two meaningful age classes: juveniles and adults. Juvenilesare less than one month old and do not reproduce; adults are greater than onemonth old and reproduce. Beginning with a newborn pair at the beginning ofthe first month, we census the population at the beginning of each subsequentmonth before adults give birth. We define two age classes. At the start of thenth month, let u1,n be the number of rabbit pairs that are less than one monthold (juveniles), and let u2,n be the number of rabbit pairs that are greater thanone month old (adults). Since each adult pair gives birth to a juvenile pair, thenumber of juvenile pairs at the start of the (n + 1)-st month is equal to thenumber of adult pairs at the start of the n-th month. And since the number ofadult pairs at the start of the (n+ 1)-th month is equal to the sum of adult andjuvenile pairs at the start of the n-th month, we have

u1,n+1 = u2,n,

u2,n+1 = u1,n + u2,n;

or written in matrix form(u1,n+1

u2,n+1

)=(

0 11 1

)(u1,n

u2,n

). (2.3)

Rewritten in vector form,un+1 = Lun, (2.4)

where the definitions of the vector un and the matrix L are obvious. Initialconditions, with one juvenile pair and no adults, are(

u1,1

u2,1

)=(

10

).

2.3. DISCRETE AGE-STRUCTURED POPULATION 21

The solution of (2.4), a system of coupled first-order linear difference equations,proceeds similarly to that of coupled first-order differential equations. Lookingfor a solution of the form un = λnv, we obtain upon substitution into Eq. (2.4)the eigenvalue problem

Lv = λv,

whose solution yields two eigenvalues λ1 and λ2, with corresponding eigenvectorsv1 and v2. The general solution to Eq. (2.4) is then

un = c1λn1 v1 + c2λ

n2 v2, (2.5)

with c1 and c2 determined from the initial conditions. Now, suppose |λ1| > |λ2|.If we rewrite Eq. (2.5) in the form

un = λn1

(c1v1 + c2

(λ2

λ1

)nv2

),

then since |λ2/λ1| < 1, as n → ∞, un → c1λn1 v1. Therefore, the long-time

asymptotics of the population depend only on λ1 and the corresponding eigen-vector v1. For our Fibonacci’s rabbits, the eigenvalues are obtained by solvingdet (L− λI) = 0, and we find

0 = det(−λ 11 1− λ

)= −λ(1− λ)− 1,

or λ2 − λ− 1 = 0, with solutions Φ and −φ. Since Φ > φ, the eigenvalue Φ andits corresponding eigenvector determine the long-time asymptotic populationage-structure. The eigenvector may be found by solving

(L− ΦI)v1 = 0,

or (−Φ 11 1− Φ

)(v

(1)1

v(2)1

)=(

00

).

The first equation is just −Φ times the second equation (use Φ2 − Φ − 1 = 0),so that v(2)

1 = Φv(1)1 . Taking v(1)

1 = 1,

v1 =(

1Φ

).

The asymptotic age-structure obtained from v1 shows that the ratio of adultsto juveniles approaches the golden mean, i.e., v(2)

1 /v(1)1 = Φ.

2.3 Discrete age-structured population

In a discrete model, population censuses occur at discrete times and individualsare assigned to age classes, spanning a range of ages. For model simplicity weassume the time between censuses is equal to the age span of all age classes.An example is a country that censuses its population every five years, andassigns individuals to age classes spanning five years (e.g., 0-4 years old, 5-9


ui,n number of individuals in age class i at census nsi fraction of individuals surviving from age class i− 1 to imi number of offspring from an individual in age class ili = s1 · · · si fraction of individuals surviving from birth to age class ifi = mili number of offspring from a newborn after reaching age class iR0 =

∑i fi basic reproductive ratio

λ1 only positive eigenvalue of the discrete Euler-Lotka equationr related to λ1 by λ1 = er

Table 2.1: Definitions needed in an age-structured, discrete time populationmodel

years old, etc.). Although country censuses commonly count both females andmales separately, we will only count females and ignore males.

There are half-a-dozen or so new definitions in this Section and I place thesein Table 2.1 for easy reference. We define ui,n to be the number of females inage class i at census n. We assume that i = 1 represents the first age class andi = ω the last. No female survives past the last age class. We define si to bethe fraction of females that survive from age class i− 1 to age class i, (with s1

the fraction of newborns that survive to their first census), and mi the expectednumber of female births per female in age class i.

We construct difference equations for {ui,n+1} in terms of {ui,n}. First,newborns at census n + 1 were born between census n and n + 1 to differentaged females, with mi the expected number born to a female in age class i. Also,only a faction s1 of these newborns survive to their first census. Second, only afraction si of females in age class i that were counted in census n survive to becounted in age class i+ 1 in census n+ 1. Putting these two ideas together, wedetermine the difference equations for {ui,n+1} to be

u1,n+1 = s1 (m1u1,n +m2u2,n + . . .+mωuω,n) ,u2,n+1 = s2u1,n,

u3,n+1 = s3u2,n,

. . . = . . . ,

uω,n+1 = sωuω−1,n,

which can be written as the matrix equation

u1,n+1

u2,n+1

u3,n+1

...uω−1,n+1

uω,n+1

=

s1m1 s1m2 . . . s1mω−1 s1mω

s2 0 . . . 0 00 s3 . . . 0 0...

......

......

0 0 . . . 0 00 0 . . . sω 0

u1,n

u2,n

u3,n

...uω−1,n

uω,n

;

or in vector formun+1 = Lun,

where L is called the Leslie Matrix. It is useful to define two more parameters(see Table 2.1). Let li = s1s2 · · · si be the fraction of females that survive from

2.3. DISCRETE AGE-STRUCTURED POPULATION 23

birth to age class i, and let fi = mili be the number of offspring expectedfrom a newborn after reaching age class i (i.e., the expected number of birthsmi in age class i times the fraction of individuals li that survive to age classi). For homework, you will show that the characteristic equation derived fromdet (L− λI) = 0 is

λω − f1λω−1 − f2λ

ω−2 − . . .− fω = 0. (2.6)

Denote the eigenvalues obtained from Eq. (2.6) by λ1, λ2, . . . , λω, with corre-sponding eigenvectors v1,v2, . . . ,vω. For homework, you will also show that(2.6) admits only one positive eigenvalue and, furthermore, it is the eigenvalueof largest magnitude. Denote this positive eigenvalue by λ1. Since

un = c1λn1 v1 + c2λ

n2 v2 + . . .+ cωλ

nωvω

= λn1

(c1v1 + c2

(λ2

λ1

)nv2 + . . .+ cω

(λωλ1

)nvω

),

for large n, un ∼ c1λn1 v1, and the asymptotic age-structure of the population isdetermined by λ1 and v1. Since λ1 is the only positive eigenvalue of (2.6), it isthe only eigenvalue that can be written, with r real, as

λ1 = er, (2.7)

since er > 0 for all real r. Substitution of (2.7) into (2.6) yields

erω − f1er(ω−1) − . . .− fω = 0,

or after division by erω

1 = f1e−r + f2e

−2r + . . .+ fωe−ωr.

This equation, known as the discrete Euler-Lotka equation, can be rewritten inthe compact form

ω∑j=1

fje−jr = 1. (2.8)

If numerical values of the fj ’s are known, then the discrete Euler-Lotka equation(2.8) can be solved for real r numerically, using Newton’s method.

Newton’s method is an efficient root-finding algorithm that solves F (x) = 0for x. Newton’s method can be derived graphically by plotting F (x) versusx, approximating F (x) by the tangential line at x = xn with slope F ′(xn),and letting xn+1 be the intercept of the tangential line with the x-axis. Analternative derivation starts with the Taylor series F (xn+1) = F (xn) + (xn+1−xn)F ′(xn) + . . .. We set F (xn+1) = 0, drop higher-order terms in the Taylorseries, and solve for xn+1:

xn+1 = xn −F (xn)

F ′(xn),

which is to be solved iteratively, starting with a wise choice of initial guess x0,until convergence.


Once λ1 is determined, the corresponding eigenvector v1 can be computed usingthe Leslie matrix. We have

s1m1 − λ1 s1m2 . . . s1mω−1 s1mω

s2 −λ1 . . . 0 00 s3 . . . 0 0...

......

......

0 0 . . . −λ1 00 0 . . . sω −λ1

v(1)1

v(2)1

v(3)3...

v(ω−1)1

v(ω)1

=

000...00

.

Arbitrarily taking v(ω)1 = 1, we have beginning with the last row and working

backwards:

v(ω−1)1 = λ1/sω

v(ω−2)1 = λ2

1/(sωsω−1)...

v(1)1 = λω−1

1 /(sωsω−1 · · · s2).

We can renormalize this eigenvector by multiplying by (s1s2 · · · sω)/λω1 . Usingthe definition li = s1 · · · si, we then obtain the simpler result

vi1 = li/λi1, for i = 1, 2, . . . , ω.

We can obtain an interesting implication of this result by forming the ratio oftwo consecutive age classes. Asymptotically,

ui+1,n/ui,n ∼ v(i+1)1 /vi1 = si+1/λ1.

With the survival fractions {si} fixed, a smaller λ1 implies a larger ratio; aslower growing population has relatively more older people than a faster grow-ing population. In fact, we are now living through a time when developedcountries, particularly Japan and Western Europe, have substantially loweredtheir population growth rates and increased the average age of their populations.

To simply determine if a population grows (r > 0) or decays (r < 0), onecan calculate the basic reproduction ratio R0, defined as the expected numberof females a female produces over her lifetime. Stasis is obtained if the femaleonly replaces herself before dying. If R0 > 1 then the population grows, andif R0 < 1 then the population decays. R0 is equal to the number of femaleoffspring expected from a newborn when she is in age class i, summed over allage classes, or

R0 =ω∑i=1

fi.

For equal numbers of males and females, R0 = 1 means a newborn female mustproduce on average two children over her lifetime. News stories in the westernpress frequently state that for zero population growth, women need to have2.1 children. The term women used in these stories presumably means womenof child-bearing age. Since girls that die young have no children, the statistic2.1 children implies that 0.1/2.1, or about 5% of children die before reachingadulthood.

2.4. CONTINUOUS AGE-STRUCTURED POPULATION 25

a age of individualt timeu(a, t) population age densitym(a) maternity functionl(a) survival functionf(a) = l(a)m(a) net maternity functionR population growth rate

Table 2.2: Definitions needed in an age-structured, continuous time populationmodel

A useful application of the mathematical model developed in this Sectionis to predict the future age-structure within various countries. This can beimportant for governmental economic planning – for instance, determining taxrevenues that can pay for the rising costs of health care as a population ages. Foraccurate predictions on the future age-structure of a given country, immigrationand migration must also be modeled. An interesting website to browse is at

http://www.census.gov/ipc/www/idb.

This website, created by the US census bureau, provides access to the Inter-national Data Base (IDB), a computerized source of demographic and socioe-conomic statistics for 227 countries and areas of the world. In class, we willlook at and discuss the dynamic output of some of the population pyramids, inparticular those for Hong Kong and China.

2.4 Continuous age-structured population

Here, we derive a continuous time model by considering the discrete modelin the limit of the age span ∆a of an age class going to zero. For use in asubsequent application, our main theoretical target is the continuous version ofthe Euler-Lotka equation (2.8).

New definitions from this Section are placed in Table 2.2. Time t and agea are now continuous variables. In the discrete model, age class i included allages a in the range ai ≤ a < ai + ∆a, and censuses occured at discrete timestn, with tn+1 = tn + ∆a. Now, we define a population age density u(a, t) sothat the number of individuals in age class i at the nth census is given byui,n =

∫ ai+∆a

aiu(a, tn)da. As ∆a→ 0, we obtain

u(ai, tn) = ui,n/∆a.

With this definition for u(a, t), the population size N(t) at time t is determinedby the integral

N(t) =∫ ∞

0

u(a, t)da.

The maternity function m(a) is defined so that m(a)∆a is the expectednumber of offspring per individual in the age interval (a, a+∆a). In the discretemodel, mi was the expected number of offspring per individual in age class i.


Since the time between censuses is ∆a, mi is in fact the expected number ofoffspring per individual over the ages ai to ai + ∆a. Therefore,

mi = m(ai)∆a.

Now li was the fraction of individuals living to age class i, so we can simplydefine the survival function l(a) to be the fraction of individuals living to age a,i.e.,

li = l(ai)

Finally, we define the net maternity function f(a) = l(a)m(a), in analogy toour definition of fi = limi.

Now, to obtain the continuous version of the Euler-Lotka equation, we takethe limit ∆a→ 0 of the discrete Euler-Lotka equation (2.8) as follows:

1 =ω∑j=1

fje−jr (Discrete Euler− Lotka equation)

=ω∑j=1

ljmje−j∆a(r/∆a) (using fj = ljmj ; r = ∆a(r/∆a))

=ω∑j=1

l(aj)m(aj)e−ajR∆a (using mj = m(aj)∆a; aj = j∆a;R = r/∆a)

=ω∑j=1

f(aj)e−ajR∆a (using l(aj)m(aj) = f(aj))

=∫ ∞

0

f(a)e−aRda. (by converting the Riemann sum to an integral)

Notice that in the discrete model, asymptotically ui,n ∼ λn1 = ern. Sincetn = n∆a, in the continuous model u(a, tn) ∼ ertn/∆a = eRtn . Therefore,N(t) ∼ eRt, and R is the asymptotic population growth rate. The continuousEuler-Lotka equation ∫ ∞

0

f(a)e−aRda = 1, (2.9)

is thus an integral equation for the population growth rate R given the netmaternity function f(a). Again, this equation may be solved using Newton’smethod.

2.5 The brood size of a hermaphroditic worm

Caenorhabditis elegans, a soil-dwelling nematode worm about 1 mm in length,is a widely studied model organism in biology. With a body made up of approx-imately 1000 cells, it is one of the simplest multicellular organisms under study.Advances in understanding the development of this multicellular organism ledto the awarding of the 2002 Nobel prize in Physiology or Medicine to the threeC. elegans biologists Sydney Brenner, H. Robert Horvitz and John E. Sulston.

The worm C. elegans has two sexes: hermaphrodites, which are essentiallyfemales that can produce internal sperm and self-fertilize their own eggs, and

2.5. THE BROOD SIZE OF A HERMAPHRODITIC WORM 27

Figure 2.1: Caenorhabiditis elegans, a nematode worm used as a simple animalmodel of a multicellular organism by Biologists.

Figure 2.2: A hermaphrodite’s life-history

males, which must mate with hermaphrodites to produce offspring. In labora-tory cultures, males are rare and worms generally propagate by self-fertilization.Typically, a hermaphrodite lays about 250-350 self-fertilized eggs before becom-ing infertile. It is reasonable to assume that the forces of natural selection haveshaped the life-history of C. elegans, and that the number of offspring producedby a selfing hermaphrodite must be in some sense optimum. Here, we show howan age-structured model applied to C. elegans gives theoretical insight into thebrood size of a selfing hermaphrodite.

To develop a mathematical model for C. elegans, we need to know details ofits life-history. A simplified time chart of a hermaphrodite’s life-history is shownin Fig. 2.2. The fertilized egg is laid at time t = 0. During a growth period,the immature worm develops through four larval stages (L1-L4). Towards theend of L4 and a short while after the final molt, spermatogenesis occurs and thehermaphrodite produces and stores all of the sperm it will later use. Then thehermaphrodite produces eggs (oogenesis), self-fertilizes them using internallystored sperm, and lays them. In the absence of males, egg production ceasesafter all the sperm are utilized. We assume the growth period occurs for 0 <t < g, spermatogenesis for g < t < g + s, and egg-production, self-fertilization,and laying for g + s < t < g + s+ e.


g growth period 64 hrs sperm production period 12 hre egg production period 48 hrp sperm production rate 24 hr−1

m egg production rate 6 hr−1

b brood size 288

Table 2.3: Parameters in a life-history model of C. elegans, with experimentalestimates

Here, we want to understand why hermaphrodites limit their sperm produc-tion. Biologists define males and females from the size and metabolic cost oftheir gametes: sperm are cheap and eggs are expensive. So on first look, itis puzzling why the total number of offspring produced by a hermaphrodite islimited by the number of sperm produced, rather than by the number of eggs.There must be a hidden cost other than metabolic to the hermaphrodite forproducing additional sperm. To understand the basic biology, it is instructiveto consider two limiting cases: (1) no sperm production; (2) infinite sperm pro-duction. In both cases, no offspring are produced—in the first because thereare no sperm, and in the second because there are no eggs. The number ofsperm produced by a hermaphrodite before laying eggs is therefore a compro-mise; although more sperm means more offspring, it also means delayed eggproduction.

Our main theoretical assumption is that natural selection will select wormsthat have the largest growth rate R. Worms containing a genetic mutation re-sulting in a larger value for R will eventually outnumber all other worms, andthis genetic mutation will become fixed in the population. From the continu-ous Euler-Lotka equation (2.9), the growth rate R is determined from the netmaternity function f(a). Our mathematical attack on the problem is to modelf(a), determine R = R(s), where s is the period of spermatogenesis, and thenmaximize R with respect to s. The value of s at which R is maximum will beconsidered the naturally selected value.

The parameters we need in our mathematical model are listed in Table 2.3,together with estimated experimental values. In addition to the growth periodg, sperm production period s, and egg production period e (all in units oftime), we also need the sperm production rate p and egg production rate m(both in units of inverse time). We also define the brood size b as the totalnumber of fertilized eggs laid by a selfing hermaphrodite. It is reasonable toassume that the growth period g and the sperm and egg production rates p andm are independent of the length of the sperm production period s. The eggproduction period, e, however, depends on s since an increase in the number ofsperm produced will correspondingly increase the number of eggs laid. In fact,the brood size is equal to the number of sperm produced, and also equal to thenumber of eggs laid, so that b = ps = me. We can thus find e = e(s) explicitlyas

e(s) = ps/m.

We thus eliminate e from our model in favor of the independent variable s, andthe s-independent parameters p and m.

One of the most challenging and difficult aspects of modeling is to connect


the idealized parameters used in a model with real experimental data. Here,the most accessible experimental data obtainable from direct observations are:(i) the time for a laid egg to develop into an egg-laying adult (g + s hours);(ii) the duration of the egg laying period (e hours); (iii) the rate at whicheggs are produced (m eggs per hour), and; (iv) the brood size (b fertilized eggs).Although direct observations yield averaged values of these parameters, differentworm labs have reported different values, perhaps because of subtle differencesin the culturing of the worms, but also because of the inherent variability ofbiological processes. The least accessible experimental data are those modelingthe internal sperm production: (v) the sperm production period s, and (vi) thesperm production rate p. Notice that b = ps, so experimental determination ofthe brood size b and one of s or p determines the other. In Table 2.3 are somereasonable experimental estimates for these parameters for worms cultured at20◦C, but note that we have slightly adjusted the recorded experimental valuesto exactly satisfy the constraint ps = me = b.

Out of the six parameters in Table 2.3, only four are independent because ofthe two equations b = ps and b = me. Maximizing R with respect to s will yieldan additional equation. Therefore, experimental values of three independentparameters can be used to predict the remaining parameters, which can thenbe compared to the values in Table 2.3. We will use the experimental estimatesof the growth period g, and the sperm and egg production rates p and m inTable 2.3 to predict the numerical values of the sperm production period s, theegg production period e, and the brood size b. It is unrealistic to expect exactagreement: the values of Table 2.3 contain experimental error, and our modelmay be oversimplified.

The continuous Euler-Lotka equation (2.9) for the growth rate R requires amodel for f(a) = m(a)l(a), where m(a) is the age-dependent maternity functionand l(a) is the age-dependent survival function. Now l(a) is the probability ofa worm surviving to age a. To model l(a), we assume that the probability of aworm of age a dying in the next small interval of time ∆a is d∆a, where d isthe constant death rate of the worm. Here we have assumed that worms do notdie of old age during egg-laying, or d would necessarily increase with age, butrather die of predation, starvation, disease, or other age-independent causes.Such an assumption is reasonable since worms can live for several weeks aftersperm-depletion. Now, the probability of a worm surviving to age a + ∆a isequal to the probability of surviving to age a times the probability of not dyingin the next small interval of time ∆a, that is (1− d ·∆a). Hence,

l(a+ ∆a) = l(a)(1− d ·∆a),

orl(a+ ∆a)− l(a)

∆a= −d · l(a),

and as ∆a→ 0,l′ = −d · l.

With l(0) = 1, the solution to the ode is

l(a) = exp (−d · a). (2.10)

Now the age-dependent maternity function m(a) is defined so that m(a)∆ais the expected number of offspring produced over the age interval ∆a. We


0 2 4 6 8 10

0.04

0.045

0.05

0.055

0.06

0.065

s (hr)

B (h

r−1)

B=B(s) versus s

Figure 2.3: B = B(s) versus s

assume that a hermaphrodite lays eggs at a constant rate m over the agesg + s < a < g + s+ e; therefore,

m(a) ={m for g + s < a < g + s+ e,0 otherwise.

(2.11)

Using (2.10) and (2.11), the continuous Euler-Lotka equation (2.9) for thegrowth rate R becomes∫ g+s+e

g+s

m exp [−(d+R)a]da = 1. (2.12)

Since the death rate d is independent of s, the value of s that maximizes R willalso maximize d+R. We can therefore eliminate the unknown parameter d bydefining B = d+R, and maximize B (instead of R) with respect to s.

We first integrate the Euler-Lotka equation (2.12):

1 =∫ g+s+e

g+s

m exp (−Ba)da

=m

B{exp [−(g + s)B]− exp [−(g + s+ e)B]}

=m

Bexp [−(g + s)B] {1− exp [−(ps/m)B]} ,

where in the last equality we have used e = ps/m. The Euler-Lotka equationhas reduced to a nonlinear transcendental equation for B = B(s):

B

mexp [(g + s)B] + exp [−(ps/m)B]− 1 = 0, (2.13)


where we assume the parameters g, p and m are known from experiments.Equation (2.13) may be solved for B = B(s) using Newton’s method. We let

F (B) =B

mexp [(g + s)B] + exp [−(ps/m)B]− 1,

and differentiate with respect to B, to obtain

F ′(B) =1m

(1 + (g + s)B) exp [(g + s)B]− ps

mexp [−(ps/m)B].

For a given s, we solve F (B) = 0 by iterating

Bn+1 = Bn −F (Bn)F ′(Bn)

.

Numerical experimentation can find appropriate starting values for B. A graphof B(s) versus s is presented in Fig. 2.3. The MATLAB code used to generatethis figure is printed below.

function [s, B] = B_newtonglobal g p mg=64; p=24; m=6; %experimental estimatesmaxit=50; tol=5*eps; %for Newton’s methods=linspace(0.5,10,100);for i=1:length(s)

if i==1, B(i)=.35; else, B(i)=B(i-1); endfor j=1:maxit

delta = F(B(i),s(i))/Fp(B(i),s(i));B(i) = B(i) - delta;error=abs(F(B(i),s(i)));if error < tol, break, end

endif j==maxit, disp(s(i)), end

endplot(s,B);xlabel(’s (hr)’); ylabel(’B (hr^{-1})’);title(’B=B(s) versus s’);%function y=F(B,s)global g p my=(B/m)*exp((g+s)*B) + exp(-(p*s/m)*B) - 1;%function y=Fp(B,s)global g p my=(1/m)*(1+(g+s)*B)*exp((g+s)*B) - (p*s/m)*exp(-(p*s/m)*B);

To compute B = B(s), we have used the values of g, p and m from Table 2.3.The maximum value of B = B(s) can be shown from the computation to occurwhen s = 6.5 hr. Using this value of s, we compute e = 26 hr and b = 156.The values for s, e and b are all approximately 45% less than the approximateexperimental values. Possible reasons for this discrepancy will be discussed inclass.


References

Hodgkin, J. & Barnes, T.M. More is not better: brood size and populationgrowth in a self-fertilizing nematode. Proc. R. Soc. Lond. B. (1991) 246,19-24.

Barker, D.M. Evolution of sperm shortage in a selfing hermaphrodite. Evolu-tion (1992) 46, 1951-1955.

Cutter, A.D. Sperm-limited fecundity in nematodes: How many sperm areenough? Evolution (2004) 58, 651-655.

Chapter 3

Population Dynamics

Our previous model of population growth neglected interactions of individu-als with their environment or with each other. Without interactions and withthe birth rate exceeding the death rate the population size grows exponen-tially. Thomas Malthus, in An Essay on the Principle of Population (1798),used unchecked population growth to famously predict a global famine unlessgovernments regulated family size (an idea later echoed by Mainland China’sone-child policy). The reading of Malthus is said by Charles Darwin in his au-tobiography to have inspired his formulation of what is now the cornerstone ofmodern biology: the principle of evolution by natural selection.

Unchecked exponential growth obviously does not occur in nature. Popula-tion growth may be regulated by limited food or other environmental resources,and by competition among individuals within a species or across species. Here,we will develop models for three types of regulation. First, we will discuss asimple mathematical model of environmentally regulated population growth.Second, we will extend this model to species competition. And third, we willpresent the famous Lotka-Volterra predator-prey model. Since all these math-ematical models are nonlinear differential equations, mathematical methods toanalyze such equations will be developed.

3.1 Logistic equation for population growth

The deterministic model of population growth is given by

dN

dt= rN,

where r = b−d is the difference between the birth and death rates. If r > 0, thenthe population grows exponentially (so-called Malthusian growth). Populationswhen left alone, however, do not grow exponentially forever. Eventually theirgrowth will be checked by the over-consumption of resources. We assume thatthe environment has an intrinsic carrying capacity K, and populations largerthan this size suffer heightened death rates.

To model population growth with an environmental carrying capacity K, welook for a nonlinear equation of the form

dN

dt= rNF (N),

33

34 CHAPTER 3. POPULATION DYNAMICS

where F (N) provides a model for environmental regulation. This functionshould satisfy F (0) = 1 (the population grows exponentially with growth rater when N is small), F (K) = 0 (the population stops growing at the carryingcapacity), and F (N) < 0 when N > K (the population decays when larger thanthe carrying capacity). The simplest function F (N) satisfying these conditionsis linear, and given by F (N) = 1−N/K. The resulting model is the well-knownlogistic equation,

dN

dt= rN(1−N/K), (3.1)

an important model for many processes besides bounded population growth.Although Eq. (3.1) is a nonlinear equation, an analytical solution can be

found by separating variables. Before we embark on this algebra, we first illus-trate some basic concepts used in analyzing nonlinear differential equations.

Fixed points, also called equilibria, of a differential equation such as (3.1)are defined as the values of N where N = 0. Here, we see that the fixed pointsof (3.1) are N = 0 and N = K. If the initial value of N is at one of thesefixed points, then N will remain fixed there for all time. Fixed points, however,can be stable or unstable. A fixed point is stable if a small perturbation fromthe fixed point decays to zero so that the solution returns to the fixed point.Likewise, a fixed point is unstable if a small perturbation grows exponentiallyso that the solution moves away from the fixed point. Calculation of stabilityby means of small perturbations is called linear stability analysis. For example,consider the general one-dimensional differential equation

x = f(x), (3.2)

with x∗ a fixed point of the equation, that is f(x∗) = 0. To determine analyt-ically if x∗ is a stable or unstable fixed point, we perturb the solution. Let uswrite our solution x = x(t) in the form

x(t) = x∗ + ε(t), (3.3)

where initially ε(0) is small but different from zero. Substituting (3.3) into (3.2),we obtain

ε = f(x∗ + ε)= f(x∗) + εf ′(x∗) + . . .

= εf ′(x∗) + . . . ,

where the second equality uses a Taylor series expansion of f(x) about x∗, andthe third equality uses x∗ is a fixed point. If f ′(x∗) 6= 0, we can neglect higher-order terms in ε for small times, and integrating we have

ε(t) = ε(0)ef′(x∗)t.

The perturbation ε(t) to the fixed point x∗ goes to zero as t → ∞ providedf ′(x∗) < 0. Therefore, the stability condition on x∗ is

x∗ is{

a stable fixed point if f ′(x∗) < 0,an unstable fixed point if f ′(x∗) > 0.

3.1. LOGISTIC EQUATION FOR POPULATION GROWTH 35

0

x

f(x)

unstable stable unstable

Figure 3.1: Solving one-dimensional stability using a graphical approach.

Another equivalent but sometimes simpler approach to analyzing the stabil-ity of the fixed points of a one-dimensional nonlinear equation such as (3.2) is toplot f(x) versus x. We show a generic example in Fig. 3.1. The fixed points arethe x-intercepts of the graph. Directional arrows on the x-axis can be drawnbased on the sign of f(x). If f(x) < 0, then the arrow points to the left; iff(x) > 0, then the arrow points to the right. The arrows show the direction ofmotion for a particle at position x satisfying x = f(x). As illustrated in Fig. 3.1,fixed points with arrows on both sides pointing in are stable, and fixed pointswith arrows on both sides pointing out are unstable.

Considering the logistic equation (3.1), the fixed points are N∗ = 0,K. Asketch of F (N) = rN(1−N/K) versus N , with r,K > 0 in Fig. 3.2 immediatelyshows that N∗ = 0 is an unstable fixed point and N∗ = K is a stable fixed point.The analytical approach computes F ′(N) = r(1−2N/K), so that F ′(0) = r > 0and F ′(K) = −r < 0. Again we conclude that N∗ = 0 is unstable and N∗ = Kis stable.

We now solve the logistic equation analytically. Although this relativelysimple equation can be solved as is, we first nondimensionalize to illustrate thisvery important technique that will later prove most useful. Perhaps here onecan guess the appropriate unit of time to be 1/r and the appropriate unit ofpopulation size to be K. However, we prefer to demonstrate a more generaltechnique that may be useful for equations that are difficult to guess. We beginby nondimensionalizing time and population size:

τ = t/t∗, η = N/N∗,


0 K

0

N

F(N) unstable stable

Figure 3.2: Determining stability of the fixed points of the logistic equation.

with t∗ and N∗ unknown dimensional units. The derivative N is computed as

dN

dt=d(N∗η)dτ

dτ

dt=N∗t∗

dη

dτ.

Therefore, the logistic equation (3.1) becomes

dη

dτ= rt∗η

(1− N∗η

K

),

which assumes the simplest form with the choices t∗ = 1/r and N∗ = K.Therefore, our nondimensional variables are

τ = rt, η = N/K,

and the logistic equation, in nondimensional form, becomes

dη

dτ= η (1− η) , (3.4)

with nondimensional initial condition η(0) = η0 = N0/K, where N0 is theinitial population size. Note that the nondimensional logistic equation (3.4) hasno free parameters, while the dimensional form of the equation (3.1) containsr and K. Reduction in the number of free parameters (here, two: r and K)by the number of independent units (here, also two: time and population size)is a general feature of nondimensionalization. The theoretical result is knownas the Buckingham Pi Theorem. Reducing the number of free parameters in

3.1. LOGISTIC EQUATION FOR POPULATION GROWTH 37

a problem to the absolute minimum is especially important before proceedingto a numerical solution. The parameter space that must be explored may besubstantially reduced.

Solution of the nondimensional logistic equation (3.4) can proceed by sep-arating variables. Separating, and integrating from τ = 0 to τ and η0 to ηyields ∫ η

η0

dη′

η′(1− η′)=∫ τ

0

dτ ′.

The integral on the left-hand-side can be done using the method of partialfractions:

1η(1− η)

=A

η+

B

1− η

=A+ (B −A)ηη(1− η)

;

and equating the coefficients of the numerators proportional to η0 and η1, wefind A = 1 and B = 1. Therefore,∫ η

η0

dη

η(1− η)=

∫ η

η0

dη

η+∫ η

η0

dη

(1− η)

= lnη

η0− ln

1− η1− η0

= lnη(1− η0)η0(1− η)

= τ.

Solving for η, we first exponentiate both sides and then isolate η:

η(1− η0)η0(1− η)

= eτ

orη(1− η0) = η0e

τ − ηη0eτ

orη(1− η0 + η0e

τ ) = η0eτ

orη =

η0

η0 + (1− η0)e−τ.

Returning to dimensional variables, we finally have

N(t) =N0

N0/K + (1−N0/K)e−rt. (3.5)

There are many ways to write (3.5) so please examine my carefully consideredchoice. Aesthetic considerations in writing the final result is an important ele-ment of mathematical technique that students all too often neglect. In decidinghow to write (3.5), I considered if it was easy to observe the following limitingresults: (1) N(0) = N0; (2) limt→∞N(t) = K; (3) limK→∞N(t) = N0 exp (rt).

In Fig. 3.3, we plot the solution to the nondimensional logistic equation forinitial conditions η0 = 0.02, 0.2, 0.5, 0.8, 1.0, and 1.2. The lowest curve is


0 2 4 6 80

0.2

0.4

0.6

0.8

1

1.2

1.4

τ

η

Logistic Equation

Figure 3.3: Solution of the nondimensional logistic equation.

the characteristic ‘S-shape’ usually associated with the solution of the logisticequation. This sigmoidal curve appears in many other types of models. TheMATLAB script to produce Fig. 3.3 is shown below.

eta0=[0.02 .2 .5 .8 1 1.2];tau=linspace(0,8);for i=1:length(eta0)

eta=eta0(i)./(eta0(i)+(1-eta0(i)).*exp(-tau));plot(tau,eta);hold on

endaxis([0 8 0 1.25]);xlabel(’\tau’); ylabel(’\eta’); title(’Logistic Equation’);

3.2 A model of species competition

Suppose two species compete for the same resources. To build a model, wecan start with logistic equations for both species. Different species would havedifferent growth rates and different carrying capacities. If we let N1 and N2 bethe number of individuals of species one and species two, then

dN1

dt= r1N1(1−N1/K1),

dN2

dt= r2N2(1−N2/K2).

3.2. A MODEL OF SPECIES COMPETITION 39

These are uncoupled equations so that asymptotically, N1 → K1 and N2 → K2.How can we model the competition between species? If N1 is much smaller thanK1, and N2 much smaller than K2, then resources are plentiful and populationsgrow exponentially with growth rates r1 and r2. If species one and two compete,then the growth of species one reduces resources available to species two, andvice-versa. Since we do not know the impact species one and two have on eachother, we introduce two additional parameters to model the competition. Areasonable modification that couples the two logistic equations is

dN1

dt= r1N1

(1− N1 + α12N2

K1

), (3.6)

dN2

dt= r2N2

(1− α21N1 +N2

K2

), (3.7)

where α12 and α21 are dimensionless parameters that model the consumptionof species one’s resources by species two, and vice-versa. For example, supposethat both species eat exactly the same food, but species two consumes twice asmuch as species one. Since one individual of species two consumes the equivalentof two individuals of species one, the correct model is α12 = 2 and α21 = 1/2.

Another example supposes that species one and two occupy the same niche,consume resources at the same rate, but may have different growth rates andcarrying capacities. Can the species coexist, or does one species eventually drivethe other to extinction? It is possible to answer this question without actuallysolving the differential equations. With α12 = α21 = 1 as appropriate for thisexample, the coupled logistic equations (3.6) and (3.7) become

dN1

dt= r1N1

(1− N1 +N2

K1

),

dN2

dt= r2N2

(1− N1 +N2

K2

). (3.8)

For sake of argument, we assume K1 > K2. The only fixed points other thanthe trivial one (N1, N2) = (0, 0) are (N1, N2) = (K1, 0) and (N1, N2) = (0,K2).Stability can be computed analytically by a two-dimensional Taylor series ex-pansion, but here a simpler argument can suffice. We first consider (N1, N2) =(K1, ε), with ε small. Since K1 > K2, observe from (3.8) that N2 < 0 sothat species two goes extinct. Therefore (N1, N2) = (K1, 0) is a stable fixedpoint. Now consider (N1, N2) = (ε,K2), with ε small. Again, since K1 > K2,observe from (3.8) that N1 > 0 and species one increases in number. There-fore (N1, N2) = (0,K2) is an unstable fixed point. We have thus found that,within our coupled logistic model, species that occupy the same niche and con-sume resources at the same rate cannot coexist, and that the species with thelargest carrying capacity will survive and drive the other species extinct. Thisis the so-called principle of competive exclusion, also called K-selection sincethe species with the largest carrying capacity wins. In fact, ecologists also talkabout r-selection, that is the species with the largest growth rate wins. Ourcoupled logistic model does not model r-selection, demonstrating the potentiallimitations of a too simple mathematical model.

For some values of α12 and α21, our model admits a stable equilibriumsolution where two species coexist. The calculation of the fixed points and theirstability is more complicated than the calculation just done, and I only presentthe results. The stable coexistence of two species within our model is possibleonly if α12K2 < K1 and α21K1 < K2.


3.3 Lotka-Volterra predator-prey model

Suppose a predator population (e.g., foxes) feeds on a prey population (e.g.,rabbits). We assume that the number of prey grow exponentially in the absenceof predators (there is unlimited food available to the prey), and that the numberof predators decay exponentially in the absence of prey (predators must eatprey or starve). Contact between predators and prey increase the number ofpredators and decrease the number of prey.

Let U(t) and V (t) be the number of prey and predators at time t. Todevelop a coupled differential equation model, we consider population sizes ata time t + ∆t. Exponential growth of prey in the absence of predators andexponential decay of predators in the absence of prey can be modeled by theusual linear terms. The coupling between prey and predator must be modeledwith two additional parameters. We write the population sizes at time t + ∆tas

U(t+ ∆t) = U(t) + α∆tU(t)− γ∆tU(t)V (t),V (t+ ∆t) = V (t) + eγ∆tU(t)V (t)− β∆tV (t).

The parameters α and β are the average per capita birthrate of prey anddeathrate of predators, in the absence of the other species. The coupling termsmodel contact between predators and prey. The parameter γ is the fraction ofprey caught per predator per unit time; the total number of prey caught bypredators in a time ∆t is γ∆tUV . The prey eaten is then converted into new-born predators (view this as a conversion of biomass), with conversion factor e,so that the number of predators in the time ∆t increases by eγ∆tUV .

We convert these equations into differential equations by letting ∆t → 0,obtaining the well-known Lotka-Volterra predator-prey equations:

dU

dt= αU − γUV, dV

dt= eγUV − βV. (3.9)

Before analyzing the Lotka-Volterra equations, we first review fixed pointand linear stability analysis applied to what is called an autonomous system ofdifferential equations. For simplicity, we consider a system of only two differen-tial equations of the form

dx

dt= f(x, y),

dy

dt= g(x, y), (3.10)

though our results can be generalized to larger systems. The system given by(3.10) is said to be autonomous since f and g do not depend explicitly on theindependent variable t. Fixed points of this system are determined by settingx = y = 0, and solving for x and y. Suppose one fixed point is (x∗, y∗).To determine its linear stability, we consider initial conditions for (x, y) nearthe fixed point with small independent perturbations in both directions, i.e.,x(0) = x∗ + ε(0), y(0) = y∗ + δ(0). If the initial perturbation grows in timewe say the fixed point is unstable; if it decays we say the fixed point is stable.Accordingly, we let

x(t) = x∗ + ε(t), y(t) = y∗ + δ(t), (3.11)

3.3. LOTKA-VOLTERRA PREDATOR-PREY MODEL 41

and substitute (3.11) into (3.10) to determine the time-dependence of ε and δ.Since x∗ and y∗ are constants, we have

dε

dt= f(x∗ + ε, y∗ + δ),

dδ

dt= g(x∗ + ε, y∗ + δ).

Linear stability analysis proceeds by assuming that the initial perturbationsε(0) and δ(0) are small enough to truncate the two-dimensional Taylor seriesexpansion of f and g about ε = δ = 0 at first-order. Note that in general, thetwo-dimensional Taylor series of a function F (x, y) about the origin is given by

F (x, y) = F (0, 0) + xFx(0, 0) + yFy(0, 0)

+12[x2Fxx(0, 0) + 2xyFxy(0, 0) + y2Fyy(0, 0)

]+ . . . ,

where the terms in the expansion can be remembered by requiring that all of thepartial derivatives of the series agree with that of F (x, y) at the origin. We nowTaylor series expand f(x∗ + ε, y∗ + δ) and g(x∗ + ε, y∗ + δ) about (ε, δ) = (0, 0).The constant terms vanish since (x∗, y∗) is a fixed point, and we neglect allterms of higher-order than ε and δ. Therefore,

dε

dt= εfx(x∗, y∗) + δfy(x∗, y∗),

dδ

dt= εgx(x∗, y∗) + δgy(x∗, y∗),

which may be written in matrix form as

d

dt

(εδ

)=(f∗x f∗yg∗x g∗y

)(εδ

), (3.12)

where f∗x = fx(x∗, y∗), etc. Equation (3.12) is a system of linear ode’s, andsolution proceeds by assuming the form(

εδ

)= eλtv. (3.13)

Upon substitution of (3.13) into (3.12), and cancelling eλt, we obtain the linearalgebra eigenvalue problem

J∗v = λv, with J∗ =(f∗x f∗yg∗x g∗y

),

with λ the eigenvalue and v the corresponding eigenvector, and J∗ the Jacobianmatrix evaluated at the fixed point. The eigenvalue is determined from thecharacteristic equation

det (J∗ − λI) = 0,

which for a two-by-two Jacobian matrix results in a quadratic equation for λ.From the form of the solution (3.13) the fixed point is stable if for all eigenvaluesλ, Re{λ} < 0, and unstable if for at least one λ, Re{λ} > 0. Here Re{λ} meansthe real part of the (possibly) complex eigenvalue λ.

We now reconsider the Lotka-Volterra equations (3.9). Fixed point solutionsare found by solving U = V = 0, and the two solutions are

(U∗, V∗) = (0, 0) or (β

eγ,α

γ). (3.14)


The trivial fixed point (0, 0) is unstable since the prey population grows expo-nentially if initially small. To determine the stability of the second fixed point,we write the Lotka-Volterra equation in the form

dU

dt= F (U, V ),

dV

dt= G(U, V ),

withF (U, V ) = αU − γUV, G(U, V ) = eγUV − βV.

The partial derivatives are then computed to be

FU = α− γV, FV = −γUGU = eγV, GV = eγU − β.

The Jacobian at the fixed point (U∗, V∗) = (β/eγ, α/γ) is

J∗ =(

0 −β/eeα 0

);

and

det(J∗ − λI) =∣∣∣∣−λ −β/eeα −λ

∣∣∣∣ = λ2 + αβ = 0

has solution λ± = ±i√αβ, which are pure imaginary. When the eigenvalues of

the two-by-two Jacobian are pure imaginary, the fixed point is called a centerand the perturbation neither grows nor decays, but oscillates. Here, the angularfrequence of oscillation is ω =

√αβ, and the period of the oscillation is 2π/ω.

We plot U and V versus t (time series plot), and V versus U (phase spacediagram) to see how the solutions behave. For a nonlinear system of equationssuch as (3.9), a numerical solution is required. The Lotka-Volterra system hasfour free parameters α, β, γ and e. The relevant units here are time, number ofprey, and number of predators, and the Buckingham Pi Theorem predicts thatnondimensionalizing the equations can reduce the number of free parameters bythree, to a managable single nondimensional grouping of parameters. We chooseto nondimensionalize time using the angular frequency of oscillation, and thenumber of prey and predators by their fixed point values. With carets denotingthe nondimensional variables, we let

t =√αβt, U = U/U∗ =

eγ

βU, V = V/V∗ =

γ

αV. (3.15)

Substitution into the Lotka-Volterra equations (3.9) results in the nondimen-sional equations

dU

dt= r(U − U V ),

dV

dt=

1r

(U V − V ),

with single nondimensional grouping r =√α/β. A numerical solution uses

MATLAB’s ode45.m built-in function to integrate the differential equations.The code below produces Fig. 3.4. Notice how the predator population lags theprey population: an increase in prey numbers results in a delayed increase inpredator numbers as the predators eat more prey. The phase space diagramsclearly show the periodicity of the oscillation. Note that the curves move coun-terclockwise: prey numbers increase when predator numbers are minimum, andprey numbers decrease when predator numbers are maximum.

3.3. LOTKA-VOLTERRA PREDATOR-PREY MODEL 43

function lotka_volterra% plots time series and phase space diagramsclear all; close all;t0=0; tf=6*pi; eps=0.1; delta=0;r=[1/2, 1, 2];options = odeset(’RelTol’,1e-6,’AbsTol’,[1e-6 1e-6]);%time series plotsfor i=1:length(r);

[t,UV]=ode45(@lv_eq,[t0,tf],[1+eps 1+delta],options,r(i));U=UV(:,1); V=UV(:,2);subplot(3,1,i); plot(t,U,t,V,’:’);axis([0 6*pi,0.8 1.25]); ylabel(’predator,prey’);text(3,1.15,[’r=’,num2str(r(i))]);

endxlabel(’t’);subplot(3,1,1);legend(’prey’,’predator’,’Location’,’NorthOutside’);%phase space plotxpos=[-4 -1 0]; ypos=[2 1 1];%for annotating graphfor i=1:length(r);

for eps=0.1:0.1:1.0;[t,UV]=ode45(@lv_eq,[t0,tf],[1+eps 1+delta],options,r(i));U=UV(:,1); V=UV(:,2);figure(2);subplot(3,1,i); plot(U,V); hold on;

endylabel(’predator’); axis equal;text(xpos(i),ypos(i),[’r=’,num2str(r(i))]);

endxlabel(’prey’)

function dUV=lv_eq(t,UV,r)dUV=zeros(2,1);dUV(1) = r*(UV(1)-UV(1)*UV(2));dUV(2) = (1/r)*(UV(1)*UV(2)-UV(2));


0 2 4 6 8 10 12 14 16 180.8

11.2

pred

, pre

y

r=0.5

0 2 4 6 8 10 12 14 16 180.8

1

1.2

pred

, pre

y r=1

0 2 4 6 8 10 12 14 16 180.8

1

1.2

pred

, pre

y r=2

t

preypredator

−6 −4 −2 0 2 4 6 8

1

2

3

pred

ator

r=0.5

−2 −1 0 1 2 3 4 50.5

1

1.5

pred

ator

r=1

−0.5 0 0.5 1 1.5 2 2.5 3

0.81

1.21.4

pred

ator

r=2

prey

Figure 3.4: Solution of the nondimensional Lotka-Volterra equations.

Chapter 4

Infectious Disease Modeling

In the late 1320’s, an outbreak of the the bubonic plague occurred in China.The disease is caused by the bacteria Yersinia pestis and is transmitted fromrats to humans by fleas. The outbreak in China spread west, and the firstmajor outbreak in Europe occurred in 1347. During a five year period, 25million people in Europe died of the black death, approximately 1/3 of thepopulation. Other more recent epidemics include the influenza pandemic knownas the Spanish flu killing 50-100 million people worldwide over the year 1918-1919, and the present AIDS epidemic, originating in Africa and first recognizedin the USA in 1981, killing more than 25 million people. For comparison, theSARS epidemic for which Hong Kong was the global epicenter resulted in 8096known SARS cases and 774 deaths. Yet, we know well that this relatively smallepidemic caused locally social and economic turmoil.

Here, we introduce the most basic mathematical models of infectious diseaseepidemics and endemics. These models form the basis of the necessarily moredetailed models currently used by world health organizations, both to predictthe future spread of a disease, and to develop strategies for containment anderadication.

4.1 SI model

The simplest model of an infectious disease categorizes people as either sus-ceptible or infective. One can imagine that susceptible people are healthy andinfective people are sick. A susceptible person can become infective by contactwith an infective. Here, and in all subsequent models, we assume that the pop-ulation under study is well-mixed so that every person has equal probabilityof coming into contact with every other person. This is a major approxima-tion. For example, while the population of Amoy Gardens could be consideredwell-mixed during the SARS epidemic because of shared water pipes and eleva-tors, the population of Hong Kong as a whole could not because of the largergeographical distances, and the limited travel of many people outside the neigh-borhoods where they live.

We derive the governing differential equation for the SI model by consideringthe number of people that become infective in a time ∆t. Let β∆t be theprobability that a random infective person infects a random susceptible person

45

46 CHAPTER 4. INFECTIOUS DISEASE MODELING

in a time ∆t. Then with S susceptible and I infective people the expectednumber of newly infected people in the total population in a time ∆t is β∆tSI.Thus

I(t+ ∆t) = I(t) + β∆tS(t)I(t),

and in the limit ∆t→ 0,dI

dt= βSI. (4.1)

We diagram (4.1) as

SβSI−→ I.

Later, diagrams will make it easier to construct more complicated systems ofequations. We now assume a constant population size N , neglecting births anddeaths, so that S + I = N . We can eliminate S from (4.1) and rewrite theequation as

dI

dt= βNI

(1− I

N

),

which can be recognized as a logistic equation, with growth rate βN and carryingcapacity N . Therefore I → N as t→∞ and the entire population will becomeinfective.

4.2 SIS model

The SI model may be extended to the SIS model, where an infective can recoverand become susceptible again. We assume that the probability that an infectiverecovers in a time ∆t is given by γ∆t. Then the total number of infective peoplethat recover in a time ∆t is given by I × γ∆t, and

I(t+ ∆t) = I(t) + β∆tS(t)I(t)− γ∆tI(t),

or as ∆t→ 0,dI

dt= βSI − γI, (4.2)

which we diagram asSβSI−→ I

γI−→ S.

Using S + I = N , we eliminate S from (4.2) and define the basic reproductiveratio as

R0 =βN

γ. (4.3)

Equation(4.2) may then be rewritten as

dI

dt= γ(R0 − 1)I

(1− I

N(1− 1/R0)

),

which is again a logistic equation, but now with growth rate γ(R0 − 1) andcarrying capacity N(1− 1/R0). The disease will disappear if the growth rate isnegative, that is R0 < 1, and will become endemic if the growth rate is positive,that is R0 > 1. For an endemic disease with R0 > 1, the number of infectedpeople approaches the carrying capacity: I → N(1− 1/R0) as t→∞.

4.3. SIR EPIDEMIC DISEASE MODEL 47

We can give a biological interpretation to the basic reproductive ratio R0.Let l(t) be the probability that an individual initially infected at t = 0 is stillinfective at time t. Since the probability of being infective at time t + ∆t isequal to the probability of being infective at time t times the probability of notrecovering during the time ∆t, we have

l(t+ ∆t) = l(t)(1− γ∆t),

or as ∆t→ 0,dl

dt= −γl.

With initial condition l(0) = 1,

l(t) = e−γt. (4.4)

Now, the expected number of secondary infections produced by a single primaryinfective over the time period (t, t + dt) is given by the probability that theprimary infective is still infectious at time t, times the expected number ofsecondary infections produced by a single infective over the time dt, that is, l(t)×S(t)βdt. We assume that the total number of secondary infections from a singleinfective individual is small relative to the population size N . Therefore, theexpected number of secondary infectives produced by a single primary infectiveintroduced into a completely susceptible population is∫ ∞

0

βl(t)S(t)dt ≈ βN

∫ ∞0

l(t)dt

= βN

∫ ∞0

e−γtdt

=βN

γ

= R0,

where we have approximated S(t) ≈ N over the time period in which theinfective remains infectious. If a single infected individual introduced into acompletely susceptible population produces more than one secondary infectionbefore recovering, then R0 > 1 and the disease becomes endemic.

We have seen previously two other analogous calculations with result similarto (4.4): the first when computing the probability distribution of the intereventtime between births (§1.4); the second when computing the probability distri-bution of worm lifetime (§2.5). We have also seen an analogous definition of thebasic reproductive ratio in our previous discussion of age-structured populations(§2.3). There the basic reproductive ratio was the number of female offspringexpected of a new born female over her lifetime; the population size would growif this value was greater than unity. Both l(t) and the basic reproductive ratioare good examples of how the same mathematical concept can be applied toseemingly unrelated biological problems.

4.3 SIR epidemic disease model

The SIR model, first published by Kermack & McKendrick in 1927, is undoubt-edly the most famous mathematical model for the spread of an infectious disease.


Here, people are characterized into three classes: susceptible S, infective I andremoved R. Removed individuals are no longer susceptible nor infective forwhatever reason; for example, they have recovered from the disease and arenow immune, or they have been vaccinated, or they have been isolated fromthe rest of the population, or perhaps they have died from the disease. As inthe SIS model, we assume that infectives leave the I class with constant rate γ,but in the SIR model they move directly into the R class. The model may bediagrammed as

SβSI−→ I

γI−→ R,

and the corresponding coupled differential equations are

dS

dt= −βSI, dI

dt= βSI − γI, dR

dt= γI, (4.5)

with the constant population constraint S + I + R = N . For convenience, wenondimensionalize (4.5) using N for population size and γ−1 for time, that is,let

S = S/N, I = I/N, R = R/N, t = γt,

and define the basic reproductive ratio

R0 =βN

γ. (4.6)

The nondimensional SIR equations are

dS

dt= −R0SI,

dI

dt= R0SI − I ,

dR

dt= I , (4.7)

with nondimensional constraint S + I + R = 1.We will use the SIR model to address two fundamental questions: (1) Under

what condition does an epidemic occur? (2) If an epidemic occurs, what fractionof the population gets sick?

Let (S∗, I∗, R∗) be the fixed points of (4.7). Setting dS/dt = dI/dt =dR/dt = 0, we immediately observe from the equation for dR/dt that I = 0,and this value forces all the time-derivatives to vanish for any S and R. Sincewith I = 0, we have R = 1− S, evidently all the fixed points of (4.7) are givenby the one parameter family (S∗, I∗, R∗) = (S∗, 0, 1− S∗).

An epidemic occurs when a small number of infectives introduced into asusceptible population results in an increasing number of infectives. We canassume an initial population at a fixed point of (4.7), perturb this fixed pointby introducing a small number of infectives, and determine the fixed point’sstability. An epidemic occurs when the fixed point is unstable. The linearstability problem may be solved by considering only the equation for dI/dt in(4.7). With I << 1 and S ≈ S0, we have

dI

dt=(R0S0 − 1

)I ,

so that an epidemic occurs if R0S0 − 1 > 0. With the basic reproductive ratiogiven by (4.6), and S0 = S0/N , where S0 is the number of initial susceptibleindividuals, an epidemic occurs if

R0S0 =βS0

γ> 1, (4.8)

4.3. SIR EPIDEMIC DISEASE MODEL 49

0 0.5 1 1.5 20

0.2

0.4

0.6

0.8

R0

R ∞

fraction of population that get sick

Figure 4.1: Fraction of population that get sick in the SIR model as a functionof the basic reproduction ratio R0.

which could have been guessed. An epidemic occurs if an infective individualintroduced into a population of S0 susceptible individuals infects on averagemore than one other person.

We now address the second question: If an epidemic occurs, what fractionof the population gets sick? For simplicity, we assume that the entire initialpopulation is susceptible to the disease, so that S0 = 1. We expect the solutionof the governing equations (4.7) to approach a fixed point asymptotically intime (so that the final number of infectives will be zero), and we define thisfixed point to be (S, I, R) = (1 − R∞, 0, R∞), with R∞ equal to the fractionof the population that gets sick. To compute R∞, it is simpler to work with atransformed version of (4.7). By the chain rule, dS/dt = (dS/dR)(dR/dt), sothat

dS

dR=

dS/dt

dR/dt

= −R0S,

which is separable. Separating and integrating from the initial to final condi-tions, ∫ 1−R∞

1

dS

S= −R0

∫ R∞

0

dR,

which upon integration and simplification, results in the following transcenden-


tal equation for R∞:1− R∞ − e−R0R∞ = 0, (4.9)

which can be solved using Newton’s method. We have

F (R∞) = 1− R∞ − e−R0R∞ ,

F ′(R∞) = −1 +R0e−R0R∞ ,

and Newton’s method for solving F (R∞) = 0 iterates

R(n+1)∞ = Rn∞ −

F (R(n)∞ )

F ′(R(n)∞ )

for fixed R0 and suitable initial condition for R(0)∞ , which we take to be unity.

My code for computing R∞ as a function of R0 is given below. The resultis shown in Fig. 4.1. There is an explosion in the number of infections as R0

increases from unity, and this rapid increase is a classic example of what isknown more generally as a threshold phenomena.

function [R0, R_inf] = sir_rinf% computes solution of R_inf using Newton’s method from SIR modelnmax=10; numpts=1000;R0 = linspace(0,2,numpts); R_inf = ones(1,numpts);for i=1:nmax

R_inf = R_inf - F(R_inf,R0)./Fp(R_inf,R0);endplot(R0,R_inf); axis([0 2 -0.02 0.8])xlabel(’R_0’); ylabel(’R_\infty’);title(’fraction of population that get sick’)%subfunctionsfunction y = F(R_inf,R0)y = 1 - R_inf - exp(-R0.*R_inf);function y = Fp(R_inf,R0)y = -1 + R0.*exp(-R0.*R_inf);

4.4 SIR endemic disease model

A disease that is constantly present in a population is said to be endemic.Endemic diseases prevail over long time-scales: babies are born, older peopledie. Let b be the birth rate and d the disease-unrelated death rate. We separatelydefine c to be the disease-related death rate; R is now the immune class. Wemay diagram a SIR model of an endemic disease as

-S - I -RbN βSI γI

?dS

?dI

6cI

?dR

4.4. SIR ENDEMIC DISEASE MODEL 51

and the governing differential equations are

dS

dt= bN − βSI − dS, dI

dt= βSI − (c+ d+ γ)I,

dR

dt= γI − dR, (4.10)

with N = S + I + R. In our endemic disease model, N separately satisfies thedifferential equation

dN/dt = (b− d)N − cI, (4.11)

and is not necessarily constant.We model two simplified situations for which equilibrium solutions exist.

First, we assume no disease-related deaths (c = 0) and equal birth and deathrates (b = d); the population size N will then be constant and the equationswill admit an equilibrium solution. Second, we explore whether disease-relateddeaths can stabilize a disease-free growing population with b > d.

In our first model, with c = 0, b = d and S + I + R = N constant, thegoverning equations simplify to

dS

dt= d(I +R)− βSI, dI

dt= βSI − (γ + d)I,

dR

dt= γI − dR.

Fixed points are obtained from S = I = R = 0. To solve, we first consider I = 0.The solution is either I∗ = 0 or S∗ = (γ+ d)/β. For I∗ = 0, the equation R = 0yields R∗ = 0, and the constraint S + I + R = N yields S∗ = N . Hence, onefixed point is (S∗, I∗, R∗) = (N, 0, 0), corresponding to a disease-free population.The other fixed point corresponds to an endemic disease. We eliminate R∗ fromthe equation R = 0 using R∗ = N − S∗ − I∗, and substitute S∗ = (γ + d)/β toobtain

γI∗ − d(N − γ + d

β− I∗

)= 0,

which can be solved for I∗. We then determine R∗ using R∗ = (γ/d)I∗. Aftersome simple algebraic manipulation, the endemic disease equilibrium solutionis determined to be

S∗ = (γ + d)/β, (4.12)

I∗ =dN

γ + d

(1− γ + d

βN

), (4.13)

R∗ =γN

γ + d

(1− γ + d

βN

). (4.14)

Clearly, this solution only exists if βN/(γ + d) ≡ R0 > 1, defining the basicreproductive ratioR0 for this model. (Note that the probability that an infectiveleaves class I in a time ∆t is given by (γ + d)∆t, since the infective can eitherrecover, or die a natural death.) It can be shown that the disease-free state isunstable and the endemic disease state is stable when R0 > 1.

In our second model, we consider the full system of equations (4.10)-(4.11),assume b > d, and look for an equilibrium solution with N = N∗ constant.From N = 0, we have

I∗ =b− dc

N∗;

from I = 0, we have

S∗ =γ + d+ c

β;


and from R = 0, we have

R∗ =γ(b− d)dc

N∗.

using N∗ = S∗ + I∗ +R∗, we obtain

N∗ =(γ + d+ c

β

)(1

1−(1 + γ

d

) (b−dc

)) .The condition N∗ > 0, implies(

1 +γ

d

)(b− dc

)< 1,

orc > (b− d)

(1 +

γ

d

). (4.15)

When the disease-related death rate satisfies (4.15), then the population size, inequilibrium, is constant. To interpret (4.15), we use γ/d = R∗/I∗ and rewrite(4.15) as

cI∗I∗ +R∗

> (b− d).

The extra factor I∗/(I∗+R∗) must occur because only infectious but not recov-ered people can die from the disease.

4.5 Vaccination

Table 4.1 lists the diseases for which vaccines exist and are widely administratedto children. Health care authorities must determine the fraction of a populationthat must be vaccinated to prevent epidemics.

We address this problem within the SIR epidemic disease model. Let p bethe fraction of the population that is vaccinated, and p∗ the minimum fractionrequired to prevent an epidemic. When p > p∗, an epidemic can not occur.Since even non-vaccinated people are protected by the absence of epidemics, wesay that the population has acquired herd immunity.

We assume that individuals are susceptible unless vaccinated, and vaccinatedindividuals are in the removed class. The initial population is then modeled as(S, I, R) = (1 − p, 0, p). We have already determined the stability of this fixedpoint to perturbation by a small number of infectives. The condition for anepidemic to occur is given by (4.8), with S0 = 1− p. Therefore,

p∗ = 1− 1R0

.

Diseases with smaller values of R0 are easier to eradicate than diseases withlarger values R0 since a population can acquire herd immunity with a smallerfraction of the population vaccinated. For example, smallpox with R0 ≈ 4 hasbeen eradicated throughout the world whereas measles with R0 ≈ 17 still hasoccasional outbreaks.

4.5. VACCINATION 53

Disease Description Symptoms ComplicationsDiphtheria A bacterial res-

piratory diseaseSore throat and low-grade fever

Airway obstruction,coma, and death

Haemophilusinfluenzae typeb (Hib)

A bacterialinfection occur-ring primarily ininfants

Skin and throat in-fections, meningitis,pneumonia, sepsis,and arthritis

Death in one out of 20children, and permanentbrain damage in 10% -30% of the survivors

Hepatitis A A viral liver dis-ease

Potentially none;yellow skin or eyes,tiredness, stom-ach ache, loss ofappetite, or nausea

usually none

Hepatitis B Same as Hepati-tis A

Same as Hepatitis A Life-long liver problems,such as scarring of theliver and liver cancer

Measles A viral respira-tory disease

Rash, high fever,cough, runny nose,and red, watery eyes

Diarrhea, ear infections,pneumonia, encephalitis,seizures, and death

Mumps A viral lymphnode disease

Fever, headache,muscle ache, andswelling of thelymph nodes closeto the jaw

Meningitis, inflamma-tion of the testicles orovaries, inflammationof the pancreas anddeafness

Pertussis(whoopingcough)

A bacterial res-piratory disease

Severe spasms ofcoughing

Pneumonia, encephalitis,and death, especially ininfants

Pneumococcaldisease

A bacterial dis-ease

High fever, cough,and stabbing chestpains, bacteremia,and meningitis

death

Polio A viral lym-phatic andnervous systemdisease

Fever, sore throat,nausea, headaches,stomach aches, stiff-ness in the neck,back, and legs

Paralysis that can lead topermanent disability anddeath

Rubella (Ger-man measles)

A viral respira-tory disease

Rash and fever fortwo to three days

Birth defects if acquiredby a pregnant woman

Tetanus (lock-jaw)

A bacterial ner-vous system dis-ease

Lockjaw, stiffness inthe neck and ab-domen, and diffi-culty swallowing

Death in one third of thecases, especially peopleover age 50

Varicella(chickenpox)

A viral diseasein the Herpesfamily

A skin rash ofblister-like lesions

Bacterial infection of theskin, swelling of thebrain, and pneumonia

Human papil-lomavirus

A viral skin andmucous mem-brane disease

Warts, cervical can-cer

5-year survival rate fromall diagnoses of cervicalcancer is 72%

Table 4.1: Previously common diseases for which vaccines have been developed.


4.6 Evolution of virulence

Microorganisms continuously evolve due to selection pressures in their environ-ment. Antibiotics are a common source of selection pressure on pathogenicbacteria, and the development of antibiotic resistant strains presents a majorhealth challenge to medical science. Viruses also compete directly with eachother for reproductive success resulting in the evolution of virulence. Here,using the SIR endemic disease model we study how virulence may evolve.

We assume that a population is initially in equilibrium with an endemicdisease caused by a wildtype virus, say. Now, suppose a virus mutates by arandom, undirected process that occurs naturally. We want to determine theconditions under which the mutant virus will replace the wildtype virus in thepopulation. We can frame this biological question as a mathematical stabilityproblem: Under what conditions does the original endemic disease equilibriumsolution becomes unstable upon introduction of a mutant viral strain? Weassume that the original wildtype virus has infection rate β, removal rate γ,and disease-related death rate c, and that the mutant virus has correspondingrates β′, γ′ and c′. We further assume that an individual infected with eitherwildtype or mutant virus gains immunity to subsequent infection from bothwildtype and mutant viral forms. Our model thus has a single susceptible classS, two distinct infective classes I and I ′ depending on which virus causes theinfection, and a single recovered class R. The appropriate diagram is

-S

-

-

I

I ′

RbN

βSI

β′SI ′?

dS

?(d+ c′)I ′

6(d+ c)I

-

?dR

γI

γ′I ′

with corresponding differential equations

dS

dt= bN − dS − S(βI + β′I ′),

dI

dt= βSI − (γ + d+ c)I,

dI ′

dt= β′SI ′ − (γ′ + d+ c′)I ′,

dR

dt= γI + γ′I ′ − dR.

If the population is in equilibrium with the wildtype virus, then I = 0, I 6= 0,and the equilibrium value for S is

S∗ =γ + d+ c

β. (4.16)

4.6. EVOLUTION OF VIRULENCE 55

We perturb this endemic disease equilibrium by introducing a small numberof infectives carrying the mutated virus, that is, by letting I ′ be small. Ratherthan solve the stability problem by means of a Jacobian analysis, we can directlyexamine the equation for dI ′/dt. Here, with S = S∗ given by (4.16), we have

dI ′

dt=[β′(γ + d+ c)

β− (γ′ + d+ c′)

]I ′,

and I ′ increases exponentially if

β′(γ + d+ c)β

− (γ′ + d+ c′) > 0,

or after some algebra:β′

γ′ + d+ c′>

β

γ + d+ c. (4.17)

Our result (4.17) suggests that endemic viruses (or other microorganisms) willtend to evolve (1) to be more easily transmitted between people (β′ > β); (2)to make people sick longer (γ′ < γ), and; (3) to be less deadly c′ < c. In otherwords, viruses evolve to increase their basic reproductive ratio. For instance,viruses evolve to be less deadly because dead people do not spread disease (atleast they do not in our model). Our result would not be applicable if deadpeople in fact did spread disease, a possibility if disposal of the dead was notdone with sufficient care, perhaps because of certain cultural traditions such asfamily washing of the dead body.

Chapter 5

Population Genetics

Deoxyribonucleic acid, or DNA – a large double-stranded, helical molecule, withrungs made from four base pairs adenine (A), cytosine (C), thymine (T) andguanine (G) – carries the inherited genetic information. The ordering of thebase pairs A, C, T and G determines the DNA sequence. A gene is a particularDNA sequence that is the fundamental unit of heredity for a particular trait.Most sexual species are diploid, carrying two copies of every gene, one from eachparent. Bacterial species are haploid with only one copy. When we say thereis a gene for eye color, we mean there is a particular DNA sequence that mayvary in a population, and that there are at least two subtypes, called alleles,where individuals with two copies of the blue allele have blue eyes, those withtwo copies of the brown allele, brown eyes. An individual with two copies ofthe same allele is homozygous for that particular gene (or a homozygote), whilean individual carrying two different alleles is heterozygous (or a heterozygote).For the eye-color gene, an individual carrying both a blue and brown allele hasbrown eyes. We say that blue eyes is a recessive trait (or the blue-eye allele isrecessive), brown eyes is a dominant trait (or the brown-eye allele is dominant).The combination of alleles carried by an individual is called his genotype, whilethe actual trait (blue or brown eyes) is called his phenotype. A gene that hasmore than one allele in a population is called polymorphic, and we say thepopulation has a polymorphism for that particular gene.

Population genetics can be defined as the mathematical modeling of the evo-lution and maintenance of polymorphism in a population. Population geneticstogether with Charles Darwin’s theory of evolution by natural selection andGregor Mendel’s theory of biological inheritance forms the modern evolutionarysynthesis (sometimes called the modern synthesis, the evolutionary synthesis,neo-Darwinian synthesis or neo-Darwinism). The primary founders in the earlytwentieth century of population genetics were Sewall Wright, J. B. S. Haldaneand Ronald Fisher.

Allele frequencies in a population can change due to the influence of fourprimary evolutionary forces: natural selection, genetic drift, mutation, and mi-gration. Here, we mainly focus on natural selection and mutation. Geneticdrift is the study of stochastic effects, and is important for small populations.Migration requires consideration of the spatial distribution of a population, andis usually modeled mathematically by partial differential equations.

The simplified models we will consider assume infinite population sizes (ne-

57

58 CHAPTER 5. POPULATION GENETICS

genotype A anumber nA naviability fitness gA gafertility fitness fA fa

Table 5.1: Haploid genetics using population size and absolute viability andfertility fitnesses.

glecting stochastic effects except in §5.5), well-mixed populations (neglectingspatial distribution), and discrete generations (neglecting age-structure). Ourmain purpose is to illustrate the fundamental ways that a genetic polymorphismcan be maintained in a population.

5.1 Haploid genetics

We first consider the modeling of selection in a population of haploid organisms.Selection is modeled by fitness coefficients, with different genotypes having dif-ferent fitnesses. We begin with a simple model that counts the number ofindividuals in the next generation. We will demonstrate how this model can bereformulated in terms of allele frequencies and relative fitness coefficients.

Table 5.1 formulates the basic model. We assume two alleles A and a for aparticular haploid gene. These alleles are carried in the population by nA andna individuals, respectively. A fraction gA (ga) of individuals carrying allele A(a) is assumed to survive to reproduction age, and those that survive contributefA (fa) offspring to the next generation. These are of course average values,but under the assumption of an infinite population we model deterministically.Accordingly, with n

(i)A (n(i)

a ) representing the number of individuals carryingallele A (a) in the ith generation, and formulating a discrete generation model,we have

n(i+1)A = fAgAn

(i)A , n(i+1)

a = fagan(i)a . (5.1)

It is mathematically easier and more transparent to work with allele frequenciesrather than individual numbers. We denote the frequency (or more accurately,proportion) of allele A (a) in the ith generation by pi (qi), that is

pi =n

(i)A

n(i)A + n

(i)a

, qi =n

(i)a

n(i)A + n

(i)a

,

where evidently pi + qi = 1. Now from (5.1),

n(i+1)A + n(i+1)

a = fAgAn(i)A + fagan

(i)a , (5.2)

so that dividing the first equation in (5.1) by (5.2) yields

pi+1 =fAgAn

(i)A

fAgAn(i)A + fagan

(i)a

=fAgApi

fAgApi + fagaqi

=pi

pi +(faga

fAgA

)qi,

(5.3)

5.1. HAPLOID GENETICS 59

genotype A afreq. of gamete p qrelative fitness 1 + s 1freq after selection (1 + s)p/w q/wmean fitness w = (1 + s)p+ q

Table 5.2: Haploid genetic model of the spread of a favored allele.

where the second equality comes from dividing the numerator and denominatorby n(i)

A + n(i)a , and the third equality from dividing the numerator and denomi-

nator by fAgA. Similarly,

qi+1 =

(faga

fAgA

)qi

pi +(faga

fAgA

)qi, (5.4)

which could also be derived using qi+1 = 1− pi+1. We observe from the evolu-tion equations for the allele frequencies, (5.3) and (5.4), that only the relativefitness faga/fAgA of the alleles matters. Accordingly, in our models we will onlyconsider relative fitnesses, and arbitrarily set one fitness to unity to simplify thealgebra and make the final result more transparent.

5.1.1 Spread of a favored allele

We consider a simple model for the spread of a favored allele in Table 5.2, withs > 0. Denoting p′ by the frequency of A in the next generation (not thederivative of p!), the model equation is given by

p′ =(1 + s)p

w(5.5)

=(1 + s)p1 + sp

, (5.6)

where we have used (1 + s)p + q = 1 + sp, since p + q = 1. Note that (5.6) isthe same as (5.3) with p′ = pi+1, p = pi, and fAgA/faga = 1 + s. Fixed pointsof (5.6) are determined from p′ = p. We find two fixed points: p∗ = 0, corre-sponding to a population in which allele A is absent; and p∗ = 1, correspondingto a population in which allele A is fixed. Intuitively, p∗ = 0 is unstable whilep∗ = 1 is stable.

To illustrate how a stability analysis is done analytically for a difference equa-tion (instead of a differential equation), consider the general difference equation

p′ = f(p). (5.7)

With p = p∗ a fixed point so that p∗ = f(p∗), we write p = p∗ + ε, and (5.7)becomes

p∗ + ε′ = f(p∗ + ε)= f(p∗) + εf ′(p∗) + . . .

= p∗ + εf ′(p∗) + . . . ,


genotype A afreq. of gamete p qrelative fitness 1 1− sfreq after selection p/w (1− s)q/wfreq after mutation (1-u)p/w [(1− s)q + up]/wmean fitness w = p+ (1− s)q

Table 5.3: Haploid genetic model of mutation-selection balance.

where f ′(p∗) denotes the derivative of f evaluated at p∗. Therefore, to leading-order in ε

|ε′/ε| = |f ′(p∗)| ,

and the fixed point is stable provided |f ′(p∗)| < 1. For our haploid model,

f(p) =(1 + s)p1 + sp

, f ′(p) =1 + s

(1 + sp)2,

so that f ′(p∗ = 0) = 1 + s > 1, and f ′(p∗ = 1) = 1/(1 + s) < 1, confirming thatp∗ = 0 is unstable and p∗ = 1 is stable.

If the selection coefficient s is small, the model equation (5.6) simplifiesfurther. We have

p′ =(1 + s)p1 + sp

= (1 + s)p(1− sp+ O(s2))= p+ (p− p2)s+ O(s2),

so that to leading-order in s,

p′ − p = sp(1− p).

If p′ − p << 1, which is valid for s << 1, we can approximate this differenceequation by the differential equation

dp/dn = sp(1− p),

which shows that the frequency of allele A satisfies the now very familiar logisticequation.

Although a polymorphism for this gene exists in the population as the newallele spreads, eventually A becomes fixed in the population and the polymor-phism is lost. In the next section, we consider how a polymorphism can bemaintained in a haploid population by the balance between mutation and selec-tion.

5.1.2 Mutation-selection balance

We consider a gene with two alleles: a wildtype allele A and a mutant allelea. We view the mutant allele as a defective genotype, which confers on thecarrier a lowered fitness 1 − s relative to the wildtype. Although all mutantalleles may not have identical DNA sequence, we assume they share in com-mon the same phenotype of reduced fitness. We model the opposing effects

5.2. DIPLOID GENETICS 61

of two evolutonary forces: natural selection, which favors the wildtype alleleA over the mutant allele a, and mutation, which confers a small probabilityu that allele A mutates to allele a in each newborn individual. Schematically,

A-a�

u

s

where u represents mutation and s represents selection. The model is shown inTable 5.3. The equations for p and q in the next generation are

p′ =(1− u)p

w

=(1− u)p

1− s(1− p), (5.8)

and

q′ =(1− s)q + up

w

=(1− s− u)q + u

1− sq,

where we have used p+ q = 1 to eliminate q from the equation for p′ and p fromthe equation for q′. The equations for p′ and q′ are linearly dependent sincep′ + q′ = 1, and we need solve only one of them.

Considering the equation for p′, the fixed points determined from p′ = p arep∗ = 0, for which the mutant allele a is fixed in the population and there is nopolymorphism, and the solution to

1− s(1− p∗) = 1− u,

which is p∗ = 1 − u/s, and there is a polymorphism. The stability of thesetwo fixed points is determined by considering p′ = f(p), with f(p) given by theright-hand-side of (5.8). Taking the derivative of f :

f ′(p) =(1− u)(1− s)[1− s(1− p)]2

,

so thatf ′(p∗ = 0) =

1− u1− s

, f ′(p∗ = 1− u/s) =1− s1− u

.

Applying the criterion |f ′(p∗)| < 1 for stability, p∗ = 0 is stable for s < u andp∗ = 1 − u/s is stable for s > u. A polymorphism is therefore possible undermutation-selection balance when s > u > 0.

5.2 Diploid genetics

Most sexually reproducing species are diploid. In particular, our species Homosapiens is diploid with two exceptions: we are haploid at the gamete stage(sperm and unfertilized egg); and males are haploid for most genes on the un-matched X and Y sex chromosomes (females are XX and diploid). This latterseemingly innocent fact is of great significance to males suffering from genetic


genotype AA Aa aareferred to as wildtype homozygote heterozygote mutant homozygote

frequency P Q R

Table 5.4: Terminology of diploidy.

diseases due to an X-linked recessive mutation inherited from their mother. Fe-males inheriting this mutation are most probably disease-free because of thefunctional gene inherited from their father.

A polymorphic gene with alleles A and a can appear in a diploid gene asthree distinct genotypes: AA, Aa and aa. Conventionally, we denote A to bethe wildtype allele and a the mutant allele. Table 5.4 presents the terminologyof diploidy.

As for haploid genetics, we will determine evolution equations for alleleand/or genotype frequencies. To develop the appropriate definitions and re-lations, we initially assume a population of size N (which we will later take tobe infinite), and assume that the number of individuals with genotypes AA, Aaand aa are NAA, NAa and Naa. Now, N = NAA +NAa +Naa. Define genotypefrequencies P , Q and R by

P =NAAN

, Q =NAaN

, R =NaaN

,

so that P +Q+R = 1. It will also be useful to define allele frequencies. Let nAand na be the number of alleles A and a in the population, with n = nA+na thetotal number of alleles. Since the population is of size N and diploidy, n = 2N ;and since each homozygote contains two identical alleles, and each heterozygotecontains one of each allele, nA = 2NAA +NAa and na = 2Naa +NAa. Definingthe allele frequencies p and q as previously,

p = nA/n

=2NAA +NAa

2N

= P +12Q;

and similarly,

q = na/n

=2Naa +NAa

2N

= R+12Q.

With five frequencies, P , Q, R, p, q, and four constraints P + Q + R = 1,p + q = 1, p = P + Q/2, q = R + Q/2, how many independent frequenciesare there? In fact, there are two because one of the four constraints is linearlydependent. We may choose any two frequencies other than the choice {p,q} asour linearly independent set. For instance, one choice is {P , p}; then

q = 1− p, Q = 2(p− P ), R = 1 + P − 2p.


Similarly, another choice is {P , Q}; then

R = 1− P −Q, p = P +12Q, q = 1− P − 1

2Q.

5.2.1 Sexual reproduction

Diploid reproduction may be sexual or asexual, and sexual reproduction maybe of varying types (e.g., random mating, selfing, brother-sister mating, andvarious other types of assortative mating). The two simplest types to modelexactly are random mating and selfing. These mating systems are useful toexplore the biology of both outbreeding and inbreeding.

Random mating

The simplest type of sexual reproduction to model mathematically is randommating. Here, we assume a well-mixed population of individuals that haveequal probability of mating with every other individual. We will determinethe genotype frequencies of the zygotes (fertilized eggs) in terms of the allelefrequences using two approaches: (1) the gene pool approach, and; (2) themating table approach.

The gene pool approach models sexual reproduction by assuming that malesand females release their gametes into pools. Offspring genotypes are deter-mined by randomly combining one gamete from the male pool and one gametefrom the female pool. Since the probability of a random gamete containing al-lele A or a is equal to the allele’s population frequency p or q, respectively, theprobability of an offspring being AA is p2, of being Aa is 2pq (male A female a+ female A male a), and of being aa is q2. Therefore, after a single generationof random mating, the genotype frequencies can be given in terms of the allelefrequencies by

P = p2, Q = 2pq, R = q2.

Notice that under the assumption of random mating, there is now only a singleindependent frequency, greatly simplifying the mathematical modeling. Forexample, if p is taken as the independent frequency, then

q = 1− p, P = p2, Q = 2p(1− p), R = (1− p)2.

Most modeling is done assuming random mating unless the biology under studyis influenced by inbreeding.

The second approach uses a mating table (see Table 5.5). This approachto modeling sexual reproduction is more general and can be applied to othermating systems. We explain this approach by consider the mating AA×Aa. Thegenotypes AA and Aa have frequencies P and Q, respectively. The frequency ofAA males mating with Aa females is PQ and is the same as AA females matingwith Aa males, so the sum is 2PQ. Half of the offpring will be AA and half Aa,and the frequencies PQ are denoted under progeny frequency. The sums of allthe progeny frequencies is given in the Totals row, and use of the relationshipbetween the genotype and allele frequencies recovers the random mating results.


progeny frequencymating frequency AA Aa aaAA×AA P 2 P 2 0 0AA×Aa 2PQ PQ PQ 0AA× aa 2PR 0 2PR 0Aa×Aa Q2 1

4Q2 1

2Q2 1

4Q2

Aa× aa 2QR 0 QR QRaa× aa R2 0 0 R2

Totals (P +Q+R)2 (P + 12Q)2 2(P + 1

2Q)(R+ 12Q) (R+ 1

2Q)2

= 1 = p2 = 2pq = q2

Table 5.5: Random mating table.

progeny frequencymating frequency AA Aa aaAA⊗ P P 0 0Aa⊗ Q 1

4Q12Q

14Q

aa⊗ R 0 0 RTotals 1 P + 1

4Q12Q R+ 1

4Q

Table 5.6: Selfing mating table.

Selfing

Perhaps the next simplest type of mating system is self-fertilization, or selfing.Here, an individual reproduces sexually (passing through a haploid gamete stagein its life-cycle), but provides both of the gametes. For example, the nematodeworm C. elegans can reproduce by selfing. The mating table for selfing is givenin Table 5.6. The selfing frequency of a particular genotype is just the frequencyof the genotype itself. For a selfing population, disregarding selection or anyother evolutionary forces, the genotype frequencies evolve as

P ′ = P +14Q, Q′ =

12Q, R′ = R+

14Q. (5.9)

We can solve (5.9) with initial conditions Q0 = 1 and P0 = R0 = 0, thatis, an initially heterozygous population. In the worm lab, this type of initialpopulation is commonly created by crossing wildtype homozygous C. elegansmales with mutant homozygous C. elegans hermaphrodites, where the mutantallele is recessive. Wildtype hermaphrodite offspring, which are necessarily het-erozygous, are then picked to separate worm plates and allowed to self-fertilize.(Do you see why the experiment is not done with wildtype hermaphrodites andmutant males?) From the equation for Q′ in (5.9), we have Qn = (1/2)n, andfrom symmetry, Pn = Rn. Then since Pn+Qn+Rn = 1, we obtain the completesolution

Pn =12

(1−

(12

)n), Qn =

(12

)n, Rn =

12

(1−

(12

)n).

The main result to be emphasized here is that the heterozygosity of the popu-lation decreases by a factor of two each generation. Selfing populations rapidly


become homozygous. For homework, I will ask you to determine the genotypefrequencies for an initially random mating population that transitions to selfing.

Constancy of allele frequencies

Mating by itself does not change allele frequencies in a population, but onlyreshuffles alleles into different genotypes. We can show this directly for randommating and selfing. For random mating,

p′ = P ′ +12Q′

= p2 +12

(2pq)

= p(p+ q)= p;

and for selfing,

p′ = P ′ +12Q′

=(P +

14Q

)+

12

(12Q

)= P +

12Q

= p.

The conservation of allele frequency by mating is an important element of neo-Darwinism. In Darwin’s time, most biologists believed in blending inheritance,where the genetic material from parents with different traits actually blended intheir offspring, rather like the mixing of paints with different color. If blendinginheritance occurred, then genetic variation, or polymorphism, would eventu-ally be lost over several generations as the “genetic paints” became well-mixed.Mendel’s work on peas, published in 1866, suggested a particulate theory ofinheritance. Sadly, Mendel’s paper was not read by Darwin (who published TheOrigin of Species in 1859 and died in 1882) or other influential biologists dur-ing Mendel’s lifetime (Mendel died in 1884). After being rediscovered in 1900,Mendel’s work eventually became widely celebrated.

5.2.2 Spread of a favored allele

We consider the spread of a favored allele in a diploid population. The classicexample – widely repeated in biology textbooks as a modern example of naturalselection – is the change in the frequencies of the dark and light phenotypesof the peppered moth during England’s industrial revolution. The evolutionarystory begins with the observation that pollution killed the light colored lichen ontrees during industrialization of the cities. Light colored peppered moths cam-ouflage well on light colored lichens, but are exposed to birds on plain tree bark.On the other hand, dark colored peppered moths camouflage well on plain treebark, but are exposed on light colored lichens (see Fig. 5.1). Natural selectiontherefore favored the light-colored allele in preindustrialized England and the


Figure 5.1: Evolution of peppered moths in industrializing England.

dark-colored allele during industrialization. The dark-colored allele, presum-ably kept at low frequency by mutation-selection balance in pre-industrializedEngland, increased rapidly under natural selection in industrializing England.

We present our model in Table 5.7. Here, we consider aa as the wildtypegenotype and normalize its fitness to unity. The allele A is the mutant whosefrequency increases in the population. In our example of the peppered moth, theaa phenotype is light colored and the AA phenotype is dark colored. The colorof the Aa phenotype depends on the relative dominance of A and a. Usually, nopigment results in light color, and is a consequence of nonfunctioning pigment-producing genes. One functioning pigment-producing allele is usually sufficientto result in a dark-colored moth. With A a functioning pigment-producing alleleand a the mutated nonfunctioning allele, a is most likely recessive, A is mostlikely dominant, and the phenotype of Aa is most likely dark, so h ≈ 1. For themoment, though, we leave h as a free parameter.

We assume random mating, and this simplification is used to write the geno-type frequencies as P = p2, Q = 2pq, and R = q2. Since q = 1−p, we reduce ourproblem to determining an equation for p′ in terms of p. Using p′ = Ps+(1/2)Qs,where p′ is the A allele frequency in the next generation’s zygotes, and Ps andQs are the AA and Aa genotype frequencies in the present generation afterselection,

p′ =(1 + s)p2 + (1 + sh)pq

w,

with q = 1− p, and

w = (1 + s)p2 + 2(1 + sh)pq + q2

= 1 + s(p2 + 2hpq).

After some algebra, the final evolution equation written solely in terms of p is

p′ =(1 + sh)p+ s(1− h)p2

1 + 2shp+ s(1− 2h)p2. (5.10)

The expected fixed points of this equation are p∗ = 0 (unstable) and p∗ = 1(stable), where our assignment of stability assumes positive selection coefficients.


genotype AA Aa aafreq. of zygote p2 2pq q2

relative fitness 1 + s 1 + sh 1freq after selection (1 + s)p2/w 2(1 + sh)pq/w q2/wmean fitness w = (1 + s)p2 + 2(1 + sh)pq + q2

Table 5.7: Diploid genetic model of the spread of a favored allele assumingrandom mating.

The evolution equation (5.10) in this form is not particularly illuminating.In general, numerical solution requires specifying numerical values for s and h,as well as an initial value for p. Here, we investigate analytically the increaseof a favored allele A assuming the selection coefficient s << 1, to determinehow the spread of A depends on the dominance coefficient h. We Taylor-seriesexpand the right-hand-side of (5.10) in powers of s, keeping terms to order s:

p′ =(1 + sh)p+ s(1− h)p2

1 + 2shp+ s(1− 2h)p2

=p+ s

(hp+ (1− h)p2

)1 + s (2hp+ (1− 2h)p2)

=[p+ s

(hp+ (1− h)p2

)] [1− s

(2hp+ (1− 2h)p2

)+ O(s2)

]= p+ sp

(h+ (1− 3h)p− (1− 2h)p2

)+ O(s2). (5.11)

If s << 1, we expect a small change in allele frequency each generation, so wecan approximate p′ − p ≈ dp/dn, where n denotes the generation number, andp = p(n). The approximate differential equation obtained from (5.11) is

dp

dn= sp

(h+ (1− 3h)p− (1− 2h)p2

). (5.12)

If A is partially dominant so that h 6= 0 (e.g., the heterozygous moth isdarker than the homozgygous mutant moth), then the solution to (5.12) behavessimilarly to the solution of a logistic equation: p initially grows exponentiallyas p(n) = p0 exp (shn), and asymptotes to one for large n. If A is recessive sothat h = 0 (e.g., the heterozygous moth is as light-colored as the homozygousmutant moth), then (5.12) reduces to

dp

dn= sp2 (1− p) , for h = 0. (5.13)

Of main interest is the initial growth of p when p(0) = p0 << 1, so that dp/dn ≈sp2. This differential equation may be integrated by separating variables to yield

p(n) =p0

1− sp0n

≈ p0(1 + sp0n).

The frequency of a recessive favored allele increases only linearly across gen-erations, a consequence of the heterozygote being hidden from natural selec-tion. Most likely the peppered-moth heterozygote is significantly darker than


disease mutation symptomsThalassemia haemoglobin anemiaSickle cell anemia haemoglobin anemiaHaemophilia blood clotting factor uncontrolled bleedingCystic Fibrosis chloride ion channel thick lung mucousTay-Sachs disease Hexosaminidase A enzyme nerve cell damageFragile X syndrome FMR1 gene mental retardationHuntington’s disease HD gene brain degeneration

Table 5.8: Seven common monogenic diseases.


relative fitness 1 1− sh 1− sfreq after selection p2/w 2(1− sh)pq/w (1− s)q2/wmean fitness w = p2 + 2(1− sh)pq + (1− s)q2

Table 5.9: Diploid genetic model of mutation-selection balance assuming ran-dom mating.

the light-colored homozygote since the dark colored moth rapidly increased infrequency over a short period of time.

As a final comment, linear growth in the frequency of A when h = 0 issensitive to our assumption of random mating. If selfing occurred, or anothertype of close family mating, then a recessive favored allele may still increaseexponentially. In this circumstance, the production of homozygous offspringfrom more frequent heterozygote pairings allows selection to act more effectively.

5.2.3 Mutation-selection balance

By virtue of self-knowledge, the species with the most known mutant phenotypesis Homo sapiens. There are thousands of known genetic diseases in humans,many of them caused by mutation of a single gene (denoted as a monogenicdisease). For an easy-to-read overview of genetic disease in humans, see thewebsite

http://www.who.int/genomics/public/geneticdiseases.

Table 5.8 lists seven common monogenic diseases. The first two diseases aremaintained at significant frequencies in some human populations by heterosis.We will discuss in §5.2.4 the maintenance of a polymorphism by heterosis, forwhich the heterozygote has higher fitness than either homozygote. It is postu-lated that Tay-Sachs disease, prevalent among ancestors of Eastern EuropeanJews, and cystic fibrosis may also have been maintained by heterosis acting inthe past. (Note that the cystic fibrosis gene was identified in 1989 by a Torontogroup led by Lap Chee Tsui, who later became President of the University ofHong Kong.) The other disease genes listed may be maintained by mutation-selection balance.

Our model for diploid mutation selection-balance is given in Table 5.9. Wefurther assume that mutations of type A→ a occur in gamete production with


frequency u. Back-mutation is neglected. The gametic frequency of A and aafter selection but before mutation is given by p = Ps+Qs/2 and q = Rs+Qs/2,and the gametic frequency of a after mutation is given by q′ = up+ q. Therefore,

q′ ={u[p2 + (1− sh)pq] + [(1− s)q2 + (1− sh)pq]

}/w,

with

w = p2 + 2(1− sh)pq + (1− s)q2

= 1− sq(2hp+ q).

Using p = 1−q, we write the evolution equation for q′ in terms of q alone. Aftersome algebra that could be facilitated using a computer algebra software suchas Mathematica, we obtain

q′ =u+ [1− u− sh(1 + u)]q − s[1− h(1 + u)]q2

1− 2shq − s(1− 2h)q2. (5.14)

To determine the equilibrium solutions of (5.14), we set q∗ ≡ q′ = q toobtain a cubic equation for q∗. One solution readily found is q∗ = 1, in whichall the A alleles have mutated to a. This is a solution because of the neglect ofback mutation in our model. The q∗ = 1 solution may be factored out of thecubic equation resulting in a quadratic equation, with two solutions. Ratherthan show the exact result here, we determine equilibrium solutions under twoapproximations: (i) 0 < u << h, s, and; (ii) 0 = h < u < s.

First, when 0 < u << h, s , we look for a solution of the form q∗ = au+O(u2),with a constant, and Taylor series expand in u (assuming s, h = O(u0)). If sucha solution exists, then (5.14) will determine the unknown coefficient a. We have

au+ O(u2) =u+ (1− sh)au+ O(u2)

1− 2shau+ O(u2)= (1 + a− sha)u+ O(u2);

and equating powers of u, a = 1 + a− sha, or a = 1/sh. Therefore,

q∗ = u/sh+ O(u2), for 0 < u << h, s.

Second, when 0 = h < u < s , we substitute h = 0 directly into (5.14),

q∗ =u+ (1− u)q∗ − sq2

∗1− sq2

∗,

which we then write as a cubic equation q∗,

q3∗ − q2

∗ −u

sq∗ +

u

s= 0.

Factoring this cubic equation,

(q∗ − 1)(q2∗ − u/s) = 0;

and the polymorphic equilibrium solution is

q∗ =√u/s, for 0 = h < u < s.


genotype AA Aa aafreq: 0 < u << s, h 1 + O(u) 2u/sh+ O(u2) u2/(sh)2 + O(u3)freq: 0 = h < u < s 1 + O(

√u) 2

√u/s+ O(u) u/s

Table 5.10: Equlibrium frequencies of the genotypes at diploid mutation-selection balance.


relative fitness 1− s 1 1− tfreq after selection (1− s)p2/w 2pq/w (1− t)q2/wmean fitness w = (1− s)p2 + 2pq + (1− t)q2

Table 5.11: Diploid genetic model of heterosis assuming random mating.

Notice that q∗ < 1 only if s > u, so that this solution does not exist if s < u.Table 5.10 summarizes our results for the equilibrium frequencies of the geno-

types at mutation-selection balance. The first row of frequencies, 0 < u << s, h,corresponds to a dominant (h = 1) or partially-dominant (u << h < 1) muta-tion, where the heterozygote is of reduced fitness and shows symptoms of thegenetic disease. The second row of frequencies, 0 = h < u < s, corresponds to arecessive mutation, where the heterozygote is symptom-free. Notice that indi-viduals carrying a dominant mutation are twice as prevalent in the populationas individuals homozygous for a recessive mutation (with the same u and s).

A heterozygote carrying a dominant mutation most commonly arises eitherde novo (by direct mutation of allele A) or by the mating of a heterozygotewith a wildtype. The latter is more common for s << 1, while the formermust occur for s = 1 (a heterozygote with an s = h = 1 mutation by definitiondoes not reproduce). One of the most common autosomal dominant geneticdiseases is Huntington’s disease, resulting in brain deterioration during middleage. Because individuals with Huntington’s disease have children before diseasesymptoms appear, s is small and the disease is usually passed to offspring bythe mating of a (heterozygote) with a wildtype homozygote. For a recessivemutation, a mutant homozygote usually occurs by the mating of two heterozy-gotes. If both parents carry a single recessive disease allele, then their child hasa 1/4 chance of getting the disease.

5.2.4 Heterosis

Heterosis, also called overdominance or heterozygote advantage, occurs when theheterozygote has higher fitness than either homozygote. The best-known exam-ples are sickle-cell anemia and thalassemia, both diseases that affect hemoglobin,the oxygen-carrier protein of red blood cells. The sickle-cell mutations are mostcommon in people of West African descent, while the thalassemia mutations aremost common in people from the Mediterranean and Asia. In Hong Kong, thetelevision stations occasionally play public service announcements concerningthalassemia. The heterozygote carrier of the sickle-cell or thalassemia gene ishealthy and resistant to malaria; the wildtype homozygote is healthy, but sus-ceptible to malaria; the mutant homozygote is sick with anemia. In class, we

5.3. FREQUENCY-DEPENDENT SELECTION 71

player \ opponent H DH EHH = −2 EHD = 2D EDH = 0 EDD = 1

Table 5.12: General payoff matrix for the Hawk-Dove game, and the usuallyassumed values. The payoffs are payed to the player (first column) when playingagainst the opponent (first row).

will watch the following short movie about the sickle cell gene, which you canalso view at your convenience:

http://www.pbs.org/wgbh/evolution/library/01/2/l_012_02.html.

Table 5.11 presents our model for heterosis. Both homozygotes are of lowerfitness than the heterozygote, whose relative fitness we arbitrarily set to unity.Writing the equation for p′, we have

p′ =(1− s)p2 + pq

1− sp2 − tq2

=p− sp2

1− t+ 2tp− (s+ t)p2.

At equilibrium, p∗ ≡ p′ = p, and we obtain a cubic equation for p∗:

(s+ t)p3∗ − (s+ 2t)p2

∗ + tp∗ = 0. (5.15)

Evidently, p∗ = 0 and p∗ = 1 are fixed points, and (5.15) can be factored as

p(1− p) (t− (s+ t)p) = 0.

The polymorphic solution is therefore

p∗ =t

s+ t, q∗ =

s

s+ t,

valid when s, t > 0. Since the value of q∗ can be large, recessive mutations thatcause disease, yet are highly prevalent in a population, are suspected to providesome benefit to the heterozygote. However, only a few genes are unequivocallyknown to exhibit heterosis.

5.3 Frequency-dependent selection

A polymorphism may also result from frequency-dependent selection. A well-known model for frequency-dependent selection is the Hawk-Dove game. Mostcommonly, frequency-dependent selection is studied using game theory, andfollowing John Maynard Smith, one looks for an evolutionarily stable strat-egy (ESS). Here, we study frequency-dependent selection using a population-genetics model.

We consider two phenotypes: Hawk and Dove, with no mating betweendifferent phenotypes (for example, different phenotypes may correspond to dif-ferent species, such as hawks and doves). We define Hawk and Dove as follows:


genotype H Dfreq of zygote p qrelative fitness K + pEHH + qEHD K + pEDH + qEDDfreq after selection (K + pEHH + qEHD)p/w (K + pEDH + qEDD)q/wmean fitness w = (K + pEHH + qEHD)p+ (K + pEDH + qEDD)q

Table 5.13: Haploid genetic model for frequency-selection in the Hawk-Dovegame

(i) when Hawk meets Dove, Hawk gets the resource and Dove retreats before in-jury; (ii) when Hawks meet, they engage in an escalating fight, seriously riskinginjury, and; (iii) when Doves meet, they share the resource. The Hawk-Dovegame is modeled by a payoff matrix, as shown in Table 5.12. The player inthe first column receives the payoff when playing the opponent in the first row.For instance, Hawk playing Dove gets the payoff EHD. The numerical valuesare commonly chosen such that EHH < EDH < EDD < EHD, that is, Hawkplaying Dove does better than Dove playing Dove does better than Dove playingHawk does better than Hawk playing Hawk.

Frequency-dependent selection occurs because the fitness of Hawk or Dovedepends on the frequency of Hawks and Doves in the population. For example, aHawk in a population of Doves does well, but a Hawk in a population of Hawksdoes poorly. We model the Hawk-Dove game using a haploid genetic model,with p and q the population frequencies of Hawks and Doves. We assumethat the times a player phenotype plays an opponent phenotype is proportionalto the population frequency of the opponent phenotype. For example, if thepopulation is composed of 1/4 Hawks and 3/4 Doves, then Hawk or Dove playsHawk 1/4 of the time and Dove 3/4 of the time. Our haploid genetic modelassuming frequency-dependent selection is formulated in Table 5.13. The fitnessparameter K is arbitrary, but assumed to be large and positive so that the fitnessof Hawk or Dove is always positive. A negative fitness in our haploid model hasno meaning (see §5.1).

From Table 5.13, the equation for p′ is

p′ = (K + pEHH + qEHD)p/w, (5.16)

withw = (K + pEHH + qEHD)p+ (K + pEDH + qEDD)q. (5.17)

Both p∗ = 0 and p∗ = 1 are fixed points; a stable polymorphism exists whenboth these fixed points are unstable.

First, consider the fixed point p∗ = 0. With perturbation p << 1 andq = 1− p, to leading-order in p

p′ =(K + EHDK + EDD

)p.

Therefore, p∗ = 0 is an unstable fixed point when EHD > EDD—a population ofDoves is unstable if Hawk playing Dove does better than Dove playing Dove. Bysymmetry, p∗ = 1 is an unstable fixed point when EDH > EHH—a populationof Hawks is unstable if Dove playing Hawk does better than Hawk playing Hawk.

5.4. LINKAGE EQUILIBRIUM 73

Both homogeneous populations of either all Hawks or all Doves are unstable forthe numerical values shown in Table 5.12.

The polymorphic solution may be determined from (5.16) and (5.17). As-suming p∗ ≡ p′ = p, q∗ = 1− p∗, and p∗, q∗ 6= 0, we have

(K + p∗EHH + q∗EHD)p∗ + (K + p∗EDH + q∗EDD)q∗ = K + p∗EHH + q∗EHD;

and solving for p∗,

p∗ =EHD − EDD

(EHD − EDD) + (EDH − EHH).

Since we have assumed EHD > EDD and EDH > EHH , the polymorphic solu-tion satisfies 0 < p∗ < 1. With the numerical values of Table 5.12,

p∗ =2− 1

(2− 1) + (0 + 2)= 1/3,

and the stable polymorphic population, maintained by frequency-dependent se-lection, consists of 1/3 Hawks and 2/3 Doves.

5.4 Recombination and the approach to linkageequilibrium

When considering a polymorphism at a single genetic loci, we assumed twodistinct alleles, A and a. The diploid then occurs as one of three types: AA,Aa and aa. We now consider a polymorphism at two genetic loci, each withtwo distinct alleles. If the alleles at the first genetic loci are A and a, andthose at the second B and b, then four distinct haploid gametes are possible,namely AB, Ab, aB and ab. Ten distinct diplotypes are possible, obtained byforming pairs of all possible haplotypes. We can write these ten diplotypesas AB/AB, AB/Ab, AB/aB, AB/ab, Ab/Ab, Ab/aB, Ab/ab, aB/aB, aB/ab,and ab/ab, where the numerator represents the haplotype from one parent,the denominator represents the haplotype from the other parent. We do notdistinguish here which haplotype came from which parent.

To proceed further, we define the allelic and gametic frequencies for ourtwo loci problem in Table 5.14. If the probability that a gamete contains alleleA or a does not depend on whether the gamete contains allele B or b, thenthe two loci are said to be independent and the problem simplifies. Under theassumption of independence, the gametic frequencies are the products of theallelic frequencies, i.e., pAB = pApB , pAb = pApb, etc. The evolution of allelesat both loci can be solved separately, and the gametic frequencies obtained bymultiplying the corresponding allelic frequencies. Note that for the two loci toremain independent, the relative fitness of a diploid genotype such as AB/aBmust be given by the product of the relative fitnesses of Aa and BB.

allele or gamete genotype A a B b AB Ab aB abfrequency pA pa pB pb pAB pAb paB pab

Table 5.14: Definition of allelic and gametic frequencies for two genetic loci eachwith two alleles.


Figure 5.2: A schematic of crossing-over and recombination during meiosis (fig-ure from Access Excellence @ the National Health Museum)

Often, the two loci are not independent. This can be due to epistatic selec-tion, or epistasis. As an example, suppose that two loci in humans influenceheight, and that the most fit genotype is the one resulting in an average height.Selection that favors the average population value of a trait is called normaliz-ing or stabilizing. Suppose A and B are hypothetical tall alleles, a and b areshort alleles, and a person with two tall and two short alleles obtains averageheight. Then selection may favor the specific genotypes AB/ab, Ab/Ab, Ab/aB,and aB/aB. Selection may act against both the genotypes yielding above av-erage heights, AB/AB, AB/Ab, and AB/aB, and those yielding below averageheights, Ab/ab, aB/ab and ab/ab. Epistatic selection occurs because the fitnessof the A, a loci depends on which alleles are present at the B, b loci. Here, Ahas higher fitness when paired with b than when paired with B.

The two loci may also not be independent because of a finite population size(i.e., stochastic effects). For instance, suppose a mutation a → A occurs onlyonce in a finite population, (in an infinite population, any possible mutationoccurs an infinite number of times), and that A is strongly favored by naturalselection. The frequency of A may then increase. If a nearby polymorphic locuson the same chromosome as A happens to be B (say, with a polymorphism bin the population), then AB gametes may substantially increase in frequency,with Ab absent. We say that the allele B hitchhikes with the favored allele A.

When the two loci are not independent, we say that the loci are in gameticphase disequilibrium, or more commonly linkage disequilibrium. When the lociare independent, we say they are in linkage equilibrium. Here, we will modelhow two loci, initially in linkage disequilibrium, approach linkage equilibriumthrough the process of recombination.

To begin, we need a rudimentary understanding of meiosis. During meio-sis, a diploid cell’s DNA, arranged in very long molecules called chromosomes,


is replicated once and separated twice, producing four haploid cells, each con-taining half of the original cell’s chromosomes. Sexual reproduction results insyngamy, the fusing of a haploid egg and sperm cell to form a diploid zygotecell.

Fig. 5.2 presents a schematic of meiosis and the process of crossing-overresulting in recombination. In a diploid, each chromosome has a correspond-ing sister chromosome, one chromosome originating from the egg, one from thesperm. These sibling chromosomes have the same genes, but possibly differentalleles. In Fig. 5.2, we schematically show alleles a, b, c on the light chromo-some, A,B,C on its sister’s dark chromosome. In the first step of meiosis, eachchromosome replicates itself exactly. In the second step, sister chromosomes ex-change genetic material by the process of crossing-over. All four chromosomesthen separate into haploid cells. Notice from the schematic that the process ofcrossing-over can result in genetic recombination. Suppose that the schematicof Fig. 5.2 represents the production of a sperm by a male. If the chromo-some from the male’s father contains the alleles ABC and that from the male’smother abc, recombination can result in the sperm containing a chromosomewith alleles ABc (the third gamete in Fig. 5.2). We say this chromosome is arecombinant; it contains alleles from both its paternal grandfather and paternalgrandmother. It is likely that the precise combination of alleles on this recombi-nant chromosome has never existed before in a single person. Recombination isthe reason why everybody, with the exception of identical twins, is geneticallyunique.

Genes that occur on the same chromosome are said to be linked. The closerthe genes are to each other on the chromosome, the tighter the linkage, and theless likely recombination is to separate them. Tightly linked genes are likelyto be inherited from the same grandparent. Genes on different chromosomesare by definition unlinked; independent assortment of chromosomes results in a50% chance of a gamete receiving either grandparents gene.

To define and model the evolution of linkage disequilibrium, we first obtainallele frequencies from gametic frequencies by

pA = pAB + pAb, pa = paB + pab,

pB = pAB + paB , pb = pAb + pab. (5.18)

Since the frequencies sum to unity,

pA + pa = 1, pB + pb = 1, pAB + pAb + paB + pab = 1. (5.19)

There are two independent allelic frequencies and three independent gameticfrequencies, and in general it is not possible to obtain the gametic frequenciesfrom the allelic frequencies without assuming linkage equilibrium. We can, how-ever, introduce a new variable, D, called the coefficient of linkage disequilibrium;the gametic frequencies can then be obtained from the allelic frequencies andD. We define D to be the difference between the actual gametic frequency andwhat the gametic frequency would be if the loci were in linkage equilibrium:

D = pAB − pApB .

Using (5.18) and (5.19)

D = pAB − pApB


= (pA − pAb)− pApB= pA (1− pB)− pAb= pApb − pAb,

and

D = pAB − pApB= (pB − paB)− pApB= pB (1− pA)− paB= papB − paB .

Finally,

D = papB − paB= papB − (pa − pab)= pa (pB − 1) + pab

= pab − papb.

Therefore, the gametic frequencies can be written in terms of the allelic frequen-cies and the coefficient of linkage disequilibrium as

pAB = pApB +D

pab = papb +D

pAb = pApb −DpaB = papB −D. (5.20)

With our definition, positive linkage disequilibrium (D > 0) implies the ABand ab gametes are over-represented, the Ab and aB gametes under-represented;negative linkage disequilibrium (D < 0) implies the opposite. An equality ob-tainable from (5.20) that we will later find useful is

pABpab − pAbpaB = (pApB +D)(papb +D)− (pApb −D)(papB −D)= D(pApB + papb + pApb + papB)= D. (5.21)

Without selection and mutation, D evolves only because of recombination.With primes representing the values in the next generation, and using p′A =pA and p′B = pB since sexual reproduction by itself does not change allelefrequencies,

D′ = p′AB − p′Ap′B= p′AB − pApB= p′AB − (pAB −D)= D + (p′AB − pAB) ,

where we have used (5.20) to obtain the third equality. Therefore, the changein D is equal to the change in frequency of AB gametes,

D′ −D = p′AB − pAB . (5.22)


gamete freq / diploid freqdiploid dip freq AB Ab aB abAB/AB p2

AB 1 0 0 0AB/Ab 2pABpAb 1/2 1/2 0 0AB/aB 2pABpaB 1/2 0 1/2 0AB/ab 2pABpab (1− r)/2 r/2 r/2 (1− r)/2Ab/Ab p2

Ab 0 1 0 0Ab/aB 2pAbpaB r/2 (1− r)/2 (1− r)/2 r/2Ab/ab 2pAbpab 0 1/2 0 1/2aB/aB p2

aB 0 0 1 0aB/ab 2paBpab 0 0 1/2 1/2ab/ab p2

ab 0 0 0 1

Table 5.15: Computation of gamete frequencies.

To understand why gametic frequencies change across generations, we shouldfirst recognize when they do not change. Without genetic recombination, chro-mosomes maintain their exact identity across generations. Chromosome fre-quencies without recombination are therefore constant, and for genetic loci with,say, alleles A,a and B,b on the same chromosome, p′AB = pAB . In an infinitepopulation without selection or mutation, gametic frequencies change only forgenetic loci in linkage disequilibrium on different chromosomes, or for geneticloci in linkage disequilibrium on the same chromosome subjected to geneticrecombination.

We will compute the frequency p′AB of AB gametes in the next generation,given the frequency pAB of AB gametes in the present generation, using twodifferent methods. The first method uses a ‘mating’ table (see Table 5.15). Thefirst column is the parent diploid genotype before meiosis. The second columnis the diploid genotype frequency assuming random mating. The next fourcolumns are the haploid genotype frequencies (normalized by the correspondingdiploid frequencies to simplify the table presentation). Here, we define r to bethe frequency at which the gamete arises from a combination of grandmotherand grandfather genes. If the A,a and B,b loci occur on the same chromosome,then r is the recombination frequency. If the A,a and B,b loci occur on differentchromosomes, then r = 1/2, assuming the chromosomes assort independently.Notice that recombination or independent assortment is of importance only ifthe grandfather’s and grandmother’s contribution to the diploid genotype shareno common alleles (i.e., AB/ab and Ab/aB genotypes). The frequency p′AB inthe next generation is given by the sum of the AB column (after multiplicationby the diploid frequencies). Therefore,

p′AB = p2AB + pABpAb + pABpaB + (1− r)pABpab + rpAbpaB

= pAB(pAB + pAb + paB + pab) + r(pAbpaB − pABpab)= pAB − rD,

where the final equality makes use of (5.19) and (5.21).The second method for computing p′AB is more direct. An AB haplotype

can arise from a diploid of general type AB/XX without recombination, or a


diploid of type AX/XB with recombination. Therefore,

p′AB = (1− r)pAB + rpApB ,

where the first term is from non-recombinants and the second term from recom-binants. With pApB = pAB −D, we have

p′AB = (1− r)pAB + r(pAB −D)= pAB − rD,

the same result as that obtained using the mating table.Using (5.22), we derive

D′ = (1− r)D,

with solutionDn = D0(1− r)n.

Recombination decreases linkage disequilibrium each generation by a factor(1 − r). Tightly linked genes on the same chromosome have small values ofr; unlinked genes on different chromosomes have r = 1/2. For unlinked genes,linkage disequilibrium decreases by a factor of two each generation. We con-clude that very strong selection is required to maintain linkage disequilibriumfor genes on different chromosomes, while weak selection can maintain linkagedisequilibrium for tightly linked genes.

5.5 Random genetic drift

Up to now, our simplified genetic models have all assumed an infinite popula-tion, neglecting stochastic effects. Now, we consider the random drift of allelicfrequencies in a finite population.

We develop a simple genetic model incorporating random drift by assuming afixed-sized population of N individuals, and modeling the evolution of a singletwo-allele haploid genetic locus. Let n denote the number of individuals inthe population with allele A; N − n the number of individuals with allele a.There are two widely-used genetic models for finite populations: The Wright-Fisher model and the Moran model. The Wright-Fisher model is most similarto our infinite-population discrete-generation model. In this model, N adultindividuals release a very large number of gametes into a gene pool; the nextgeneration is formed from N random gametes independently chosen from thegene pool. The Moran model takes a different approach. In this model, a singleevolution step consists of one random individual in the population reproducing,and another independent random individual dying, maintaining the populationat a fixed size N . A total ofN evolution steps in the Moran model is comparable,although not exactly identical, to a single discrete generation in the Wright-Fisher model. For our purposes, the Moran model is mathematically moretractable and we adopt it here.

We develop our model in analogy to the stochastic population growth modelderived in §1.2. With n = 0, 1, 2, . . . N a discrete random variable, the probabil-ity mass function pn(g) denotes the probability of n individuals carrying alleleA at evolution step g. The probability at each evolution step of an individualcarrying allele A reproducing or dying when there are n such individuals in the

5.5. RANDOM GENETIC DRIFT 79

population is given by sn = n/N ; the corresponding probabilities for a is 1−sn.There are three ways to obtain a population of n individuals with allele A atevolution step g + 1. First, there were n individuals with allele A at evolutionstep g, and the individual reproducing carried the same allele as the individualdying. Second, there were n − 1 individuals with allele A at evolution stepg, and the individual reproducing carried A and the individual dying carrieda. And third, there were n + 1 individuals with allele A at evolution step g,and the individual reproducing carried a and the individual dying carried A.Multiplying the probabilities and summing the three cases results in

pn(g + 1) =(s2n + (1− sn)2

)pn(g)

+ sn−1(1− sn−1)pn−1(g)+ sn+1(1− sn+1)pn+1(g). (5.23)

Note that this equation is valid for 0 < n < N , and that the equations at theboundaries—representing the probabilities that one of the alleles is fixed—are

p0(g + 1) = p0(g) + s1(1− s1)p1(g)pN (g + 1) = pN (g) + sN−1(1− sN−1)pN−1(g). (5.24)

Once an allele becomes fixed there is no further change in allele frequencies:The boundaries are called absorbing and the probability of fixation of an allelemonotonically increases with each birth and death.

We illustrate the solution of (5.23)-(5.24) in Fig. 5.3 for a small populationof size N = 20, and where the number of individuals carrying A is preciselyknown in the founding generation, with either (a) p10(0) = 1, or; (b) p13(0) = 1.We plot the probability mass density of the number of A individuals everyN evolution steps, up to 7N steps, corresponding to approximately 7 discretegenerations of evolution in the Wright-Fisher model. Notice how the probabil-ity distribution diffuses away from its initial values, and how the probabilitieseventually concentrate on the boundaries, with both p0 and p20 monotonicallyincreasing.

To better understand this numerical solution, we consider the limit of large(but not infinite) populations by expanding in powers of 1/N . We first rewrite(5.23) as

pn(g + 1)− pn(g) = sn+1(1− sn+1)pn+1(g)− 2sn(1− sn)pn(g)+ sn−1(1− sn−1)pn−1(g). (5.25)

We then introduce the continuous random variable x = n/N , with 0 ≤ x ≤ 1,and the continuous time t = g/N . A unit of time corresponds to approximatelya single discrete generation in the Wright-Fisher model (i.e., N evolution stepsin the Moran model). The probability density function is defined by

P (x, t) = Npn(g), with x = n/N, t = g/N.

Furthermore,sn = n/N = x ≡ S(x);


(a)

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

(b)

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

0 5 10 15 200

0.1

0.2

Figure 5.3: pn versus n with N = 20. Evolution steps plotted correspond tog = 0, N, 2N, . . . , 7N . (a) Initial number of A individuals is n = 10. (b) Initialnumber of A individuals is n = 13.

5.5. RANDOM GENETIC DRIFT 81

and similarly, sn+1 = S(x + ∆x), sn−1 = S(x −∆x), where ∆x = 1/N . Thenwith ∆t = 1/N , (5.25) transforms into

P (x, t+ ∆t)− P (x, t) = S(x+ ∆x)(1− S(x+ ∆x))P (x+ ∆x, t)− 2S(x)(1− S(x))P (x, t) (5.26)+ S(x−∆x)(1− S(x−∆x))P (x−∆x, t).

To simplify further, we use the well-known central-difference approximation tothe second-derivative of a function f(x),

f ′′(x) =f(x+ ∆x)− 2f(x) + f(x−∆x)

∆x2+ O(∆x2),

and recognize the right-hand side of (5.26) to be the numerator of a second-derivative. With ∆x = ∆t = 1/N → 0, and S(x) = x, we derive to leading-orderin 1/N the partial differential equation

∂P (x, t)∂t

=∂2

∂x2{V (x)P (x, t)}, (5.27)

with

V (x) =x(1− x)

N.

The function V (x) can be interpreted using a result from probability the-ory. For n independent trials, each with probability of success p and probabilityof failure 1 − p, the number of successes, denoted by X, is a binomial ran-dom variable with parameters (n, p). Well-known results are E[X] = np andVar[X] = np(1− p), where E[. . .] is the expected value, and Var[. . .] is the vari-ance. The number of A individuals chosen when forming the next Wright-Fishergeneration is a binomial random variable n′ with parameters (N,n/N). There-fore, E(n′) = n and Var(n′) = n(1− n/N). With x′ = n′/N and x = n/N , wehave E(x′) = x , and Var(x′) = x(1 − x)/N . The function V (x) can thereforebe interpreted as the variance of x over a single Wright-Fisher generation.

Equation (5.27) is a diffusion equation for the probability distribution. Re-ferring again to Fig. 5.3, we observe that after many generations, pn becomesnearly-independent of n at the interior points. Accordingly, we look for a solu-tion of (5.27) with P (x, t) = P (t), independent of x, for 0 < x < 1. Equation(5.27) becomes

dP (t)dt

= V ′′(x)P (t), with V ′′(x) = −2/N, (5.28)

with solutionP (t) = ce−2t/N . (5.29)

Now the probability of an allele being fixed is determined from the unknownboundary values P (0, t) and P (1, t) for which (5.29) is not a valid solution. Since∫ 1

0P (x, t) = 1, the fixation probability missing from (5.29) can be found from

1−∫ 1

0

ce−2t/Ndx = 1− ce−2t/N .

An allele is therefore likely to become fixed over a number of Wright-Fishergenerations that is much larger than the population size, i.e., e−2t/N << 1.


A problem of interest, with a particularly simple solution, is the following.Suppose within a population of N individuals homogeneous for A, a singleneutral mutation occurs so that one individual now carries allele a. What isthe probability that allele a eventually becomes fixed? We can, in fact, answerthis question without any calculation. Intuitively, after a sufficient numberof generations has passed, all living individuals should be descendant from asingle ancestral individual living at the time the single mutation occurred. Theprobability that that single individual carries the a allele is 1/N—since onlyone out of N individuals carries the a mutation—and this is the probability offixation of a.

Chapter 6

Biochemical Reactions

Biochemistry is the study of the chemistry of life. It can be considered a branchof molecular biology, perhaps more focused on specific molecules and their re-actions, or a branch of chemistry focused on the complex chemical reactionsoccuring in living organisms. Perhaps the first application of biochemistry hap-pened about 5000 years ago when bread was made using yeast.

Modern biochemistry, however, had a relatively slow start among the Sci-ences, as did modern biology. Isaac Newton’s publication of Principia Mathe-matica in 1687 preceded Darwin’s Origin of Species in 1859 by almost 200 years.I find this amazing because the ideas of Darwin are in many ways simpler andeasier to understand than the mathematical theory of Newton. Most of the de-lay must be attributed to a fundamental conflict between science and religion.Physical sciences experienced early conflict – witness the famous prosecution ofGalileo by the Catholic Church in 1633, in which he was forced to recant his he-liocentric view – but the conflict of religion with evolutionary biology continueseven to this day. Advances in biochemistry were initially delayed because it waslong believed that life was not subject to the laws of science the way nonlife was,and that only living things could produce the molecules of life. Certainly, thiswas more a religious conviction than a scientific one. Then Friedrich Wohler in1828 published his landmark paper on the synthesis of urea (a waste productneutralizing toxic ammonia before excretion in the urine), proving for the firsttime that organic compounds can be created artificially.

Here, we present mathematical models for biochemical reactions. We beginby introducing a useful model for a chemical reaction: the law of mass action.We then model what may be the most important biochemical reactions, namelythose catalyzed by enzymes. Using the mathematical model of enzyme kinetics,we will consider three fundamental enzymatic properties: competitive inhibition,allosteric inhibition, and cooperativity.

6.1 Law of mass action

The law of mass action describes the rate at which chemicals interact in re-actions. It is assumed that different chemical molecules come into contact bycollision before reacting, and that the collision rate is directly proportional tothe number of molecules of each reacting species. Suppose that two chemicals

83

84 CHAPTER 6. BIOCHEMICAL REACTIONS

A and B react to form a product chemical C, written as

A+Bk→ C,

with k the rate constant of the reaction. For simplicity, we will use the samesymbol C, say, to refer to both the chemical C and its concentration. The law ofmass action says that dC/dt is proportional to the product of the concentrationsA and B, with proportionality constant k. That is,

dC

dt= kAB. (6.1)

Similarly, the law of mass action enables us to write equations for the time-derivatives of the reactant concentrations A and B:

dA

dt= −kAB, dB

dt= −kAB. (6.2)

Notice that when applying the law of mass action to write an equation for thetime-derivative of a concentration, the chemical that the arrow points towardsis increasing in concentration (positive sign), the chemical that the arrow pointsaway from is decreasing in concentration (negative sign). The product of con-centrations on the right-hand-side is always that of the reactants from which thearrow points away, multiplied by the rate constant that is on top of the arrow.

Equation (6.1) can be solved analytically using conservation laws. Eachreactant, original and converted to product, is conserved since one molecule ofeach reactant gets converted into one molecule of product. Therefore,

d

dt(A+ C) = 0 =⇒ A+ C = A0,

d

dt(B + C) = 0 =⇒ B + C = B0,

where A0 and B0 are the initial concentrations of the reactants, and no productis present initially. Using the conservation laws, (6.1) becomes

dC

dt= k(A0 − C)(B0 − C), with C(0) = 0,

which may be integrated by separating variables. After some algebra, the solu-tion is determined to be

C(t) = A0B0e(B0−A0)kt − 1

B0e(B0−A0)kt −A0,

which is a complicated expression with the simple limits

limt→∞

C(t) ={A0 if A0 < B0,B0 if B0 < A0.

(6.3)

The reaction stops after one of the reactants is depleted; and the final concen-tration of product is equal to the initial concentration of the depleted reactant.

If we also include the reverse reaction,

6.2. ENZYME KINETICS 85

A+B-

� C,k+

k−

then the time-derivative of the product is given by

dC

dt= k+AB − k−C.

Notice that k+ and k− have different units. At equilibrium, C = 0, and usingthe conservation laws A+ C = A0, B + C = B0, we obtain

(A0 − C)(B0 − C)− k−k+

C = 0;

from which we define the equilibrium constant Keq by

Keq = k−/k+,

which has units of concentration. Therefore, at equilibrium the concentrationof product is given by the solution of the quadratic equation

C2 − (A0 +B0 +Keq)C +A0B0 = 0,

with the extra condition that 0 < C < min(A0, B0). For instance, if A0 = B0 ≡R0, then at equilibrium

C = R0 −12Keq

[√1 + 4R0/Keq − 1

].

If Keq << R0, then A and B have a high affinity, and the reaction proceedsmainly to C, with C → R0.

Below are two interesting reactions. In reaction (ii), A is assumed to be heldat a constant concentration.(i)

A+X-

� 2X

k+

k−

(ii)A+X

k1→ 2X, X + Yk2→ 2Y, Y

k3→ B

Can you write down the equations for X in reaction (i), and X and Y in reaction(ii)? When normalized properly, the equations from reaction (ii) reduce to theLotka-Volterra predator-prey equations introduced in §3.3, so that the chemicalconcentrations X and Y must oscillate in time like predator and prey.

6.2 Enzyme kinetics

Enzymes are catalysts, usually proteins, that help convert other molecules calledsubstrates into products, but are themselves unchanged by the reaction. Each


Figure 6.1: Michaelis-Menten reaction of two substrates converting to one prod-uct. (Drawn by User:IMeowbot, released under the GNU Free DocumentationLicense.)

enzyme has high specificity for at least one reaction, and can accelerate thisreaction millions of times. Without enzymes, most biochemical reactions aretoo slow for life to be possible. Enzymes are so important to our lives that asingle amino acid mutation in one enzyme out of the more than 2000 enzymesin our bodies can result in a severe or lethal genetic disease. Enzymes do notfollow the law of mass action directly: with S substrate, P product, and Eenzyme, the reaction

S + Ek→ P + E,

is a poor model since the reaction velocity dP/dt is known to attain a finitelimit with increasing substrate concentration. Rather, Michaelis and Menten(1913) proposed the following reaction scheme with an intermediate molecule:

S + E-

�

k1

k−1

C - P + E,k2

where C is a complex formed by the enzyme and the substrate. A cartoon ofthe Michaelis-Menten reaction with an enzyme catalyzing a reaction betweentwo substrates is shown in Fig. 6.1.

The complete set of differential equations representing the reaction of anenzyme catalyzing the conversion of a single substrate to a product are obtainedfrom the law of mass action:

dS/dt = k−1C − k1SE,

dE/dt = (k−1 + k2)C − k1SE,

dC/dt = k1SE − (k−1 + k2)C,dP/dt = k2C.

Commonly, substrate is continuously provided to the reaction and product iscontinuously removed. The removal of product has been modeled by neglectingthe reverse reaction P + E → C. Because substrate is continuously provided,biochemists usually want to determine the reaction velocity dP/dt in terms ofthe substrate concentration S, and total enzyme concentration E0. We caneliminate E in favor of E0 from the conservation law that the enzyme, free and

6.3. COMPETITIVE INHIBITION 87

bound, is conserved; that is

d(E + C)dt

= 0 =⇒ E + C = E0 =⇒ E = E0 − C.

We rewrite the equation for dC/dt eliminating E:

dC

dt= k1S(E0 − C)− (k−1 + k2)C

= k1E0S − (k−1 + k2 + k1S)C. (6.4)

We can solve (6.4) for C under the so-called quasi-steady-state approximation,where we assume that the complex C is in equilibrium with rate of formationequal to rate of dissociation. Accordingly, with C = 0 in (6.4), we have

C =k1E0S

k−1 + k2 + k1S.

The reaction velocity is then given by

dP

dt= k2C

=k1k2E0S

k−1 + k2 + k1S

=VmS

Km + S, (6.5)

where two fundamental constants are defined:

Km = (k−1 + k2)/k1, Vm = k2E0. (6.6)

The Michaelis-Menten constant or Michaelis constant Km has units of concen-tration, and the maximum reaction velocity Vm has units of concentration di-vided by time. The interpretation of these constants is obtained by consideringthe following limits:

as S →∞, C → E0 and dP/dt→ Vm,

if S = Km, C =12E0 and dP/dt =

12Vm.

Therefore Vm is the limiting reaction velocity obtained by saturating the reactionwith substrate so that every enzyme is bound; and Km is the concentration ofS at which only one-half of the enzymes are bound and the reaction proceedsat one-half maximum velocity .

6.3 Competitive inhibition

Competitive inhibition occurs when inhibitor molecules compete with substratemolecules for binding to the same enzyme active site. When an inhibitor isbound to the enzyme, no product is produced so competitive inhibition willreduce the velocity of the reaction. The cartoon of this process is shown inFig. 6.2.


Figure 6.2: Competitive inhibition. (Drawn by G. Andruk, released under theGNU Free Documentation License.)

To model competitive inhibition, we introduce an additional reaction asso-ciated with the inhibitor-enzyme binding:

S + E-

�

k1

k−1

C1- P + E,

k2

I + E-

�

k3

k−3

C2.

With more complicated enzymatic reactions, the reaction schematic becomesdifficult to interpret. Perhaps an easier way to visualize the reaction is from thefollowing redrawn schematic:

E C1 P + E

C2

-� -

?

6

k1S

k−1

k2

k3I k−3

Here, the substrate S and inhibitor I are combined with the relevant rate con-stants, rather than treated separately. It is immediately obvious from this re-drawn schematic that inhibition is accomplished by sequestering enzyme in theform C2 preventing its participation in the catalysis of S to P .

Our goal is to determine the reaction velocity P in terms of the substrateand inhibitor concentrations, and the total concentration of enzyme (free and

6.3. COMPETITIVE INHIBITION 89

bound). The law of mass action applied to the complex concentrations and theproduct results in

dC1

dt= k1SE − (k−1 + k2)C1,

dC2

dt= k3IE − k−3C2,

dP

dt= k2C1.

The enzyme, free and bound, is conserved so that

d

dt(E + C1 + C2) = 0 =⇒ E + C1 + C2 = E0 =⇒ E = E0 − C1 − C2.

Under the quasi-equilibrium approximation, C1 = C2 = 0, so that

k1S(E0 − C1 − C2)− (k−1 + k2)C1 = 0,k3I(E0 − C1 − C2)− k−3C2 = 0,

which results in the following system of two linear equations and two unknowns(C1 and C2):

(k−1 + k2 + k1S)C1 + k1SC2 = k1E0S, (6.7)k3IC1 + (k−3 + k3I)C2 = k3E0I. (6.8)

We define the Michaelis-Menten constant Km as before, and an additional con-stant Ki associated with the inhibitor reaction:

Km =k−1 + k2

k1, Ki =

k−3

k3.

Dividing (6.7) by k1 and (6.8) by k3 yields

(Km + S)C1 + SC2 = E0S, (6.9)IC1 + (Ki + I)C2 = E0I. (6.10)

Since our goal is to obtain the velocity of the reaction, which requires determin-ing C1, we multiply (6.9) by (Ki + I) and (6.10) by S, and subtract:

(Km + S)(Ki + I)C1 + S(Ki + I)C2 = E0(Ki + I)S− SIC1 + S(Ki + I)C2 = E0SI

[(Km + S)(Ki + I)− SI]C1 = KiE0S;

or after cancellation and rearrangement

C1 =KiE0S

KmKi +KiS +KmI

=E0S

Km(1 + I/Ki) + S.


Figure 6.3: Allosteric inhibition. (Unknown artist, released under the GNUFree Documentation License.)

Therefore, the reaction velocity is given by

dP

dt=

(k2E0)SKm(1 + I/Ki) + S

=VmS

K ′m + S, (6.11)

where

Vm = k2E0, K ′m = Km(1 + I/Ki). (6.12)

Comparing the inhibited reaction velocity (6.11) and (6.12) with the uninhib-ited reaction velocity (6.5) and (6.6), we observe that inhibition increases theMichaelis-Menten constant of the reaction, but leaves unchanged the maximumreaction velocity. Since the Michaelis-Menten constant is defined as the sub-strate concentration required to attain one-half of the maximum reaction veloc-ity, addition of an inhibitor with substrate concentration fixed acts to decreasethe reaction velocity. However, a reaction saturated with substrate still attainsthe uninhibited maximum reaction velocity.

6.4 Allosteric inhibition

The term allostery comes from the Greek allos, meaning different, and stereos,meaning solid, and refers to an enzyme with a regulatory binding site separatefrom its active binding site. In our model for allosteric inhibition, an inhibitormolecule is assumed to bind to its own regulatory site on the enzyme, resultingin either a lowered binding affinity of the substrate to the enzyme, or a loweredconversion rate of substrate to product. A cartoon of allosteric inhibition dueto a lowered binding affinity is shown in Fig. 6.3.

In general, we need to define three complexes: C1 is the complex formed fromsubstrate and enzyme; C2 from inhibitor and enzyme, and; C3 from substrate,inhibitor, and enzyme. We write the chemical reactions as follows:

6.4. ALLOSTERIC INHIBITION 91

E C1 P + E

P + C2C3C2

-� -

?

6

-�

?

6

-

k1S

k−1

k2

k′3I k′−3

k′1S

k′−1

k3I k−3

k′2

The general model for allosteric inhibition with ten independent rate con-stants appears to be complicated and may be difficult to analyze. Accordingly,we simplify the model to reduce the number of rate constants, yet still exhibitthe unique features of allosteric inhibition. Now, the binding of both S and Ito E at different sites defines allostery and must be retained in our model. Onepossible but uninteresting simplification assumes that if I binds to E, then Sdoesn’t. However, this reduces allosteric inhibition to competitive inhibition andwe do not consider it further. Instead, we simplify by allowing both I and S tosimultaneously bind to E, but assume that the binding of I prevents substrateconversion to product. With this simplification, k′2 = 0. To further reduce thenumber of independent rate constants, we assume that the binding of S to E isunaffected by the bound presence of I, and the binding of I to E is unaffectedby the bound presence of S. These approximations imply that all the primedrate constants equal the corresponding unprimed rate constants, e.g., k′1 = k1,etc. With these simplifications, the schematic of the chemical reaction reducesto

E C1 P + E

C3C2

-� -

?

6

-�

?

6

k1S

k−1

k2

k3I k−3

k1S

k−1

k3I k−3

where now there are only five independent rate constants. We write the equa-tions for the complexes using the law of mass action:

dC1

dt= k1SE + k−3C3 − (k−1 + k2 + k3I)C1, (6.13)

dC2

dt= k3IE + k−1C3 − (k−3 + k1S)C2, (6.14)

dC3

dt= k3IC1 + k1SC2 − (k−1 + k−3)C3, (6.15)


while the reaction velocity is given by

dP

dt= k2C1. (6.16)

Again, enzyme free and bound is conserved, so that E = E0 − C1 − C2 − C3.With the quasi-equilibrium approximation C1 = C2 = C3 = 0, we obtain asystem of three equations and three unknowns: C1, C2 and C3. Despite oursimplifications, the analytical solution for the reaction velocity remains messy(see Keener & Sneyd, referenced at the chapter end) and not especially illumi-nating. We omit the complete analytical result here, and determine only themaximum reaction velocity.

The maximum reaction velocity V ′m for the allosteric inhibited reaction isdefined as the time-derivative of the product concentration when the reactionis saturated with substrate, that is

V ′m = limS→∞

dP/dt = k2 limS→∞

C1.

With substrate saturation, every enzyme will have its substrate binding siteoccupied. Enzymes are either bound with only substrate in the complex C1, orbound together with substrate and inhibitor in the complex C3. Accordingly,the schematic of the chemical reaction with substrate saturation simplifies to

C1 P + C1

C3

-

?

6

k2

k3I k−3

The equations for C1 and C3 with substrate saturation are thus given by

dC1

dt= k−3C3 − k3IC1, (6.17)

dC3

dt= k3IC1 − k−3C3, (6.18)

and the quasi-equilibrium approximation yields the single independent equation

C3 = (k3/k−3)IC1

= (I/Ki)C1, (6.19)

with Ki = k−3/k3 as before. The equation expressing the conservation of en-zyme is given by E0 = C1 + C3. This conservation law together with (6.19)permits solution for C1:

C1 =E0

1 + I/Ki.

6.5. COOPERATIVITY 93

Therefore, the maximum reaction velocity for the allosteric inhibited reactionis given by

V ′m =k2E0

1 + I/Ki

=Vm

1 + I/Ki,

where Vm is the maximum reaction velocity of both the uninhibited and thecompetitive inhibited reaction. The allosteric inhibitor is thus seen to reducethe maximum velocity of the uninhibited reaction by the factor (1 + I/Ki),which may be large if the concentration of allosteric inhibitor is substantial.

6.5 Cooperativity

Enzymes and other protein complexes may have multiple binding sites, andwhen a substrate binds to one of these sites, the other sites may become moreactive. A well-studied example is the binding of the oxygen molecule to thehemoglobin protein. Hemoglobin can bind four molecules of O2, and whenthree molecules are bound, the fourth molecule has been demonstrated to havean increased affinity for binding.

We will model cooperativity by assuming that an enzyme has two separatedbut indistinguishable binding sites for a substrate S. For example, the enzymemay be a protein dimer, composed of two identical subproteins with identicalbinding sites for S. Because the two binding sites are indistinguishable, weneed consider only two complexes: C1 and C2, with enzyme bound to one ortwo substrate molecules, respectively. When the enzyme exhibits cooperativity,the binding of the second substrate molecule has greater rate constant than thebinding of the first. We therefore consider the following reaction:

E C1 P + E

P + C1C2

-� -

?

6

-

k1S

k−1

k2

k3S k−3

k4

where cooperativity supposes that k1 << k3. Application of the law of massaction results in

dC1

dt= k1SE + (k−3 + k4)C2 − (k−1 + k2 + k3S)C1,

dC2

dt= k3SC1 − (k−3 + k4)C2.

Applying the quasi-equilibrium approximation C1 = C2 = 0 and the conserva-tion law E0 = E+C1 +C2 results in the following system of two equations and


two unknowns:

(k−1 + k2 + (k1 + k3)S)C1 − (k−3 + k4 − k1S)C2 = k1E0S, (6.20)k3SC1 − (k−3 + k4)C2 = 0. (6.21)

We divide (6.20) by k1 and (6.21) by k3, and define

K1 =k−1 + k2

k1, K2 =

k−3 + k4

k3, ε = k1/k3,

to obtain

(εK1 + (1 + ε)S)C1 − (K2 − εS)C2 = εE0S, (6.22)SC1 −K2C2 = 0. (6.23)

We can substract (6.23) from (6.22) and cancel ε to obtain

(K1 + S)C1 + SC2 = E0S. (6.24)

Equations (6.23) and (6.24) can be solved for C1 and C2:

C1 =K2E0S

S2 +K2S +K1K2, (6.25)

C2 =E0S

2

S2 +K2S +K1K2, (6.26)

so that the reaction velocity is given by

dP

dt= k2C1 + k4C2

=(k2K2 + k4S)E0S

S2 +K2S +K1K2. (6.27)

To illuminate this result, we consider two limiting cases: (i) the active sitesact independently so that each protein dimer, say, can be considered as twoindependent protein monomers; (ii) the enzyme exhibits extreme cooperativityso that the binding of the second substrate has much greater rate constant thanthe binding of the first.

Independent active sites

The free enzyme E has two independent binding sites while C1 has only a singlebinding site. Consulting the reaction schematic: k1 is the rate constant forthe binding of S to two independent binding sites; k−1 and k2 are the rateconstants for the dissociation and conversion of a single S from the enzyme;k3 is the rate constant for the binding of S to a single free binding site, and;k−3 and k4 are the rate constants for the dissociation and conversion of one oftwo independent S’s from the enzyme. Accounting for these factors of two, andassuming independence of active sites, we have

k1 = 2k3, k−3 = 2k−1, k4 = 2k2.

6.5. COOPERATIVITY 95

We define the Michaelis-Menten constant KM representative of the proteinmonomer with one binding site, that is,

KM =k−1 + k2

k3= 2K1 =

12K2.

Therefore, for independent active sites, the reaction velocity becomes

dP

dt=

(2k2KM + 2k2S)E0S

S2 + 2KMS +K2M

=2k2E0S

S +Km.

As one could expect, the reaction velocity for a dimer protein enzyme composedof independent identical monomers is simply double that of a monomer proteinenzyme.

High cooperativity

We now assume that after the first substrate binds to the enzyme, the secondsubstrate binds much more easily, so that k1 << k3. The number of enzymesbound to a single substrate molecule should consequently be much less than thenumber bound to two substrate molecules, resulting in C1 << C2. Dividing(6.25) by (6.26), this inequality becomes

C1

C2=K2

S<< 1.

Dividing top and bottom of (6.27) by S2, we have

dP

dt=

(k2

K2S + k4

)E0

1 + K2S + K1

SK2S

.

To take the limit of this expression as K2/S → 0, we set K2/S = 0 everywhereexcept in the last term in the denominator, since K1/S is inversely proportionalto k1 and may go to infinity in this limit. Taking the limit and multiplying thenumerator and denominator by S2,

dP

dt=

k4E0S2

S2 +K1K2.

Here, the maximum reaction velocity is Vm = k4E0, and the modified Michaelis-Menten constant is KM =

√K1K2, so that

dP

dt=

VmS2

S2 +K2M

.

In biochemistry, this reaction velocity is generalized to

dP

dt=

VmSn

Sn +KnM

,

known as the Hill equation. By varying n, experimental data can be fit usingthis formula.


References

Keener, J. & Sneyd, J. Mathematical Physiology. Springer-Verlag, New York(1998). Pg. 30, Exercise 2.

Chapter 7

Sequence Alignment

The software program BLAST (Basic Local Alignment Search Tool) uses se-quence alignment algorithms to compare a query sequence against a databaseto identify other known sequences similar to the query sequence. Often, theannotations attached to the already known sequences yield important biologicalinformation about the query sequence. Almost all biologists use BLAST, mak-ing sequence alignment one of the most important algorithms of bioinformatics.

The sequence under study can be composed of nucleotides (from the nucleicacids DNA or RNA) or amino acids (from proteins). Nucleic acids chain togetherfour different nucleotides: A,C,T,G for DNA and A,C,U,G for RNA; proteinschain together twenty different amino acids. The sequence of a DNA moleculeor of a protein is the linear order of nucleotides or amino acids in a specifieddirection, defined by the chemistry of the molecule. There is no need for us toknow the exact details of the chemistry; it is sufficient to know that a proteinhas distinguishable ends called the N-terminus and the C-terminus, and thatthe usual convention is to read the amino acid sequence from N-terminus toC-terminus. Specification of direction is more complicated for a DNA moleculethan for a protein molecule because of the double helix structure of DNA, andthis will be explained in Section 7.1.

The basic sequence alignment algorithm aligns two or more sequences tohighlight their similarity, inserting a small number of gaps into each sequence(usually denoted by dashes) to align wherever possible identical or similar char-acters. For instance, Fig 7.1 presents an alignment using the software toolClustalW of the hemoglobin beta-chain from Human, Chimpanzee, Rat, andZebra fish. The human and chimpanzee sequences are identical, a consequenceof our very close evolutionary relationship. The rat sequence differs from hu-man/chimpanzee at only 27 out of 146 amino acids; we are all mammals. Thezebra fish sequence, though clearly related, diverges significantly. Notice theinsertion of a gap in each of the mammal sequences at the zebra fish amino acidposition 122. This permits the subsequent zebra fish sequence to better alignwith the mammal sequences, and implies either an insertion of a new amino acidin fish, or a deletion of an amino acid in mammals. The insertion or deletion of acharacter in a sequence is called an indel. Mismatches in sequence, such as thatoccurring between Zebra fish and Mammals at amino acid position 2 and 3 iscalled a mutation. Clustal W places a ’*’ on the last line to denote exact aminoacid matches across all sequences, and a ’:’ and ’.’ to denote chemically similar

97

98 CHAPTER 7. SEQUENCE ALIGNMENT

CLUSTAL W (1.83) multiple sequence alignment

Human VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV 60

Chimpanzee VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV 60

Rat VHLTDAEKAAVNGLWGKVNPDDVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKV 60

Zebrafish VEWTDAERTAILGLWGKLNIDEIGPQALSRCLIVYPWTQRYFATFGNLSSPAAIMGNPKV 60

*. * *::*: .****:* *::* :**.* *:*******:* :**:**:. *:******

Human KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK 120

Chimpanzee KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK 120

Rat KAHGKKVINAFNDGLKHLDNLKGTFAHLSELHCDKLHVDPENFRLLGNMIVIVLGHHLGK 120

Zebrafish AAHGRTVMGGLERAIKNMDNVKNTYAALSVMHSEKLHVDPDNFRLLADCITVCAAMKFGQ 120

***:.*:..:. .: ::**:*.*:* ** :*.:******:*****.: :. . ::*:

Human E-FTPPVQAAYQKVVAGVANALAHKYH 146

Chimpanzee E-FTPPVQAAYQKVVAGVANALAHKYH 146

Rat E-FTPCAQAAFQKVVAGVASALAHKYH 146

Zebrafish AGFNADVQEAWQKFLAVVVSALCRQYH 147

*.. .* *:**.:* *..**.::**

Figure 7.1: Multiple alignment of the hemoglobin beta-chain for Human, Chim-panzee, Rat and Zebra fish, obtained using ClustalW.

amino acids across all sequences (each amino acid has characteristic chemicalproperties, and can be grouped according to similar properties). In this chapterwe detail the algorithms used to align sequences.

7.1 The minimum you need to know about DNAchemistry and the genetic code

In one of the most important scientific papers ever published, James Watson andFrancis Crick in 1953 elucidated the structure of DNA using a three-dimensionalmolecular model, Fig. 7.2. The DNA molecule consists of two strands woundaround each other to form the now famous double helix. Arbitrarily, one strandis labeled by the sequencing group to be the positive strand, and the other thenegative strand. The two strands of the DNA molecule bind to each other bybase pairing: the bases of one strand pair with the bases of the other strand.Adenine (A) always pairs with thymine (T), and guanine (G) always pairs withcytosine (C): A with T, G with C. For RNA, T is replaced by uracil (U). Whenreading the sequence of nucleotides from a single strand, the direction of theread must be specified, and this is possible by referring to the chemical bondsof the DNA backbone. There are of course only two possible directions toread a linear sequence of bases, and these are denoted as 5’-to-3’ and 3’-to-5’. Importantly, the two separate strands of the DNA molecule are oriented inopposite directions. Below is the beginning of the DNA coding sequence for thehuman hemoglobin beta chain protein discussed earlier:

5’-GTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTG-3’

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3’-CACGTGGACTGAGGACTCCTCTTCAGACGGCAATGACGGGACACCCCGTTCCACTTGCAC-5’

7.2. BRUTE FORCE ALIGNMENT 99

Figure 7.2: James Watson and Francis Crick posing in front of their DNAmodel. The original photograph was taken in 1953, the year of discovery, andwas recreated in 2003, fifty years later. Francis Crick, the man on the right,died in 2004.

It is important to realize that there are two unique DNA sequences here, andeither one, or even both, can be coding. Reading from 5’-to-3’, the uppersequence begins ’GTGCACCTG...’, while the lower sequence ends ’...CAGGT-GCAC’. Here, only the upper sequence codes for the human hemoglobin betachain, and the lower sequence is non-coding.

How is the DNA code read? Enzymes separate the two strands of DNA,and transcription occurs as the DNA sequence is copied into messenger RNA(mRNA). If the upper strand is the coding sequence, then the complementarylower strand serves as the template for constructing the mRNA sequence. TheACUG nucleotides of the mRNA bind to the lower sequence and construct asingle stranded mRNA molecule containing the sequence ’GUGCACCUG...’,which exactly matches the sequence of the upper, coding strand, but with Treplaced by U. This mRNA is subsequently translated in the ribosome of thecell, where each nucleotide triplet codes for a single amino acid. The tripletcoding of nucleotides for amino acids is the famous genetic code, shown in Fig.7.3. Here, the translation to amino acid sequence is ’VHL...’, where we haveused the genetic code ’GUG’ = V, ’CAC’=H, ’CUG’=L. The three out of thetwenty amino acids used here are V = Valine, H = Histidine, and L = Leucine.

7.2 Sequence alignment by brute force

One (bad) approach to sequence alignment is to align the two sequences in allpossible ways, score the alignments (assuming some scoring system) and find thehighest scoring alignment. The problem with this brute force approach is thatthe number of possible alignments is an exponential function of sequence length.For sequences of reasonable length, the computation is already impossible. Forexample, the number of ways to align two sequences of 50 characters each—arather small alignment problem—is about 1.5 × 1037, already an astonishinglylarge number. It is informative to count the number of possible alignments


Figure 7.3: The genetic code.

between two sequences since a similar algorithm is used for sequence alignment.Suppose we want to align two sequences. Gaps in either sequence are allowed

but a gap can not be aligned with a gap. By way of illustration, we demonstratethe three ways that the first character of the capital letter alphabet and the lowercase alphabet may align:

A -A A-

| , || , ||

a a- -a

and the five ways in which the first two characters of the capital letter alphabetcan align with the first character of the lower case alphabet:

AB AB AB- A-B -AB

|| , || , ||| , ||| , ||| .

-a a- --a -a- a--

A recursion relation for the total number of possible alignments of a sequenceof i characters with a sequence of j characters may be derived by considering thealignment of the last character. There are three possibilities which we illustrateassuming the ith character is ”F” and the jth character is ”d”:

(1) i−1 characters of the first sequence are already aligned with j−1 charactersof the second sequence, and the ith character of the first sequence aligns exactlywith the jth character of the second sequence:

...F

||||

...d

7.2. BRUTE FORCE ALIGNMENT 101

(2) i − 1 characters of the first sequence are aligned with j characters of thesecond sequence and the ith character of the first sequence aligns with a gap inthe second sequence:

...F

||||

...-

(3) i characters of the first sequence are aligned with j − 1 characters of thesecond sequence and a gap in the first sequence aligns with the jth character ofthe second sequence:

...-

||||

...d

If C(i, j) is the number of ways to align an i character sequence with a j charactersequence, then from our counting

C(i, j) = C(i− 1, j − 1) + C(i− 1, j) + C(i, j − 1). (7.1)

This recursion relation requires boundary conditions to solve. Since there is onlyone way to align an i > 0 character sequence against a zero character sequence,i.e., i characters against i gaps, the boundary conditions are C(0, j) = C(i, 0) =1 for all i, j > 0 We may also add the additional boundary condition C(0, 0) = 1,obtained from the known result C(1, 1) = 3.

Using the recursion relation (7.1), we can construct the following dynamicmatrix to count the number of ways to align the two five character sequencesa1a2a3a4a5 and b1b2b3b4b5:

- b1 b2 b3 b4 b5- 1 1 1 1 1 1a1 1 3 5 7 9 11a2 1 5 13 25 41 61a3 1 7 25 63 129 231a4 1 9 41 129 321 681a5 1 11 61 231 681 1683

The size of this dynamic matrix is 6× 6, and for convenience we label the rowsand columns starting from zero (i.e., row 0, row 1, . . . , row 5). This matrixwas constructed by first writing − a1 a2 a3 a4 a5 to the left of the matrix and− b1 b2 b3 b4 b5 above the matrix, then filling in ones across the zeroth row anddown the zeroth column to satisfy the boundary conditions, and finally applyingthe recursion relation directly by going across the first row from left-to-right, thesecond row from left-to-right, etc. To demonstrate the filling in of the matrix,we have going across the first row: 1+1+1 = 3, 1+1+3 = 5, 1+1+5 = 7, etc,and going across the second row: 1 + 3 + 1 = 5, 3 + 5 + 5 = 13, 5 + 7 + 13 = 25,etc. Finally, the last element entered gives the number of ways to align two fivecharacter sequences: 1683, already a remarkably large number.

It is possible to solve analytically for C(i, j) the recursion relation (7.1),together with the boundary conditions, using generating functions. Althoughthe solution method is interesting (and in fact was shown to me by a student),the final analytical result is messy and we omit it here. In general, computationof C(i, j) is best done numerically by constructing the dynamic matrix.


7.3 Sequence alignment by dynamic program-ming

Two reasonably sized sequences can not be aligned by brute force. Luckily, thereis another algorithm borrowed from computer science, dynamic programming,that uses a dynamic matrix .

First, a scoring system is required to judge the quality of an alignment. Thegoal is to find the alignment that has maximum score. We assume that thealignment of character ai with character bj has score S(ai, bj). For example,when aligning two DNA sequences, a match (A-A, C-C, T-T, G-G) may bescored as +2, and a mismatch (A-C, A-T, A-G, etc.) scored as −1. We alsoassume that an indel (a nucleotide aligned with a gap) is scored as g, witha typical value for DNA alignment g = −2. In the next section, we developa better and more widely used model for indel scoring that distinguishes gapopenings from gap extensions.

Now, let T (i, j) denote the maximum score for aligning a sequence of lengthi with a sequence of length j. We can compute T (i, j) provided we know T (i−1, j − 1), T (i − 1, j) and T (i, j − 1). Indeed, our logic is similar to that usedwhen counting the total number of alignments. There are again three ways tocompute T (i, j): (1) i − 1 characters of the first sequence are already alignedwith j−1 characters of the second sequence with maximum score T (i−1, j−1).The ith character of the first sequence then aligns with the jth character of thesecond sequence with updated maximum score T (i − 1, j − 1) + S(ai, bj); (2)i− 1 characters of the first sequence are aligned with j characters of the secondsequence with maximum score T (i − 1, j), and the ith character of the firstsequence aligns with a gap in the second sequence with updated maximiumscore T (i − 1, j) + g or; (3) i characters of the first sequence are aligned withj − 1 characters of the second sequence with maximum score T (i, j − 1), and agap in the first sequence aligns with the jth character of the second sequencewith updated maximum score T (i, j − 1) + g. We then compare these threescores, and assign T (i, j) the maximum. That is,

T (i, j) = max

T (i− 1, j − 1) + S(ai, bj)T (i− 1, j) + gT (i, j − 1) + g

. (7.2)

Boundary conditions give the score of aligning a sequence with a null sequenceof gaps, so that

T (i, 0) = T (0, i) = ig, i > 0,

with T (0, 0) = 0.The recursion (7.2), together with the boundary conditions, can be used to

fill in a dynamic matrix. The score of the best alignment is then given by thelast filled in element of the matrix, which for aligning a sequence of length nwith a sequence of length m is T (n,m). Besides this score, however, we alsowant to determine the alignment itself. This is obtained by tracing back thepath in the dynamic matrix followed to compute T (i, j) at each matrix element.There could be more than one path, and the best alignment can be degenerate.

Sequence alignment is always done using computers and there are excellentsoftware tools freely available on the web (see §7.6). Just to illustrate thedynamic programming algorithm, we compute by hand the dynamic matrix for

7.3. DYNAMIC PROGRAMMING 103

aligning two short DNA sequences GGAT and GAATT, scoring a match as +2,a mismatch as −1 and an indel as −2:

- G A A T T- 0 -2 -4 -6 -8 -10G -2 2 0 -2 -4 -6G -4 0 1 -1 -3 -5A -6 -2 2 3 1 -1T -8 -4 0 1 5 3

In our hand calculation, the two sequences to be aligned go to the left and abovethe dynamic matrix, leading with a gap character ”-”. Row 0 and column 0are then filled by boundary conditions, starting with 0 in position (0, 0) andincrementing by the gap penalty −2 across row 0 and down column 0. Therecursion relation (7.2) is then used to fill in the dynamic matrix one row ata time moving from left-to-right and top-to-bottom. To determine the (i, j)matrix element, three numbers must be compared and the maximum taken: (1)inspect the nucleotides to the left of row i and above column j and add +2 fora match or -1 for a mismatch to the (i− 1, j− 1) matrix element; (2) add −2 tothe (i − 1, j) matrix element; (3) add −2 to the (i, j − 1) matrix element. Forexample, the first computed matrix element 2 at position (1, 1) was determinedby taking the maximum of (1) 0+2 = 2, since G-G is a match; (2) −2−2 = −4;(3) −2−2 = −4. You can test your understanding of dynamic programming bycomputing the other matrix elements.

After the matrix is constructed, traceback to obtain the best alignment startsat the bottom-right element of the matrix, here the (4, 5) matrix element withentry 3. The matrix element used to compute 3 was either at (4, 4) (horizontalmove) or at (3, 4) (diagonal move). Two possibilities implies that the bestalignment is degenerate. For now, we arbitrarily choose the diagonal move.Building the alignment from end to beginning with GGAT on top and GAATTon bottom:

T

|

T

We illustrate our current position in the dynamic matrix by eliminating all theelements that are not on the traceback path and are no longer accessible:

- G A A T T- 0 -2 -4 -6 -8G -2 2 0 -2 -4G -4 0 1 -1 -3A -6 -2 2 3 1T 3

We start again from the 1 entry at (3, 4). This value came from the 3 entry at(3, 3) by a horizontal move. Therefore, the alignment is extended to

-T

||

TT


where a gap is inserted in the bottom sequence for a horizontal move. (A gapis inserted in the top sequence for a vertical move.) The dynamic matrix nowlooks like

- G A A T T- 0 -2 -4 -6G -2 2 0 -2G -4 0 1 -1A -6 -2 2 3 1T 3

Starting again from the 3 entry at (3, 3), this value came from the 1 entry at(2, 2) in a diagonal move, extending the alignment to

A-T

|||

ATT

The dynamic matrix now looks like

- G A A T T- 0 -2 -4G -2 2 0G -4 0 1A 3 1T 3

Continuing in this fashion (try to do this), the final alignment isGGA-T

: : : ,

GAATT

where it is customary to represent a matching character with a colon ‘:’. Thetraceback path in the dynamic matrix is

- G A A T T- 0G 2G 1A 3 1T 3

If initially, the other degenerate path was taken, the final alignment would beGGAT-

: ::

GAATT

and the traceback path is

- G A A T T- 0G 2G 1A 3T 5 3

7.4. GAPS 105

The score of both alignments is easily recalculated to be the same, with 2− 1 +2− 2 + 2 = 3 and 2− 1 + 2 + 2− 2 = 3.

The algorithm for aligning two proteins is similar, except match and mis-match scores depend on the pair of aligning amino acids. With twenty differentamino acids found in proteins, the score is represented by a 20× 20 substitutionmatrix. The most commonly used matrices are the PAM series and BLOSUMseries of matrices, with BLOSUM62 the commonly used default matrix.

7.4 Gap opening and gap extension penalties

Empirical evidence suggests that gaps cluster, in both nucleotide and proteinsequences. Clustering is usually modeled by different penalties for gap opening(go) and gap extension (ge), with go < ge < 0. For example, the default scoringscheme for the widely used BLASTN software is +1 for a nucleotide match, −3for a nucleotide mismatch, −5 for a gap opening, and −2 for a gap extension.

Two types of gaps (opening and extension) complicates the dynamic pro-gramming algorithm. When an indel is added to an existing alignment thescoring increment depends on whether the indel is a gap opening or a gap ex-tension. For example, the extended alignment

AB AB-

|| -> |||

ab abc

adds to the score a gap opening penalty go, whereas

A- A--

|| -> |||

ab abc

adds to the score a gap extension penalty ge. The score increment depends notonly on the current aligning pair, but also on the previously aligned pair.

The final aligning pair of a sequence of length i with a sequence of length jcan be one of three possibilities (top:bottom): (1) ai : bj ; (2) ai : −; (3) − : bj .Only for (1) is the score increment S(ai, bj) unambiguous. For (2) or (3), thescore increment depends on the presence or absence of indels in the previouslyaligned characters. For instance, for alignments ending as ai : −, the previouslyaligned character pair could be one of (i) ai−1 : bj , (ii) − : bj , (iii) ai−1 : −. Ifthe previous aligned character pair was (i) or (ii), the score increment would bethe gap opening penalty go; if it was (iii), the score increment would be the gapextension penalty ge.

To remove the ambiguity that occurs with a single dynamic matrix, weneed to simultaneously compute three dynamic matrices, with matrix elementsdenoted by T (i, j), T−(i, j) and T−(i, j), corresponding to the three types ofaligning pairs. The recursion relations are(1) ai : bj

T (i, j) = max

T (i− 1, j − 1) + S(ai, bj)T−(i− 1, j − 1) + S(ai, bj)T−(i− 1, j − 1) + S(ai, bj)

; (7.3)

(2) ai : −


T−(i, j) = max

T (i− 1, j) + goT−(i− 1, j) + geT−(i− 1, j) + go

; (7.4)

(3) − : bj

T−(i, j) = max

T (i, j − 1) + goT−(i, j − 1) + goT−(i, j − 1) + ge

. (7.5)

For aligning a sequence of length n with a sequence of length m, the bestalignment score is the maximum of the scores obtained from the three dynamicmatrices:

Topt(n,m) = max

T (n,m)T−(n,m)T−(n,m)

. (7.6)

Traceback to obtain the best alignment proceeds as before by starting withthe matrix element corresponding to the best alignment score, Topt(n,m), andtracing back to the matrix element that determined this score. The optimumalignment is then built up from last-to-first as before, but now switching mayoccur between the three dynamic matrices.

7.5 Local alignments

We have so far discussed how to align two sequences over their entire length,called a global alignment. Often, however, it is more useful to align two sequencesover only part of their lengths, called a local alignment. In bioinformatics, thealgorithm for global alignment is called “Needleman-Wunsch,” and that for localalignment “Smith-Waterman.” Local alignments are useful, for instance, whensearching a long genome sequence for alignments to a short DNA segment. Theyare also useful when aligning two protein sequences since proteins can consistof multiple domains, and only a single domain may align.

If for simplicity we consider a constant gap penalty g, then a local alignmentcan be obtained using the rule

T (i, j) = max

0T (i− 1, j − 1) + S(ai, bj)T (i− 1, j) + gT (i, j − 1) + g

. (7.7)

After the dynamic matrix is computed using (7.7), traceback proceeds by start-ing with the matrix element with highest score, and stops when encountering azero score.

If we apply the Smith-Waterman algorithm to locally align the two sequencesGGAT and GAATT considered previously, with a match scored as +2, a mis-match −1 and an indel −2, the dynamic matrix is

7.6. SOFTWARE 107

- G A A T T- 0 0 0 0 0 0G 0 2 0 0 0 0G 0 2 1 0 0 0A 0 0 4 3 1 0T 0 0 2 3 5 3

Traceback starts at the highest score, here the 5 at matrix element (4, 4), andends at the 0 at matrix element (0, 0). The resulting local alignment is

GGAT

: ::

GAAT

which has a score of five, larger than the previous global alignment score ofthree.

7.6 Software

If you have in hand two or more sequences that you would like to align, there isa choice of software tools available. For relatively short sequences, one can usethe LALIGN program for global or local alignments:

http://www.ch.embnet.org/software/LALIGN_form.html

For longer sequences, the BLAST software has a flavor that permits local align-ment of two sequences:

http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

Another useful software for global alignment of two or more long DNA sequencesis PipMaker:

http://pipmaker.bx.psu.edu/pipmaker/

Multiple global alignments of protein sequences use ClustalW or Tcoffee:

http://www.ebi.ac.uk/clustalw/index.html

http://igs-server.cnrs-mrs.fr/Tcoffee/tcoffee_cgi/index.cgi

Most users of sequence alignment software want to compare a given sequenceagainst a database of sequences. The BLAST software is most widely used, andcomes in several flavors depending on the type of sequence and database searchone is performing:

http://www.ncbi.nlm.nih.gov/BLAST/

mathematical biology (ust)

Documents