hardy weinberg theorem

Upload: ishwar-chandra

Post on 13-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 Hardy Weinberg Theorem

    1/4

    HardyWeinberg TheoremAlan Hastings, University of California, Davis, California, USA

    The HardyWeinberg theorem states what the genotype frequencies are, in terms of the

    gene frequencies, at a single locus, under the simplest assumptions for the genetic

    processes in a diploid organism. It is essentially the cornerstone on which much of thetheory of population genetics has been built.

    What it States

    The HardyWeinberg theorem is named after Hardy andWeinberg who independently discovered it in 1908. Thetheorem states what the genotype frequencies are, in termsof the gene frequencies, at a single locus, under the simplestassumptions for the genetic processes in a diploidorganism. These genotype frequencies are attained afterone generation of randommating, under the assumption of

    no selection, no mutation, no random changes infrequencies and no migration. Moreover, the genefrequencies remain constant from generation to genera-tion. The formula for the genotype frequencies is easiest topresent in the two-allele case, although the theorem alsoholds with more alleles.

    Denote the frequency of alleleAbyp, and the frequencyof allele a by q5 12p. Then, after a single generation ofrandom mating, the genotype frequencies are as given inTable 1. A similar relationship will hold if there are threealleles at a single locus, which we could designate as A1, A2,A3. In this case, there would be six different genotypes, sothe situation is a bit more complicated algebraically,

    although the underlying principles would be exactly thesame. In fact, the theorem extends to an arbitrary numberof alleles which we could designate as Ai, where thesubscript denotes the allele. Assume that the frequency oftheAiallele is given bypi. Then the frequency of the AiAihomozygote genotype will bepi

    2, and the frequency of theAiAj heterozygote (with i and j different) will be given by2pipj.

    One can also extend the HardyWeinberg equilibriumto a case of multiple loci. This is easiest to explain in thecase of two loci and two alleles. Denote the alleles at the Alocus byAanda, and the alleles at the B locus byBandb.Then there are four haplotypes, AB, Ab, aB and ab, and

    consequently ten genotypes,AB/AB,AB/Ab,AB/aB,AB/ab, Ab/Ab, Ab/aB. Ab/ab, aB/aB, aB/ab and ab/ab.Another issue arises which we will not deal with here,namely linkage disequilibrium, or the nonrandom assort-ment of alleles at the different loci. However, the HardyWeinberg proportions still hold, and after one generation,the genotype frequencies can be determined from thehaplotype frequencies.

    Since the HardyWeinberg law depends on knowing theallele frequencies, it is important to state that in any

    generation, independent of whether the population is aHardyWeinberg equilibrium,the allele frequencies can bderived from the genotype frequencies. For the case of twalleles at a single locus, the frequencyof the allele A is giveby the sum of the frequency of the AA homozygotes pluone half the frequency of the Aaheterozygotes:

    p5pAA1pAa/2

    For more than two alleles, the frequency of the allele A igiven by the sum of the frequency of theAAhomozygoteplus one half the frequency of all the heterozygotes thahave anA allele.

    Example of Population in HardyWeinberg Equilibrium

    If there are data providing genotype frequencies at a singl

    locus, one can tell if the population is in HardyWeinberequilibrium. Obviously, no natural population will bexactly in HardyWeinberg proportions, but in maninstances the population will be close to HardyWeinberequilibrium. An example of a hypothetical population iHardyWeinberg equilibrium is given inTable 2. A naturpopulation will never be in perfect HardyWeinberequilibrium, but in many cases the genotype frequencieare very close to the HardyWeinberg proportions.

    How You Can Tell Whether a Populatio

    is in Hardy

    Weinberg EquilibriumDeciding whether a population is in HardyWeinberequilibrium involves a statistical test to determine thlikelihood of a population with the observed allelfrequencies having the observed genotype frequencieFollowing the usual statistical procedures, one accepts thnull hypothesis that the population is in HardyWeinberequilibrium if the chance of a deviation from the HardyWeinberg proportions that is less than or equal to tha

    Article Contents

    Secondary article

    . What it States

    . Example of Population in HardyWeinberg

    Equilibrium

    . How You Can Tell Whether a Population is in Hardy

    Weinberg Equilibrium

    . Assumptions, and How Much and In What SenseThe

    Matter

    . Interest and Importance of the Theorem

    ENCYCLOPEDIA OF LIFE SCIENCES / & 2001 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

  • 7/26/2019 Hardy Weinberg Theorem

    2/4

    observed is greater than 0.05. If the probability is less than0.05, then one would say that the population is not inHardyWeinberg equilibrium.

    In discussing these tests, it is useful to define the HardyWeinberg disequilibrium for the allele A, DA, as thefrequency of the AA genotype minus the square of thefrequency of theA allele:

    DA5pAA2 (pA)2

    Then the statistical question is to test the null hypothesisthat DA5 0. Written this way, we can use the test even ifthere are more than twoalleles, but lumping together allthe

    alleles other than A. The appropriate statistical test isoutlined in Weir (1996). Two different approaches can beused. The simplest test begins by using the observed allelefrequencies to compute the genotype frequencies, as givenin Table 3. Then, the expected genotype numbers arecalculated as the sample size times the expected frequen-cies.

    Then the w2 goodness of fit statistic is:

    2A X

    genotypes

    Observed Expected

    Expected

    2

    nDA

    2

    np2

    2nDA

    2

    2npq

    nDA

    2

    nq2

    nD2A

    p2

    q2

    This statistic is distributed as a w2 with one degree offreedom, and thus significance can be determined usingstandard tables. The reason that the test has one degree offreedom and not two, is that there are two constraints: theallele frequencies are given and the genotype frequenciesmust sum to one.

    There are problems with this approach, especially ifnumbers are small. Since the situation is quite simple, exacttests are available, based on computing exactly theprobability of an observed set of genotype frequencies.This approach, which goes back to Fisher (1935) is

    summarized by Weir (1996). Conceptually the test consists

    of summing either the probability of observing fewehomozygotes than were actually observed, or the probability of observing more homozygotes than were actuallobserved.

    We will give the formula in the case of two alleles,Aana, in a population of size n (so there are 2nalleles). For given number ofA alleles,nA, the conditional probabilitof observingx heterozygotes can be shown to be:

    PrxjnA n!nA!2nnA!2

    x

    nAx=2!x!n nAx=2!2n!

    For the observed value of the frequency of theAallele anthe given population size, this formula can be evaluatenumerically for all possible values of the heterozygotfrequency,x. The least likely outcomes that sum to a giverejection level form a rejection level of that size.

    Assumptions, and How Much and InWhat Sense They Matter

    The HardyWeinberg theorem rests upon a series o

    assumptions, namely that:

    . Generations are nonoverlapping.

    . There is random mating.

    . The population size is very large effectively infinite.

    . There is no migration.

    . There is no mutation.

    . There is no natural selection.

    In addition, the model is phrased for diploid sexuaorganisms. Violation of any of these assumptions woulmean that the theorem is not strictly true. It is recognizethat the HardyWeinberg theorem applies only t

    idealized populations and that the assumptions will nevebe met exactly for any real population. Of more interest ihow large the deviation from the HardyWeinberproportions is if any particular assumption is not met.

    If generations are overlapping, then HardyWeinberproportions are approached, but only asymptoticallyrather than after a single generation, as is the case witnonoverlapping generations. The rate of approach geometric, so that the theorem holds approximately afteseveral generations, even if it is not strictly true. Similarly

    Table 2 Numbers of a hypothetical population in perfect

    HardyWeinberg equilibrium; the population size is 1000,

    and the frequency of the A allele is 0.6

    Genotype AA Aa aa

    Frequency p25 0.36 2pq5 0.48 q25 0.16

    Numbers 360 480 160

    Table 3 Test for HardyWeinberg

    Genotype AA Aa aa

    Observed number nAA nAa naaExpected number np2 2npq nq2

    Observed2Expected nDA 2 2nDA nDA

    Table 1 Genotype frequencies after one generation of

    random mating as given by the HardyWeinberg law

    Genotype AA Aa aa

    Frequency p2 2pq q2

    HardyWeinberg Theorem

    2 ENCYCLOPEDIA OF LIFE SCIENCES / & 2001 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

  • 7/26/2019 Hardy Weinberg Theorem

    3/4

    for sex-linked loci, the HardyWeinberg proportions areonly approached asymptotically and not in a singlegeneration.

    Lack of random mating is perhaps the most importantreason for deviations from HardyWeinberg proportionsin natural populations. The most common reason for anobserved population showing a deficiency of heterozygotes

    relative to the HardyWeinberg proportions is if twopopulations that do not interbreed freely are sampledtogether, a result known as the Wahlund effect. How thisarises is easiest to understand in the case where it is mostdramatic: where two populations that are essentiallymonomorphic for different alleles are sampled together.In this case the deficiency of heterozygotes can be verylarge, and the Wahlund effect is the most likely explanationfor most observed deviations of natural populations fromHardyWeinberg proportions. Consider the case whereone population is entirely AA individuals and the other isallaaindividuals. Then a sample of these two populationstogether will have no Aaindividuals and will appear not to

    be in HardyWeinberg equilibrium. Thus it would befallacious to attribute deviations from HardyWeinbergequilibrium to the action of selection without extensivefurther investigation.

    In a similar vein, other deviations from random mating,such as selfing, would also lead to deviations from HardyWeinberg equilibrium. The size of the deviation fromHardyWeinberg proportions would depend on the degreeof deviation from random mating.

    If the population size is small enough, then randomeffects will lead to deviations from the HardyWeinbergproportions. These effects will be very small, however,unless the population size is extremely small (much less

    than 50).If there is migration, input of alleles from an outside

    population, then there can be deviations from HardyWeinberg. The size of the deviation will depend on the rateof migration,but will typically be quite small.However,theWahlund effect described above could be viewed as aneffect of migration. Similarly, mutation is only likely tocause very small deviations from HardyWeinberg pro-portions.

    In theory, one way to detect natural selection would beto look for deviations from HardyWeinberg proportions.In practice, however, any deviationfrom HardyWeinbergproportions resulting from selection would be relatively

    small unless selection is very strong. Moreover, thedeviations from HardyWienberg proportions dependon the form of selection. Clearly, very strong selectioncould lead to large deviations from HardyWeinbergproportions, such as if all heterozygotes die. However,strong directional selection (favouring a single allele) maylead to no deviations from HardyWeinberg proportions.

    In summary, although all the assumptions listed arerequired for the strict truth of the HardyWeinbergtheorem, the effect of nonrandom mating is the most likely

    cause of large deviations from HardyWeinberg proportions. Very strong selection can also produce sizeabldeviations from HardyWeinberg proportions, but theswill only be large enough to be detected if selection is verstrong and sample sizes are very large.

    InterestandImportanceoftheTheorem

    The HardyWeinberg theorem is essentially the cornerstone on which much of the theory of population genetichas been built. It is thus of great historical importance. Ialso has a number of direct consequences of great import ipopulation genetics.

    One of the most important questions in populatiogenetics is understanding what maintains variability. ThHardyWeinberg theorem shows, under the assumptionof the theorem, that variability will be maintained. Beforthe HardyWeinberg theorem was demonstrated, thresult was not known, even though it seems so obvioutoday.

    Further work on understanding the dynamics opopulations depends critically on the HardyWeinbertheorem. Since the HardyWeinberg proportions arobtained in one generation, the theorem has the consequence that population genetic questions can be describeby the frequencies of alleles rather than the frequency ogenotypes. Thus, if there are two alleles, only one variablea single allele frequency, is needed to describe the genetistate of the population. (The other allele frequency can bobtained because the frequencies must sum to one.) If onfollowed genotypes, two variables (three frequencies minuone since the frequencies must sum to one) would bneeded. In general, in a system with n alleles, use of thHardyWeinberg theorem would suggest that n2variables are needed, while there would be n(n1 1)/22variables needed to follow the genotype frequencies. Thisa much larger number.

    Similarly, as we noted, the theorem applies eveto systems with more than one locus. Once again, thiallows for a great simplification in the description othese systems, so only frequencies of haplotypes, rathethan genotypes, need be followed. This observation essential for organizing large data sets in populatiogenetics.

    References

    Fisher RA (1935) The logic of inductive inference.Journal of the Roy

    Statistical Society 98: 3954.

    Weir BS (1996)Genetic Data Analysis II. Sunderland, MA: Sinauer.

    Further Reading

    HardyGH (1908) Mendelianproportions in a mixedpopulation.Scienc

    28: 4150.

    HardyWeinberg Theorem

    ENCYCLOPEDIA OF LIFE SCIENCES / & 2001 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

  • 7/26/2019 Hardy Weinberg Theorem

    4/4

    Hartl DL and Clark AG (1989) Principles of Population Genetics, 2nd

    edn. Sunderland, MA: Sinauer.

    Vithayasai C (1973) Exact critical values of the HardyWeinberg test

    statistic for two alleles.Community Statistics 1: 229242.

    WeinbergW (1908) On the demonstration of heredity in man.Translat

    by Boyer SH IV (1963). In: Papers on Human Genetics, pp. 41

    Englewood Cliffs, NJ: Prentice-Hall.

    HardyWeinberg Theorem

    4 ENCYCLOPEDIA OF LIFE SCIENCES / & 2001 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net