1. plant and animal association mapping introduce yourselves: -what species do you work on -what...

Post on 01-Jan-2016

221 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

2

Plant and Animal Association Mapping

• Introduce yourselves:- What species do you work on- What kind of population- What traits interest you- What marker resources do you have

3

Objective

• Convey ideas and tools to help you think about your population to decide what resources and methods will help you identify loci that affect the traits that interest you- Are we going to have the same problems as human

geneticists are having?

• This is a grand objective that we will fail to achieve

• Think about what we present in that light

4

Association MappingLD

Methods Germplasm

DefinitionCauses

Haplotype Blocks

Marker DensityRecombinationHotspots

Model-basedor PCA?

Candidate locior whole genome?

Sub-populationstructure

Extent of LD

BreedingSystem

Gene identification orMarker-assisted

selection?

Regression

Genomic selection

Multiple testingvs. Shrinkage

Signatures ofselection

Species

Panel diversity

Confounded structure andpolymorphism

5

Outline

• All mapping requires linkage disequilibrium• Why association over linkage mapping?• Refresher on LD: measures and causes• Whole-genome scan marker densities• Extent of LD in plants

6

Association Mapping

• It’s the same thing as linkage mapping in a bi-parental population but in a population that has not been carefully designed and generated experimentally.

7

aa AA BBbbIl y a un QTL à proximité

du marqueur AIl n’y a pas QTL à proximité

du marqueur B

Distribution des performances en fonction du génotype au marqueur A ou B

Performances Agronomiques dans la population

1

234 56

Détection de QTL

q q q Q Q Q

qqqQQ Q qqqQQ Q

Q q2 0

1 3

Q q2 2

1 1

Marqueur A

A a

Individus de la populationparents

1 2 3 4 5 6

Marqueur B

B b

Individus de la populationparents

1 2 3 4 56

8

Linkage Disequilibrium <–> Association

Jannink, J.-L. et al. 2001. Trends Plant Sci 6:337-342

9

Linkage Disequilibrium <–> Association

• pQM = pQpM <=> p(Q|M) = p(Q|m) = pQ• pQM ≠ pQpM <=> p(Q|M) ≠ p(Q|m)• Lines carrying M do not carry Q at the same

frequency as lines carrying m.

10

Why Association Mapping?

• Sometimes you can’t generate a population experimentally…

• Mapping efficiency• Fine mapping• Link to plant / livestock breeding

11

Dissecting A Quantitative Trait: Resolution Versus Time

Resolution in bp1x1071

Rese

arch

Tim

e in

Yea

rs5

1Associations

1x104

NILs Positional

Cloning

RI QTL Mapping

Yu and Buckler, Curr Opin Biotechnol 17: 1-6 (2006)

Pedigree

F2 or RILMapping

12

Resolution Versus Allelic Range

Resolution in bp1x1071

Alle

les

Eval

uate

d

>40

1

Associations In Diverse Germplasm

1x104

NIL

Pedigree

F2 or RIL Mapping

Positional Cloning

Associations In Narrow Germplasm

Yu and Buckler, Curr Opin Biotechnol 17: 1-6 (2006)

13

Link to plant / livestock breeding

• Phenotyping is not getting cheaper- use data collected in breeding for discovery:

association mapping does not require the generation of experimental mapping populations

• Dense genome-wide markers together can predict polygenic breeding values- Predict yield in the greenhouse before seed increase

for field testing- Predict performance of embryo / immature animal

14

Refresher on LD

• Definition: alleles at different loci not co-inherited independently

• Association with the phenotype• Parameter D; min and max of D• Standardize D: D’ and r2

• Causes of LD- mutation, drift / sampling, structure, selection

• Decay of LD: recombination

15

Linkage disequilibrium

• Alleles are co-inherited either more or less often than predicted by “chance”:

• Loci M and Q. Alleles {M, m} and {Q, q}• pMQ: probability that a parent transmits a

gamete carrying both alleles M and Q• “Chance” = “Alleles are independent”

16

Independence in a Table

17

Non-independence

Non-Independence in a Table

• Algebra shows D = ru – st• By convention, qA and qB are the minor allele

frequencies 18

19

Linkage Disequilibrium <–> Association

• pQM = pQpM <=> p(Q|M) = p(Q|m) = pQ• pQM ≠ pQpM <=> p(Q|M) ≠ p(Q|m)• Lines carrying M do not carry Q at the same

frequency as lines carrying m.

20

Minimal and maximal values of D

• If D < 0r = pApB + D ≥ 0 D ≥ –pApBu = qAqB + D ≥ 0 D ≥ –qAqB

• D ≥ max(–pApB, –qAqB)• If D > 0

s = pAqB – D ≥ 0 D ≤ pAqBt = qApB – D ≥ 0 D ≤ qApB

• D ≤ min(pAqB, qApB)

21

Standardize D between 0 and 1

• Define Dmax = max(–pApB, –qAqB) if D < 0or Dmax = min(pAqB, qApB)

if D > 0then 0 ≤ D’ = D / Dmax ≤ 1

• When is |D| maximized?

pA qApB r = 0 t = pBqB s = pA u = qAqB – pApB

=> D = –pApB

22

Recombination and Maximal D

• After a new mutation, one of the four gametes is missing so D’ = 1

• The missing gamete can be created by recombination

• D’ = 1 until recombination occurs: series of loci with D’ = 1 can define a haplotype block

Number of allelesA1B1 N11 A2B1 N21

A1B2 1 A2B2 0

Slatkin, M. 2008. Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9:477-485.

23

Other standardized LD measure

• See D as a covariance of allelic states:cov(X, Y) = E(XY) – E(X)E(Y)–> If allele A1 (B1) present, X (Y) = 1 else 0D = pAB – pApB

• To standardize a covariance turn it into a correlation

24

Coefficient of Determination

• Allele at locus M can predict allele at QCorrelation between allelic states:

• This standardization is more useful for association mapping purposes

25

Variance explained by marker

• Non-independence leads the mean of lines carrying M versus m to differ: the marker explains phenotypic variance

• variance explained by the marker• variance generated by the QTL

Long, A.D. and Langley, C.H. 1999. The Power of Association Studies to Detect the Contribution of Candidate Genetic Loci to Variation in Complex Traits. Genome Res. 9: 720-731

26

r2 and Sample Size

• If you typed the causal polymorphism, you would need a sample N1 to detect it

• Then to identify a marker in LD with the cause, you need a sample of size N2 ≅ N1 / r2

Pritchard, J.K., and M. Przeworski. 2001. Linkage disequilibrium in humans: models and data. Am J Hum Genet 69:1-14.

27

r2max

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Associated Marker Frequency (pB)

r2m

ax

pA = 0.2

pA = 0.5Focal LocusFrequency

28

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Associated Marker Frequency (pB)

r2m

ax

r2max

pA = 0.2

pA = 0.5

0 0.05 0.1 0.15 0.20

0.2

0.4

0.6

0.8

1

Associated Marker Frequency (pB)

r2m

ax

pA = 0.02

Focal LocusFrequency

29

QTL & Marker frequencies must match

r2max

Series1Series2Series3Series4Series5Series6Series7Series8Series9Series10Series11Series12Series13

Associated Marker Frequency

Foca

l Lo

cus

Frequency

30

Multi-allelic LD

• l is the minimum number of alleles at loci A and B that is, l = min(k, m)

• If A and B are biallelic

Zhao, H., D. Nettleton, M. Soller, and J.C.M. Dekkers. 2005. Evaluation of linkage disequilibrium measures between multi-allelic markers as predictors of linkage disequilibrium between markers and QTL. Genetical Research 86:77-87

31

Causes of LD

• Four categories- Mutation- Drift / Sampling- Structure- Selection

Slatkin, M. 2008. Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9:477-485.

32

Mutation

• Locus M is polymorphic {M, m}• Locus Q is monomorphic {q}• Allele Q appears on gamete carrying M• M and Q are not independent since Q always

appears with M

33

Drift / Sampling

• Linkage equilibrium: MQ, Mq, mQ, mq all represented in expected frequencies

• Sampling differentially increases or reduces certain combinations by chance

• LD has appeared since combinations are no longer in expected frequencies

• Punctual event: Founder effectsRecurrent: Finite population, effective size Ne

34

Structure

• Differential relatedness among individuals in the sample

• Four sub-categories- Subpopulation structure- Admixture- Migration- Hybridization / Pedigree relatedness / “Familial

structure”

35

Subpopulation structure

• Mating is random within sub-populations but very little occurs between sub-populations

• “Very little” means 0 < 4Nm < 1- N effective population size- m migration rate: number of migrants /

population size• No / low migration –> allele frequencies drift:

pA(1) ≠ pA(2) AND pB(1) ≠ pB(2)

36

Subpopulation Structure

37

Extreme / General cases

• Extreme case:- One subpopulation fixed for A and B, the other for

a and b => the Ab and aB gametes never occur• Each subpopulation is contributing an excess

of its “major two-locus haplotype”• General two-subpopulation case:

D = k(1 – k)[pA(1) – pA(2)][pB(1) – pB(2)]- k proportion of subpopulation 1 in total

population

38

One gamete per subpopulation reduces structure-induced LD

• 18 unlinked SSR: of 149 wheats, 95 retained basedon diversity

• Low “repeat contributions” from one subpop

• <=> single pop with high Ne 149

95

Breseghello, F., and M.E. Sorrells. 2006. Association Mapping of Kernel Size and Milling Quality in Wheat (Triticum aestivum L.) Cultivars. Genetics 172:1165-1177.

39

Admixture

• Just like subpoplation structure, but reproductive barriers between subpopulations have recently broken down

Migration• Occasional gametes with “non-equilibrium”

linkage phases arrive in the population

40

• Start with a double heterozygote• It produces gametes with

• So:

Hybridization

41

Selection

• Two sub-categories- LD between different causal loci

o Sub-sub categories:o Selection under an additive model that changes the

additive varianceo Selection under an epistatic model

- LD between causal and nearby neutral loci

42

Selection that changes the variance

• Variance conferred by two-locus AB gametes:var(AX + BY) = (A)2var(X) + (B)2var(Y) +

2(AB)cov(X,Y)= (A)2var(X) + (B)2var(Y) + 2(AB)D

• LD modulates additive variance

Directional SelectionBulmer Effect

Disruptive Selection

43

Selection under epistasis

• Favored gamete will have higher frequency than expected

44

LD caused by hitchhiking

• Selection increases frequency of a novel mutation

• Combinations of neutral loci are co-inherited and reach higher-than-equilbrium frequencies

Smith, J.M., and J. Haigh. 1974. The hitch-hiking effect of a favorable gene. Genetical Research 23:23-35

45Hayes, B.J. et al. 2008. Animal Genetics 39:105-111.

Selection example (Bovine)

46

Selection example (Human)

Sabeti, P.C. et al. 2007. Nature 449:913-918

47

Decay of LD

• One systematic process: recombination• Generation 0

- r0 = Pr(A1B1) and D = r0 – pApB

• What are Generation 1 A1B1 gametes origins?- non-recombinant from A1B1/AB parent: r0(1 – c)

- recombinant from A1B/AB1 parent: pApB c

- r1 = r0(1 – c) + pApB c

48

Recombination decay of LD

• In Generation 1- r1 = r0(1 – c) + pApB c

- D1 = r1 – pApB = D0(1 – c)

• In Generation t- Dt = D0(1 – c)t

• Valid for- Random mating- No drift

0 5 10 15 20 250

0.2

0.4

0.6

0.8

1

c = 0.001c = 0.005c = 0.01c = 0.05c = 0.1c = 0.5

Generation

Dt /

D0

49

Hybrid-Source vs. Population LD

• Double heterozygote hybrid: D = ½ ( ½ – c)- Unlinked D goes to zero in one generation

• Population-wide: D1 = D0(1 – c)- Unlinked D is reduced by half in one generation

• In the population, gametes are produced by individuals for which recombination is ineffective, e.g.

50

Generation + Decay => Equilibrium

• Random mating population of constant size:- Mutation and drift are constantly generating LD- Recombination removes it as a function of

distance between loci

• E(r2) = 1/(1+4Nec)

51

Linked IBD and E(r2)

Sved J.A. (2009) Genetics Research 91:183-192

52

Relationship between LIBD and LD

• LD does not require non-recombination between loci, LIBD does

• For tightly linked loci, LD ≈ LIBD• For loosely linked loci, LD ≠ LIBD

53

LIBD recurrence equation

• If loci have not recombined, they are perfectly correlated, else they are uncorrelated:

• If loci have not recombined over two independent pathways, they are LIBD:

• From one generation to the next:

54

Extension to population subdivision

• Both α and β depend on migration between subpopulations• Any LIBD across subpopulations is generated within

subpopulations• Barring migration, common linkage phase will be ancestral

to subpopulation divergence• Structure is a case where LD and LIBD would not be

expected to be similar

55

Variation around E(r2)

• E(r2) = 1/(1+4Nec)• This is an expectation. There is a LOT of

variability around it.

56

Simulated / expected LD

Mutation / Drift LD: Ne = 100 LD in RIL (Hybridization)

57

Marker Density (chicken example)About 140 cM

• E(r2) still only about 0.4 at 0.2 cM• And when you are that close, you still have

some probability of a very low r2

Andreescu, C. et al. 2007. Genetics 177:2161-2169

58

Diverse 2-row barley example

• E(r2) about 0.3 at 0.2 cM• But when you are that close, you still have

good probability of a very low r2 (< 0.2)

59

Mean versus P(r2 > 0.5)

P(r2 >

0.5

)

Elite N. American spring oat dataset

60

QTL & Marker frequencies must match

r2max

Series1Series2Series3Series4Series5Series6Series7Series8Series9Series10Series11Series12Series13

Associated Marker Frequency

Foca

l Lo

cus

Frequency

61

Association Mapping: a search in 2DM

AF: 0

.0 –

0.5

Genome

Associated markers need to be close in the genome to be in high LD, but they also need to have comparable allele frequencies

62

Extent of LD and marker density

• Power of detection is a function of QTL effect size, number of observations, and LD between QTL and marker

• Use this relationship to choose the desired r2

• “Extent of LD” analyses show the expected r2 at a given distance

• Combine to determine the required density• Hedge because of variability

63

Coverage ≠ Power

Spencer, C.C.A. et al. 2009. Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Genet 5:e1000477.

64

Whole Genome or Candidate Loci?

• Focusing on candidate loci imposes the bias of our current biological knowledge

• Whole Genome imposes two burdens- Higher genotyping cost (usually)- Higher multiplicity of testing

65

Extent of LD in plants

• Breeding system (selfing / outcrossing)• Selection at the loci assayed• Diversity of the panel

- E(r2) = 1/(1+4Nec)

- More diverse Larger Ne

• Population structure in the sample

66

LD affected by selfing rate s

• Recombination is ineffective in homozygotes• ceteris paribus LD decays more slowly in

(partial) selfers than in outcrossersS = 0.00 S = 0.95

Nordborg, M. 2000. Genetics 154:923-929

67

Maize: Outcrosser; Diverse vs. Elite

Tenaillon, M.I. et al. 2002. Patterns of Diversity and Recombination Along Chromosome 1 of Maize. Genetics 162:1401-1413

68

Maize

Remington, D.L. et al. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. PNAS USA 98:11479-11484.

Ching, A. et al. 2002. SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genetics 3:19.

69

Arabidopsis: Selection; Founder effects

Nordborg, M. et al. 2002. The extent of linkage disequilibrium in Arabidopsis thaliana. Nat Genet 30:190-193.

Very approximately 10 Mbp

70

LD decay in Pinus taeda

• Loblolly pine has pollen that moves >100km

• Summary of LD at 19 candidate genes

Brown et al. 2004 PNAS 10:15255

71Copyright © 2007 by the Genetics Society of America

Krutovsky, K. V. et al. Genetics 2005. 171:2029-2041

Douglas fir

72Copyright © 2007 by the Genetics Society of America

Ingvarsson, P. K. Genetics 2005. 169:945-953

Aspen (Populus)

73

Barley: Selfer; Wild vs Cultivated

Caldwell, K.S. et al. 2006. Extreme Population-Dependent Linkage Disequilibrium Detected in an Inbreeding Plant Species, Hordeum vulgare. Genetics 172:557-567.

Wild

Cultivated

74

More Wild Barley

Steffenson, B.J. et al. 2007. Aust. J Agric. Res. 58:532-544

75

Barley: North American EliteVery approximately 200 Mbp

Ham

blin

et a

l. 20

10. C

rop

Scie

nce

50:5

56:5

66

76

Rice

Mather, K.A. et al. 2007. The Extent of Linkage Disequilibrium in Rice (Oryza sativa L.). Genetics 177:2223-2232.

77

Rice

McNally et al. in preparation

78Copyright © 2007 by the Genetics Society of America

Hamblin, M. T. et al. Genetics 2005;171:1247-1256

Sorghum

kbp

top related