MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Population genetics of maize domestication,adaptation, and improvement
Je↵rey Ross-Ibarrawww.rilab.org
@jrossibarrarossibarra
February 5, 2014
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Lead Authors
UC Davis
Sofiane Mezmouk
Joost van Heerwaarden
Matthew Hu↵ord
Shohei Takuno
U Missouri
Justin Gerke
U Copenhagen
Rute Fonseca
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Acknowledgements
People
•Ed Buckler (USDA)
• Jer-Ming Chia (CSHL)
•John Doebley (Wisconsin)
• Jode Edwards (USDA)
•Tom Gilbert (Copenhagen)
•Mike McMullen (USDA)
• Tanja Pyhajarvi
• Lauren Sagara
• Nathan Springer (Minnesota)
•Doreen Ware (USDA)
Funding
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Maize evolutionary genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Maize evolutionary genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Maize evolutionary genetics
Div
ersi
ty
Genome Sequence
Selective Sweep
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Outline
1 DomesticationGeographic origins of maize domesticationImpacts of selection on genomic diversity
2 AdaptationParallel adaptation to new environmentsAdaptive introgression from wild relatives
3 ImprovementHistorical genomics of US maizeDrift and selection in the Iowa RRSThe role of deleterious alleles in maize
4 Conclusions
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Maize origins: single domestication
Sawers & Sanchez Leon 2011 Front. Genet.
Matsuoka et al. 2002 PNAS
• Single domestication from lowland ssp. parviglumis
• Microsatellite data suggested oldest maize from highlands
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Maize origins: single domestication
Sawers & Sanchez Leon 2011 Front. Genet.
Matsuoka et al. 2002 PNAS
• Single domestication from lowland ssp. parviglumis
• Microsatellite data suggested oldest maize from highlands
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Highland landraces genetically similar to teosinte
ResultsPatterns of Genetic Structure and Differentiation. Principal com-ponents analysis (PCA) (17) of the maize SNP data identifies 58significant principal components (PCs) (explaining 37.6% oftotal variance), probably reflecting isolation by distance (18) andlinkage effects (19). We use the first nine PCs, which present thestrongest spatial autocorrelation (Fig. S2) and explain a largeportion of the total variance (18.7%), to cluster the accessionsinto 10 geographically distinct groups (Fig. 1A). Meso-Americanmaize falls into three groups: the Meso-American Lowlandgroup, which includes predominantly lowland accessions fromsoutheast Mexico and the Caribbean; the West Mexico group,representing both lowlands and highlands; and the MexicanHighland group, encompassing most of Matsuoka et al.’s high-land Mexican accessions (5) as well as accessions from highlandGuatemala. These clusters also confirm the presence of US-de-rived varieties in South America (20); we excluded these acces-sions from further analysis.In the joint PCA analysis of the three subspecies, the first PC
(10.8% of variance) separates maize from its wild relatives andconfirms the similarity between maize from the Mexican Highlandgroup and parviglumis (Fig. 1B). The second PC (4.8%of variance)mainly separates the genetic groups of maize along a north–southaxis, with the Northern United States and Andean Highlands atthe extremes. The third PC (2.7% of variance) predominatelyreflects the difference between parviglumis and mexicana. TheMexican Highland cluster extends toward mexicana along bothPC 1 and 3, suggesting that the similarity of highland maize toparviglumis may reflect admixture with mexicana.
Admixture Analysis. Simulation of gene flow of mexicana into theMeso-American Lowland maize group suggests that 13% cu-mulative historical introgression is sufficient to explain observeddifferences between lowland and highland maize in terms ofheterozygosity and differentiation from parviglumis (Fig. S3).Structure analysis (21) of all Mexican accessions lends supportfor this magnitude of introgression (Fig. 2). The three subspeciesform clearly separated clusters, but evidence of admixture is
evident in all three groups, and the two wild relatives show clearsigns of bidirectional introgression at altitudes where theirranges overlap (Fig. 2). Highland maize shows strong signs ofmexicana introgression, with 20% admixture observed in theMexican Highland cluster, but below 1,500 m mexicana in-trogression drops to less than 1%. Introgression from parviglumisinto maize is much lower overall, reaching its highest averagevalue (3%) in the lowland West Mexico group.
Drift Analysis. Because introgression from mexicana may affectancestry inference based on genetic distance from parviglumis, wetook an approach that does not require reference to the wild rel-atives. Under models of historical range expansion, genetic dif-ferentiation increases away from the population of origin (22, 23),and estimates of drift from ancestral frequencies have been appliedsuccessfully to identify ancestral populations (24). We thereforeapplied the method of Nicholson et al. (25) to estimate simulta-neously ancestral frequencies and F, a measure of genetic drift ofaway from these frequencies, for sets of predefined populations.To illustrate the potential impact ofmexicana introgression, we
first performed a standard analysis that includes each maizepopulation in turn in conjunction with the two wild relatives.Average drift away from the inferred common ancestor of maize,parviglumis, and mexicana is higher for maize (F = 0.24) than formexicana (F = 0.15) or parviglumis (F = 0.07), probably due tochanges in allele frequency following the domestication bottle-neck. Because the inferred ancestral frequencies are closer tothose of the wild relatives than to present-day maize, comparisonwith this ancestor is sensitive to introgression from these sub-species. It therefore is not surprising that estimates of F betweenindividual maize populations and the common ancestor of allthree taxa identify the Mexican Highland group as being mostsimilar (Fig. 3A). This pattern is maintained in an analysis ex-cluding mexicana, in which Mexican Highland maize is tied withtheWestMexico group as themost ancestral population (Fig. 3B).To mitigate the impact of introgression, we used a slightly
modified approach that excludes both parviglumis and mexicanaand calculates genetic drift with respect to ancestral frequenciesinferred from domesticated maize alone. Because the genetic
Fig. 1. (A) Map of sampled maize accessions colored by genetic group. (B) First three genetic PCs of all sampled accessions.
van Heerwaarden et al. PNAS | January 18, 2011 | vol. 108 | no. 3 | 1089
EVOLU
TION
• 1K SNPs from 1200 landraces across Americas
• PCA identifies genetic clusters and confirms highlandmaize most similar to teosinte
van Heerwaarden et al. 2011 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Modern maize originated in lowlands
similarity of some of our maize groups violates the assumption ofindependent drift, we infer ancestral frequencies by averagingover estimates obtained for pairs of diverged maize groups andcalculate drift of individual populations with respect to thesefrequencies. In contrast to previous results, this comparisonidentifies the West Mexico group as being most similar to thecommon domesticated ancestor, followed by the MexicanHighland and Meso-American Lowland groups (Fig. 3C).Moreover, splitting the West Mexico group into highland(>2,000 m) and lowland (<1,500 m) components reveals that thelowland West Mexico group is most similar to the inferred an-cestral maize. Direct comparison of genetic drift among thelowland West Mexico, Mexican Highland, and each of theremaining eight clusters shows further that the lowland WestMexico group is significantly closer than the Mexican Highlandgroup to the inferred ancestor of each triplet (Fig. S4). Theseresults strongly suggest that maize from the western lowlands ofMexico is genetically most similar to the common ancestor ofmaize and is more closely related to other extant populationsthan is maize from the highlands of central Mexico.The ancestral position of the lowland West Mexico group is
confirmed in a spatially explicit analysis of current allele fre-quencies in modern landraces, in which we mapped the momentestimator of F with respect to inferred ancestral allele frequen-cies. Mapping against allele frequencies observed in parviglumis
(Fig. 4A) recapitulates earlier genetic results identifying highlandmaize as most similar to its wild ancestor (5). Points in the lower0.05 quantile of F cluster in the highlands, with a mean altitudeof 1,745 m. In contrast, mapping F with respect to inferred an-cestral allele frequencies (Fig. 4B) identifies the lowest 0.05quantile of F values in the lowlands of western Mexico, includingthe Balsas region and the region south of the Mexican highlands,resulting in an average altitude of 1,268 m; this analysis alsoclearly estimates higher values of F for maize in the Mexicanhighlands, particularly in areas of high inferred introgressionfrom mexicana (Fig S5).
DiscussionResolving the origins and spread of domesticated crops is a fas-cinating and challenging endeavor that requires the integrationof botanical, archeological, and genetic evidence (26, 27, 28).Maize provides an exceptional opportunity for studying theprocesses of domestication and subsequent diffusion because ofthe wealth of existing archaeobotanical data, germplasm acces-sions, and molecular markers. The contradiction between evi-dence supporting the earliest cultivation in the lowlands and thegenetically ancestral position of Mexican Highland maize istherefore of particular interest. The disagreement is important,because the adaptive differences between highland and lowlandmaize are profound (14, 29). In other crops, uncertainty about
mexicana parviglumis Meso-American Lowland West Mexico Mex. Highland
05001000150020002500
m
Fig. 2. (Lower) Bar plot of assignment values for the sample of Mexican accessions: Mexicana (red), parviglumis (green), and mays (blue). (Upper) The solidblack line indicates the altitude for each sample. The dotted line marks the minimum altitude at which mexicana occurs.
0.1 0.2 0.3 0.4
F F
0.1 0.2 0.3 0.40.1 0.2 0.3 0.4
F
South-West USCentral USS American Lowland Bolivian LowlandMeso-American Lowland West MexicoCoastal BrazilMexican Highland North USAndean Highland
A B C
Fig. 3. Posterior densities of the genetic drift parameter F for 10 genetic groups with respect to (A)mexicana and (B) parviglumis. Only lowland accessions ofthe West Mexico group (light blue) were included. (C) Drift of all 10 genetic groups with respect to inferred ancestral frequencies. Light blue represents WestMexico; dotted line indicates the division between lowlands (<1,500 m, solid line) and highlands (>2,000 m).
1090 | www.pnas.org/cgi/doi/10.1073/pnas.1013011108 van Heerwaarden et al.
• Identifying gene flow from ssp. mexicana
• Ancestral reconstruction identifies lowland origin
van Heerwaarden et al. 2011 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Modern maize originated in lowlands
similarity of some of our maize groups violates the assumption ofindependent drift, we infer ancestral frequencies by averagingover estimates obtained for pairs of diverged maize groups andcalculate drift of individual populations with respect to thesefrequencies. In contrast to previous results, this comparisonidentifies the West Mexico group as being most similar to thecommon domesticated ancestor, followed by the MexicanHighland and Meso-American Lowland groups (Fig. 3C).Moreover, splitting the West Mexico group into highland(>2,000 m) and lowland (<1,500 m) components reveals that thelowland West Mexico group is most similar to the inferred an-cestral maize. Direct comparison of genetic drift among thelowland West Mexico, Mexican Highland, and each of theremaining eight clusters shows further that the lowland WestMexico group is significantly closer than the Mexican Highlandgroup to the inferred ancestor of each triplet (Fig. S4). Theseresults strongly suggest that maize from the western lowlands ofMexico is genetically most similar to the common ancestor ofmaize and is more closely related to other extant populationsthan is maize from the highlands of central Mexico.The ancestral position of the lowland West Mexico group is
confirmed in a spatially explicit analysis of current allele fre-quencies in modern landraces, in which we mapped the momentestimator of F with respect to inferred ancestral allele frequen-cies. Mapping against allele frequencies observed in parviglumis
(Fig. 4A) recapitulates earlier genetic results identifying highlandmaize as most similar to its wild ancestor (5). Points in the lower0.05 quantile of F cluster in the highlands, with a mean altitudeof 1,745 m. In contrast, mapping F with respect to inferred an-cestral allele frequencies (Fig. 4B) identifies the lowest 0.05quantile of F values in the lowlands of western Mexico, includingthe Balsas region and the region south of the Mexican highlands,resulting in an average altitude of 1,268 m; this analysis alsoclearly estimates higher values of F for maize in the Mexicanhighlands, particularly in areas of high inferred introgressionfrom mexicana (Fig S5).
DiscussionResolving the origins and spread of domesticated crops is a fas-cinating and challenging endeavor that requires the integrationof botanical, archeological, and genetic evidence (26, 27, 28).Maize provides an exceptional opportunity for studying theprocesses of domestication and subsequent diffusion because ofthe wealth of existing archaeobotanical data, germplasm acces-sions, and molecular markers. The contradiction between evi-dence supporting the earliest cultivation in the lowlands and thegenetically ancestral position of Mexican Highland maize istherefore of particular interest. The disagreement is important,because the adaptive differences between highland and lowlandmaize are profound (14, 29). In other crops, uncertainty about
mexicana parviglumis Meso-American Lowland West Mexico Mex. Highland
05001000150020002500
m
Fig. 2. (Lower) Bar plot of assignment values for the sample of Mexican accessions: Mexicana (red), parviglumis (green), and mays (blue). (Upper) The solidblack line indicates the altitude for each sample. The dotted line marks the minimum altitude at which mexicana occurs.
0.1 0.2 0.3 0.4
F F
0.1 0.2 0.3 0.40.1 0.2 0.3 0.4
F
South-West USCentral USS American Lowland Bolivian LowlandMeso-American Lowland West MexicoCoastal BrazilMexican Highland North USAndean Highland
A B C
Fig. 3. Posterior densities of the genetic drift parameter F for 10 genetic groups with respect to (A)mexicana and (B) parviglumis. Only lowland accessions ofthe West Mexico group (light blue) were included. (C) Drift of all 10 genetic groups with respect to inferred ancestral frequencies. Light blue represents WestMexico; dotted line indicates the division between lowlands (<1,500 m, solid line) and highlands (>2,000 m).
1090 | www.pnas.org/cgi/doi/10.1073/pnas.1013011108 van Heerwaarden et al.
• Identifying gene flow from ssp. mexicana
• Ancestral reconstruction identifies lowland origin
van Heerwaarden et al. 2011 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Allele frequencies reveal bottleneck, growth
Rare (= =) Common
• 30X landrace genome estimates population size
• Genic regions reflect bottleneck loss of rare alleles
• Nongenic regions of maize show new mutations (⇡ 40%unique) due to exponential growth
Vince Bu↵alo, In Prep
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Allele frequencies reveal bottleneck, growth
Rare (= =) Common
• 30X landrace genome estimates population size
• Genic regions reflect bottleneck loss of rare alleles
• Nongenic regions of maize show new mutations (⇡ 40%unique) due to exponential growth
Vince Bu↵alo, In Prep
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Genome sequencing identifies changes in diversity
• Full genome sequencing to ⇡ 5x of > 100 temperate andtropical inbreds, landraces, and teosinte
• Maize retained most diversity through both domestication(⇡ 80%) and improvement (> 95%)
Hu↵ord et al. 2012 Nature Genetics; Chia et al. 2012 Nature Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Genome sequencing identifies changes in diversity
• Full genome sequencing to ⇡ 5x of > 100 temperate andtropical inbreds, landraces, and teosinte
• Maize retained most diversity through both domestication(⇡ 80%) and improvement (> 95%)
Hu↵ord et al. 2012 Nature Genetics; Chia et al. 2012 Nature Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Strong selection, including regulatory regions
GRMZM2G136072
• Selection stronger during domestication (s ⇡ 1.5%)
• ⇡ 18% domestication genes show continued selection
• 6� 10% of selected regions contain no genes
• Expression suggests selection on regulatory sequence
Hu↵ord et al. 2012 Nature Genetics; Swanson-Wagner et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Strong selection, including regulatory regions
GRMZM2G136072
• Selection stronger during domestication (s ⇡ 1.5%)
• ⇡ 18% domestication genes show continued selection
• 6� 10% of selected regions contain no genes
• Expression suggests selection on regulatory sequence
Hu↵ord et al. 2012 Nature Genetics; Swanson-Wagner et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Domestication candidate genes
• 484 selected regions identified
• Majority of show stronger selection than tb1 or tga1
Hu↵ord et al. 2012 Nature Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Outline
1 DomesticationGeographic origins of maize domesticationImpacts of selection on genomic diversity
2 AdaptationParallel adaptation to new environmentsAdaptive introgression from wild relatives
3 ImprovementHistorical genomics of US maizeDrift and selection in the Iowa RRSThe role of deleterious alleles in maize
4 Conclusions
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Repeated adaptation to highlands
Domestication
9,000BP
Lowland S.
America
6,000BP
Highland S.
America
4,000BP
Highland Mexico
6,000BP
Highland SW US
4,000BP
Fonseca et al. in prep.
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Parallel phenotypes in S. America and Mexico
Barthakur 1974 Int. J. Biometeor.
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Genetic data confirm independent origin
• GBS data from Mexico and S. America landraces
• Independent origins, little admixture between highlands
Takuno et al. in prep
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Distinct genetic architecture of highland adaptation
Yi et al. 2010 Science
•F
ST
identifies many candidate SNPs, < 5% shared
• Most (> 80%) found segregating in lowland samples
• Contrast to highland adaptation in humans
Takuno et al. in prep
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Distinct genetic architecture of highland adaptation
Yi et al. 2010 Science
•F
ST
identifies many candidate SNPs, < 5% shared
• Most (> 80%) found segregating in lowland samples
• Contrast to highland adaptation in humans
Takuno et al. in prep
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Repeated adaptation in maize and teosinte
maize mexicana
Photo: Pesach Lubinsky
mexicana parviglumis
Latuer et al. 2004 Genetics
• Colonization of highland Mexico brought maize intosympatry with highland ssp. mexicana
•mexicana and parviglumis diverged ⇡ 60, 000BP
Ross-Ibarra et al. 2009 Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Repeated adaptation in maize and teosinte
maize mexicana
Photo: Pesach Lubinsky
mexicana parviglumis
Latuer et al. 2004 Genetics
• Colonization of highland Mexico brought maize intosympatry with highland ssp. mexicana
•mexicana and parviglumis diverged ⇡ 60, 000BP
Ross-Ibarra et al. 2009 Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Widespread introgression from mexicana
El Porvenir
Opopeo
Santa Clara
Nabogame
Puruandiro
Xochimilco
Tenango del Aire
San Pedro
Ixtlan
Allopatric
A
0 50 100 150 200 250
Chromosome 4: Maize
Mb
El Porvenir
Opopeo
Santa Clara
Nabogame
Puruandiro
Xochimilco
Tenango del Aire
San Pedro
Ixtlan
HAPMIX
STRUCTURE
El Porvenir
Opopeo
Santa Clara
Nabogame
Puruandiro
Xochimilco
Tenango del Aire
San Pedro
Ixtlan
• SNP genotyping 8 landraces sympatric with mexicana
• 6 genomic regions with mexicana haplotypes introgressedin multiple landraces at high frequencies
• No consistent introgression from maize into mexicana
Hu↵ord et al. 2013 PLoS Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Introgressed regions overlap with teosinte QTL
Hu↵ord et al. 2013 PLoS Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Adaptive introgression from mexicana
• Landraces with introgression show mexicana-likephenotype and superior growth in cold temperatures
• Maize adapted to highland environments in Mexico viagene flow from mexicana
Hu↵ord et al. 2013 PLoS Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Adaptive introgression from mexicana
• Landraces with introgression show mexicana-likephenotype and superior growth in cold temperatures
• Maize adapted to highland environments in Mexico viagene flow from mexicana
Hu↵ord et al. 2013 PLoS Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Outline
1 DomesticationGeographic origins of maize domesticationImpacts of selection on genomic diversity
2 AdaptationParallel adaptation to new environmentsAdaptive introgression from wild relatives
3 ImprovementHistorical genomics of US maizeDrift and selection in the Iowa RRSThe role of deleterious alleles in maize
4 Conclusions
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Historical genomics of US maize
• SNP genotyping of 400 historicallandraces and inbreds
• Track allele frequencies
• Estimate genome-wide ancestryusing identity by state
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Genetic structure and diversity of US maize
• Increasing structure over time mirrors development ofheterotic groups
• Number and diversity of ancestors decreases over time
van Heerwaarden et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Selection on quantitative traits
DiscussionThe genomics of breeding history is of great importance to un-derstanding the genetic basis of crop improvement and is in-strumental to the identification of molecular targets of artificialselection. The current state of marker technology has granted usan unprecedented look across eight decades of breeding andselection, providing insight into historical developments in di-versity, ancestry, and the effects of selection across the genome.The transition from open-pollinated varieties to inbred lines
and the emergence of heterotic groups have caused profoundchanges in population structure, linkage disequilibrium, and an-cestry patterns. Differentiation in the first two eras, although sig-nificant, is weak and our results support pedigree analyses (4) thatsuggest current population structure is mainly due to recent di-vergence of breeding pools rather than to different landrace ori-gins. The strong differentiation observed in themodern era 3 linesis likely the result of the use of smaller numbers of more closelyrelated breeding lines and limited genetic exchange among het-erotic groups in the last two eras. Nonetheless, differential land-race ancestry remains detectable in elite material, providing somejustification for the use of the traditional designations Reid(YellowDent) and Lancaster for the SS andNSS heterotic groups.
Compared with the dramatic shifts in ancestry, directionalselection has had limited effect on the genome, with only 5% ofSNPs showing some evidence of consistent selection. Candidatesites, apart from a slight reduction in ancestral diversity, do notdeviate meaningfully from genome-wide patterns of haplotypelength and ancestry. A potential caveat regarding this observa-tion is that our selection scan is most sensitive to cumulativechanges in allele frequency, possibly missing alleles fixed in theearly stages of maize breeding. To account for this potential bias,we measured ancestry distortion and haplotype diversity at the236 SNPs with highest frequency differentiation between eras0 and 3, finding similar results as for our candidate SNPs (i.e., noincrease in distortion and only 12% diversity reduction). Ourresults are also consistent with a recent resequencing studyshowing modest genome-wide effects of recent selection in alimited but geographically diverse sample of maize accessions(23). Nonetheless, a considerable number of candidate regionsare identified across the genome, containing many genes af-fecting processes of agronomic relevance such as lignin synthesis(24) and response to auxin (25) and stress (1). It must also benoted that we have mapped selection associated with breedingprogress per se, and that further analyses may detect selectivechanges specific to individual heterotic groups.
Fig. 3. Evidence for directional selection (Top), basal ancestry distortion (Middle), and ancestral haplotype diversity (Bottom) across the genome. Colorsindicate the separate chromosomes with red vertical lines marking the centromeres. Green dashed horizontal line marks the 99th percentile of Bayes factors;purple dashed horizontal lines indicate median values of ancestry distortion and effective number of basal ancestors. Black vertical ticks mark selectedfeatures. Gray dots mark candidate SNPs. Black circles mark candidates that coincide with sites of low ancestral diversity.
van Heerwaarden et al. PNAS | July 31, 2012 | vol. 109 | no. 31 | 12423
AGRICU
LTUR
AL
SCIENCE
S
• Time GWA reveals SNPs selected across breeding pools• Frequency, diversity suggest selection on common alleles ofsmall e↵ect at quantitative traits
van Heerwaarden et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Ancestry, not selection, drives diversity
DiscussionThe genomics of breeding history is of great importance to un-derstanding the genetic basis of crop improvement and is in-strumental to the identification of molecular targets of artificialselection. The current state of marker technology has granted usan unprecedented look across eight decades of breeding andselection, providing insight into historical developments in di-versity, ancestry, and the effects of selection across the genome.The transition from open-pollinated varieties to inbred lines
and the emergence of heterotic groups have caused profoundchanges in population structure, linkage disequilibrium, and an-cestry patterns. Differentiation in the first two eras, although sig-nificant, is weak and our results support pedigree analyses (4) thatsuggest current population structure is mainly due to recent di-vergence of breeding pools rather than to different landrace ori-gins. The strong differentiation observed in themodern era 3 linesis likely the result of the use of smaller numbers of more closelyrelated breeding lines and limited genetic exchange among het-erotic groups in the last two eras. Nonetheless, differential land-race ancestry remains detectable in elite material, providing somejustification for the use of the traditional designations Reid(YellowDent) and Lancaster for the SS andNSS heterotic groups.
Compared with the dramatic shifts in ancestry, directionalselection has had limited effect on the genome, with only 5% ofSNPs showing some evidence of consistent selection. Candidatesites, apart from a slight reduction in ancestral diversity, do notdeviate meaningfully from genome-wide patterns of haplotypelength and ancestry. A potential caveat regarding this observa-tion is that our selection scan is most sensitive to cumulativechanges in allele frequency, possibly missing alleles fixed in theearly stages of maize breeding. To account for this potential bias,we measured ancestry distortion and haplotype diversity at the236 SNPs with highest frequency differentiation between eras0 and 3, finding similar results as for our candidate SNPs (i.e., noincrease in distortion and only 12% diversity reduction). Ourresults are also consistent with a recent resequencing studyshowing modest genome-wide effects of recent selection in alimited but geographically diverse sample of maize accessions(23). Nonetheless, a considerable number of candidate regionsare identified across the genome, containing many genes af-fecting processes of agronomic relevance such as lignin synthesis(24) and response to auxin (25) and stress (1). It must also benoted that we have mapped selection associated with breedingprogress per se, and that further analyses may detect selectivechanges specific to individual heterotic groups.
Fig. 3. Evidence for directional selection (Top), basal ancestry distortion (Middle), and ancestral haplotype diversity (Bottom) across the genome. Colorsindicate the separate chromosomes with red vertical lines marking the centromeres. Green dashed horizontal line marks the 99th percentile of Bayes factors;purple dashed horizontal lines indicate median values of ancestry distortion and effective number of basal ancestors. Black vertical ticks mark selectedfeatures. Gray dots mark candidate SNPs. Black circles mark candidates that coincide with sites of low ancestral diversity.
van Heerwaarden et al. PNAS | July 31, 2012 | vol. 109 | no. 31 | 12423
AGRICU
LTUR
AL
SCIENCE
S
• No deviation from genome-wide ancestry at selected sites
• Unusual ancestry instead reflects diversity in ancestral lines
van Heerwaarden et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Popular lines do not show superior genotypes
The genomic signature of selection is informative of the ge-netic architecture of breeding progress. Two issues of obviousinterest are the selective importance of rare alleles of large effectand the contribution of dominant ancestors with superior mul-tilocus genotypes. The infrequent occurrence of rare ancestralcontributors and absence of extended haplotypes at candidateloci favor a model of selection on common variants rather thanone of strong selective sweeps (26, 27), and we find no evidenceof the long-term success of specific lines being determined bytheir multilocus genotype. This being said, the exceptionally fa-vorable genotypes observed for some era 1 inbreds suggests thatselection of outstanding lines may have occurred, albeit withlimited effect on future genomic composition.In all, our results suggest that genetic gain achieved by plant
breeding has been a complex process, involving a steady accu-mulation of changes at multiple loci (28), combined with heter-osis due to differentiation of breeding pools (29). We therebysupport the notion that selected traits of agronomic importanceare predominantly quantitative in nature (30), with relatively fewdominant contributions from individual alleles or lines. It willtherefore be interesting to see whether our candidates proveuseful in defining improved multilocus targets for genomic se-lection. Although challenging, the application of historical geno-mics to crop improvement is a tantalizing prospect that we hopebreeders will soon put to the test.
MethodsSamples and Genotyping. We obtained a total of 400 accessions from USDepartment of Agriculture (USDA)’s National Plant Germplasm System andcollaborators. Lines were chosen by a combination of literature research,consultation with plant breeders, and by querying the stock database hostedat maizegdb.org for accessions with a large number of references. Ap-proximate ages of the selected lines were similarly obtained from the lit-erature and germplasm databases. Accessions were divided into 99 classicNorth American landraces (era 0), 94 early inbreds from before the 1950s(era 1), 70 advanced public lines from the 1960s and 70s (era 2), and 137 elitecommercial lines from the 1980s and 90s (era 3) that are no longer underplant variety protection (ex-PVP).
For each accession, DNA was extracted by a standard cetyltrimethyl am-monium bromide (CTAB) protocol (31) for genotyping on the IlluminaMaizeSNP50 Genotyping BeadChip platform using the clustering algorithmof the GenomeStudio Genotyping Module v1.0 (Illumina). Of the total of56,110 markers contained on the chip, 45,997 polymorphic SNPs were gen-otyped successfully with less than 10% missing data for use in subsequentanalysis. SNPs were of diverse origins and discovery schemes. We evaluatedthe effects of ascertainment by comparing results for 33,575 SNPs derivedfrom more diverse discovery panels to 12,422 SNPs that were discoveredbetween the advanced public lines B73 and Mo17. Effects on differentiationand selection inference were found to be statistically significant but modest(SI Text).
Diversity, Linkage, and Ancestry Analysis. Diversity analyses followed (32, 33).Briefly, PCA was performed on normalized genotype matrices and thenumber of significant eigenvalues determined by comparison with a Tracy–Widom (TW) distribution (18). Genotypes were assigned to k groups by Wardclustering on the Euclidean distance calculated from the k −1 significant PCs.PCA-based clustering into groups was done separately for each era. To im-prove clustering within era 0, Corn Belt Dents were analyzed separately fromNorthern Flints and a divergent group containing a popcorn and a CherokeeFlower Corn (referred to here as popcorn). Genetic differentiation withineach era was measured as the weighted mean of Nicholson’s population-specific differentiation parameter C (19), a measure of allele frequency di-vergence from an estimated base population frequency, calculated for eachgenetic group using the popdiv function of the R (34) package popgen.
For linkage and ancestry analysis, era 0 genotypes were converted to phasedhaplotypes using the program fastPHASE (35). To correct for backgroundlinkage caused by genetic differentiation, linkage disequilibrium (LD) betweenSNPs was calculated as the squared correlation (r2) between inverse logit-transformed residuals of a multiple logistic regression on each SNP, using thefirst six genetic PCs as covariates to correct for population structure. LD decaywas described by nonlinear regression as in ref. 36. Mean haplotype length wascalculated at 1,000 random positions across the genome and compared withthe expected length obtained by randomizing SNPs within each genetic grouparound the same positions. Linkage disequilibrium between closely spaced SNPswas accounted for by randomizing blocks of SNPs separated by more than 4 kb.
We estimated direct genomic ancestry by shared haplotype analysis. Foreach line, the longest shared haplotype with lines from the same era or older
-5 -4 -3 -2 -1 0
05
10
log10(genome-wide ancestry)
enric
hmen
t for
favo
rabl
e al
lele
s
C103
OH43
WF9
W22
B14
B37
I205
MO1W
H49
W182B
CI187-2
-3.5 -3.0 -2.5 -2.0 -1.5 -1.0
-4.0
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
log10(genome-wide ancestry)
log1
0(an
cest
ry a
t fav
orab
le a
llele
s)
ancestral overrepresentation of individual era 1 lines at favorable alleles
relation between enrichment for favorable alleles in individual era 1 lines and genome-wide ancestry
Fig. 4. Analysis of disproportionate ancestral contributions of individual era 1 lines to favorable alleles in era 3. Left: Overrepresentation of individual era 1lines in the ancestry of favorable alleles, estimated by plotting the average ancestry proportion at favorable alleles against the genome-wide proportion.Right: Enrichment (as defined by the log probability ratio (LPR) with respect to noncandidate SNP) of favorable alleles in era 1 lines as a function of theiraverage ancestral contribution to era 3. Black dotted lines represent the 1:1 diagonal and 0 horizontal, respectively. Gray dotted lines are regression lines(slope/r2: 1.15/0.85 and −0.1/0.00). Line names on the Right are shown for lines with LPR values higher than 4 or ancestry proportion above 0.03. Labels inboldface mark breeding lines of known historic popularity.
12424 | www.pnas.org/cgi/doi/10.1073/pnas.1209275109 van Heerwaarden et al.
• No over-representation of early inbreds at selected sites
• Early inbreds contributing most to ancestry are notenriched for beneficial alleles
van Heerwaarden et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Popular lines do not show superior genotypes
The genomic signature of selection is informative of the ge-netic architecture of breeding progress. Two issues of obviousinterest are the selective importance of rare alleles of large effectand the contribution of dominant ancestors with superior mul-tilocus genotypes. The infrequent occurrence of rare ancestralcontributors and absence of extended haplotypes at candidateloci favor a model of selection on common variants rather thanone of strong selective sweeps (26, 27), and we find no evidenceof the long-term success of specific lines being determined bytheir multilocus genotype. This being said, the exceptionally fa-vorable genotypes observed for some era 1 inbreds suggests thatselection of outstanding lines may have occurred, albeit withlimited effect on future genomic composition.In all, our results suggest that genetic gain achieved by plant
breeding has been a complex process, involving a steady accu-mulation of changes at multiple loci (28), combined with heter-osis due to differentiation of breeding pools (29). We therebysupport the notion that selected traits of agronomic importanceare predominantly quantitative in nature (30), with relatively fewdominant contributions from individual alleles or lines. It willtherefore be interesting to see whether our candidates proveuseful in defining improved multilocus targets for genomic se-lection. Although challenging, the application of historical geno-mics to crop improvement is a tantalizing prospect that we hopebreeders will soon put to the test.
MethodsSamples and Genotyping. We obtained a total of 400 accessions from USDepartment of Agriculture (USDA)’s National Plant Germplasm System andcollaborators. Lines were chosen by a combination of literature research,consultation with plant breeders, and by querying the stock database hostedat maizegdb.org for accessions with a large number of references. Ap-proximate ages of the selected lines were similarly obtained from the lit-erature and germplasm databases. Accessions were divided into 99 classicNorth American landraces (era 0), 94 early inbreds from before the 1950s(era 1), 70 advanced public lines from the 1960s and 70s (era 2), and 137 elitecommercial lines from the 1980s and 90s (era 3) that are no longer underplant variety protection (ex-PVP).
For each accession, DNA was extracted by a standard cetyltrimethyl am-monium bromide (CTAB) protocol (31) for genotyping on the IlluminaMaizeSNP50 Genotyping BeadChip platform using the clustering algorithmof the GenomeStudio Genotyping Module v1.0 (Illumina). Of the total of56,110 markers contained on the chip, 45,997 polymorphic SNPs were gen-otyped successfully with less than 10% missing data for use in subsequentanalysis. SNPs were of diverse origins and discovery schemes. We evaluatedthe effects of ascertainment by comparing results for 33,575 SNPs derivedfrom more diverse discovery panels to 12,422 SNPs that were discoveredbetween the advanced public lines B73 and Mo17. Effects on differentiationand selection inference were found to be statistically significant but modest(SI Text).
Diversity, Linkage, and Ancestry Analysis. Diversity analyses followed (32, 33).Briefly, PCA was performed on normalized genotype matrices and thenumber of significant eigenvalues determined by comparison with a Tracy–Widom (TW) distribution (18). Genotypes were assigned to k groups by Wardclustering on the Euclidean distance calculated from the k −1 significant PCs.PCA-based clustering into groups was done separately for each era. To im-prove clustering within era 0, Corn Belt Dents were analyzed separately fromNorthern Flints and a divergent group containing a popcorn and a CherokeeFlower Corn (referred to here as popcorn). Genetic differentiation withineach era was measured as the weighted mean of Nicholson’s population-specific differentiation parameter C (19), a measure of allele frequency di-vergence from an estimated base population frequency, calculated for eachgenetic group using the popdiv function of the R (34) package popgen.
For linkage and ancestry analysis, era 0 genotypes were converted to phasedhaplotypes using the program fastPHASE (35). To correct for backgroundlinkage caused by genetic differentiation, linkage disequilibrium (LD) betweenSNPs was calculated as the squared correlation (r2) between inverse logit-transformed residuals of a multiple logistic regression on each SNP, using thefirst six genetic PCs as covariates to correct for population structure. LD decaywas described by nonlinear regression as in ref. 36. Mean haplotype length wascalculated at 1,000 random positions across the genome and compared withthe expected length obtained by randomizing SNPs within each genetic grouparound the same positions. Linkage disequilibrium between closely spaced SNPswas accounted for by randomizing blocks of SNPs separated by more than 4 kb.
We estimated direct genomic ancestry by shared haplotype analysis. Foreach line, the longest shared haplotype with lines from the same era or older
-5 -4 -3 -2 -1 0
05
10
log10(genome-wide ancestry)
enric
hmen
t for
favo
rabl
e al
lele
s
C103
OH43
WF9
W22
B14
B37
I205
MO1W
H49
W182B
CI187-2
-3.5 -3.0 -2.5 -2.0 -1.5 -1.0
-4.0
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
log10(genome-wide ancestry)
log1
0(an
cest
ry a
t fav
orab
le a
llele
s)
ancestral overrepresentation of individual era 1 lines at favorable alleles
relation between enrichment for favorable alleles in individual era 1 lines and genome-wide ancestry
Fig. 4. Analysis of disproportionate ancestral contributions of individual era 1 lines to favorable alleles in era 3. Left: Overrepresentation of individual era 1lines in the ancestry of favorable alleles, estimated by plotting the average ancestry proportion at favorable alleles against the genome-wide proportion.Right: Enrichment (as defined by the log probability ratio (LPR) with respect to noncandidate SNP) of favorable alleles in era 1 lines as a function of theiraverage ancestral contribution to era 3. Black dotted lines represent the 1:1 diagonal and 0 horizontal, respectively. Gray dotted lines are regression lines(slope/r2: 1.15/0.85 and −0.1/0.00). Line names on the Right are shown for lines with LPR values higher than 4 or ancestry proportion above 0.03. Labels inboldface mark breeding lines of known historic popularity.
12424 | www.pnas.org/cgi/doi/10.1073/pnas.1209275109 van Heerwaarden et al.
The genomic signature of selection is informative of the ge-netic architecture of breeding progress. Two issues of obviousinterest are the selective importance of rare alleles of large effectand the contribution of dominant ancestors with superior mul-tilocus genotypes. The infrequent occurrence of rare ancestralcontributors and absence of extended haplotypes at candidateloci favor a model of selection on common variants rather thanone of strong selective sweeps (26, 27), and we find no evidenceof the long-term success of specific lines being determined bytheir multilocus genotype. This being said, the exceptionally fa-vorable genotypes observed for some era 1 inbreds suggests thatselection of outstanding lines may have occurred, albeit withlimited effect on future genomic composition.In all, our results suggest that genetic gain achieved by plant
breeding has been a complex process, involving a steady accu-mulation of changes at multiple loci (28), combined with heter-osis due to differentiation of breeding pools (29). We therebysupport the notion that selected traits of agronomic importanceare predominantly quantitative in nature (30), with relatively fewdominant contributions from individual alleles or lines. It willtherefore be interesting to see whether our candidates proveuseful in defining improved multilocus targets for genomic se-lection. Although challenging, the application of historical geno-mics to crop improvement is a tantalizing prospect that we hopebreeders will soon put to the test.
MethodsSamples and Genotyping. We obtained a total of 400 accessions from USDepartment of Agriculture (USDA)’s National Plant Germplasm System andcollaborators. Lines were chosen by a combination of literature research,consultation with plant breeders, and by querying the stock database hostedat maizegdb.org for accessions with a large number of references. Ap-proximate ages of the selected lines were similarly obtained from the lit-erature and germplasm databases. Accessions were divided into 99 classicNorth American landraces (era 0), 94 early inbreds from before the 1950s(era 1), 70 advanced public lines from the 1960s and 70s (era 2), and 137 elitecommercial lines from the 1980s and 90s (era 3) that are no longer underplant variety protection (ex-PVP).
For each accession, DNA was extracted by a standard cetyltrimethyl am-monium bromide (CTAB) protocol (31) for genotyping on the IlluminaMaizeSNP50 Genotyping BeadChip platform using the clustering algorithmof the GenomeStudio Genotyping Module v1.0 (Illumina). Of the total of56,110 markers contained on the chip, 45,997 polymorphic SNPs were gen-otyped successfully with less than 10% missing data for use in subsequentanalysis. SNPs were of diverse origins and discovery schemes. We evaluatedthe effects of ascertainment by comparing results for 33,575 SNPs derivedfrom more diverse discovery panels to 12,422 SNPs that were discoveredbetween the advanced public lines B73 and Mo17. Effects on differentiationand selection inference were found to be statistically significant but modest(SI Text).
Diversity, Linkage, and Ancestry Analysis. Diversity analyses followed (32, 33).Briefly, PCA was performed on normalized genotype matrices and thenumber of significant eigenvalues determined by comparison with a Tracy–Widom (TW) distribution (18). Genotypes were assigned to k groups by Wardclustering on the Euclidean distance calculated from the k −1 significant PCs.PCA-based clustering into groups was done separately for each era. To im-prove clustering within era 0, Corn Belt Dents were analyzed separately fromNorthern Flints and a divergent group containing a popcorn and a CherokeeFlower Corn (referred to here as popcorn). Genetic differentiation withineach era was measured as the weighted mean of Nicholson’s population-specific differentiation parameter C (19), a measure of allele frequency di-vergence from an estimated base population frequency, calculated for eachgenetic group using the popdiv function of the R (34) package popgen.
For linkage and ancestry analysis, era 0 genotypes were converted to phasedhaplotypes using the program fastPHASE (35). To correct for backgroundlinkage caused by genetic differentiation, linkage disequilibrium (LD) betweenSNPs was calculated as the squared correlation (r2) between inverse logit-transformed residuals of a multiple logistic regression on each SNP, using thefirst six genetic PCs as covariates to correct for population structure. LD decaywas described by nonlinear regression as in ref. 36. Mean haplotype length wascalculated at 1,000 random positions across the genome and compared withthe expected length obtained by randomizing SNPs within each genetic grouparound the same positions. Linkage disequilibrium between closely spaced SNPswas accounted for by randomizing blocks of SNPs separated by more than 4 kb.
We estimated direct genomic ancestry by shared haplotype analysis. Foreach line, the longest shared haplotype with lines from the same era or older
-5 -4 -3 -2 -1 0
05
10
log10(genome-wide ancestry)
enric
hmen
t for
favo
rabl
e al
lele
s
C103
OH43
WF9
W22
B14
B37
I205
MO1W
H49
W182B
CI187-2
-3.5 -3.0 -2.5 -2.0 -1.5 -1.0
-4.0
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
log10(genome-wide ancestry)
log1
0(an
cest
ry a
t fav
orab
le a
llele
s)
ancestral overrepresentation of individual era 1 lines at favorable alleles
relation between enrichment for favorable alleles in individual era 1 lines and genome-wide ancestry
Fig. 4. Analysis of disproportionate ancestral contributions of individual era 1 lines to favorable alleles in era 3. Left: Overrepresentation of individual era 1lines in the ancestry of favorable alleles, estimated by plotting the average ancestry proportion at favorable alleles against the genome-wide proportion.Right: Enrichment (as defined by the log probability ratio (LPR) with respect to noncandidate SNP) of favorable alleles in era 1 lines as a function of theiraverage ancestral contribution to era 3. Black dotted lines represent the 1:1 diagonal and 0 horizontal, respectively. Gray dotted lines are regression lines(slope/r2: 1.15/0.85 and −0.1/0.00). Line names on the Right are shown for lines with LPR values higher than 4 or ancestry proportion above 0.03. Labels inboldface mark breeding lines of known historic popularity.
12424 | www.pnas.org/cgi/doi/10.1073/pnas.1209275109 van Heerwaarden et al.
• No over-representation of early inbreds at selected sites
• Early inbreds contributing most to ancestry are notenriched for beneficial alleles
van Heerwaarden et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Selection in the Iowa RRS
• BSSS, BSCB1 selected for hybrid yield and agronomics
• SNP genotyping of founders and plants from 5 cycles
• Allele frequency divergence mostly due to genetic drift
Gerke et al. In Review
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Selection in the Iowa RRS
• BSSS, BSCB1 selected for hybrid yield and agronomics
• SNP genotyping of founders and plants from 5 cycles
• Allele frequency divergence mostly due to genetic drift
Gerke et al. In Review
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
No overlap in selection suggests complementation
Gerke et al. In Review
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
No overlap in selection suggests complementation
Gerke et al. In Review
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Many new mutations, most deleterious
Lohmueller 2013 arXivJones 1924 Genetics
• ⇡90 mutations per meiosis, > 80% deleterious
• Population growth increases rare deleterious variants andthese explain a larger proportion of V
A
• GWAS has low power to detect rare deleterious variants
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Computational prediction of deleterious alleles
• Published GBS, heterosis data of maize 282 population
•A priori identify putatively deleterious alleles fromconservation and physicochemical properties
• Deleterious nonsynonymous at lower frequencies thannondeleterious
Mezmouk & Ross-Ibarra 2014 G3
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Constraint; no evidence of positive selection
0.0
0.1
0.2
0.3
0.4
0.5
C NotC
K NK S
del
Deleterious
None
• Few of high-frequency deleterious alleles show significantsignals of selection
• Genes with del. SNPs show lower constraint (higher K
N
K
S
)
Mezmouk & Ross-Ibarra 2014 G3; Hu↵ord et al. 2012 Nature Genetics
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Deleterious allele frequencies consistent with BPH
• BPH increases with distance from B73 tester
• Significant BPH even among sti↵-stalk lines
Mezmouk & Ross-Ibarra 2014 G3
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Deleterious allele frequencies consistent with BPH
• BPH increases with distance from B73 tester
• Significant BPH even among sti↵-stalk lines
Mezmouk & Ross-Ibarra 2014 G3
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Heterosis GWA genes enriched for deleterious alleles
• No enrichment for individual deleterious SNPs (low power)
• Genes associated with heterosis (for all traits) are enrichedin deleterious alleles
Mezmouk & Ross-Ibarra 2014 G3
MaizeEvolution
J. Ross-Ibarra
Introduction
DomesticationOriginsDiversity
AdaptationParallelIntrogression
ImprovementUS germplasmIowa RRSDeleterious
Conclusions
Conclusions
• Population genetic analyses are allowing clarification ofmaize origins and e↵ects of selection on maize genome.
• Maize adaptation to new environments has taken multipledistinct routes, including utilizing genes from wild relatives.
• Genetic drift and selection on common variants appears tohave dominated US maize germplasm.
• Patterns of complementation and frequencies ofdeleterious alleles support a simple dominance model ofheterosis.
MaizeEvolution
J. Ross-Ibarra
Improvement candidate genes
• 695 selected regions identified
Hu↵ord et al. 2012 Nature Genetics
MaizeEvolution
J. Ross-Ibarra
Selection on gene expression
Expression changes
Domestication Improvement
Directional change yes noTissue Specificity no yesDominance in crosses no yes
• Expression at > 18, 000 genes in both maize and teosinte
• Domestication directly acted on candidate gene expression
• Improvement worked with highly expressed genes
• Modern breeding selected for dominance in expression
Hu↵ord et al. 2012 Nature Genetics; Swanson-Wagner et al. 2012 PNAS
MaizeEvolution
J. Ross-Ibarra
Repeated evolution at grassy tillers
• Cloned gt1 as gene underlying QTL for prolificacy
• Selection on di↵erent parts of gene:
A) Temperate zones: selection on 5’ enhancer regionB) Tropical zones: selection on 3’ UTR
Wills et al. 2013 PLoS Genetics
MaizeEvolution
J. Ross-Ibarra
Consistent with QTL for heterosis
• QTL for heterosis enriched in centromeric regions1
1Lariepe et al. 2012 Genetics
MaizeEvolution
J. Ross-Ibarra
Consistent with change in inbred and hybrid yield?
continuing volatility of genotypes over the decades(“genetic diversity in time”).
Analysis by multidimensional scaling of allelepolymorphisms among the parental inbred lines ofthe hybrids separated the older inbred lines (usuallyparents of double cross hybrids) from the newer in-bred lines. The newer lines sorted into two heterot-ic groups, called Stiff Stalk and Non Stiff Stalk. Thisseparation of the newer lines agrees with the obser-vation that breeders, using pedigree informationand empiricism (practical experience), have estab-lished two breeding pools to balance importanttraits in the final hybrids, as well as to improve effi-ciency of seed production (Stiff Stalk for seed par-ents and Non Stiff Stalk for pollinator parents).
Heterosis: hybrid and inbred performanceDUVICK (2005) reviewed several studies (DUVICK,
1984b, 1999; MEGHJI et al., 1984; DUVICK et al., 2004b)that examined the contribution of heterosis to im-provements in yield of U.S. maize hybrids. He con-cluded that(1) Absolute heterosis for grain yield has increased
over the years to a small extent (more so underabiotic stress) but its annual gain is less (some-times much less) than total genetic gain in hy-
brid yield. Inbred and single cross yields haveeach increased over the decades, but singlecross yield has advanced to greater degree (Fig.7). Absolute heterosis is defined as “yield of asingle cross minus the mean yield of its inbredparents” (SCHNELL, 1974).
(2) Relative heterosis for grain yield has decreasedover the years according to two reports, but in-creased slightly in a third study. Relative hetero-sis is defined as “absolute heterosis as percent-age of single cross yield” (SCHNELL, 1974).
(3) Absolute heterosis for plant size and maturityhas decreased to a small degree, in contrast withheterosis for grain yield.
SUMMARY AND CONCLUSIONS
Maize grain yields have risen continually in theU.S. since the 1930s, concomitant with changes incrop management and with the utilization and im-provement of hybrid maize. Approximately 40 to50% of the yield gains are owed to changes in man-agement (e.g., use of herbicides, increased amountsof nitrogen fertilizer) and 50 to -60% to continuinggenetic improvements in maize hybrids releasedduring the past seven decades.
GENETIC PROGRESS IN YIELD OF U.S. MAIZE 199
FIGURE 6 - Groups of 968 alleles from 98 SSR loci distributedacross 10 chromosomes for the historical series of widely grownhybrids. Six groups based on patterns of change in allele fre-quency across nine decades. Number of alleles per group isshown in parentheses. From DUVICK et al. (2004a). Copyright ©2004 by John Wiley & Sons, Inc. This material is used by permis-sion of John Wiley & Sons, Inc.
FIGURE 7 - Yields of single crosses (SX) and their inbred parentmeans (MP), and heterosis as SX – MP. Single-cross pedigrees arebased on heterotic inbred combinations in the Era hybrids duringthe six decades, 1930s through 1980s, 12 inbreds and six singlecrosses per decade. Means of trials grown in three locations in1992 and two locations in 1993 at three densities (30, 54, and 79thousand plants/ha) with one replication per density. From DU-VICK et al. (2004b). Copyright © 2004 by John Wiley & Sons, Inc.This material is used by permission of John Wiley & Sons, Inc.
• Selection in high recombination regions improve inbreds?
• Haplotype blocks in low recombination maintainheterosis?2
2Duvick 205 Advances in Agronomy