Doctoral seminar Doctoral seminar onon
Presented ByZuge Sopan ShivajiPh.D. ScholarDept. of Genetics and Plant Breeding, CoA, Raipur.
Introduction
• Polygenic inheritance of agronomic traits- controlled by multiple genes whose
expression is affected by many factors. Hence phenotypic selection becomes
tedious job.
• Family mapping (Limitations-Biparental population, Low resolution, Analysis
of only 2 alleles, time consuming).
• Population or Association mapping (I) increased mapping resolution, (ii)
reduced research time, and (iii) greater allele number (Yu and Buckler, 2006).
• Association mapping identifies quantitative trait loci (QTLs) by examining the
marker-trait associations that can be attributed to the strength of linkage
disequilibrium between markers and phenotype across a set of diverse
germplasm.
• Association mapping, also known as "linkage disequilibrium
mapping", is a method of mapping quantitative trait loci (QTLs)
that takes advantage of linkage disequilibrium to link phenotypes
to genotypes.
Offers greater precision in QTL location than family-based
linkage analysis.
Does not require family or pedigree information , can be applied to
a range of experimental and non-experimental populations.
Association mapping (AM)
How it works?
• Association studies are based on the assumption that a marker locus
is ‘sufficiently close’ to a trait locus so that some marker allele
would be ‘travelling’ along with the trait allele through many
generations during recombination.
Direct and Indirect Allelic Association
D
*
Measure disease relevance (*) directly, ignoring correlated markers nearby
Direct Association
M1 M2 Mn
Assess trait effects on D via correlated markers (Mi) rather than susceptibility/etiologic variants.
D
Indirect Association & LD
•Allele of interest is itself involved in phenotype
• Allele itself is not involved, but a nearby correlated marker changes phenotype
Linkage mapping
In 1913, the first individual to construct a (very small) genetic map was Alfred Sturtevant.
Genes/ markers in order, indicating the relative genetic distances between them, and assigning them to their chromosome.
Distance = Recombination frequency=No. of recombinants /Total progeny X 100
Suppose the recombination between loci A and B is 6%, that between loci B and C is 20%, and that between A and C 24%, then we can order the loci along the chromosome as…
(Hartal et al., 2010)
Mapping resolution
(Braulio et al., 2012)
Marker-trait associations in experimental and natural populations
Experimental populations (e.g. F2, RIL) 2-parental alleles; small genetic variation; few meiotic cycles; low resolution
Natural populations many alleles; large genetic variation; many meiotic cycles; high resolution
Approaches to mapping genes
(Yu and Buckler, 2006)
Advantages of AM over linkage mapping
Linkage Mapping Association Mapping
Structured Population
(e.g. Biparental population)
Un-structured population
(e.g. Germplasm lines)
Low resolution (few to several
centimorgans away from gene/QTL)
High resolution (Much closer than
those by linkage mapping)
Only few alleles can be detected Many alleles can be detected
Moderate marker density High/moderate marker density
Feasible in annual and biennial
species, not feasible in perennial
species
Feasible in annual, biennial and
perennial species
Narrow range Wide range
Time consuming Less time required(Yu et al., 2006)
Types of association mapping
1. Genome wide association mapping: Search whole genome for causal
genetic variation. A large number of markers are tested for association
with various complex traits and it doesn’t require any prior information
on the candidate genes.
2. Candidate gene association mapping: Dissect out the genetic control
of complex traits, based on the available results from genetic,
biochemical, or physiology studies in model and non-model plant
species (Mackay, 2001). Requires identification of SNPs between lines
within specific genes.
(Zu et al., 2009)
Steps in association mapping
Mapping population and Population structure
• Randomly or non-randomly mated germplasm
• Randomly mated populations represent a rather narrow group of
germplasm, likely to lower resolution and harbor only a narrow range of
alleles
• Non randomly mated germplasm is used, population structure needs to
be controlled in the statistical analysis.
• Cluster analysis is done to know the variation in population and most
diverse individuals are selected from each cluster to represent the
individuals of that cluster.
(Yu et al., 2006)
Phenotyping
Phenotyping
• Success of AM depends on accuracy and throughput of genotyping
• Replications across multiple years in randomized plots and multiple
locations and environments.
• Field Design:- incomplete block design (Lattice), RBD (Eskridge,
2003).
Should be done on the basis of
• Diversity:- on the basis of phenotype and genotype
• Population structure:- Systematic difference in allele frequencies
btw. sub-populations…
Genotyping
• Mostly multiallelic, reproducible, PCR-based markers are used.
• Microsatellites or simple sequence repeats (SSRs), and SNPs are
more revealing than their dominant counterparts and, therefore, are
more powerful.
• Due to higher genome density, lower mutation rate and wide
distribution throughout the genome SNPs are rapidly becoming the
marker of choice for complex trait
Linkage Disequilibrium
Linkage disequilibrium means that we don’t need to genotype the
exact causal variant, but only a variant that is correlated with it.
Linkage Disequilibrium Map & Allelic Association
Primary Aim of LD maps: To identify the relationship between marker and QTL or trait of interest.
Marker 1 2 3 n
LD
D
Linkage disequilibrium (LD)
• LD refers to non random association of allels at different loci.
• LD follows the fact that closely located genes are transmitted as a block, which only
rarely breaks up in meiosis.
• Closely located genes often express linkage disequilibrium to each other: An example:
Consider two independently segregating genes A and B with two alleles (A, a and B, b
respectively)
• At equilibrium, the frequency of the AB should equal to the product of the allele
frequencies of A and B,
• PAB Pab =PAbPaB (1:1 ratio = no LD)
• Any deviation from these values implies LD.
A a Total
B AB aB Bb Ab ab bTotal A a
Cont.…..
(Zhu et al.,2009)
LD Decay with time for four different recombination fractions (ϴ)
(Powell et al., 2006)
Factors affecting LD
LD increases due to population structure, relatedness (kinship), small
founder population size or genetic drift, selection (natural, artificial).
While factors like outcrossing, high recombination rate, high
mutation rate, gene conversion, etc., lead to a decrease/disruption in
LD. Thus, LD declines with 1) increase in genetic distance and
2) increase in number of generations.
(Huttley et al., 2005)
Evaluation of linkage disequilibrium and associating
genotype- phenotype
• TASSEL (http://www.maizegenetics.net) is used to measure the
extent of LD as squared allele frequency correlation estimates (R2,
Weir, 1996) and measure the significance of R2.
• Besides TASSEL there are many other softwares like DnaSP,
Arlequin etc. used to calculate D‘ and R2.
Softwares used in AMSr. Software Focus Description
1. TASSEL Association analysis Free, LD statistics, sequence analysis, association mapping
2. Haploview 4.2
Haplotype analysis and LD LD and haplotype block analysis, haplotype population frequency estimation, single SNP and haplotype association tests.
3. SVS 7 Stratification,LD and AM
Estimate stratification, LD, haplotypes blocks and multiple AM approaches for up to 1.8 million SNPs and 10,000 sample
4. GenStat Stratification, LD and AM SSR markers, GLM and MLM-PCA methods
5. JMP genomics
Stratification, LD and structured AM
SNPs, CG and GWAS, analysis of common and rare Variants
6. GenAMap Stratification, LD and structured AM
SNPs, tree of functional branches, multiple visualization tools
7 PLINK Stratification, LD and structured AM
SNPs, multiple AM approaches, IBD and IBS Analyses
8. STRUCTURE
Populationstructure
Compute a MCMC Bayesian analysis to estimate the proportion of the genome of an individual originating from the different inferred Populations
9. SPAGeDi Relative kinship genetic relationship analysis(Braulio et al., 2012)
Advantages of AM
1. Saves time, effort, and cost needed for the development of specific
mapping populations.
2. The QTL-linked markers identified by AM can be directly used for MAS
3. AM has high resolution
4. AM would assess the entire range of diversity in the trait of interest
5. Associated markers identified during AM can be used for either selection
of parents for hybridization or for selection of desirable segregants
Disadvantages of AM
1. The results from AM are affected by several factors like selection
history, population structure, kinship, etc., may lead to false positive
association
2. Large number (hundreds of thousands or even millions) of markers
would be required to adequately cover the entire genome.
3. High quality phenotypic data required (Multiple environment with multi
location)
4. The rate of recombination is not uniform throughout the genome.
Need of Association mapping in Rice
• Rice (Oryza sativa) is a staple food that feeds 3 billion people.
• Largest variability among germplasm and genomic database as
compared to any other species.
• All the agronomic traits in rice (Grain yield, Days to maturity,
Height, etc.) have quantitative inheritance.
• Challenge and opportunity is to utilize this information to
understand and predict how genotypic variation gives rise to the
abundance of phenotypic variation and its utilization in MAS.
Association mapping studies in RiceAssociation mapping studies in Rice
Population Sample Size
Markers used
Trait Reference
Germplasm 523 5291 SNPs 12 agronomic traits (Qing et al., 2015)
Diverse accessions 203 154 SSRs Trait of Harvest Index (Li et al., 2012)
Diverse rice accessions
383 44,000 SNPs Aluminum Tolerance (Famoso et al., 2011)
Diverse accessions 413 44K SNPs Agronomic traits (Zhao et al., 2011)
Diverse accessions 210 86 SSRs yield and grain quality
(Borba et al., 2010)
Landraces 517 3,625,200 SNPs
14 agronomic traits (Xuehui et al., 2010)
Mini core collection
90 108 SSR stigma and spikelet characteristics
(Yan et al., 2009)
Diverse accessions 103 123 SSRs Yield and its components
(Agrama et al., 2007)
517 landraces were phenotyped and genotyped by sequencing using
Illumina Genome Analyzer II
Aligned sequence reads to the rice reference genome for SNP
identification
Discrepancies with rice reference genome were called as candidate
SNPs.
Case Study
A total of 3,625,200 SNPs were identified, resulting in an average of 9.32
SNPs per kb, with 87.9% of the SNPs located within 0.2 kb of the nearest SNP
A total of 167,514 SNPs were found in the coding regions of 25,409 genes.
3,625 large-effect SNPs (representing mutations predicted to cause large
effects) were identified.
Principal-component analysis
seperated rice germaplasm in two
groups i.e. indica and japonica.
Further both indica and japonica had three subgroups.
Because of strong population differentiation between the two
subspecies of cultivated rice GWAS was conducted only for 373
indica lines using mixed linear model (MLM)
80 associations for the 14 agronomic traits were identified.
Heading date strongly correlated with both population structure and
geographic distribution.
• Ultimate aim of plant breeding is prediction of phenotype from
genotype
• Major agricultural economic traits are of complex nature
• It is desperate to dissect these complex traits and assign them function
• Advanced genomic tools like association mapping will be a valuable
option can be effectively and efficiently utilized to accelerate crop
improvement
• Association mapping is long term commitment, so have all the things
and then go for it
Conclusion