gbs usage cases: examples from maize - cornell...
Post on 02-Jan-2021
11 Views
Preview:
TRANSCRIPT
GBS Usage Cases:Examples from Maize
Jeff Glaubitz (jcg233@cornell.edu)Senior Research Associate, Buckler Lab, Cornell University
Panzea Project Manager
Cornell IGD/CBSU WorkshopJanuary 28‐29, 2014
• Marker discovery• Phylogeny/Kinship• Linkage mapping of QTL in a biparental cross• Fine‐mapping QTL• Genomic selection • Genome Wide Association Studies (GWAS)• NAM‐GWAS• Improving reference genome assembly
Some potential applications of GBS Data
< 450 bp
ApeKI site (GCWGC)( ) 64‐base sequence tag
Loss of cut site
Sample1
Sample2
Marker Discovery
• GBS markers can be converted to SNPs or PCR assays of indels• Develop SNP assays from polymorphic tags at same location• Develop PCR primers from adjacent tags & hope for large indels
Phylogeny/Kinship• Missing data not an issue for estimating pairwise genetic distance or kinship Each pair of individuals has large, “random” sample of markers in common
• Works really well even in non‐model organisms Fei Lu’s previous talk on switchgrass
• Principle Coordinates Analysis better than Principle Components Analysis Uses distance matrix rather than every genotype Missing data not an issue for Prin. Coord. Analysis
• SNPs can be strongly affected by ascertainment bias Panel used to discover the SNPs can severely distort estimates of population genetic parameters (e.g., kinship, diversity) Industry SNPs on the Maize 55K SNP chip an extreme example
Less ascertainment bias than array‐based SNPs?
Ram Sharma
B73
Mo17
(ss)
(nss)
Less ascertainment bias than array‐based SNPs?
Ram Sharma
B73
Mo17
(ss)
(nss)
Less ascertainment bias than array‐based SNPs?
Ram Sharma
Mo17
B73
(ss)
(nss)
Less ascertainment bias than array‐based SNPs?
Ram Sharma
B73
Mo17
(ss)
(nss)
Linkage mapping of QTL in a biparental cross• In maize, we use the reference genome to order markers
• With ApeKI, too many markers for traditional software (MapMaker, JoinMap, R‐QTL etc.)
• Filter for a smaller set of markers with high coverage• Use 6 base cutter (or combination of enzymes) for fewer markers with higher coverage
• JoinMap can handle at least 3,000 markers• Newer software? MSTMap claims 10,000 – 100,000 markers Guided Evolutionary Strategy (Mester et al. 2004, Comp Biol
Chem 28: 281) – currently being implemented in TASSEL Others?
HMM for calling hets/correcting errors in biparental RIL populations
Locus 1 Locus 2 Locus 3 Locus 4
heterozygous
homozygous parent 2
→→→
homozygous parent 1
distance between nodes = log[ P(obsi|gi)P(gi|gi‐1)]
)()|()(
)()|()|( GPGSPSP
GPGSPSGP
Peter Bradbury
HMM for calling hets/correcting errors in biparental RIL populations
Low % het RILactual calls
after HMM
B73het
non-B73
B73het
non-B73
Med % het RIL
B73het
non-B73
B73het
non-B73
High % het RIL
B73het
non-B73
B73het
non-B73
Physical Position on Chr1 (bp)
Peter Bradbury
GBS analysis of Teosinte/W22 BC2S3
• 868 RILs in total• Framework map with 492 SNPs• 51,238 bi‐allelic GBS markers (ApeKI)
TeoW22 BC2S3 – Chr9 – 4122 bi‐allelic GBS markers
W22 Teo Het
Reproducibility – 60 RILs – Chr9
W22 Teo Het
GBS analysis of Teosinte/W22 BC2S3
• 868 RILs in total• Framework map with 492 SNPs• 51,238 bi‐allelic GBS markers (ApeKI)
(96-plex)
Maize Sh1 orthologs are located at seed shattering QTLs
201244:720‐725
Fine mapping QTL• Need to saturate interval containing QTL with markers
• GBS a good source of markers
• Also need to collect recombinants in the interval
• Near‐isogenic lines (NILs) helpful (Mendelize)
• Good reference genome
Marker1 Marker2
Map position (cM)
Trait Y
Trait Z
Fine Mapping of Domestication QTL in Maize
W22 Teo Het
7,756 GBS SNPs along Chr?
136 RC-NILS
Genomic Selection & GWAS• Genomic Selection: very basic imputation approaches adequate? e.g., code non‐missing genotypes as 0, 0.5, 1 & missing data as the mean genotypic value
Accurate genomic prediction of height based on GBS data
R2= 0.82
USDA Ames InbredsRidge‐Regression BLUP
2800 diverse maize breeding lines
Jason Peiffer
• Missing data are more problematic for GWAS erroneous imputation might cause spurious results avoid false imputation of biologically missing regions area of active research
Genomic Selection & GWAS• Genomic Selection: very basic imputation approaches adequate? e.g., code non‐missing genotypes as 0, 0.5, 1 & missing data as the mean genotypic value
GWAS directly hits known Mendelian traits
The best hit for kernel color lies within Y1
GWAS of white vs. yellow kernels in 1,595 USDA Ames inbreds
Y1
Cinta Romay
Romay et al. (2013) Genome Biology 14: R55
GWAS of a more complex trait directly hits known flowering time genes
Even with ~660k SNPs we almost missed ZmCCT (only 1 significant SNP)
Cinta Romay
Romay et al. (2013) Genome Biology 14: R55
GWAS of growing degree days to silking in 2,279 inbreds
GWAS of a more complex trait directly hits known flowering time genes
Zhiwu Zhang
Alex Lipka
GWAS of growing degree days to silking in 2,279 inbreds
Lipka et al. (2013) Bioinformatics 28: 2397‐2399
• Missing data are more problematic for GWAS erroneous imputation might cause spurious results avoid false imputation of biologically missing regions area of active research
• In NAM‐GWAS, imputation is much less of an issue NAM = “Nested Association Mapping” population
Genomic Selection & GWAS• Genomic Selection: very basic imputation approaches adequate? e.g., code non‐missing genotypes as 0, 0.5, 1 & missing data as the mean genotypic value
NAM
1
2
200
B73×
F1s
SSD
25 DL
B97
CM
L103
CM
L228
CM
L247
CM
L277
CM
L322
CM
L333
CM
L52
CM
L69
Hp3
01
Il14H Ki11 Ki3
Ky2
1
M16
2W
M37
W
Mo1
8W
MS
71
NC
350
NC
358
Oh4
3
Oh7
B
P39
Tx30
3
Tzi8
The maize NAM population was built for NAM-GWAS
liguleless1 and liguleless2 explain the two “biggest” leaf angle QTL
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
BP
P
0
10
20
30
40
50
60
−log
(p)
Associations with positive effectAssociations with negative effectLinkage QTL peak
lg1 lg3 lg2 lg4
Upper leaf angle
1020
1 2 3 4 5 6 7 8 9 10
cM/M
b
Chromosomes
1 2 3 4 5 6 7 8 9 10Chromosomes
Tian, Bradbury, et al 2011 Nature Genetics
ligule
Photo: Jason Wallace
We are using GBS to pinpoint the location of cross overs in the NAM RILs
• B73 is the reference genome: complete knowledge• Remaining NAM parents whole genome sequenced via Illumina at 4x coverage (paired end random sheared) 26 million high quality SNPs
• Precise knowledge of crossover locations in NAM RILs allows us to more accurately project sequences of parents onto RILs:
B73MS71
Z019E001
Z019E001
1100 SNPs:
GBS SNPs:
GBS data improves the resolution of joint linkage analysis in the NAM population
Preliminary NAM‐GWAS results also show improved resolution
Peter Bradbury
Markers Median QTL support interval
1,106 array SNPs 6.2 cM
171,479 GBS SNPs 2.6 cM
Trait: Days to silk (flowering time)
Recombination Rates for NAM from GBS Data
Peter Bradbury – USDA Scientist, Buckler lab, Cornell (unpublished)
The maize B73 reference genome: room for improvement?
1) The B73 reference genome accurate for B73 but less so for other maize lines (e.g., Mo17)
2) Even for B73, some regions of the genome are in the wrong place
3) Some large (multiple BAC) contigs could not be anchored assigned to “chromosome 0” 30 chr0 contigs in B73 RefGenV1 17 chr0 contigs in B73 RefGenV2
4) Some regions of the genome are missing ≈5% of B73 sequence is not in the B73 reference genome
The maize B73 reference genome: room for improvement?
1) The B73 reference genome accurate for B73 but less so for other maize lines (e.g., Mo17)
2) Even for B73, some regions of the genome are in the wrong place
3) Some large (multiple BAC) contigs could not be anchored assigned to “chromosome 0” 30 chr0 contigs in B73 RefGenV1 17 chr0 contigs in B73 RefGenV2
4) Some regions of the genome are missing ≈5% of B73 sequence is not in the B73 reference genome
< 450 bp
ApeKI site (GCWGC)( ) 64‐base sequence tag
Loss of cut site
B73
Mo17
Most tags can be mapped as individual alleles• In a biparental cross such as maize IBM (B73 x Mo17)
• Provided that they are polymorphic between the parents
Genetically mapping individual GBS alleles in IBM
Min #Successes p-value
max Recomb.
Total # GBS tags mapped
# B73 tagsmapped
# Mo17 tagsmapped
10 <10-3 <5% 485,860 266,192 219,668
20 <10-6 <5% 235,531 123,094 112,437
30 <10-7 <5% 140,713 73,829 66,884
Genetically mapping individual GBS alleles in IBM
Min #Successes p-value
max Recomb.
Total # GBS tags mapped
# B73 tagsmapped
# Mo17 tagsmapped
10 <10-3 <5% 485,860 266,192 219,668
20 <10-6 <5% 235,531 123,094 112,437
30 <10-7 <5% 140,713 73,829 66,884
B73 reference genome highly accurate for B73…
• 0.4% of B73 tags genetically map to different chromosome than they align to
B73 reference genome highly accurate for B73…
…but far less so for other maize lines
• 9.3% of Mo17 tags genetically map to different chromosome than they align to
• 0.4% of B73 tags genetically map to different chromosome than they align to
Only 50% of the maize genome is shared between two varieties
Fu & Dooner 2002, Morgante et al. 2005, Brunner et al 2005Numerous PAVs and CNVs - Springer, Lai, Schnable in 2010
50%
Plant 1
Plant 2 Plant 3
99%
Person 1
Person 2 Person 3
Maize Humans
Some chunks of the B73 reference genome are in the wrong place
Physical Chr
Start(Mb)
End(Mb)
GeneticChr
Approx. GeneticLocation (Mb) # Tags
10 139.3 139.8 2 16.5–16.8 49
9 102.5 106.9 9 15–32 49
7 150.1 161.8 5 192–214 13
10 0.2 0.4 4 83–151 12
8 48.4 50 2 61–127 12
10 0.07 0.2 7 47–100 9
2 231.2 231.2 7 18–26 8
3 228.1 230.5 5 194–212 6
• Marker discovery• Phylogeny/Kinship• Linkage mapping of QTL in a biparental cross• Fine‐mapping QTL• Genomic selection • Genome Wide Association Studies (GWAS)• NAM‐GWAS• Improving reference genome assembly
Some potential applications of GBS Data
top related