sharon e. mitchell - cornell universitycbsu.tc.cornell.edu/lab/doc/gbs_method_overview2.pdf ·...
TRANSCRIPT
![Page 1: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/1.jpg)
Genotyping By Sequencing (GBS) Method Overview
Sharon E. MitchellInstitute for Genomic Diversity
Cornell University
http://www.maizegenetics.net/
![Page 2: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/2.jpg)
•Background/Goals
•GBS lab protocol
•Illumina sequencing review
•GBS adapter system
•How GBS differs from RAD
•Modifying GBS for different species
•GBS Workflow
Topics Presented
![Page 3: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/3.jpg)
Background
Genotyping by sequencing (GBS) in any large genome species requires reduction of genome complexity.
II. Restriction Enzymes (REs)I. Target enrichment
•Long range PCR of specific genes or genomic subsets
•Molecular inversion probes
•Sequence capture approaches hybridization‐based (microarrays)
*Technically less challenging*
• Methylation sensitive REs filter out repetitive genomic fraction
![Page 4: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/4.jpg)
QTL are often located in non‐coding regionsVgt1, Tb, B regulatory regions 60‐150kb from gene
Exon Exon Exon Exon
Exon captureMap large numbers ofgenome‐wide markers
QTL
![Page 5: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/5.jpg)
Create inexpensive, robust multiplex
sequencing protocol
Low‐coverageSequencing Illumina GA
Informatics Pipelin
Anchor markers across the genomeImpute missing data
Combine genotypic & phenotypic data for GS and GWAS
Our goal is to create a public genotyping/informatics platform based on next‐generation sequencing
![Page 6: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/6.jpg)
Open Source
• Method available for anyone to use / modify.• Analysis pipeline details and code are public.• Promote dataset compatibility.• Method published in PLoS ONE to promote accessibility.
• Genotype calls publically available.
![Page 7: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/7.jpg)
< 450 bp
Restriction site( ) sequence tag
Loss of cut site
Sample1
Overview of Genotyping by Sequencing (GBS)
• Focuses NextGen sequencing power to ends of restriction fragments• Scores both SNPs and presence/absence markers
Sample2
![Page 8: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/8.jpg)
• Reduced sample handling• Few PCR & purification steps• No DNA size fractionation• Efficient barcoding system• Simultaneous marker discovery & genotyping
• Scales very well
GBS is a simple, highly multiplexed system for constructing libraries for next‐gen sequencing
![Page 9: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/9.jpg)
GBS 96‐plex Protocol(http://www.maizegenetics.net/gbs‐overview)
1. Plate DNA & adapter pair
Now at 384‐plex
![Page 10: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/10.jpg)
GBS Adapters and Enzymes
BarcodeAdapter
“Sticky Ends”
Barcode(4‐8 bp)
CommonAdapter
P1 P2
ApeKI G CWGCPstI CTGCA GEcoT22I ATGCA T
5’ 3’
Restriction Enzymes
Illumina Sequencing Primer 2
Illumina Sequencing Primer 1
![Page 11: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/11.jpg)
GBS 96‐plex Protocol(http://www.maizegenetics.net/gbs‐overview)
............. ...
..... ..................... ........... ..
..
. .. ...
.
...
..... ......... .. .........
.....
...
. .. ....... ......
...
.. .......
...
. .
..
.... . .. . ....
1. Plate DNA & adapter pair
5. PCR
Primers
2. Digest DNA with RE3. Ligate adapters
Clean‐up4. Pool 96 samples
![Page 12: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/12.jpg)
. .
..
..
.
....
.. ...
...........
.
.
.. ........
.
.
. ... ...
..
.
.
..
.
.
.
.....
.
.
...
... ..
..
.
.
.
.
...
.. ...
.. ...
......
.
..
.....
. ......
....
... ..
...
.. ..
. .
..
..
.
........
.
. .
P1 P2 P1 P2
.
O1 O2
PCR
Insert
Pooled Digestion/LigationReactions (n=94)
GBS“Library”
PRC primers:
Insert
![Page 13: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/13.jpg)
GBS 96‐plex Protocol(http://www.maizegenetics.net/gbs‐overview)
............. ...
..... ..................... ........... ..
..
. .. ...
.
...
..... ......... .. .........
.....
...
. .. ....... ......
...
.. .......
...
. .
..
.... . .. . ....
1. Plate DNA & adapter pair
5. PCR
Primers
2. Digest DNA with RE3. Ligate adapters4. Pool 96 DNA aliquots
6. Evaluate fragment sizes
Clean‐upClean‐up
![Page 14: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/14.jpg)
Perform Titration to Minimize Adapter Dimers Before Sequencing
Size Standards
LibraryAdapterDimer
1500 bp15 bp
NOTE: Done once with a small number of samples.Adapter dimers constitute only 0.05% of raw sequence reads
Fluo
rescen
seintensity
Time Time
Optimal adapter amount
![Page 15: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/15.jpg)
Small Fragments are Enriched in GBS Libraries Total Fragm
ents
0%
5%
10%
15%
20%
25%
30%
35%
40%
0 50 100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1000
0000
REFGENOME
IBM
ApeKI fragment size (bp) >10000
B73 RefGen v1IBM (B73 X Mo17) RILs
![Page 16: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/16.jpg)
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
384‐plex GBS Results for Maize
Reads
Mean read count per line = 528,000c.v. = 0.22
![Page 17: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/17.jpg)
Flowcell8 channels
Solid Phase Oligos
Illumina Sequencing by Synthesis Review
![Page 18: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/18.jpg)
Flow cell with bound oligos
Denatured “Library”
Cluster Formation Amplifies Sequencing Signal
Bridge Amplification
Linearization
![Page 19: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/19.jpg)
ACGTGGC
ACGT
GC
T
G
TG
P1 primer
C
ACGT
GC
TG
GA
ACGT
GC
TGC
G
T TG TGC TGCA
Sequencing by Synthesis
FlowcellHiSeq 2000
![Page 20: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/20.jpg)
![Page 21: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/21.jpg)
GACGT
GC
T
G
ACGTGGC
T
C
ACGT
GC
TG
G
T N
ACGTGGC
T ACGTGGC
T ACGTGGC
T
“In Phase” “Out of Phase”
![Page 22: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/22.jpg)
Read 1
GBS captures barcode and insert DNA sequence in single read
Insert DNA
Barcode
Sequencing Primer 1
Sequencing Primer 2
Illumina
Read 1
Read 2
Sequencing Primer 1
BarcodeRE cut‐site
Insert DNA
GBS
![Page 23: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/23.jpg)
Variable Length GBS Barcodes Solves Sequence Phasing Issues
•First 12 nt used to calculate phasing.•Algorithm assumes random nt distribution.•Incorrect phasing causes incorrect base calls.
Barcode
Illumina
Insert DNA
•Good design and modulating the RE cut site position with variable length barcodes produces even nt distribution.
BarcodeRE cut‐site
GBS
Insert DNA
Read 2
Read 1
![Page 24: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/24.jpg)
Invariant GBS barcodes cause loss of signal intensity.
![Page 25: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/25.jpg)
Successful GBS sequencing run.
![Page 26: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/26.jpg)
GBS Adapter Design
![Page 27: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/27.jpg)
Barcode Design Considerations• Barcode sets are enzyme specific
– Must not recreate the enzyme recognition site– Must have complementary overhangs
• Sets must be of variable length• Bases must be well balanced at each position• Must different enough from each other to avoid confusion if
there is a sequencing error.– At least 3 bp differences among barcodes.
• Must not nest within other barcodes• No mononucleotide runs of 3 or more bases
http://www.deenabio.com
![Page 28: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/28.jpg)
Most significant GBS technical issue?
DNA quality
![Page 29: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/29.jpg)
Different Ends 50%
Same Ends 50%
100%
Ligation
LigationPCR
GBSStandard
GBS does not use standard “Y” adapters
![Page 30: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/30.jpg)
Denatured “library”
Flow Cell with Bound Oligos
Same‐ended Fragments Do Not Form Clusters
Bridge Amplification
Cluster Formation &“Linearization”
No P1 bindingsite
Cleaved from surface
![Page 31: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/31.jpg)
< 450 bp
Restriction site( ) sequence tag
Loss of cut site
Sample1
GBS vs. RAD
• Focuses NextGen sequencing power to ends of restriction fragments• Scores both SNPs and presence/absence markers
Sample2
![Page 32: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/32.jpg)
Digest
Ligate adapters
Pool
Random shear
Size select
Ligate Y adapters
PCR
RADGBS
ReferenceDavey et al. 2011
![Page 33: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/33.jpg)
Modifying GBS
Considerations for using GBS with new species and / or different enzymes.
![Page 34: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/34.jpg)
Why Modify the GBS Protocol?
• More markers• Fewer markers (deeper sequence coverage per locus)
• Increase multiplexing• More genome appropriate (avoid more repetitive DNA classes)
• Other novel applications (i.e., bisulfitesequencing).
![Page 35: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/35.jpg)
Genome Sampling Strategies Vary by Species
Dependent on Factors that Affect Diversity:
•Mating System(Outcrosser, inbreeder, clonal?)
•Ploidy(Haploid, diploid, auto‐ or allopolyploid?)
•Geographical Distribution(Island population, cosmopolitan?)
![Page 36: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/36.jpg)
Other Factors
• Genome size– The size of the genome has some bearing on the size of the fragment pool.
– Amount of repetitive DNA directly correlated with genome size.
• Genome composition– The base composition of the genome can affect the frequency and distribution of the cut sites.
![Page 37: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/37.jpg)
Sampling large genomes with methylation‐sensitive restriction enzymes.
5‐base cutter
6‐base cutter
MethylatedDNA
UnmethylatedDNA
GBS LibrarySequenced Fragments
![Page 38: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/38.jpg)
Scrub jay
Vole Giantsquid
Deer Mouse
TunicateYeast
Solitary Bee
Grape Maize Cacao
Rice
Raspberry
Barley
Cassava
Goose
Sorghum
Cassava
Shrub willow
Optimizing GBS in New Species
![Page 39: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/39.jpg)
Choosing Appropriate Restriction Enzymes
“Life is like a bowl of chocolates”.
![Page 40: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/40.jpg)
Deer Mouse
ApeKI
PstI
![Page 41: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/41.jpg)
Goose
PstI
Giant Squid
ApeKI
![Page 42: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/42.jpg)
B
Bovine GBS LibrariesPstI ApeKI
EcoT22I
X
![Page 43: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/43.jpg)
PstI ApeKI
Total unique tags 442,813 1,138,366
Total SNPs 63,697 13,328
Tags with SNPs 14% 1%
Results for PstI and ApeKI libraries
![Page 44: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/44.jpg)
single copy92%
single copy93%
PstI Library ApeKI Library
2‐5 copies >5 copies
Proportion of sequence tags by copy number
![Page 45: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/45.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 2‐10 11‐100 101‐1000 1001‐10000 >10,0000
Proportion of sequences by copy number
ApekIPstI
![Page 46: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/46.jpg)
0
20000000
40000000
60000000
80000000
100000000
120000000
140000000
160000000
180000000
0 1000 2000 3000
0
10000000
20000000
30000000
40000000
50000000
60000000
70000000
80000000
0 1000 2000 3000 4000 5000 6000 7000 8000
Bos taurusPstI SNPsChromosome 1
Sorghum bicolor ApeKI SNPsChromosome 1
Position
SNPs
![Page 47: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/47.jpg)
GBS SNP calls in Sorghum bicolorLots of Missing Data
091211.c10.hmp alleles chrom SC283 BR007B RIL_100 RIL_101 RIL_102 RIL_103 RIL_104 RIL_105 RIL_106 RIL_107 RIL_108 RIL_109 RIL_10 RIL_110 RIL_111 RIL_112 RIL_113 RIL_114 RIL_115
S10_29954 C/A 10 A N A C A N N C C N N N N N N N N C N
S10_29963 A/C 10 C N C A C N N A A N N N N N N N N A N
S10_29965 T/C 10 C N C T C N N T T N N N N N N N N T N
S10_84178 C/G 10 N N C N G N N N N N N N N N C N N N N
S10_84512 T/C 10 T T N N T N T N N N N C N N N N N N N
S10_84513 C/T 10 C C N N C N C N N N N T N N N N N N N
S10_85556 G/A 10 G A G A G N G G N N G G G R N A N R G
S10_85556 G/A 10 G N N A G A G N N G G G N N A N G G G
S10_115300 T/C 10 T C T C T C T N N T T N N N C N N T T
S10_133268 T/C 10 T C N N N N N T T N N N N N N N N N N
S10_158101 G/A 10 G N N A G N N N N N N N G A A N G G N
S10_158101 G/A 10 G A N A G A N N N G N G N N N N G G N
S10_163724 G/A 10 G A N A N N N G N N G N N N N N G N G
S10_163929 T/G 10 G T N T G N G N N G G N G T T T N G N
S10_163931 C/T 10 T C N C T N T N N T T N T C C C N T N
S10_180445 G/A 10 G A G N G A G G G N G N G A A N G G G
S10_209049 C/G 10 N G C S C G C C C N C C N C G G N C N
S10_209231 G/A 10 N A G A G A G G N N G G G N A A G G N
S10_214941 C/T 10 C N N T N N N N C N C N N N N N N C C
S10_234260 T/G 10 T N T N T G T N T T T T T N N G T N T
S10_234260 T/G 10 T G T G T G N T T T T T T N G G T T T
S10_238937 G/A 10 N A N A N N G G G N G N N N A N N N N
S10_252729 G/T 10 G T G T G N N G G G N N N N N T G G G
S10_252730 C/T 10 C T C T C N N C C C N N N N N T C C C
S10_261151 T/C 10 T C T C T C N N T T N N T N N N T N N
S10_268392 C/A 10 N N N N A N A N N N N N N N N N N N N
S10_268393 G/A 10 N N N N A N A N N N N N N N N N N N N
S10_274414 C/T 10 N N C N N N N C T C N N C T N N C N C
S10_281272 T/C 10 T N T N N N N N N N N N N N N N N T T
S10_287523 G/T 10 G N G N N N N G G G N N N N N N N N N
S10_292020 A/T 10 A T N N A N N A A A N A A N T T A A A
S10_292020 A/T 10 A T A T N N N A N A N A N N T T A N A
S10_294189 C/T 10 N N C N C N C C C N N N C T N N C N C
S10_302638 C/T 10 C T C T C N N C N C C N C T T N C C N
S10_312560 A/C 10 A C A N A N A A A A A A A C C N A A A
S10_347848 T/C 10 T C T C T C N T T T T N T N C N T T T
![Page 48: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/48.jpg)
Missing Data Strategies
• Impute• Technical Options
– Reduce the multiplexing level– Sequence the same library multiple times
• Molecular Options– Choose less frequently cutting enzymes
![Page 49: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/49.jpg)
DNA sample info entered to webform
HTS databaseProject approved?Samples shipped
GBS libraries made
Reference genome
Non‐reference genome
Data FileSNPs/Sample
Lab Analysis Pipelines
DNA sequence data Data File
States/Sample
GBS Workflow
![Page 50: Sharon E. Mitchell - Cornell Universitycbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview2.pdf · 2014-10-31 · GBS SNP calls in Sorghum bicolor Lots of Missing Data 091211.c10.hmp](https://reader030.vdocument.in/reader030/viewer/2022041120/5f33c261f7ce1a34c91dc9f1/html5/thumbnails/50.jpg)
BioinformaticsJeff GlaubitzQi SunJames Harriman
Method DevelopmentRob ElshireEd Buckler
Sharon Mitchell
GBS Team
Laboratory/ProductionCharlotte Acharya
Wenyan ZhuLisa Blanchard