tomato genome sl2.50 and beyond…
TRANSCRIPT
Tomato Genome SL2.50 and
Beyond…
Surya Saha, Jeremy Edwards and Lukas Mueller
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
[email protected] @SahaSurya
Slides: http://bit.ly/PAGbld230
https://fanart.tv/movie/196/back-to-the-future-part-iii/
CHROMOSOMES
SCAFFOLDSCONTIGS
Gene to Genome – The BIG picture
SCAFFOLD GAPS
CHROMOSOME GAPS
SGN Workshop, PAG 2015
GENES
TM2 (Chr 9)
L2 (Chr 10)
Genome Assembly @NCBI
Contigs
• Components
Tiling Path file
(TPF)
• Accession numbers
• Can have nested
components
Accession
Golden Path files
(AGP)
• Scaffold IDs
• Orientation
• Chromosome from
contig AGP
• Chromosome from
scaffold AGP
• Scaffold from
contig AGP
NCBI
SGN Workshop, PAG 2015
Jeremy Edwards
https://github.com/solgenomics/Bio-GenomeUpdate
FISH• Order
• Orientation
• Gap sizes
Tiling Path file
(TPF)
Accession
Golden Path files
(AGP)NCBI
Gap extension
Scaffold flip
SGN Workshop, PAG 2015
Jeremy Edwards
https://github.com/solgenomics/Bio-GenomeUpdate
SL2.40 Annotation
• SL2.40 AGP
• SL2.50 AGP
• SL2.40 GFF3
SL2.50 Annotation
• SL2.50 GFF3
• Validated via Fasta
Errors corrected
• Start/end coordinates in different scaffolds
• Start > end coordinates for UTRs
• Start or end coordinates in gap region
• Dropped Solyc03g053140.1 and Solyc12g032910.1
SL2.50 Genome Release
Genome build
2.5 Fasta
+
ITAG 2.4 GFFs
CHADO
FTP site
Website
JBrowse
Blast DBs
SGN Workshop, PAG 2015
State of the SL2.50 Build
SGN Workshop, PAG 2015
0
20000000
40000000
60000000
80000000
100000000
120000000
0 1 2 3 4 5 6 7 8 9 10 11 12
State of the SL2.50 Build
SGN Workshop, PAG 2015
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 1 2 3 4 5 6 7 8 9 10 11 12
Sequence Scaffold gap length Component gap length
State of the SL2.50 Build
SGN Workshop, PAG 2015
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 1 2 3 4 5 6 7 8 9 10 11 12
Sequence Scaffold gap length Component gap length
Length 823Mb
Sequence 737Mb
Component gaps 43Mb (5.30%)
Scaffold gaps 42Mb (5.17%)
Total gaps 86Mb (10.47%)
BAC Resources
Bruce Roe
HTGS Phase 1: 332
HTGS Phase 2: 520
HTGS Phase 3: 2751
http://www.ncbi.nlm.nih.gov/genbank/htgs/faq
SGN Workshop, PAG 2015
HTGS Phase 3 BACs
SGN Workshop, PAG 2015
Chr 0 53
Chr 1 589
Chr 2 248
Chr 3 137
Chr 4 147
Chr 5 117
Chr 6 104
Chr 7 111
Chr 8 249
Chr 9 119
Chr 10 620
Chr 11 100
Chr 12 86
Unknown 84
SGN Workshop, PAG 2015
Jeremy Edwards
https://github.com/solgenomics/Bio-GenomeUpdate
BAC assemblies
• Phrap
• ACE files
BAC sets
• Assembled BACs
• Singleton BACs
Align to SL2.50
• Nucmer
• 100bp word size
• 500bp minimum alignment
• 99% identity
Novel sequences
• Extensions
• Gap coverage
Phrap Assembly (HTGS Phase 3 BACs)
SGN Workshop, PAG 2015
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 5 6 7 8 9 10 11 12
Assembled BACs Singleton BACs
Phrap Assembly (HTGS Phase 3 BACs)
SGN Workshop, PAG 2015
Chr10 Contig68 10 BACs (242Kb!!)
Chr2 Contig185 7 BACs (566Kb!!)
Future Work
• Manually examine assembled BAC contigs with < 99% identity
• Evaluate HTGS phase 2 BACs
• Use PCR walking to close gaps
• Create TPF files for SL3.0
• Annotate SL3.0 and lift over annotations from SL2.50
SGN Workshop, PAG 2015