solanum lycopersicum chromosome 4 mapping and finishing update src-uk and wellcome trust sanger...
TRANSCRIPT
Solanum lycopersicum Chromosome 4
Mapping and Finishing Update
SRC-UK andWellcome Trust Sanger Institute
SOL Korea – September 2007
Wellcome Trust Medical Photographic Library
Tomato Physical Map
Library No. of clones
Average Insert Genome equivalents
Fingerprints
LE_HBa 129,024 117 kb 15 X 88,000 (AGI)
SL_MboI 52,992 135 kb 7 X 43,000 (WTSI)
SL_EcoI 72,264 95-100 kb 7 X
BACs are selected for sequencing on chromosome 4 using the physical map assembled in fpc.
The map has been assembled using fingerprinted clones from 2 BAC libraries. Extending and gap filling clones are identified using end sequences. Clones are fingerprinted, entered in fpc and overlaps checked before being selected for sequencing.
Tomato BAC libraries
Map Coverage – Chromosome 4
Chromosome 4 is represented by 45 FPC contigs that cover approximately 22.2Mb, estimated from fingerprints (5 bands/kb). 40 clones have been selected to extend original contigs based on clone end sequence matches
All contigs are anchored to the chromosome by SGN chromosome 4 markers
FISH (H. de Jong, Wageningen) has confirmed the placement of some contigs on chromosome 4, but may refute placement of >= 7 contigs. Confirmation of chromosome 4 contigs is high priority.
142 markers are missing out of the 907 SGN chromosome 4 markers from current fpc build. Overgo probes are being used to screen the BAC libraries. They may identify ~47 additional clones
The Syngenta marker data will also be used for identifying additional BACs.
FISH Data
Confirmation of chromosome location
Verification of contig and marker placement
Assessment of heterochromatin & euchromatin distribution
This image demonstrates:
– LE_HBa114C15 on short arm
– LE_HBa308B7 on heterochromatin/centromere border
– LE_HBa20F17 on long arm
FISH performed by S. B. Chang at Prof S. Stack’s Laboratory, University of Colorado, USA.
Chromosome 4 – Distribution of contigs
Mapped Markers
ctg503 ctg15
ctg5716
ctg5014 ctg5252
ctg5711
ctg916
ctg1406
ctg1189
ctg1795
FISH
confirmed
This shows that clones for sequencing have been selected from seed contigs along the length of the chromosome. Including those selected from putative heterochromatic regions to try to asses the boundary domains
Distribution of Chromosome 4 Contigs
This shows that clones for sequencing have been selected from seed contigs along the length of the chromosome. Ten contigs shown are from the current 45 fpc contigs on chr4 - including those selected from putative heterochromatic regions to try to assess the boundary domains.
Chr4 Mapped Markers
ctg503 ctg15
ctg5716
ctg5014 ctg5252
ctg5711
ctg916
ctg1406
ctg1189
ctg1795
TG485 T0635 T0954 T1322 CT_At5g
37360
T1068 TG287
FISH confirmed
TG163P41P74
Analysed BAC and Number of gene models
Centromere
bTH8H22 - 4 GenesbTH36C23 – 2 GenesbTH50I18 – 3 Genes
bTH114C152 Genes
bTH308B70 Genes
bTH198L24 – 0 GenesbTH31H5 – 1 Gene
bTH132O113 Genes
bTH53M25 Genes
bTH59M167 Genes
The number of gene models obtained from the gene prediction training set
= Euchromatin= Heterochromatin
Sequence Plot of ctg916 euchromatin
Sequence Plot of ctg5711 euchromatin
Sequence Plot of ctg15 (heterochromatic -euchromatic boundary region)
Same plot
as before
with greyscale
adjusted to
view repeat
features
Sequence Plot of ctg5014 near centromere
Same plot
as before
with greyscale
adjusted to
view repeat
features
TPF File
Tile Path Format file – tab delimited flat file
GAP type-3 ?? LE_HBa-24G5 ctg145CT990489 LE_HBa-20F17 ctg145GAP type-3 ?CT990488 LE_HBa-114C15 ctg5716? SL_MboI-143K21 ctg5716GAP type-3 ?? LE_HBa-147F16 ctg5014CT990558 LE_HBa-308B7 ctg5014GAP type-3 ?CT990624 LE_HBa-27G19 ctg15CT476825 LE_HBa-198L24 ctg15CT573298 LE_HBa-119A16 ctg15CT485992 LE_HBa-31H5 ctg15
chr4 1 50000 1 N 50000 clone nochr4 50001 100000 2 N 50000 clone nochr4 100001 150000 3 N 50000 contig nochr4 150001 200000 4 N 50000 clone nochr4 200001 360432 5 F CT476825.1 1 160432 +chr4 360433 370113 6 F CT573298.1 2001 11681 +chr4 370114 532277 7 F CT485992.1 2001 164164 +chr4 532278 582277 8 N 50000 contig nochr4 582278 632277 9 N 50000 clone nochr4 632278 682277 10 N 50000 contig no
AGP File
Accesioned Golden Path – tab delimited flat file
Gaps and unfinished clones are entered as 50,000bp sections to more accurately represent the chromosome in each build
Order and alignment of Phase 3 finished accessions
AGP View on SGN
PseudoGoldenPath analysis for Contig Extension and Gap Closure
A PGP viewer is being developed to visualise sequence alignments and contig positioning
Contains finished and unfinished sequence
Unfinished clones are represented as sequence contigs
Unmasked BES aligned to PGP sequence using ssaha2
Parameters e.g. minimum percentage id = 95%, minimum of 60% of the end sequence found
Map gaps are assigned an arbitrary 5kb size
Clone candidates for contig extension checked with BLAST and fingerprinted
Aim to incorporate other data such as markers
Closing the Map using PGP
MAP GAP
Bridging clones identified from BES alignments to sequence
Sequenced clones
53 clone extensions have been identified, including 5 merges with previously unplaced contigs. 2 merges of chromosome 4 contigs have also been made
Extender from Fosmid Library
Fosmid end sequences deposited by Cornell have been aligned to chromosome 4 sequence
A copy of the fosmid library has been received at WTSI and ~ 50,000 clones will be end sequenced by December and the sequences deposited in the Ensembl / NCBI Trace repositories
Potential Extender
WTSI Tomato Clone Pipeline
Pipeline Stage Number of BACs
Subcloning 34
Shotgun 21
Assembly Start 7
Auto-prefinishing 3
Finishing 11
QC Checking 4
Finished 63
Total 143
Phase 3
Phase 1
Phase 2
HTGS:
Chromosome 4Sequence Generated
Total Sequence Available 10,666,227 bp
Total Unique Sequence 10,633,995 bp
Total amount of Finished Sequence = 7,543,322 bp
Summary of Progress on Chromosome 4
45 map contigs have been built on chromosome 4
Clone end sequence alignments visualised with the PGP viewer are being used to extend contigs and close gaps
~100,000 fosmid end sequences will be generated by end 2007
10.6Mb of sequence has been generated, of which 7.5Mb are finished
All sequence assemblies >2kb are deposited in HTGS divisions of EMBL/GenBank/DDBJ
Acknowledgements
Wellcome Trust Sanger Institute:Jane RogersSean HumphrayClare Riddle and Mapping Core GroupKaren McLaren and Finishing Team 46Stuart McLaren and Pre-finishing Team 58Christine Lloyd and QC Team 57Karen OliverMatt JonesCarol Scott
Imperial College London:Gerard BishopDaniel BuchanJames AbbottSarah Butcher
University of Nottingham:Graham Seymour
Scottish Crop Research Institute:Glenn Bryan
Cornell University: Lukas MuellerJim Giovannoni
MIPS/IBI Institute for Bioinformatics:Klaus MayerRemy Bruggmann
FISH ResourcesStephen Stack Group (Colorado)Hans de Jong (Wageningen)
FUNDING