high-throughput sequencing
Click here to load reader
Post on 03-Dec-2014
2.833 views
Embed Size (px)
DESCRIPTION
Talk on High-Throughput Sequencing: Overview and Selected Applications for Masters students. Nov 9th 2011TRANSCRIPT
- 1.
- PROFESSOR MARK PALLEN
- UNIVERSITY OF BIRMINGHAM
High-throughput sequencingOverview and selected applications 2. Outline
- What is high-throughput sequencing?
- How it works
- Key considerations
- Applications
- Clinical Microbiology
- Cancer Biology
3. Conventional Sequencing
- Sanger dideoxy chemistry 1970s
- Bacterial genome sequencing 1990s
- Whole-genome shotgun
- Clonal populations of template molecules in vector/cloning host
- Read lengths >500 bp
- De novoassembly
- Drawbacks
- Time-consuming, expensive, onerous
- Beyond average project grant
- Out of reach of university infrastructure
- Relies on colony propagation and picking
- Some sequences cannot be cloned
4. High-throughput Sequencing
- >100x faster, >100x cheaper!
- A disruptive technology
- Three second-generation technologies in the marketplace
- 454 (Roche)
- Solexa (Illumina)
- SOLiD (ABI)
- Fundamentally new approaches
- Solid-phase amplification of clonal templates in molecular colonies
- Massive increase in number of clones compensates for shorter read length
- New chemistries for sequence reading
- Pyrophosphate detection (PPi release upon base addition): 454
- Reversible addition of fluorescent : Solexa
- Sequencing by Ligation: SOLiD
5. Recent Developments
- Single-molecule sequencing
- Pacific Biosciences (PacBio)
- Nanopore
- Benchtop sequencers
- Ion Torrent
- MiSeq
6. Sequencing in Birmingham 7. 454 Life Sciences (Roche) 8. 454 Life Sciences (Roche) 9. Solexa/Illumina Sequencing 10. SOLiD Sequencing Requires emPCR Long run-times Short read-lengths (stuck at 50bp) Sequences in colour space 11. Source: http://www.politigenomics.com/next-generation-sequencing-informatics *Now improved to 1 kb reads and choice of 3, 8 or 20 kb inserts#b=bases, B=bytes Vendor: Roche Illumina ABI Technology: 454 Solexa GA SOLiD Platform: GS20 FLX Ti I II IIx 1 2 3 Reads: (M) 0.5 0.5 1.25 28 100 150 40 115 320 Fragment Read length: 100 200 400* 35 50 100 25 35 50 Run time: (d) 0.25 0.3 0.4 3 3 5 6 5 8 Yield: (Gb#) 0.05 0.1 0.5 1 5 15 1 4 16 Rate: (Gb/d) 0.2 0.33 1.25 0.33 1.67 3 0.34 1.6 2 Images: (TB#) 0.01 0.01 0.03 0.5 1.1 2.8 1.8 2.5 1.9 Paired-end Read length: 200 400 235 250 2100 225 235 250 Insert: (kb) 3.5 3.5* 0.2 0.2 0.2 3 3 3 Run time: (d) 0.3 0.4 6 10 10 12 10 16 Yield: (Gb) 0.1 0.5 2 9 30 2 8 32 Moores law applies! The Sequencing Singularity! Everything published is out of date! 12. Modes and Applications
- For some applications, 454 read length essential, e.g.
- amplicon sequencing; otherwise assembly will create chimeras
- differential splicing; translocations
- For other applications read number is more important; read length less so
- Transcriptomics where 35 b read will identify transcript
- SNP discovery/screening
13. Modes and Applications
- Modes
- Basic shotgun library
- Paired-end or mate-pair shotgun
- Amplicon sequencing
- Applications
- Whole genome
- Metagenome, phylogenetic profiling
- Transcriptome
- SNP analysis; Splice variants; Methylation
- Targeted sequence capture by microarray; Small RNAs
14. Modes and Applications
- Sequencing run is the basic unit
- Basic cost of 454 or Illumina ~several 1000s per run in consumables & essential on-costs
- Additions for consumables and/or staff time for
- multiple library preparation
- some modes, e.g. paired end
- data analysis etc
- Run can be subdivided
- Plate-dividing gaskets (loss of wells)
- Multiplex identifiers (MIDs or sequence barcodes)
- So cost per sample may be ~10s not 1000s!
- But logistics of filling a plate may incur delays
15. De novoassembly versus alignment against template (aka re-sequencing) 16. Bacterial Genomic Epidemiology
- Genome sequencing brings the advantages of
- open-endedness (revealing the unknown unknowns),
- universal applicability
- ultimate in resolution
- High-throughput platforms
- 454, Illumina, PacBio
- Expense and set-up puts them beyond average lab
- Bench-top sequencing platforms
- generate data sufficiently quickly and cheaply to have an impact on real-world clinical and epidemiological problems
17. The Birth of Genomic Epidemiology for Bacteria 18. The Birth of Genomic Epidemiology for Bacteria 19. Sequencing in Birmingham @mjpallen @pathogenomenick #AAMTHI 20. Case StudyAcinetobacter baumannii
- Gram-negative bacillus
- Multi-drug resistant
- colistin and tigecycline as reserve agents
- moving towards pan-resistance
- Associated with
- wound infections and ventilator-associated pneumonia
- bloodstream infections
- returning military personnel from Iraq and Afghanistan
- transmission from military to civilian patients
21. Acinetobacter baumannii : problems
- Hard to identify in clinical laboratory
- Two related genomospecies 3 and 13TU, (now A. pittii and A. nosocomialis) impossible to distinguish phenotypically
- Outbreak strains can be identified by PFGE, VNTR and gene-specific assays
- BUT mode of spread and transmission chains often uncertain, hindering optimal management of outbreaks and rational design of policies
- Mechanism of resistance hard to identify in individual cases
- Poor understanding of pathogen biology
22. Applications and Questions
- Epidemiology
- Q1: Can whole-genome sequencing detect differences between isolates within an outbreak?
- Q2: Can these differences be used to help determine chains of transmission?
- Emergence of Resistance
- Q3: Can it reveal how resistance emerges?
- Taxonomy and Identification
- Q4: Can it tell us what defines a species within a genus?
23. AcinetobacterGenomic Epidemiology
- Outbreak in Birmingham Hospital in 2008
- Isolates indistinguishable by current typing methods
24. AcinetobacterGenomic Epidemiology
- 454 whole-genome sequencing of 6 isolates
- SNP detection by mapping reads against draft reference assembly
- SNP filtering for false positives
- SNP validation with Sanger sequencing of PCR amplicons
25. Outbreak isolates distinguishable at only three loci SNP 1SNP 2SNP 3AB0057CAGM1CAGM2TAGM3TATM4TAGC1TTGC2TAG 26. 27. 28. Before and after tigecycline therapy
- Genomes of twoAcinetobacter baumanniiisolates from single patient sequenced
- AB210 before tigecycline therapy (susceptible); 454 sequenced
- AB211 after therapy (resistant); Illumina-sequenced
29. Before and after tigecycline therapy
- Eighteen SNPs detected between AB210 and AB211
- nine non-synonymous
- including a SNP inadeSwhich accounts for resistance phenotype
- Three contigs i