high-throughput sequencing

Click here to load reader

Post on 03-Dec-2014

2.833 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

Talk on High-Throughput Sequencing: Overview and Selected Applications for Masters students. Nov 9th 2011

TRANSCRIPT

  • 1.
    • PROFESSOR MARK PALLEN
  • UNIVERSITY OF BIRMINGHAM

High-throughput sequencingOverview and selected applications 2. Outline

  • What is high-throughput sequencing?
    • How it works
    • Key considerations
  • Applications
    • Clinical Microbiology
    • Cancer Biology

3. Conventional Sequencing

  • Sanger dideoxy chemistry 1970s
  • Bacterial genome sequencing 1990s
    • Whole-genome shotgun
    • Clonal populations of template molecules in vector/cloning host
    • Read lengths >500 bp
    • De novoassembly
  • Drawbacks
    • Time-consuming, expensive, onerous
      • Beyond average project grant
      • Out of reach of university infrastructure
    • Relies on colony propagation and picking
      • Some sequences cannot be cloned

4. High-throughput Sequencing

  • >100x faster, >100x cheaper!
    • A disruptive technology
  • Three second-generation technologies in the marketplace
    • 454 (Roche)
    • Solexa (Illumina)
    • SOLiD (ABI)
  • Fundamentally new approaches
    • Solid-phase amplification of clonal templates in molecular colonies
      • Massive increase in number of clones compensates for shorter read length
    • New chemistries for sequence reading
      • Pyrophosphate detection (PPi release upon base addition): 454
      • Reversible addition of fluorescent : Solexa
      • Sequencing by Ligation: SOLiD

5. Recent Developments

  • Single-molecule sequencing
    • Pacific Biosciences (PacBio)
    • Nanopore
  • Benchtop sequencers
    • Ion Torrent
    • MiSeq

6. Sequencing in Birmingham 7. 454 Life Sciences (Roche) 8. 454 Life Sciences (Roche) 9. Solexa/Illumina Sequencing 10. SOLiD Sequencing Requires emPCR Long run-times Short read-lengths (stuck at 50bp) Sequences in colour space 11. Source: http://www.politigenomics.com/next-generation-sequencing-informatics *Now improved to 1 kb reads and choice of 3, 8 or 20 kb inserts#b=bases, B=bytes Vendor: Roche Illumina ABI Technology: 454 Solexa GA SOLiD Platform: GS20 FLX Ti I II IIx 1 2 3 Reads: (M) 0.5 0.5 1.25 28 100 150 40 115 320 Fragment Read length: 100 200 400* 35 50 100 25 35 50 Run time: (d) 0.25 0.3 0.4 3 3 5 6 5 8 Yield: (Gb#) 0.05 0.1 0.5 1 5 15 1 4 16 Rate: (Gb/d) 0.2 0.33 1.25 0.33 1.67 3 0.34 1.6 2 Images: (TB#) 0.01 0.01 0.03 0.5 1.1 2.8 1.8 2.5 1.9 Paired-end Read length: 200 400 235 250 2100 225 235 250 Insert: (kb) 3.5 3.5* 0.2 0.2 0.2 3 3 3 Run time: (d) 0.3 0.4 6 10 10 12 10 16 Yield: (Gb) 0.1 0.5 2 9 30 2 8 32 Moores law applies! The Sequencing Singularity! Everything published is out of date! 12. Modes and Applications

  • For some applications, 454 read length essential, e.g.
    • amplicon sequencing; otherwise assembly will create chimeras
    • differential splicing; translocations
  • For other applications read number is more important; read length less so
    • Transcriptomics where 35 b read will identify transcript
    • SNP discovery/screening

13. Modes and Applications

  • Modes
    • Basic shotgun library
    • Paired-end or mate-pair shotgun
    • Amplicon sequencing
  • Applications
    • Whole genome
    • Metagenome, phylogenetic profiling
    • Transcriptome
    • SNP analysis; Splice variants; Methylation
    • Targeted sequence capture by microarray; Small RNAs

14. Modes and Applications

  • Sequencing run is the basic unit
    • Basic cost of 454 or Illumina ~several 1000s per run in consumables & essential on-costs
    • Additions for consumables and/or staff time for
      • multiple library preparation
      • some modes, e.g. paired end
      • data analysis etc
  • Run can be subdivided
    • Plate-dividing gaskets (loss of wells)
    • Multiplex identifiers (MIDs or sequence barcodes)
  • So cost per sample may be ~10s not 1000s!
    • But logistics of filling a plate may incur delays

15. De novoassembly versus alignment against template (aka re-sequencing) 16. Bacterial Genomic Epidemiology

  • Genome sequencing brings the advantages of
    • open-endedness (revealing the unknown unknowns),
    • universal applicability
    • ultimate in resolution
  • High-throughput platforms
    • 454, Illumina, PacBio
    • Expense and set-up puts them beyond average lab
  • Bench-top sequencing platforms
    • generate data sufficiently quickly and cheaply to have an impact on real-world clinical and epidemiological problems

17. The Birth of Genomic Epidemiology for Bacteria 18. The Birth of Genomic Epidemiology for Bacteria 19. Sequencing in Birmingham @mjpallen @pathogenomenick #AAMTHI 20. Case StudyAcinetobacter baumannii

  • Gram-negative bacillus
  • Multi-drug resistant
    • colistin and tigecycline as reserve agents
    • moving towards pan-resistance
  • Associated with
    • wound infections and ventilator-associated pneumonia
    • bloodstream infections
    • returning military personnel from Iraq and Afghanistan
    • transmission from military to civilian patients

21. Acinetobacter baumannii : problems

  • Hard to identify in clinical laboratory
    • Two related genomospecies 3 and 13TU, (now A. pittii and A. nosocomialis) impossible to distinguish phenotypically
  • Outbreak strains can be identified by PFGE, VNTR and gene-specific assays
    • BUT mode of spread and transmission chains often uncertain, hindering optimal management of outbreaks and rational design of policies
  • Mechanism of resistance hard to identify in individual cases
  • Poor understanding of pathogen biology

22. Applications and Questions

  • Epidemiology
    • Q1: Can whole-genome sequencing detect differences between isolates within an outbreak?
    • Q2: Can these differences be used to help determine chains of transmission?
  • Emergence of Resistance
    • Q3: Can it reveal how resistance emerges?
  • Taxonomy and Identification
    • Q4: Can it tell us what defines a species within a genus?

23. AcinetobacterGenomic Epidemiology

  • Outbreak in Birmingham Hospital in 2008
  • Isolates indistinguishable by current typing methods

24. AcinetobacterGenomic Epidemiology

  • 454 whole-genome sequencing of 6 isolates
  • SNP detection by mapping reads against draft reference assembly
  • SNP filtering for false positives
  • SNP validation with Sanger sequencing of PCR amplicons

25. Outbreak isolates distinguishable at only three loci SNP 1SNP 2SNP 3AB0057CAGM1CAGM2TAGM3TATM4TAGC1TTGC2TAG 26. 27. 28. Before and after tigecycline therapy

  • Genomes of twoAcinetobacter baumanniiisolates from single patient sequenced
    • AB210 before tigecycline therapy (susceptible); 454 sequenced
    • AB211 after therapy (resistant); Illumina-sequenced

29. Before and after tigecycline therapy

  • Eighteen SNPs detected between AB210 and AB211
    • nine non-synonymous
    • including a SNP inadeSwhich accounts for resistance phenotype
  • Three contigs i