high-throughput sequencing
DESCRIPTION
Talk on High-Throughput Sequencing: Overview and Selected Applications for Masters students. Nov 9th 2011TRANSCRIPT
PROFESSOR MARK PALLENUNIVERSITY OF BIRMINGHAM
High-throughput sequencing
Overview and selected applications
Outline
What is high-throughput sequencing? How it works Key considerations
Applications Clinical Microbiology Cancer Biology
Conventional Sequencing
Sanger dideoxy chemistry 1970sBacterial genome sequencing
1990s Whole-genome shotgun Clonal populations of template
molecules in vector/cloning host Read lengths >500 bp De novo assembly
Drawbacks Time-consuming, expensive, onerous
Beyond average project grant Out of reach of university infrastructure
Relies on colony propagation and picking Some sequences cannot be cloned
High-throughput Sequencing
>100x faster, >100x cheaper! A disruptive technology
Three “second-generation” technologies in the marketplace 454 (Roche) Solexa (Illumina) SOLiD (ABI)
Fundamentally new approaches Solid-phase amplification of clonal
templates in “molecular colonies” Massive increase in number of “clones”
compensates for shorter read length New chemistries for sequence reading
Pyrophosphate detection (PPi release upon base addition): 454
Reversible addition of fluorescent : Solexa
Sequencing by Ligation: SOLiD
Recent Developments
Single-molecule sequencing Pacific Biosciences (PacBio) Nanopore
Benchtop sequencers Ion Torrent MiSeq
Sequencing in Birmingham
454 Life Sciences (Roche)
454 Life Sciences (Roche)
Solexa/Illumina
Sequencing
SOLiD Sequencing
Requires emPCRLong run-timesShort read-lengths(stuck at 50bp)Sequences in colour space
Vendor: Roche Illumina ABI
Technology:
454 Solexa GA SOLiD
Platform: GS20 FLX Ti I II IIx 1 2 3
Reads: (M) 0.5 0.5 1.25 28 100 150 40 115 320
Fragment
Read length:
100 200 400* 35 50 100 25 35 50
Run time: (d)
0.25 0.3 0.4 3 3 5 6 5 8
Yield: (Gb#)
0.05 0.1 0.5 1 5 15 1 4 16
Rate: (Gb/d)
0.2 0.33 1.25 0.33 1.67 3 0.34 1.6 2
Images: (TB#)
0.01 0.01 0.03 0.5 1.1 2.8 1.8 2.5 1.9
Paired-end
Read length:
200 400 2×35 2×502×10
02×25 2×35 2×50
Insert: (kb) 3.5 3.5* 0.2 0.2 0.2 3 3 3
Run time: (d)
0.3 0.4 6 10 10 12 10 16
Yield: (Gb) 0.1 0.5 2 9 30 2 8 32Source: http://www.politigenomics.com/next-generation-sequencing-informatics
*Now improved to 1 kb reads and choice of 3, 8 or 20 kb inserts #b=bases, B=bytes
Moore’s law applies!
The Sequencing Singularity!
Everything published is out of date!
Modes and Applications
For some applications, 454 read length essential, e.g. amplicon sequencing; otherwise assembly will create
chimeras differential splicing; translocations
For other applications read number is more important; read length less so Transcriptomics where 35 b read will identify
transcript SNP discovery/screening
Modes and Applications
Modes Basic shotgun ‘library’ Paired-end or mate-pair shotgun Amplicon sequencing
Applications Whole genome Metagenome, phylogenetic profiling Transcriptome SNP analysis; Splice variants; Methylation Targeted sequence capture by microarray; Small
RNAs
Modes and Applications
Sequencing run is the basic unit Basic cost of 454 or Illumina ~several £1000s per run
in consumables & essential on-costs Additions for consumables and/or staff time for
multiple library preparation some modes, e.g. paired end data analysis etc
Run can be subdivided Plate-dividing gaskets (loss of wells) Multiplex identifiers (MIDs or sequence barcodes)
So cost per sample may be ~£10s not £1000s! But logistics of filling a plate may incur delays
“De novo assembly” versus“alignment against template”(aka “re-sequencing”)
Bacterial Genomic Epidemiology
Genome sequencing brings the advantages of open-endedness (revealing the “unknown unknowns”), universal applicability ultimate in resolution
High-throughput platforms 454, Illumina, PacBio Expense and set-up puts them beyond average lab
Bench-top sequencing platforms generate data sufficiently quickly and cheaply to have
an impact on real-world clinical and epidemiological problems
The Birth of Genomic Epidemiology for Bacteria
The Birth of Genomic Epidemiology for Bacteria
Sequencing in Birmingham
@mjpallen@pathogenomenick
#AAMTHI
Case Study Acinetobacter baumannii
Gram-negative bacillusMulti-drug resistant
colistin and tigecycline as reserve agents moving towards pan-resistance
Associated with wound infections and ventilator-associated pneumonia bloodstream infections returning military personnel from Iraq and Afghanistan transmission from military to civilian patients
Acinetobacter baumannii: problems
Hard to identify in clinical laboratory Two related genomospecies 3 and 13TU, (now A. pittii and
A. nosocomialis) impossible to distinguish phenotypically
Outbreak strains can be identified by PFGE, VNTR and gene-specific assays BUT mode of spread and transmission chains often
uncertain, hindering optimal management of outbreaks and rational design of policies
Mechanism of resistance hard to identify in individual cases
Poor understanding of pathogen biology
Applications and Questions
Epidemiology Q1: Can whole-genome sequencing detect differences
between isolates within an outbreak? Q2: Can these differences be used to help determine
chains of transmission?
Emergence of Resistance Q3: Can it reveal how resistance emerges?
Taxonomy and Identification Q4: Can it tell us what defines a species within a
genus?
Acinetobacter Genomic Epidemiology
Outbreak in Birmingham Hospital in 2008Isolates indistinguishable by current typing
methods
Acinetobacter Genomic Epidemiology
454 whole-genome sequencing of 6 isolatesSNP detection by mapping reads against
draft reference assemblySNP filtering for false positivesSNP validation with Sanger sequencing of
PCR amplicons
Outbreak isolates distinguishable at only three loci
SNP 1 SNP 2 SNP 3
AB0057 C A G
M1 C A G
M2 T A G
M3 T A T
M4 T A G
C1 T T G
C2 T A G
Before and after tigecycline therapy
Genomes of two Acinetobacter baumannii isolates from single patient sequenced AB210 before
tigecycline therapy (susceptible); 454 sequenced
AB211 after therapy (resistant); Illumina-sequenced
Before and after tigecycline therapy
Eighteen SNPs detected between AB210 and AB211 nine non-synonymous including a SNP in adeS which accounts for resistance
phenotype
Three contigs in AB210 not covered by reads in AB211, representing three deletions of ~15, 44,17 kb mutS truncated; likely increase in mutation rate
Ion Torrent
Millions of wells reading sequencesMicrochip detects release of protons~3 hour run-time~£500 cost per run
Applications: Cancer Biology
Malignant Darwinism
Mutational frequency heterogeneity analysis to become an integral component of molecular pathology
Cancer is an evolutionary process
Applications: Cancer Biology
Genome versus exome versus transcriptomeEven a transcriptome provides • abundance of RNAs • expressed mutations (point mutations, indels, inversions), alternative and novel splicing, gene fusions, RNA editing
Applications: Cancer Biology
Deep precision measurements of mutation frequency in a tissue can be made using next generation sequencing of PCR amplicons spanning the mutation
Challenges
In recent cancer genomes ~50% of predicted SNVs from the primary sequence data could not be revalidated.
Many private germline polymorphisms still exist in every individual, so additional qualification against germline DNA is always necessary to distinguish somatic variants
Applications: Cancer Biology
Coding SNPs dominated by a few frequently mutated loci (oncogenes or tumour suppressors) long tail of population-infrequent SNPs driver/passenger distinction regulatory sequence mutations yet to be explored
Hundreds of genomes for each cancer type required to make sense of the mutations seen?
BUT driver mutations in some cancer subtypes found with much smaller studies C134Y FOXL2 mutation in adult type granulosa cell
tumours from the transcriptomes of four granulosa cell cases
Multiple Displacement Amplification
Single-cell Genomics
Or FACS or dilution or microfluidics)
What will you do when you can sequence everything?
Further Information
High-throughput sequencing technology http://pathogenomics.bham.ac.uk/blog http://www.nature.com/nrg/journal/v11/n1/pdf/nrg262
6.pdf http://onlinelibrary.wiley.com/doi/10.1002/smll.200900
976/pdf http://dx.doi.org/10.1016/j.tibtech.2008.07.003
Clinical Microbiology Pallen, Loman, Penn High-throughput sequencing
and clinical microbiology: progress, opportunities and challenges Current Opinion in Infectious Disease http://www.sciencedirect.com/science/journal/13695274
Further Information
Cancer genomics http://www.ncbi.nlm.nih.gov/pubmed/
19921711,19918804,20016485,20164919,20016488,20200521, 20371490 http://www.nature.com/nature/journal/v458/n7239/pdf/nature07943.pdf http://www.nature.com/news/2010/100414/pdf/464972a.pdf http://www.nature.com/nature/journal/v464/n7289/pdf/464678a.pdf http://www.nature.com/nature/journal/v464/n7289/pdf/464679a.pdf http://omicsomics.blogspot.com/2010/04/value-of-cancer-genomics.html http://cancergenome.nih.gov/ http://www.sanger.ac.uk/genetics/CGP/ http://scienceonline.org/cgi/content/full/sci;327/5969/1074 http://app2.capitalreach.com/esp1204/servlet/tc?
cn=aacr&c=10165&s=20435&e=12623&&m=1&br=80&audio=false http://app2.capitalreach.com/esp1204/servlet/tc?
cn=aacr&c=10165&s=20435&e=12624&&m=1&br=80&audio=false