high-throughput sequencing

50
PROFESSOR MARK PALLEN UNIVERSITY OF BIRMINGHAM High-throughput sequencing Overview and selected applications

Upload: mark-pallen

Post on 03-Dec-2014

2.849 views

Category:

Technology


0 download

DESCRIPTION

Talk on High-Throughput Sequencing: Overview and Selected Applications for Masters students. Nov 9th 2011

TRANSCRIPT

Page 1: High-Throughput Sequencing

PROFESSOR MARK PALLENUNIVERSITY OF BIRMINGHAM

High-throughput sequencing

Overview and selected applications

Page 2: High-Throughput Sequencing

Outline

What is high-throughput sequencing? How it works Key considerations

Applications Clinical Microbiology Cancer Biology

Page 3: High-Throughput Sequencing

Conventional Sequencing

Sanger dideoxy chemistry 1970sBacterial genome sequencing

1990s Whole-genome shotgun Clonal populations of template

molecules in vector/cloning host Read lengths >500 bp De novo assembly

Drawbacks Time-consuming, expensive, onerous

Beyond average project grant Out of reach of university infrastructure

Relies on colony propagation and picking Some sequences cannot be cloned

Page 4: High-Throughput Sequencing

High-throughput Sequencing

>100x faster, >100x cheaper! A disruptive technology

Three “second-generation” technologies in the marketplace 454 (Roche) Solexa (Illumina) SOLiD (ABI)

Fundamentally new approaches Solid-phase amplification of clonal

templates in “molecular colonies” Massive increase in number of “clones”

compensates for shorter read length New chemistries for sequence reading

Pyrophosphate detection (PPi release upon base addition): 454

Reversible addition of fluorescent : Solexa

Sequencing by Ligation: SOLiD

Page 5: High-Throughput Sequencing

Recent Developments

Single-molecule sequencing Pacific Biosciences (PacBio) Nanopore

Benchtop sequencers Ion Torrent MiSeq

Page 6: High-Throughput Sequencing

Sequencing in Birmingham

Page 7: High-Throughput Sequencing

454 Life Sciences (Roche)

Page 8: High-Throughput Sequencing

454 Life Sciences (Roche)

Page 9: High-Throughput Sequencing

Solexa/Illumina

Sequencing

Page 10: High-Throughput Sequencing

SOLiD Sequencing

Requires emPCRLong run-timesShort read-lengths(stuck at 50bp)Sequences in colour space

Page 11: High-Throughput Sequencing

Vendor: Roche Illumina ABI

Technology:

454 Solexa GA SOLiD

Platform: GS20 FLX Ti I II IIx 1 2 3

Reads: (M) 0.5 0.5 1.25 28 100 150 40 115 320

Fragment

Read length:

100 200 400* 35 50 100 25 35 50

Run time: (d)

0.25 0.3 0.4 3 3 5 6 5 8

Yield: (Gb#)

0.05 0.1 0.5 1 5 15 1 4 16

Rate: (Gb/d)

0.2 0.33 1.25 0.33 1.67 3 0.34 1.6 2

Images: (TB#)

0.01 0.01 0.03 0.5 1.1 2.8 1.8 2.5 1.9

Paired-end

Read length:

200 400 2×35 2×502×10

02×25 2×35 2×50

Insert: (kb) 3.5 3.5* 0.2 0.2 0.2 3 3 3

Run time: (d)

0.3 0.4 6 10 10 12 10 16

Yield: (Gb) 0.1 0.5 2 9 30 2 8 32Source: http://www.politigenomics.com/next-generation-sequencing-informatics

*Now improved to 1 kb reads and choice of 3, 8 or 20 kb inserts #b=bases, B=bytes

Moore’s law applies!

The Sequencing Singularity!

Everything published is out of date!

Page 12: High-Throughput Sequencing

Modes and Applications

For some applications, 454 read length essential, e.g. amplicon sequencing; otherwise assembly will create

chimeras differential splicing; translocations

For other applications read number is more important; read length less so Transcriptomics where 35 b read will identify

transcript SNP discovery/screening

Page 13: High-Throughput Sequencing

Modes and Applications

Modes Basic shotgun ‘library’ Paired-end or mate-pair shotgun Amplicon sequencing

Applications Whole genome Metagenome, phylogenetic profiling Transcriptome SNP analysis; Splice variants; Methylation Targeted sequence capture by microarray; Small

RNAs

Page 14: High-Throughput Sequencing

Modes and Applications

Sequencing run is the basic unit Basic cost of 454 or Illumina ~several £1000s per run

in consumables & essential on-costs Additions for consumables and/or staff time for

multiple library preparation some modes, e.g. paired end data analysis etc

Run can be subdivided Plate-dividing gaskets (loss of wells) Multiplex identifiers (MIDs or sequence barcodes)

So cost per sample may be ~£10s not £1000s! But logistics of filling a plate may incur delays

Page 15: High-Throughput Sequencing

“De novo assembly” versus“alignment against template”(aka “re-sequencing”)

Page 16: High-Throughput Sequencing

Bacterial Genomic Epidemiology

Genome sequencing brings the advantages of open-endedness (revealing the “unknown unknowns”), universal applicability ultimate in resolution

High-throughput platforms 454, Illumina, PacBio Expense and set-up puts them beyond average lab

Bench-top sequencing platforms generate data sufficiently quickly and cheaply to have

an impact on real-world clinical and epidemiological problems

Page 17: High-Throughput Sequencing

The Birth of Genomic Epidemiology for Bacteria

Page 18: High-Throughput Sequencing

The Birth of Genomic Epidemiology for Bacteria

Page 19: High-Throughput Sequencing

Sequencing in Birmingham

@mjpallen@pathogenomenick

#AAMTHI

Page 20: High-Throughput Sequencing

Case Study Acinetobacter baumannii

Gram-negative bacillusMulti-drug resistant

colistin and tigecycline as reserve agents moving towards pan-resistance

Associated with wound infections and ventilator-associated pneumonia bloodstream infections returning military personnel from Iraq and Afghanistan transmission from military to civilian patients

Page 21: High-Throughput Sequencing

Acinetobacter baumannii: problems

Hard to identify in clinical laboratory Two related genomospecies 3 and 13TU, (now A. pittii and

A. nosocomialis) impossible to distinguish phenotypically

Outbreak strains can be identified by PFGE, VNTR and gene-specific assays BUT mode of spread and transmission chains often

uncertain, hindering optimal management of outbreaks and rational design of policies

Mechanism of resistance hard to identify in individual cases

Poor understanding of pathogen biology

Page 22: High-Throughput Sequencing

Applications and Questions

Epidemiology Q1: Can whole-genome sequencing detect differences

between isolates within an outbreak? Q2: Can these differences be used to help determine

chains of transmission?

Emergence of Resistance Q3: Can it reveal how resistance emerges?

Taxonomy and Identification Q4: Can it tell us what defines a species within a

genus?

Page 23: High-Throughput Sequencing

Acinetobacter Genomic Epidemiology

Outbreak in Birmingham Hospital in 2008Isolates indistinguishable by current typing

methods

Page 24: High-Throughput Sequencing

Acinetobacter Genomic Epidemiology

454 whole-genome sequencing of 6 isolatesSNP detection by mapping reads against

draft reference assemblySNP filtering for false positivesSNP validation with Sanger sequencing of

PCR amplicons

Page 25: High-Throughput Sequencing

Outbreak isolates distinguishable at only three loci

  SNP 1  SNP 2  SNP 3 

AB0057   C A G

M1  C A G

M2  T  A G

M3  T  A T 

M4  T  A G

C1  T  T  G

C2   T  A G

Page 26: High-Throughput Sequencing
Page 27: High-Throughput Sequencing
Page 28: High-Throughput Sequencing

Before and after tigecycline therapy

Genomes of two Acinetobacter baumannii isolates from single patient sequenced AB210 before

tigecycline therapy (susceptible); 454 sequenced

AB211 after therapy (resistant); Illumina-sequenced

Page 29: High-Throughput Sequencing

Before and after tigecycline therapy

Eighteen SNPs detected between AB210 and AB211 nine non-synonymous including a SNP in adeS which accounts for resistance

phenotype

Three contigs in AB210 not covered by reads in AB211, representing three deletions of ~15, 44,17 kb mutS truncated; likely increase in mutation rate

Page 30: High-Throughput Sequencing
Page 31: High-Throughput Sequencing

Ion Torrent

Millions of wells reading sequencesMicrochip detects release of protons~3 hour run-time~£500 cost per run

Page 32: High-Throughput Sequencing
Page 33: High-Throughput Sequencing
Page 34: High-Throughput Sequencing
Page 35: High-Throughput Sequencing
Page 36: High-Throughput Sequencing

Applications: Cancer Biology

Page 37: High-Throughput Sequencing

Malignant Darwinism

Mutational frequency heterogeneity analysis to become an integral component of molecular pathology

Cancer is an evolutionary process

Page 38: High-Throughput Sequencing

Applications: Cancer Biology

Genome versus exome versus transcriptomeEven a transcriptome provides • abundance of RNAs • expressed mutations (point mutations, indels, inversions), alternative and novel splicing, gene fusions, RNA editing

Page 39: High-Throughput Sequencing

Applications: Cancer Biology

Deep precision measurements of mutation frequency in a tissue can be made using next generation sequencing of PCR amplicons spanning the mutation

Page 40: High-Throughput Sequencing
Page 41: High-Throughput Sequencing

Challenges

In recent cancer genomes ~50% of predicted SNVs from the primary sequence data could not be revalidated.

Many private germline polymorphisms still exist in every individual, so additional qualification against germline DNA is always necessary to distinguish somatic variants

Page 42: High-Throughput Sequencing

Applications: Cancer Biology

Coding SNPs dominated by a few frequently mutated loci (oncogenes or tumour suppressors) long tail of population-infrequent SNPs driver/passenger distinction regulatory sequence mutations yet to be explored

Hundreds of genomes for each cancer type required to make sense of the mutations seen?

BUT driver mutations in some cancer subtypes found with much smaller studies C134Y FOXL2 mutation in adult type granulosa cell

tumours from the transcriptomes of four granulosa cell cases

Page 43: High-Throughput Sequencing
Page 44: High-Throughput Sequencing
Page 45: High-Throughput Sequencing
Page 46: High-Throughput Sequencing

Multiple Displacement Amplification

Single-cell Genomics

Or FACS or dilution or microfluidics)

Page 47: High-Throughput Sequencing
Page 48: High-Throughput Sequencing

What will you do when you can sequence everything?

Page 49: High-Throughput Sequencing

Further Information

High-throughput sequencing technology http://pathogenomics.bham.ac.uk/blog http://www.nature.com/nrg/journal/v11/n1/pdf/nrg262

6.pdf http://onlinelibrary.wiley.com/doi/10.1002/smll.200900

976/pdf http://dx.doi.org/10.1016/j.tibtech.2008.07.003

Clinical Microbiology Pallen, Loman, Penn High-throughput sequencing

and clinical microbiology: progress, opportunities and challenges Current Opinion in Infectious Disease http://www.sciencedirect.com/science/journal/13695274

Page 50: High-Throughput Sequencing

Further Information

Cancer genomics http://www.ncbi.nlm.nih.gov/pubmed/

19921711,19918804,20016485,20164919,20016488,20200521, 20371490 http://www.nature.com/nature/journal/v458/n7239/pdf/nature07943.pdf http://www.nature.com/news/2010/100414/pdf/464972a.pdf http://www.nature.com/nature/journal/v464/n7289/pdf/464678a.pdf http://www.nature.com/nature/journal/v464/n7289/pdf/464679a.pdf http://omicsomics.blogspot.com/2010/04/value-of-cancer-genomics.html http://cancergenome.nih.gov/ http://www.sanger.ac.uk/genetics/CGP/ http://scienceonline.org/cgi/content/full/sci;327/5969/1074 http://app2.capitalreach.com/esp1204/servlet/tc?

cn=aacr&c=10165&s=20435&e=12623&&m=1&br=80&audio=false http://app2.capitalreach.com/esp1204/servlet/tc?

cn=aacr&c=10165&s=20435&e=12624&&m=1&br=80&audio=false