dna and genome sequencing - illinoisstan.cropsci.uiuc.edu/courses/cpsc265/class9.pdf · dna...

51
Matthew Hudson Dept of Crop Sciences University of Illinois DNA and genome sequencing

Upload: others

Post on 27-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Matthew HudsonDept of Crop Sciences

University of Illinois

DNA and genome sequencing

Page 2: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Genome projects

2,424 ongoing genome projects

696 for eukaryotes

520 completed genomes

47 from eukaryotes

Almost every crop now has a genome project

Page 3: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

DNA Sequencing

• Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s. Often called “Sanger sequencing”.

Nobel prize number 2 for Fred Sanger in 1980, shared with WalterGilbert from Harvard (inventor of the now little-used Maxam-Gilbertsequencing method).

Page 4: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Sanger’s Dideoxy DNA sequencing method -How it works:

1. DNA template is denatured to single strands.

2. DNA primer (with 3’ end near sequence of interest) is annealed to the template DNA and extended with DNA polymerase.

3. Four reactions are set up, each containing:

1. DNA template – eg a plasmid2. Primer3. DNA polymerase4. dNTPS (dATP, dTTP, dCTP, and dGTP)

4. Next, a different radio-labeled dideoxynucleotide (ddATP, ddTTP, ddCTP, or ddGTP) is added to each of the four reaction tubes at 1/100th the concentration of normal dNTPs……

Page 5: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 6: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

ddNTPs are terminators: they possess a 3’-H instead of 3’-OH, compete in the reaction with normal dNTPS, and produce no phosphodiester bond.

Whenever the radio-labeled ddNTPs are incorporated in the chain, DNA synthesis terminates.

Terminators stop further elongation of a DNA deoxyribose-phosphate backbone

Page 7: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

“hasta la vista”

Page 8: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Manual Dideoxy DNA sequencing-How it works (cont.):

5. Each of the four reaction mixtures produces a population of DNA molecules with DNA chains terminating at each “terminator”base..

6. Extension products in each of the four reaction mixutes also end with a different radio-labeled ddNTP (depending on the base).

7. Next, each reaction mixture is electrophoresed in a separate lane (4 lanes) at high voltage on a polyacrylamide gel.

8. Pattern of bands in each of the four lanes is visualized on X-ray film.

9. Location of “bands” in each of the four lanes indicate the size of the fragment terminating with a respective radio-labeled ddNTP.

10. DNA sequence is deduced from the pattern of bands in the 4 lanes.

Page 9: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Vigilant et al. 1989PNAS 86:9350-9354

Page 10: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Short products

Long products

Radio-labeled ddNTPs (4 rxns)

Sequence (5’ to 3’)

GGATATAACCCCTGT

Page 11: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Manual vs automatic sequencing

Manual sequencing has basically died out.

It needs four lanes, radioactive gels, and a technician in one day from one gel can get four sets of four lanes, with maybe 300 base pairs of data from each template.

Everyone now uses “automatic sequencing” – the downside is no one lab can afford the machine, so it is done in a central facility (eg. Keck center).

Most automated DNA sequencers can load robotically and operate around the clock for weeks with minimal labor.

Page 12: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Dye deoxy terminators

One tube.One gel lane or capilliary

Page 13: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Robotic 96 capillary machine:ABI 3730 xl

Page 14: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

DNA sequence output from ABI 377 (a gel-based sequencer)

1. Trace files (dye signals) are analyzed and bases called to create chromatograms.

2. Chromatograms from opposite strands are reconciled with software to create double-stranded sequence data.

Page 15: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Genome sequencing

How do you use these chunks of sequence to make a “whole genome” sequence?

Page 16: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

The “traditional” genome

A physical map is madeA BAC “tiling path” is createdBACs are farmed out to hundreds of collaborating laboratoriesEach lab does a few BACs

Arabidopsis, E. coli etc were done this way, but since Craig Venter got interested, everything is “going shotgun”

Page 17: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Shotgun Genome Sequencing

Slow and expensive..but accurate and completeand assembly is straightforward

Much faster and cheapervery hard to get complete genomeassembly of large (>10Mb) genomes

Page 18: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Finished genome Shotgun genome Maize now

Whole chromosome sequences 100kb average chunks Some BAC contigsDone clone by clone Need physical map MAGIse.g. human, Arabidopsis e.g. poplar

Page 19: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Shotgun sequencing

~700 bases per read

One or two reads per clone

Shotgun sequence of mouse, ~2.6GB, 7x coverage

That’s 26,000,000 sequencing reactions, 13,000,000 minipreps…

Extract DNA

ShearLigate into

library

Pick clones Grow clones

Extract vectorDNA

Sequence using ddNTPs

Read fragmentswith gel orcapillary

Page 20: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

The genome factory

There are a few centers around the world that havea “factory” big enough to do shotgun sequence of a large eukaryotic genome:

Broad Institute, MITBaylor College of Medicine, HoustonWashington University, St LouisDoE Joint Genomics Institute, Walnut Creek, CA

Sanger Centre, CambridgeBeijing Genomics Institute, Chinese Academy of Sciences

Page 21: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Pictures from JGI

Page 22: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Qpix robot – picks colonies

Page 23: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 24: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 25: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Biomek – PCR / cleanup robot

Page 26: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 27: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

PCR – 384 x 4 x 48 x 3

Page 28: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

About 150 sequencers, at $200,000 each…

Page 29: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Sequence analysis

Page 30: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Bioinformatics

Armies of programmers and large supercomputers are necessary toassemble and annotate the sequence

Page 31: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Assembly and annotation

Assembly – we have to compare those 30,000,000 seqenceswith each other and work out how they fit together. Nasty mathematical problem…

Annotation – when we have the sequence, we have to work out where the genes are and what they do. Mostly a computational problem – very large databases.

Page 32: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Whole-genome resequencing

Wouldn’t it be great to have the whole genome of each line you work with? Then the whole genome would be haplotyped.

Whole plant or metazoan genomes still cost $40-50m

NIH have target for human genome to cost $100,000 in 2010

$1,000 in 2020

This is likely to be achieved ahead of schedule

Human resequencing technology is likely to have a big impact on plant biology also.

Page 33: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Cost of sequencing is falling exponentially

0.001

0.01

0.1

1

10

1994 1996 1998 2000 2002 2004 2006

Cos

t per

bas

e ($

)

Page 34: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Robotic 96 capillary machine:ABI 3730 xl

Page 35: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

DNA sequence output from ABI 377 (a gel-based sequencer)

1. Trace files (~350KB / run)

2. Analyzed and bases called to create sequence and quality files (~2kb / run)

3. One run is about 700 base pairs (bp)

4. Typical genome project – soybean – 6M runs so far

Page 36: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Limits to how cheap sequencing can get using the Sanger method

~700 bases per read

One or two reads per clone

Cost: $2 per read high throughputPlus costs of clone generation ~$1Total current lowest cost, ~$5/kb, 0.5c /Q20 base

Extract DNA

ShearLigate into

library

Pick clones Grow clones

Extract vectorDNA

Sequence using ddNTPs

Read fragmentswith gel orcapillary

Page 37: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Next-generation sequencing

A number of proprietary technologies, most based on the manipulation of microbeads and/or nanobeads where sequencing is performed without gels or capillaries

First on the market was a company called 454 (now Roche) now on the second generation of instruments.

454 have a major competitor in Solexa (now Illumina)

Recently AB announced its own next-generation platform, SOLiD (AB acquired Agencourt)

Page 38: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Next-generation sequencing approach Extract and

Shear DNA

Fluorescent orluminescent

readout in situ

Isolate clonalmolecules on beads

“polony”amplification

Immobilize onSolid support

No E. coli

No plasmids

No freezers

No hydras

No gels

No capillaries

Page 39: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

454 Sequencing technology

Page 40: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Picowell (50nm) technology

Page 41: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Sequencing by synthesis using chemiluminescenceGS20:

20Mb of sequence for ~$5,000 in running costsQuality is similar to early ESTs (97-98% at best)We have no clone information, so no read pairings

Homopolymer…

Page 42: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 43: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

“flowgram file” – binary SFF format

About 250 MB per run

Similar to trace file – contains luminosity readingsfor each of 1.6M wells from a photomultiplier,for each of four bases, for each of 42 flow cycles

Processed using on-board FPGA with instrument

Others have tried to improve software, but 454’s is still best all round

Data output

Page 44: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

454 “FLX”

Claimed: 100 MB per run, 200+ base reads

Cost: ~$12,000 / run in reagents & basic maintenance

Ours delivered Tues June 12 –no data yet

Page 45: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

1Gb of sequence for < $3,000 in running costs

Page 46: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 47: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 48: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Data output

No access to data yet, reportedly:

A series of huge image filesEach is colorAnalysis uses image analysis techniquesRaw data output is ~ 500GB per runCurrent customers say compute infrastructure cannot cope100s of CPU hours to process one runRaw data currently must be discarded

Page 49: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Polony sequencing / ABI SOLiD

George Church’s group invented “polony” method

Since developed by Agencourt

Now bought by ABI

Similar to Solexa – no wells, small beads, 4-color fluorescent detection, about 1G per run, about $3,000 per run

Uses ligation of nucleotide-specific probes rather than reversible terminators

Page 50: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s
Page 51: DNA and genome sequencing - Illinoisstan.cropsci.uiuc.edu/courses/cpsc265/Class9.pdf · DNA Sequencing • Dideoxy sequencing was developed by Fred Sanger at Cambridge in the 1970s

Summaryof NGStechnologies