polymerase colonies & fisseq
DESCRIPTION
Polymerase colonies & Fisseq. 1-Apr-2003 Santa Fe. Thanks to: DOE HGP & GtL & DARPA BioComp Wash U: Rob Mitra HMS: Jay Shendure, Jun Zhu, Vincent Butty, Ben Williams. U. Del: Jeremy Edwards, Josh Merritt Ambergen: Jerzy Olejnik. - PowerPoint PPT PresentationTRANSCRIPT
Thanks to: DOE HGP & GtL & DARPA BioComp
Wash U: Rob Mitra
HMS: Jay Shendure, Jun Zhu, Vincent Butty,
Ben Williams.
U. Del: Jeremy Edwards, Josh Merritt
Ambergen: Jerzy Olejnik
1-Apr-2003 Santa Fe
Polymerase colonies & Fisseq
gggatttagctcagttgggagagcgccagactgaa gatttg gaggtcctgtgttcgatccacagaattcgcacca
Modeling successes:
3D & Sequence alignment
DNA RNA Proteins
Metabolites
Replication rate
Environment
Biosystems Integrating Measures & Models
Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
interactions
Improving Models & Measures
Why model?
“Killer Applications”: Share, Search, Merge, Check, Design
The issue is not speed, but integration.Cost per 99.99% bp : Including Reagents, Personnel, Equipment/5yr, Overhead/sq.m• Sub-mm scale : 1m = femtoliter (10-15)• Instruments should match GHz / $2K CPU
Why improve measurements?
Human genomes (6 billion)2 = 1019 bpImmune & cancer genome changes >1010 bp per time pointRNA ends & splicing: in situ 1012 bits/mm3
Biodiversity: Environmental & lab evolution Compact storage 105 now to 1017 bits/ mm3 eventually
& How? ($1K per genome, 108-1013 bits/$ )
Examples of cost bottlenecks
Affymetrix $30M? microfabricator limited by chemical reaction rate to one set of chips per day.
Electrophoresis limited to 4000 bp/capillary/day. Fix cost ratio of capillaries to CPUs.
Projected costs determine when biosystems data overdetermination is feasible.
In 1984, pre-HGP (X, pBR322, etc.) 0.1bp/$, would have been $30B per human
genome.
In 2002, (de novo full vs. resequencing ) ABI/Perlegen/Lynx: $300M vs. $3M
103 bp/$ (4 log improvement)
Other data I/O (e.g. video) 1013 bits/$
Steeper than exponential growth
0.001
0.01
0.1
1
10
100
1000
10000
1970 1980 1990 2000 2010
bp/$R2 = 0.985
R2 = 0.992
-5-3-113579
111315
1830 1850 1870 1890 1910 1930 1950 1970 1990 2010
log(IPS/$K)
log(bits/sec transmit)
http://www.faughnan.com/poverty.htmlhttp://www.kurzweilai.net/meme/frame.html?main=/articles/art0184.html
Kurzweil/Moore's law of ICs 1965
New sequencing approaches in commercial R&DMethod liter/bp Length Error Test-set $/device bp/hr
Capil fluidics e-6 600 <0.1% 1e11 350k 80k
ABI, Amersham, GenoMEMS, Caliper*, RTS*
SeqByHyb e-12 1 <5% 1e9 200k 1M
Perlegen-Affymetrix*, Xeotron*
Mass Spectrometry Sequenom, Bruker*
Single molecule >e-24 >>40 ? >80 30k-1M 180k
Pore(Agilent*) Fluor(USGenomics, Solexa) FRET(VisiGen,Mobious,Caltech)
In vitro DNA-Amplification (e.g. Polonies) -- Multiplex cycles:
Lynx* e-15 20 <3% 1e7 ? 1M
Pyroseq.* e-6 >40 <1% 1e6 100k 5k
HMS* e-13<1% 40 90k >1M?
ParAllele, 454, RTS**Church lab involvement
Why single molecules?
(1) Integration from cells/genomes/RNAs to data
(2) Geometry, “cis-ness” on a molecule, complex, or cell.e.g. DNA Haplotypes & RNA splice-forms
(3) Asynchronous dNTP incorporation
Polymerasecolonies
(Polonies) along a DNA
or RNAmolecule
A’
A’A’
A’
A’
A’
B
BB
B
BB
A
Single Molecule From Library
B
BA’
A’
1st Round of PCR
Primer is Extendedby Polymerase
B
A’
BA’
Polymerase colony (polony) PCR in a gel
Primer A has 5’ immobilizing Acrydite
Mitra & Church Nucleic Acids Res. 27: e34
• Hybridize Universal Primer • Add Red (Cy3) dTTP. Wash.• Add Green (FITC) dCTP• Wash; Scan
B B’
3’ 5’
AGT.
TC
B B’
3’ 5’
GCG..
C
Sequence polonies by sequential, fluorescent single-base extensions
Inexpensive, off-the-shelf equipment
MJR in situ Cycler$10K
Automatedslide fluidics
$4K
MicroarrayScanner
$26K-100K
Slide fluidics automation #1: The Grunt
Human Haplotype:CFTR gene
45 kbp
Rob MitraVincent ButtyJay ShendureBen Williams
Quantitative removal of Fluorophores
Rob Mitra
Template ST30:3' TCACGAGT
Base added: (C) A G T (C)
(A) G (T) C (A)
(G) T C A
3' TCACGAGT AGTGCTCA
Sequencing multiple polonies
Rob Mitra
Mutiple Image Alignment
Metric based on optimal coincidence of high intensity noise pixels over a matrix of local offsets (0.4 pixel precision)
Polony exclusion principle &Single pixel sequences
Mitra & Shendure
DNA RNA Proteins
Metabolites
Replication rate
Environment
Biosystems Integrating Measures & Models
Microbes Cancer & stem cells Darwinian optimaIn vitro replicationSmall multicellular organisms
RNAiInsertionsSNPs
interactions
Alternatively Spliced Cell Adhesion Molecule
Specific variable exons are up-or-down-regulated in various cancers
Controversial prospective diagnostic / prognostic marker (>1000 papers)
Can full isoforms resolve controversy and/or act as superior markers?
Eph4 = murine mammary epthithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
F R
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
TMA
CD44
CD44 Exon Combinatorics (Zhu & Shendure)
1. Search Signature Image for qualified ‘objects’
a. > 50 connected pixels with same signature valueb. ‘solidity’ of > 0.50c. long axis / short axis ratio < 3
OR
a. > 25 connected pixels with same signature valueb. ‘solidity’ of > 0.80c. long axis / short axis ratio < 1.5
2. Search for internal regional maxima within each object (lest two adjacent polonies with same signature get counted as one)
3. Assign centroid locations as qualified individual ‘polonies’
Trial & Error Derived Algorithm for Polony Finding
V1
V2
V3
V4
V5
V6
V7
V8
V9
V1
0
Jun Zhu
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Eph4 = murine mammary epthithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
Summary of Counts (isoforms)
1. Replica Plating of DNA images [Mitra et al. NAR 1999]
2. Long Range Haplotyping [Mitra et al. PNAS 2003]
3. Allelic mRNA Quantitation (HEP) [Mitra et al. in prep]
4. Alternative Splicing Combinatorics [Zhu et al. 2003]
5. Precise SNP-mutant & mRNA ratios [Merritt et al. 2003]
6. Fluorescent in situ Sequencing (FISSEQ) [Mitra et al. 2003]
7. Multiplex Genotyping [ApoE, Hyman, Shendure & Williams]
8. In situ / single-cell extensions of the above [Zhu & Williams]
Polony Flavors
1. Scale up slide making
2. Anchor points in long DNA (mini-Tn vs tagged-random primers)
3. Runs a. Signatureb. Quantitatec. Terminatiors
Next steps
Long-range continuity inspired by DNA-Fiber Fluorescent In Situ Hybridization
300 kb = 100 microns
http://allserv.rug.ac.be/~fspelema/neubla/content/images_r.htm
129 bp mini Tn5