thanks to: agencourt, ambergen, atactic, beyondgenomics, caliper, genomatica, genovoxx, helicos,...

32
Thanks to: Agencourt , Ambergen, Atactic , BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen , ThermoFinnigan, Xeotron/Invitrogen For more info see: arep.med.harvard.edu DOE Wed 3-Nov-2004 11:30 AM Analysis & Synthesis of Omes

Post on 22-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

For more info see: arep.med.harvard.edu

DOE Wed 3-Nov-2004 11:30 AM

Analysis & Synthesis of Omes

Page 2: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Systems Biology Loop

Syntheses &Perturbations

Models

Experimental designs

(Systematic)

Data

Proteasome targetingGenome engineering

Metabolic optimality

Flux & Competitive growth

DNA & RNAPolony-Seq

Synthetic Biology Tools

Page 3: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

DOE Synthetic Genomes: Why?Cheaper/faster "standard biology", hypothesis testing

Systems Biology: Multiple simultaneous tests

Viruses: Aid strain transfer; generate variants, new haplotypes

Anti-viral vaccines and therapeutics (including variants)

In vitro: Make products toxic in E.coli.

Microbes: Interspecific hybrids (e.g. codon usage)

Structural biology: variants

Rapid vaccine response to engineered bioterrorism.

Cell-mediated immunity + humoral.

Fix mismatch between genome analysis & synthesis

Page 4: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

DOE Synthetic Genomes: Why?In vitroMicrobial & Human AntimutatorsArtificial ecosystems (laboratory scales)Energy aiding pathway improvementInstrustrial production: Enzymes, SingleCellProtein, Protein-

drugs Remediation: Hybrid genomes (opt. codons), combinatorial

pathway (Maxygen & Diversa). Xylose & OilPharmaceuticals: Combinatorial synthesesNano science Combinatorial syntheses, Complex nanosystems,

more general nanoassembly (in reach of polymerases and ribosome-like factories)

Health research: 10X faster results per current $ (cost/benefit)Hypothesize & test unknown gene combinations Synthetic standards (arrays, MS, quantitation, etc)Agriculture: salt, cold, drought, pest tolerant hybrid genomes

Page 5: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Motif Co-occurrence, comparative genomics, RNA clusters, and/or ChIP2-location data

P= 10-6 to 10-11

Genome Res. 14:201–208Bulyk, McGuire,Masuda,Church

Page 6: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Synthetic testing of DNA motif combinations

1.3 2.4 (1.3 in argR)

1.1 1.3

0.7 2.5

0.2 1.4

1.4 3.5

RNA Ratio (motif- to wild type) for each flanking gene

Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208

Page 7: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Synthetic Genomes & Proteomes. Why?

• Test or engineer cis-DNA/RNA-elements •Access to any protein (complex) including post-transcriptional modifications• Affinity agents for the above.• Mass spectrometry standards, protein design• Utility of molecular biology DNA-RNA-Protein

in vitro "kits" (e.g. PCR, SP6, Roche)

Toward these goals design a chassis:• 115 kbp genome. 150 genes.• Nearly all 3D structures known.• Comprehensive functional data.

Page 8: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

(PURE) translation utility

Removing tRNA-synthetases, translational release-factors,RNases & proteases

Selection of scFvs specific for HBV DNA polymerase using ribosome display. Lee et al. 2004 J Immunol Methods. 284:147

Programming peptidomimetic syntheses by translating genetic codes designed de novo. Forster et al. 2003 PNAS 100:6353

High level cell-free expression & specific labeling of integral membrane proteins. Klammt et al. 2004 Eur J Biochem 271:568

Cell-free translation reconstituted with purified components. Shimizu et al. 2001 Nat Biotechnol. 19:751-5.

Page 9: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

in vitro genetic codes

5'

mS yU eU

UGGUUG CAG

AAC... GUU A 3'GAAACCAUG

fM TN V E

| | | | | || | |

5' Second base 3'

U

A

C

C U

mSyU

eU

A C U

G

A

0

500

1000

1500

2000

2500

3000

3500

30 40 50 60 70 80

3H-E dpm

time (min.)

fM yU mS eU E |

Forster, et al. (2003) PNAS 100:6353-7

80% average yieldper unnatural coupling.

bK = biotinyllysine , mS = Omethylserine eU=2-amino-4-pentenoic acid yU = 2-amino-4-pentynoic acid

Page 10: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Mirror world : enzyme, parasite, & predator resistance& access 2n diastereomers (n chiral atoms)

L-amino acids & D-ribose (rNTPs, dNTPs)

Transition: EF-Tu, peptidyl transferase, DNA-ligase

D-amino acids & L-ribose (rNTPs, dNTPs)

Dedkova, et al. (2003) Enhanced D-amino acid incorporation into protein by modified ribosomes. J Am Chem Soc 125, 6616-7

Page 11: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Escherichia coli Mycoplasma 3D structureColiphage 29 DNA polymerase + +Coliphage P1 Cre recombinase - + >Coliphage Lox/Cre recombinase site - +Coliphage T7 RNA polymerase + + >Coliphage T7 RNA polymerase initiation site + + >Coliphage T7 RNA polymerase termination site + +RNase P RNA + -RNase P protein + + >RNase P site/RNA primer for DNA polymerase + +Small subunit 16S ribosomal RNA + +All 21 small subunit ribosomal proteins (1-21) + except 1,21 +Large subunit 5S ribosomal RNA + +Large subunit 23S ribosomal RNA + +Large subunit 23S rRNA G2445>m2G methylase: unknown ? -Large subunit 23S rRNA U2449>dihydroU synthetase: unknown ? -Large subunit 23S rRNA U2457>pseudoU synthetase ? -Large subunit 23S rRNA C2498>Cm methylase: unknown ? -Large subunit 23S rRNA A2503>m2A methylase: unknown ? -Large subunit 23S rRNA U2504>pseudoU synthetase ? -All 33 large subunit ribosomal proteins (1-7,9-11,13-25,27-36) + except 25, 30 +Translational initiation factor 1 + +Translational initiation factor 2 + +Translational initiation factor 3 + +Translational elongation factor Tu + +Translational elongation factor Ts + +Translational elongation factor G + +Translational release factor 1 + +Translational release factor 2 - +Translational release factor Gln methylase + +Translational release factor 3 - +Ribosome recycling factor + +33/45 Transfer RNAs (see Fig. 2) 29/33 +tRNA(I) C34>lysidine synthetase ? +tRNA(R) A34>I deaminase ? +tRNA(ASV) U34>cmo5U (=V) synthetase: unknown - -tRNA(R) U34>2sU Cys desulfurase - +tRNA(R) nm5U34 methylase ? +tRNA(R) U34>cmnm5U GTPase ? +tRNA(R) U34>cmnm5U synthetase ? +tRNA(R) cmnm5U34>nm5U,mnm5U synthetase ? -tRNA(R) G37 N1-methylase + +tRNA(RNIKM) A37>t6A N6-threonylcarbamoyl-A synthetase: unknown + -tRNA(CLFSWY) A37>i6A synthetase - +tRNA(CLFSWY) i6A37>s2i6A(ms2i6A) synthetase - +All 22 aminoacyl-tRNA synthetase subunits (20 enzymes) + except G subunit, Q + except G subunitMet-tRNA formyltransferase + +Chaperonin DnaK + +Chaperonin GroEL + +Chaperonin GroES + +

Total genes = 150Forster & Church

Oligos for 150 & 776

synthetic genes(for E.coli minigenome & M.mobile whole genome

respectively)

Page 12: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes)

<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert

Tian, Gong, Church

Page 13: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per

oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)

Solution: Amplify the oligos then release them.

10 50 10 => ss-70-mer (chip)

20-mer PCR primers with restriction sites at the 50mer junctions

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

=> ds-90-mer

=> ds-50-mer

Page 14: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Improve DNA Synthesis Accuracyvia mismatch selection

Tian & Church Other mismatch methods: MutS (&H,L)

Page 15: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Genome assembly

Moving forward: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding)2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. >30 kbp homologous (Nick Reppas)4. Phage integrase site-specific recombination, also for counters.

Stemmer et al. 1995. Gene 164:49-53;Mullis 1986 CSHSQB.

50

75

125 225 425 825 … 100*2^(n-1)

Page 16: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

All 30S-Ribosomal-protein DNAs(codon re-optimized)

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

1.7 kb

0.3 kb

Page 17: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Improving synthesis accuracy 9-fold

MethodTotal

bp#

ClonesTrans-ition

Trans-version Deletion Addition Bp/error

Hyb selection, PCR 23641 9 7 3 5 2 1391Gel selection, PCR 24546 35 28 12 11 3 455

No selection, ligation+PCR 6093 25 6 6 22 4 160

No selection, PCR 9243 21 25 13 19 1 159

Tian & Church

Page 18: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Extreme mRNA makeover for protein expression in vitro

RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.

RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.

Solution: Iteratively resynthesize all mRNAs with less mRNA structure.

Tian & Church

20w 20m 17w 17m 16w 16m

10kd

W: wild-typeM: modified

Western blot based on His-tags

Page 19: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Systems Biology Loop

Syntheses &Perturbations

Models

Experimental designs

(Systematic)

Data

Proteasome targetingGenome engineering

Metabolic optimality

Flux & Competitive growth

DNA & RNAPolony-Seq

Synthetic Biology Tools

Page 20: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Why sequence?

• Cancer: mutation sets for individual clones, loss-of-heterozygosity• Pathogen "weather map", biowarfare sensors• RNA splicing & chromatin modification patterns.• Synthetic biology & lab selections• Antibodies or "aptamers" for any protein• B & T-cell receptor diversity: Temporal profiling, clinical • Preventative medicine & genotype–phenotype associations• Cell-lineage during development• Phylogenetic footprinting, biodiversity

Shendure et al. 2004 Nature Rev Gen 5, 335.

Page 21: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Sequencing single molecules

Ecosystem studies really need single-cell amplification because of multiple chromosomes (& RNAs)

(Even an 80% genome coverage is better than 100 kb BACs)

Page 22: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Single bacterial chromosome amplification

Ratio to unamplified hybridization along thechromosome ofEscherichia & ProchlorococusonAffymetrix chips.

Page 23: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Convergence on non-electrophorectic tag sequencing methods?

Tag >400 14-26 20 100 26 bp (2-ends) EST SAGE MPSS 454 Polony-Seq • Single-molecule vs. amplified single molecule. • Array vs. bead packing vs. random• Rapid scans vs. long scans (chemically limited, 454)• Number of immobilized primers: 0: Chetverin'97 "Molecular Colonies" 1: Mitra'99 > Agencourt "Bead Polonies" 2: Kawashima'88, Adams'97 > Lynx/Solexa: "Clusters"

http://arep.med.harvard.edu/Polonator/Plone.htm

Page 24: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Polony Fluorescent In Situ Sequencing Libraries

Greg PorrecaAbraham Rosenbaum

1 to 100kb Genomic1 to 100kb Genomic

M

L R

M

PCRbead

Sequencingprimers

Selectorbead

2x20bp after MmeI (BceAI, AcuI)

Dressman et al PNAS 2003 emulsion

Page 25: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Cleavable dNTP-Fluorophore (& terminators)

Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65

Reduce

or

photo-cleave

Page 26: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

0.5% of full gel areaPolony-FISSeq: up to 2 billion beads/slide

Page 27: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Polony-FISSeq: up to 2 billion beads/slideCy5 primer (570nm) ; Cy3 dNTP (666nm)

Jay Shendure

Page 28: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

• # of bases sequenced (total) 23,703,953

• # bases sequenced (unique) 73

• Avg fold coverage 324,711 X

• Pixels used per bead (analysis) ~3.6

• Read Length per primer 14-15 bp

• Insertions 0.5%

• Deletions 0.7%

• Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min

Polony FISSeq Stats

Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X)

(This may omit: PCR , homopolymer, context errors)Shendure

Page 29: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

Systems Biology Loop

Syntheses &Perturbations

Models

Experimental designs

(Systematic)

Data

Proteasome targetingGenome engineering

Metabolic optimality

Flux & Competitive growth

DNA & RNAPolony-Seq

Synthetic Biology Tools

Page 30: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

.

Page 31: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

High accuracy special case: homopolymers (e.g. AAA, CC, etc.)

• Use "compressed" tags , ACG = ACCG=ACCCG• Quantitate incorporation • Reversible terminators• "Wobble sequencing"

All of these work.

• Maintenance of amplification fidelity using linear amplification from initial genomic fragment

Page 32: Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

"Wobble sequencing" for homopolymers

6 positions * 16 primers * 4 dNTPs => 13 bp (paired ends)CCTCATTCTCT AA + dATP (then C, …)CCTCATTCTCT AC + dATP (then C, …). . .CCTCATTCTCTnnAA + dATP (then C, …). . . CCTCATTCTCTnnNNnnNNnnTT + dATP (then C, …)

4.5/64 bp/cycle (for wobble sequencing) vs. 2.5/4 bp/cycle (for simple sequential base-extension)