thanks to: agencourt, ambergen, atactic, beyondgenomics, caliper, genomatica, genovoxx, helicos,...

Post on 22-Dec-2015

216 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Thanks to: Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen

For more info see: arep.med.harvard.edu

DOE Wed 3-Nov-2004 11:30 AM

Analysis & Synthesis of Omes

Systems Biology Loop

Syntheses &Perturbations

Models

Experimental designs

(Systematic)

Data

Proteasome targetingGenome engineering

Metabolic optimality

Flux & Competitive growth

DNA & RNAPolony-Seq

Synthetic Biology Tools

DOE Synthetic Genomes: Why?Cheaper/faster "standard biology", hypothesis testing

Systems Biology: Multiple simultaneous tests

Viruses: Aid strain transfer; generate variants, new haplotypes

Anti-viral vaccines and therapeutics (including variants)

In vitro: Make products toxic in E.coli.

Microbes: Interspecific hybrids (e.g. codon usage)

Structural biology: variants

Rapid vaccine response to engineered bioterrorism.

Cell-mediated immunity + humoral.

Fix mismatch between genome analysis & synthesis

DOE Synthetic Genomes: Why?In vitroMicrobial & Human AntimutatorsArtificial ecosystems (laboratory scales)Energy aiding pathway improvementInstrustrial production: Enzymes, SingleCellProtein, Protein-

drugs Remediation: Hybrid genomes (opt. codons), combinatorial

pathway (Maxygen & Diversa). Xylose & OilPharmaceuticals: Combinatorial synthesesNano science Combinatorial syntheses, Complex nanosystems,

more general nanoassembly (in reach of polymerases and ribosome-like factories)

Health research: 10X faster results per current $ (cost/benefit)Hypothesize & test unknown gene combinations Synthetic standards (arrays, MS, quantitation, etc)Agriculture: salt, cold, drought, pest tolerant hybrid genomes

Motif Co-occurrence, comparative genomics, RNA clusters, and/or ChIP2-location data

P= 10-6 to 10-11

Genome Res. 14:201–208Bulyk, McGuire,Masuda,Church

Synthetic testing of DNA motif combinations

1.3 2.4 (1.3 in argR)

1.1 1.3

0.7 2.5

0.2 1.4

1.4 3.5

RNA Ratio (motif- to wild type) for each flanking gene

Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208

Synthetic Genomes & Proteomes. Why?

• Test or engineer cis-DNA/RNA-elements •Access to any protein (complex) including post-transcriptional modifications• Affinity agents for the above.• Mass spectrometry standards, protein design• Utility of molecular biology DNA-RNA-Protein

in vitro "kits" (e.g. PCR, SP6, Roche)

Toward these goals design a chassis:• 115 kbp genome. 150 genes.• Nearly all 3D structures known.• Comprehensive functional data.

(PURE) translation utility

Removing tRNA-synthetases, translational release-factors,RNases & proteases

Selection of scFvs specific for HBV DNA polymerase using ribosome display. Lee et al. 2004 J Immunol Methods. 284:147

Programming peptidomimetic syntheses by translating genetic codes designed de novo. Forster et al. 2003 PNAS 100:6353

High level cell-free expression & specific labeling of integral membrane proteins. Klammt et al. 2004 Eur J Biochem 271:568

Cell-free translation reconstituted with purified components. Shimizu et al. 2001 Nat Biotechnol. 19:751-5.

in vitro genetic codes

5'

mS yU eU

UGGUUG CAG

AAC... GUU A 3'GAAACCAUG

fM TN V E

| | | | | || | |

5' Second base 3'

U

A

C

C U

mSyU

eU

A C U

G

A

0

500

1000

1500

2000

2500

3000

3500

30 40 50 60 70 80

3H-E dpm

time (min.)

fM yU mS eU E |

Forster, et al. (2003) PNAS 100:6353-7

80% average yieldper unnatural coupling.

bK = biotinyllysine , mS = Omethylserine eU=2-amino-4-pentenoic acid yU = 2-amino-4-pentynoic acid

Mirror world : enzyme, parasite, & predator resistance& access 2n diastereomers (n chiral atoms)

L-amino acids & D-ribose (rNTPs, dNTPs)

Transition: EF-Tu, peptidyl transferase, DNA-ligase

D-amino acids & L-ribose (rNTPs, dNTPs)

Dedkova, et al. (2003) Enhanced D-amino acid incorporation into protein by modified ribosomes. J Am Chem Soc 125, 6616-7

Escherichia coli Mycoplasma 3D structureColiphage 29 DNA polymerase + +Coliphage P1 Cre recombinase - + >Coliphage Lox/Cre recombinase site - +Coliphage T7 RNA polymerase + + >Coliphage T7 RNA polymerase initiation site + + >Coliphage T7 RNA polymerase termination site + +RNase P RNA + -RNase P protein + + >RNase P site/RNA primer for DNA polymerase + +Small subunit 16S ribosomal RNA + +All 21 small subunit ribosomal proteins (1-21) + except 1,21 +Large subunit 5S ribosomal RNA + +Large subunit 23S ribosomal RNA + +Large subunit 23S rRNA G2445>m2G methylase: unknown ? -Large subunit 23S rRNA U2449>dihydroU synthetase: unknown ? -Large subunit 23S rRNA U2457>pseudoU synthetase ? -Large subunit 23S rRNA C2498>Cm methylase: unknown ? -Large subunit 23S rRNA A2503>m2A methylase: unknown ? -Large subunit 23S rRNA U2504>pseudoU synthetase ? -All 33 large subunit ribosomal proteins (1-7,9-11,13-25,27-36) + except 25, 30 +Translational initiation factor 1 + +Translational initiation factor 2 + +Translational initiation factor 3 + +Translational elongation factor Tu + +Translational elongation factor Ts + +Translational elongation factor G + +Translational release factor 1 + +Translational release factor 2 - +Translational release factor Gln methylase + +Translational release factor 3 - +Ribosome recycling factor + +33/45 Transfer RNAs (see Fig. 2) 29/33 +tRNA(I) C34>lysidine synthetase ? +tRNA(R) A34>I deaminase ? +tRNA(ASV) U34>cmo5U (=V) synthetase: unknown - -tRNA(R) U34>2sU Cys desulfurase - +tRNA(R) nm5U34 methylase ? +tRNA(R) U34>cmnm5U GTPase ? +tRNA(R) U34>cmnm5U synthetase ? +tRNA(R) cmnm5U34>nm5U,mnm5U synthetase ? -tRNA(R) G37 N1-methylase + +tRNA(RNIKM) A37>t6A N6-threonylcarbamoyl-A synthetase: unknown + -tRNA(CLFSWY) A37>i6A synthetase - +tRNA(CLFSWY) i6A37>s2i6A(ms2i6A) synthetase - +All 22 aminoacyl-tRNA synthetase subunits (20 enzymes) + except G subunit, Q + except G subunitMet-tRNA formyltransferase + +Chaperonin DnaK + +Chaperonin GroEL + +Chaperonin GroES + +

Total genes = 150Forster & Church

Oligos for 150 & 776

synthetic genes(for E.coli minigenome & M.mobile whole genome

respectively)

Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes)

<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert

Tian, Gong, Church

Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per

oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)

Solution: Amplify the oligos then release them.

10 50 10 => ss-70-mer (chip)

20-mer PCR primers with restriction sites at the 50mer junctions

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

=> ds-90-mer

=> ds-50-mer

Improve DNA Synthesis Accuracyvia mismatch selection

Tian & Church Other mismatch methods: MutS (&H,L)

Genome assembly

Moving forward: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding)2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. >30 kbp homologous (Nick Reppas)4. Phage integrase site-specific recombination, also for counters.

Stemmer et al. 1995. Gene 164:49-53;Mullis 1986 CSHSQB.

50

75

125 225 425 825 … 100*2^(n-1)

All 30S-Ribosomal-protein DNAs(codon re-optimized)

Tian, Gong, Sheng , Zhou, Gulari, Gao, Church

1.7 kb

0.3 kb

Improving synthesis accuracy 9-fold

MethodTotal

bp#

ClonesTrans-ition

Trans-version Deletion Addition Bp/error

Hyb selection, PCR 23641 9 7 3 5 2 1391Gel selection, PCR 24546 35 28 12 11 3 455

No selection, ligation+PCR 6093 25 6 6 22 4 160

No selection, PCR 9243 21 25 13 19 1 159

Tian & Church

Extreme mRNA makeover for protein expression in vitro

RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.

RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.

Solution: Iteratively resynthesize all mRNAs with less mRNA structure.

Tian & Church

20w 20m 17w 17m 16w 16m

10kd

W: wild-typeM: modified

Western blot based on His-tags

Systems Biology Loop

Syntheses &Perturbations

Models

Experimental designs

(Systematic)

Data

Proteasome targetingGenome engineering

Metabolic optimality

Flux & Competitive growth

DNA & RNAPolony-Seq

Synthetic Biology Tools

Why sequence?

• Cancer: mutation sets for individual clones, loss-of-heterozygosity• Pathogen "weather map", biowarfare sensors• RNA splicing & chromatin modification patterns.• Synthetic biology & lab selections• Antibodies or "aptamers" for any protein• B & T-cell receptor diversity: Temporal profiling, clinical • Preventative medicine & genotype–phenotype associations• Cell-lineage during development• Phylogenetic footprinting, biodiversity

Shendure et al. 2004 Nature Rev Gen 5, 335.

Sequencing single molecules

Ecosystem studies really need single-cell amplification because of multiple chromosomes (& RNAs)

(Even an 80% genome coverage is better than 100 kb BACs)

Single bacterial chromosome amplification

Ratio to unamplified hybridization along thechromosome ofEscherichia & ProchlorococusonAffymetrix chips.

Convergence on non-electrophorectic tag sequencing methods?

Tag >400 14-26 20 100 26 bp (2-ends) EST SAGE MPSS 454 Polony-Seq • Single-molecule vs. amplified single molecule. • Array vs. bead packing vs. random• Rapid scans vs. long scans (chemically limited, 454)• Number of immobilized primers: 0: Chetverin'97 "Molecular Colonies" 1: Mitra'99 > Agencourt "Bead Polonies" 2: Kawashima'88, Adams'97 > Lynx/Solexa: "Clusters"

http://arep.med.harvard.edu/Polonator/Plone.htm

Polony Fluorescent In Situ Sequencing Libraries

Greg PorrecaAbraham Rosenbaum

1 to 100kb Genomic1 to 100kb Genomic

M

L R

M

PCRbead

Sequencingprimers

Selectorbead

2x20bp after MmeI (BceAI, AcuI)

Dressman et al PNAS 2003 emulsion

Cleavable dNTP-Fluorophore (& terminators)

Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65

Reduce

or

photo-cleave

0.5% of full gel areaPolony-FISSeq: up to 2 billion beads/slide

Polony-FISSeq: up to 2 billion beads/slideCy5 primer (570nm) ; Cy3 dNTP (666nm)

Jay Shendure

• # of bases sequenced (total) 23,703,953

• # bases sequenced (unique) 73

• Avg fold coverage 324,711 X

• Pixels used per bead (analysis) ~3.6

• Read Length per primer 14-15 bp

• Insertions 0.5%

• Deletions 0.7%

• Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min

Polony FISSeq Stats

Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X)

(This may omit: PCR , homopolymer, context errors)Shendure

Systems Biology Loop

Syntheses &Perturbations

Models

Experimental designs

(Systematic)

Data

Proteasome targetingGenome engineering

Metabolic optimality

Flux & Competitive growth

DNA & RNAPolony-Seq

Synthetic Biology Tools

.

High accuracy special case: homopolymers (e.g. AAA, CC, etc.)

• Use "compressed" tags , ACG = ACCG=ACCCG• Quantitate incorporation • Reversible terminators• "Wobble sequencing"

All of these work.

• Maintenance of amplification fidelity using linear amplification from initial genomic fragment

"Wobble sequencing" for homopolymers

6 positions * 16 primers * 4 dNTPs => 13 bp (paired ends)CCTCATTCTCT AA + dATP (then C, …)CCTCATTCTCT AC + dATP (then C, …). . .CCTCATTCTCTnnAA + dATP (then C, …). . . CCTCATTCTCTnnNNnnNNnnTT + dATP (then C, …)

4.5/64 bp/cycle (for wobble sequencing) vs. 2.5/4 bp/cycle (for simple sequential base-extension)

top related