adleman and computing on a surface 1introduction 2theoretical background biochemistry/molecular...
TRANSCRIPT
Adleman and computing on a surface
1 Introduction
2 Theoretical background Biochemistry/molecular biology
3 Theoretical background computer science
4 History of the field
5 Splicing systems
6 P systems
7 Hairpins
8 Detection techniques
9 Micro technology introduction
10 Microchips and fluidics
11 Self assembly
12 Regulatory networks
13 Molecular motors
14 DNA nanowires
15 Protein computers
16 DNA computing - summery
17 Presentation of essay and discussion
Course outline
Who’s who?
Tom Head
http://www.math.binghamton.edu/tom/
Areas of interest
Algebra
Computing with biomolecules
Formal representations of communication
Department of Mathematical Sciences
Binghamton University
http://www.usc.edu/dept/molecular-science/fm-adleman.htm
Areas of interest Method for Obtaining Digital Signatures and
Public-Key Cryptosystems Distinguishing Prime Numbers From Composite
Numbers The First Case of Fermat's Last Theorem Primality Testing And Two Dimensional
Abelian Varieties Over Finite Fields Molecular Computation of Solutions To
Combinatorial Problem
Leonard Adleman
Turing Award 2002
Department of Computer Science
Theoretical Computer Science College of
Computing, Georgia Tech
Richard Lipton
http://www.cc.gatech.edu/computing/Theory/theory.html
Areas of interest Algorithms and Complexity Theory Cryptography DNA Computing
Laura Landweber
http://www.princeton.edu/~lfl/
Areas of interestOrigins of Genes, Genomesthe Genetic CodeEarly Pathways of RNA EvolutionScrambled GenesRNA EditingGene ScramblingDNA Computing
Dept. of Ecology and Evolutionary Biology
Princeton University
John Reif
http://www.cs.duke.edu/~reif/
Computer ScienceDuke University
Areas of interestDNA nanostructuresMolecular ComputationEfficient AlgorithmsParallel ComputationRobotic Motion PlanningOptical Computing.
Erik Winfree
http://www.dna.caltech.edu/~winfree/
Computer Science Computation and Neural Systems Caltech,
Areas of interestDNA-based computersComputing by self-assemblyGenetic Regulatory NetworksSignal Transduction CascadesRibosomal TranslationDNA and RNA folding
MacArthur Fellow 2000
Nadrian Seeman
Department of Chemistry
New York University
Areas of interestDNA NanotechnologyMacromolecular Design and TopologyBiophysical Chemistry of
Recombinational IntermediatesDNA-Based ComputationCrystallography
http://www.nyu.edu/pages/chemistry/faculty/seeman.html
Robert Corn
http://corninfo.chem.wisc.edu/
Chemistry Department
University of Wisconsin
Areas of interest surface plasmon resonance (SPR) to monitor
biopolymer adsorption, the chemical
modification of surfaces, characterization of molecular monolayers electron transfer processes at
liquid/liquid electrochemical interfaces.
DNA computing algorithms at surfaces multilayer polyelectrolyte films for ion
transport applications.
Hagiya Masami
http://hagi.is.s.u-tokyo.ac.jp
Department of Computer Science,
University of Tokyo
Areas of interest Automated Deduction, Formal
Verification and Programming Languages Bio-Computing Hybrid Systems...
Akira Suyama
http://talent.c.u-tokyo.ac.jp/suyama/
Graduate School of Arts and Sciences,
University of Tokyo
Areas of interest SNPs Probe design DNA chips Quantitative gene expression Hybrid Systems...
John Rose
Areas of interest the DNA chip, especially Tag-Antitag
Systems Whiplash PCR, a simple autonomous DNA
computer equilibrium chemistry/statistical
thermodynamic model
http://hagi.is.s.u-tokyo.ac.jp/~johnrose/
Department of Computer Science,
University of Tokyo
Gheorghe Păun
Areas of interestFormal language theory (and applications)Combinatorics on wordsSemiotics operational research DNA Computing Membrane Computing
http://stoilow.imar.ro/~gpaun/
Institute of Mathematics of
the Romanian Academy
Grzegorz Rozenberg
http://www.wi.leidenuniv.nl/~rozenber/
Institute of Advanced Computer Science
University of Leiden
Areas of interestMolecular ComputingEvolutionary AlgorithmsNeural Networks
Areas of interestH systemsP systemsNeural Networks
Giancarlo Mauri
http://bioinformatics.bio.disco.unimib.it/
Dipartimento di Informatica,
Sistemistica e Comunicazione (DISCo)
Milano
Ehud Shapiro
Areas of interestDNA as input fuelBiological nanocomputerTuring machine-like model
http://www.weizmann.ac.il/mathusers/lbn/index.html
Computer Science and Applied Mathematics
the Weizmann Institute
Byoung-Tak Zhang
http://scai.snu.ac.kr/~btzhang/
Areas of interestEvolutionary Intelligence Neural Intelligence Molecular Intelligence Computational Learning Theory
School of Computer Science and Engineering
Seoul National University
Danny van Noort
http://bi.snu.ac.kr/~danny/
Areas of interestmicrostructure design and fabrication DNA-hybridisationinstrumentationfluorescent microscopy affinity biosensors protein chips DNA computingcell behaviour
School of Computer Science and Engineering
Seoul National University
NP complete problems
Tractable and intractable problems
NP-complete problems
The theory of NP-completeness
Classify problems as tractable or
intractable.
Problem is tractable if there exists at least
one polynomial bound algorithm that solves
it.
An algorithm is polynomial bound if its worst
case growth rate can be bound by a polynomial
p(n) in the size n of the problem
constant a is where...)( 01 kanananp kn
Classifying problems
• Problem is intractable if it is not tractable.
• All algorithms that solve the problem are not polynomial bound.
• It has a worst case growth rate f(n) which cannot be bound by a polynomial p(n) in the size n of the problem.
• For intractable problems the bounds are:
f n c nn n( ) , ,log or etc.
Intractable problems
There are many practical problems for which
no one has yet found a polynomial bound
algorithm.
Examples: traveling salesperson, 0/1
knapsack, graph coloring, bin packing etc.
Most design automation problems such as
testing and routing.
Many networks, database and graph problems.
Hard practical problems
The theory of NP-completeness enables
showing that these problems are at least as
hard as NP-complete problems
Practical implication of knowing problem is
NP-complete is that it is probably
intractable ( whether it is or not has not
been proved yet)
So any algorithm that solves it will
probably be very slow for large inputs
The theory of NP-completeness
A decision problem answers yes or no for a
given input
Examples:
Given a graph G Is there a path from s to t
of length at most k?
Does graph G contain a Hamiltonian cycle?
Given a graph G is it bipartite?
Decision problems
A Hamiltonian cycle of a graph G is a
cycle that includes each vertex of the
graph exactly once.
Problem: Given a graph G, does G have
a Hamiltonian cycle?
Decision problem: Hamiltonian cycle
P is the class of decision problems that
are polynomial bounded
Is the following problem in P?
Given a weighted graph G, is there a
spanning tree of weight at most B?
The decision versions of problems such as
shortest distance, and minimum spanning
tree belong to P
The class P
NP is the class of decision problems for
which there is a polynomial bounded
verification algorithm
It can be shown that:
all decision problems in P, and
decision problems such as traveling
salesman, knapsack, bin pack, are also in
NP
The class NP
P NP
If a problem is solvable in polynomial
time, a polynomial time verification
algorithm can easily be designed that
ignores the certificate and answers “yes”
for all inputs with the answer “yes”.
The relation between P and NP
It is not known whether P = NP.
Problems in P can be solved “quickly”
Problems in NP can be verified “quickly”.
It is easier to verify a solution than to
solve a problem.
Some researchers believe that P and NP
are not the same class.
The relation between P and NP
A problem A is NP-complete if
1. It is in NP and
2. For every other problem A’ in NP, A’ A
A problem A is NP-hard if
For every other problem A’ in NP, A’ A
NP-complete problems
Cook’s theorem
Satisfiability is NP-complete
This was the first problem shown to be NP-complete
Other problems
the decision version of knapsack,
the decision version of traveling salesman
Examples of NP-complete problems
Satisfiability problem
First, Conjunctive Normal Form (CNF)
will be defined
Then, the Satisfiability problem will
be defined
The satisfiability problem
A logical (Boolean) variable is a variable
that may be assigned the value true or false
(x, y, w and z are Boolean variables)
A literal is a logical variable or the
negation of a logical variable (x and y are literals)
A clause is a disjunction of literals
((wxy) and (xy) are clauses)
Conjunctive normal form (CNF)
A logical (Boolean) expression is in
Conjunctive Normal Form if it is a
conjunction of clauses.
The following expression is in
conjunctive normal form:
(wxy) (wyz) (xy) (wy)
Conjunctive normal form (CNF)
Is there a truth assignment to the n
variables of a logical expression in
Conjunctive Normal Form which makes the
value of the expression true?
For the answer to be yes, all clauses
must evaluate to true
Otherwise the answer is no
The satisfiability problem
x=F, y=F, w=T and z=T is a truth
assignment for:
(wxy) (wyz) (xy) (wy)
Note that if y=F then y=T
Each clause evaluates to true
The satisfiability problem
Adleman’s experiment
The 1994 experiment
DNA computer
The 1994 experiment
Basic Idea
Perform molecular biology experiment
to find solution to math problem.
The 1994 experiment
(Proposed by William Hamilton)
Given a network of nodes and directed
connections between them, is there a path
through the network that begins with the start
node and concludes with the end node visiting
each node only once (“Hamiltonian path")?
Does a Hamiltonian path exist, or not?”
Hamiltonian path
Detroit
BostonChicago
Atlanta
start city
end city
Hamiltonian path does exist
Detroit
BostonChicago
Atlanta
end city
start city
Hamiltonian path does not exist
Generation-&-Test Algorithm
Step 1 Generate random paths on the network.
Step 2 Keep only those paths that begin with
start city and conclude with end city.
Step 3 If there are N cities, keep only those
paths of length N.
Step 4 Keep only those that enter all cities at
least once.
Step 5 Any remaining paths are solutions (i.e.,
Hamiltonian paths).
Solving the Hamiltonian problem
[X] D -> B -> A
[X] B -> C -> D -> B -> A -> B
[X] A -> B -> C -> B
[X] C -> D -> B -> A
[x] A -> B -> A -> D
[O] A -> B -> C -> D
[X] A -> B -> A -> B -> C -> D
The paths
Solving the Hamiltonian problem
The total number of paths grows exponentially
as the network size increases:
(e.g.) 106 paths for N=10 cities, 1012 paths
(N=20), 10100 paths!! (N =100)
The Generation-&-Test algorithm takes
“forever”. Some sort of smart algorithm must be
devised; none has been found so far (NP-hard).
Combinatorial explosion
The key to solving the problem is using DNA to
perform the five steps of the Generation-&-
Test algorithm in parallel search, instead of
serial search.
Finding a solution with DNA
Protein that produces complementary DNA strand
A -> T, T -> A, C -> G, G -> C
Requires primer and starter
Enables DNA to reproduce
Intermezzo: DNA polymerase
The bio-nanomachine
hops onto DNA strand
slides along
reads each base
writes its complement
onto new strand
Intermezzo: DNA polymerase
Ingredients and tools needed
DNA strands that encode city names and
connections between them
Polymerases, ligase, water, salt, other
ingredients
Polymerase chain reaction (PCR) set
Gel electrophoresis tool (that filters
out non-solution strands)
Experimental set-up
Gel electrophoresis
Detroit
BostonChicago
Atlanta
start city
end city
Solving a Hamiltonian path problem
CITY DNA NAME COMPLEMENTATLANTA ACTTGCAG TGAACGTCBOSTON TCGGACTG AGCCTGAC
CHICAGO GGCTATGT CCGATACADETROIT CCGAGCAA GGCTCGTT
City coding
TGAACGTCAGCCTGACGCAGTCGG
Atlanta Boston
Atlanta -Boston
Atlanta
Boston
City coding with DNA
Detroit
BostonChicago
Atlanta
start city
end city
Atlanta-Boston Boston-Chicago
Chicago*
Chicago-Detroit
Detroit*Atlanta* Boston*
Possible paths
Detroit
BostonChicago
Atlanta
start city
end city
Boston-Atlanta Atlanta-Detroit
Detroit*Boston* Atlanta*
Possible paths
In pictures
1. In a test tube, mix the prepared DNA pieces
together (which will randomly link with each
other, forming all different paths).
2. Perform PCR with two ‘start’ and ‘end’ DNA
pieces as primers (which creates millions’
copies of DNA strands with the right start
and end).
3. Perform gel electrophoresis to identify only
those pieces of right length (e.g., N=4).
The DNA experiment
4. Use DNA ‘probe’ molecules to check whether
their paths pass through all intermediate
cities.
5. All DNA pieces that are left in the tube
should be precisely those representing
Hamiltonian paths.
If the tube contains any DNA at all, then
conclude that a Hamiltonian path exists, and
otherwise not.
When it does, the DNA sequence represents
the specific path of the solution.
The DNA experiment
Why does it work?
Enormous parallelism, with 1023 DNA pieces
working in parallel to find solution
simultaneously.
Takes less than a week (vs. thousands
years for supercomputer)
Extraordinary energy efficient
(10-10 of supercomputer energy use)
Note this is a Universal Turing machine
Summary and conclusion
Experimental set-up
Experimental set-up
CAPTURE LAYER (-R or G)
- +
CAPTURE LAYER (-R or G)
Experimental set-up
- +
CAPTURE LAYER (-R or G)
Experimental set-up
- +
CAPTURE LAYER (-R or G)
Experimental set-up
- +HOT
CAPTURE LAYER (-R or G)
Experimental set-up
Experimental set-up
Experimental set-up
Experimental set-up
DNA computing on a surface
DNA computing on surfaces
Advantages over “solution phase” chemistry
Disadvantages:
Facile purification stepsReduced interference between strandsEasily automated
Loss of information density (2D)Lower surface hybridization efficiencySlower surface enzyme kinetics
DNA computing on surfaces
DNA strands representing the set {0,1}^n are
synthesized and subsequently immobilized on
a surface in a non-addressed fashion
DNA surface model: input
A strand is comprised of
words. Each word is a
short DNA strand (16mer)
representing one or more
bits.
Word Bit
1
2
3
4
12341234...
Encoding binary information
Requirements of a “DNA code”
Success in specific hybridization between a
DNA code word and its Watson-crick complement
Few false positive signals
Virtually all designs enforce combinatorial
constraints on the code words
Applications:
Information storage, retrieval for DNA
computing
Molecular bar codes for chemical libraries
DNA word design problem
Hamming: distance between two code words
should be large
Reverse complement: distance between a
word and the reverse complement of
another word should be large
Also: frame shift, distinct sub-words,
forbidden sub-words, …
DNA word design problem
Seeman (1990): de novo design of sequences
for nucleic acid structural engineering
Brenner (1997): sorting polynucleotides
using DNA tags
Shoemaker et al. (1996): analysis of yeast
deletion mutants using a parallel molecular
bar-coding strategy
Many other examples in DNA computing
Work on DNA code design
Word design example
MARK strands in which bit j = 0 (or 1):
hybridize with Watson-Crick complements of
word containing bit j, followed by
polymerizationDESTROYUNMARK
DNA surface model: process
MARK strands in which bit j = 0 (or 1)DESTROY unmarked strands:
exonuclease degradationUNMARK
DNA surface model: process
MARK strands in which bit j = 0 (or 1):
hybridize with Watson-Crick complements of word
containing bit j, followed by polymerization
DNA surface model: process
MARK strands in which bit j = 0 (or 1)DESTROY unmarked strandsUNMARK strands:
wash in distilled water
DNA surface model: process
Detect remaining strands (if any) by
detaching strands from surface and
amplifying using PCR (polymerase
chain reaction).
DNA surface model: output
Theorem Any CNFSAT formula of size m
can be computed using O(m) mark,
unmark and destroy operations.
Theorem Any circuit of size m can be
computed using O(m) mark, unmark,
destroy, and append operations.
Computational power
Input 16 strands
Process
Output exactly those strands that satisfy
the circuit remain on the surface.
or
not
or
z
and
w y x
MARK if bit z = 1 MARK if bit w = 1 MARK if bit y = 0 DESTROY UNMARK
MARK if bit w = 0 MARK if bit y = 0 DESTROY UNMARK …
or or
not not
The satisfiability problem
(wxy) (wyz) (xy) (wy)
{0000} {0001} {0010} {0011} {0100} {0101}{0110} {0111}{1000} {1001} {1010} {1011} {1100} {1101}{1110} {1111}
4-variable SAT demo
4-variable SAT demo
4-variable SAT demo
The logic of the DNA
computation in each cycle,
leading at the end to four
types of DNA molecules
remaining on the surface.
The identity of those
molecules that correspond to
the solutions was determined
by PCR.
Solution:
S3
S7
S8
S9
4-variable SAT demo
S3: w=0, x=0, y=1, z=1
S7: w=0, x=1, y=1, z=1
S8: w=1, x=0, y=0, z=0
S9: w=1, x=0, y=0, z=1
y=1: (w V x V y)
z=1: (w V y V z)
x=0 or y=1: (x V y)
w=0: (w V y)
4-variable SAT, the answers
Synthesize; Attach
Mark
Destroy
Unmark
Readout
Cycle
4-variable SAT demo
4-variable SAT demo
Solid-phase chemistry is a promising approach
to DNA computing
DNA computing will require greatly improved
DNA surface attachment chemistries and control
of chemical and enzymatic processes
Conclusions