dna sequencing jessica scheld. recall: dna polymer of nucleotides which encodes information made up...
Post on 21-Dec-2015
216 views
TRANSCRIPT
DNA Sequencing
Jessica Scheld
Recall: DNA
Polymer of nucleotides which encodes information
Made up of long sequences of A,T,C,G’s
We want to read the DNA sequence, but it’s often too long to just read off
Problem: How do we know that a reassembled DNA sequence is in the right order?
http://www.tokyo-med.ac.jp/genet/picts/dna.jpg
DNA polymerase
DNA Sequencing-Biology
TAQ polymerase
primer
C A T G
deoxynucleotides
dideoxynucleotides
The Players
DNA
What would our sequence read?T-C-G-A
Problem
Say we want to read this piece of single-stranded DNA
We can’t read it all in one piece, so we break it up into -length “snippets,” and then piece it back together to reconstruct the DNA sequence.
Here, we’ll use snippets of 4 nucleotides. For the sequence above, we would get
snippets of:
ATCGACTATAAGGCATCGAA
TCGA
CGAC GACTACTA
CTATTATA
ATAA
TAAGAAGGAGGC
GGCA GCAT
CATC ATCG
TCGA CGAA
ATCG
l
A T C G A T C A T A A G G C A T C G A A
Constructing the DeBruijn Graph
These snippets can be represented by a graph.
Each snippet has length 4. Make each vertex of the graph consist of 3 of these letters. The “head” vertex contains the first three of one snippet and the “tail” vertex contains the last three.
Example:
GGCA = GGC GCA
TCGA CGACGACT
ACTA
CTATTATA
ATAA
TAAG
AAGG
AGGC
GGCA
GCAT
CATC ATCG
TCGA
CGAA
ATCG
Constructing the DeBruijn Graph cont.
Do for all snippets – connect with directed edges. Above is a construction of the DeBruijn graph for the
subsequence of DNA above
CGA
GAC ACT
CTA
TAT
ATA TAA
AAG
AGGGGC
GGC
CAT
ATCGAA
TCG
GCA
ATCGACTATAAGGCATCGAA
Creating the 2-in 2-out digraph
Notice only 3 vertices with more than degree two. We can redraw this graph so it only has vertices with degree 4.
CGA
GAC ACT
CTA
TAT
ATA TAA
AAG
AGGGGC
GGC
CAT
ATCGAA
TCG
GCA
CGA
ATC
TCG
CGATCGATC
Recap
There is almost always more than one way to reconstruct a strand of DNA (and only 1 correct way)
DeBruijn graph visualizes these ways and can be redrawn as an Eulerian digraph
We want to find the number of ways to reconstruct the DNAFinding the probability of getting
the correct one
Eulerian digraphs
Required:Only 2-in 2-out digraphs
• why not 1 or 3?
Use circuit: ABCDCABDA
Then we can rewrite this graph state as:
C
D
B
A
CC
D
B
A
Chord Diagram Interlace Graph
C
A B
D
D
B
D
C
B
A
A
Use circuit: ABCDCABDA
C
Interlace polynomial
if , the edgeless graph on vertices( , )
( , ) ( , ) if ( )
nn
vw
x G E nq G x
q G v x q G w x vw E G
Arratia, Bollobas, Sorkin, ‘00
vwA vwA
vA wA
vwA
vA wA
vw vwG G G wInterchange edges and non-edges among
, , and vw v wA A A
v vww w
wAvA
v
Finding the Interlace Polynomial
dbG
C
C
A
BD
D B
A
G
dbG b
C
D
A
B
C
A
BD
G d
Finding the Interlace Polynomial cont.
abG b
daG G
B
A
A B
A
G
abG G
G
daG a G dG a
B
A B
D A
C
D
A
C
D A
C
D A
C
D A
C
A B
CC
C
C
C
Finding Interlace Polynomial cont.
=+ + + +
=+ + ++ +=
B
A
D
A
C
A B
C
A D
C
A AB C D C
C
C C
C C
22 4x x
Interlace Polynomial Reconstructions To find the number of reconstructions, we must
relate the interlace polynomial to the circuit partition polynomial.
Theorem*: If is a 4-regular Eulerian digraph, C is any Eulerian circuit of , and H is the circle graph of the chord diagram determined by C, then
Thus:
2
2 2
2
3 2 2
3 2
( ; ) 2 4
( ; ) * ( ; 1)
2 4 * 2( 1) 4( 1)
* 2 4 2) 4 4
2 4 2 4 4
2 8 6
q H x x x
f G x x q H x
x x x x x
x x x x
x x x x x
x x x
BBBBBBBBBBBBBB
GBBBBBBBBBBBBBBG
BBBBBBBBBBBBBB
( ; ) ( ; 1).f G x xq H x BBBBBBBBBBBBBB
6 reconstructions
*J.A. Ellis-Monaghan, I. Sarmiento. Properties of the Interlace Polynomial via Isotropic Systems, Preprint.
Different Possible Cycles
C D
B
A
C D
B
A
C D
B
A
C D
B
A
ABCABDCDAABCDABDCAABCDCABDAABCDABDCAABCABDCDAABCDCABDA
What does it mean?
6x 6 ways to reconstruct the original Eulerian graph.
Can find this by counting, but with bigger graphs, it would take much longer to find all the different circuits (if we don’t miss one)
Useful in determining probability of getting the right sequence of DNAProblem above – probability = 1/6
Problem for the Class
Find the interlace polynomial, using these graphs:
What is the sequence we are using? How can you tell?
How many reconstructions are possible?
D
C C
B
B AD
AA B
D C
Solution
D
C C
B
A
B A
DDC
A B
A B
D
Solution cont.C
D
C
D
BA
abG
G
abG b
C
A BA B
C
G aD
D
A
ADD
Solution cont.
C
B
DC
A
+ =
C
D
B
D
+ + + C
= C + + + + +B C =
Similar to the previous example, the interlace polynomial can be equated to the circuit partition polynomial, giving: implying that there are 6 ways to reconstruct this snippet of DNA the probability you have the correct one is 1/6.
D D
D D
22 4x x
( , 1)xf H x 3 22 8 6x x x
The 6 circuits
A B
D C
ABCDACBDAABCDACBDAABCBDACDAABCBDACDAABDACBCDAABDACBCDA