dna sequencing jessica scheld. recall: dna polymer of nucleotides which encodes information made up...

24
DNA Sequencing Jessica Scheld

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

DNA Sequencing

Jessica Scheld

Page 2: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Recall: DNA

Polymer of nucleotides which encodes information

Made up of long sequences of A,T,C,G’s

We want to read the DNA sequence, but it’s often too long to just read off

Problem: How do we know that a reassembled DNA sequence is in the right order?

http://www.tokyo-med.ac.jp/genet/picts/dna.jpg

Page 3: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

DNA polymerase

DNA Sequencing-Biology

TAQ polymerase

primer

C A T G

deoxynucleotides

dideoxynucleotides

The Players

DNA

Page 4: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

What would our sequence read?T-C-G-A

Page 5: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Problem

Say we want to read this piece of single-stranded DNA

We can’t read it all in one piece, so we break it up into -length “snippets,” and then piece it back together to reconstruct the DNA sequence.

Here, we’ll use snippets of 4 nucleotides. For the sequence above, we would get

snippets of:

ATCGACTATAAGGCATCGAA

TCGA

CGAC GACTACTA

CTATTATA

ATAA

TAAGAAGGAGGC

GGCA GCAT

CATC ATCG

TCGA CGAA

ATCG

l

A T C G A T C A T A A G G C A T C G A A

Page 6: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Constructing the DeBruijn Graph

These snippets can be represented by a graph.

Each snippet has length 4. Make each vertex of the graph consist of 3 of these letters. The “head” vertex contains the first three of one snippet and the “tail” vertex contains the last three.

Example:

GGCA = GGC GCA

TCGA CGACGACT

ACTA

CTATTATA

ATAA

TAAG

AAGG

AGGC

GGCA

GCAT

CATC ATCG

TCGA

CGAA

ATCG

Page 7: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Constructing the DeBruijn Graph cont.

Do for all snippets – connect with directed edges. Above is a construction of the DeBruijn graph for the

subsequence of DNA above

CGA

GAC ACT

CTA

TAT

ATA TAA

AAG

AGGGGC

GGC

CAT

ATCGAA

TCG

GCA

ATCGACTATAAGGCATCGAA

Page 8: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Creating the 2-in 2-out digraph

Notice only 3 vertices with more than degree two. We can redraw this graph so it only has vertices with degree 4.

CGA

GAC ACT

CTA

TAT

ATA TAA

AAG

AGGGGC

GGC

CAT

ATCGAA

TCG

GCA

CGA

ATC

TCG

CGATCGATC

Page 9: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Recap

There is almost always more than one way to reconstruct a strand of DNA (and only 1 correct way)

DeBruijn graph visualizes these ways and can be redrawn as an Eulerian digraph

We want to find the number of ways to reconstruct the DNAFinding the probability of getting

the correct one

Page 10: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Eulerian digraphs

Required:Only 2-in 2-out digraphs

• why not 1 or 3?

Use circuit: ABCDCABDA

Then we can rewrite this graph state as:

C

D

B

A

CC

D

B

A

Page 11: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Chord Diagram Interlace Graph

C

A B

D

D

B

D

C

B

A

A

Use circuit: ABCDCABDA

C

Page 12: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Interlace polynomial

if , the edgeless graph on vertices( , )

( , ) ( , ) if ( )

nn

vw

x G E nq G x

q G v x q G w x vw E G

Arratia, Bollobas, Sorkin, ‘00

vwA vwA

vA wA

vwA

vA wA

vw vwG G G wInterchange edges and non-edges among

, , and vw v wA A A

v vww w

wAvA

v

Page 13: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Finding the Interlace Polynomial

dbG

C

C

A

BD

D B

A

G

dbG b

C

D

A

B

C

A

BD

G d

Page 14: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Finding the Interlace Polynomial cont.

abG b

daG G

B

A

A B

A

G

abG G

G

daG a G dG a

B

A B

D A

C

D

A

C

D A

C

D A

C

D A

C

A B

CC

C

C

C

Page 15: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Finding Interlace Polynomial cont.

=+ + + +

=+ + ++ +=

B

A

D

A

C

A B

C

A D

C

A AB C D C

C

C C

C C

22 4x x

Page 16: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Interlace Polynomial Reconstructions To find the number of reconstructions, we must

relate the interlace polynomial to the circuit partition polynomial.

Theorem*: If is a 4-regular Eulerian digraph, C is any Eulerian circuit of , and H is the circle graph of the chord diagram determined by C, then

Thus:

2

2 2

2

3 2 2

3 2

( ; ) 2 4

( ; ) * ( ; 1)

2 4 * 2( 1) 4( 1)

* 2 4 2) 4 4

2 4 2 4 4

2 8 6

q H x x x

f G x x q H x

x x x x x

x x x x

x x x x x

x x x

BBBBBBBBBBBBBB

GBBBBBBBBBBBBBBG

BBBBBBBBBBBBBB

( ; ) ( ; 1).f G x xq H x BBBBBBBBBBBBBB

6 reconstructions

*J.A. Ellis-Monaghan, I. Sarmiento. Properties of the Interlace Polynomial via Isotropic Systems, Preprint.

Page 17: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Different Possible Cycles

C D

B

A

C D

B

A

C D

B

A

C D

B

A

ABCABDCDAABCDABDCAABCDCABDAABCDABDCAABCABDCDAABCDCABDA

Page 18: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

What does it mean?

6x 6 ways to reconstruct the original Eulerian graph.

Can find this by counting, but with bigger graphs, it would take much longer to find all the different circuits (if we don’t miss one)

Useful in determining probability of getting the right sequence of DNAProblem above – probability = 1/6

Page 19: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Problem for the Class

Find the interlace polynomial, using these graphs:

What is the sequence we are using? How can you tell?

How many reconstructions are possible?

D

C C

B

B AD

AA B

D C

Page 20: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

Solution

D

C C

B

A

B A

DDC

A B

Page 21: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

A B

D

Solution cont.C

D

C

D

BA

abG

G

abG b

C

A BA B

C

G aD

Page 22: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

D

A

ADD

Solution cont.

C

B

DC

A

+ =

C

D

B

D

+ + + C

= C + + + + +B C =

Similar to the previous example, the interlace polynomial can be equated to the circuit partition polynomial, giving: implying that there are 6 ways to reconstruct this snippet of DNA the probability you have the correct one is 1/6.

D D

D D

22 4x x

( , 1)xf H x 3 22 8 6x x x

Page 23: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the

The 6 circuits

A B

D C

ABCDACBDAABCDACBDAABCBDACDAABCBDACDAABDACBCDAABDACBCDA

Page 24: DNA Sequencing Jessica Scheld. Recall: DNA Polymer of nucleotides which encodes information Made up of long sequences of A,T,C,G’s We want to read the