the simplified partial digest problem: hardness and a probabilistic analysis
DESCRIPTION
The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis. Zo ë Abrams [email protected] Ho-Lin Chen [email protected]. Restriction Site Analysis. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/1.jpg)
The Simplified Partial Digest Problem:Hardness and a Probabilistic Analysis
Zoë Abrams [email protected]
Ho-Lin Chen [email protected]
![Page 2: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/2.jpg)
An enzyme cuts a target DNA strand to into DNA fragments, and these DNA fragments are used to reconstruct the restriction site locations of the enzyme.
Two common Approaches
Double Digest Problem (NP-complete) [Goldstein, Waterman ’87]
Partial Digest Problem
Restriction Site Analysis
![Page 3: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/3.jpg)
Reconstruct the locations using the length of all fragments that can possibly be produced.
The hardness of the problem is unknown. [Skiena, Sundaram ’93][Lemke, Skiena, Smith ’02]
Adding the primary fragments to the information used, we can find a unique reconstruction in polynomial time. [Pandurangan, Ramesh ’01]
Information is susceptible to experimental error caused by missing fragments.
Partial Digest Problem
![Page 4: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/4.jpg)
Proposed by Blazewicz et. Al. ’01 Uses primary fragments and base fr
agments to reconstruct restriction sites Primary fragments: One of the endpoin
ts is the endpoint of the original DNA strand
Base fragments: two endpoints are consecutive sites on the DNA strand
Simplified Partial Digest Problem
![Page 5: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/5.jpg)
Problem Definition
Given
X0 = 0, Xn+1 = D
A set of base fragments
{Xi - Xi-1}1 i n+1
A set of primary fragments
{(Xn+1 - Xi) (Xi – X0)}1 i n
Reconstruct the original series X1,...,Xn,
![Page 6: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/6.jpg)
Theoretical and Algorithmic Issues
The algorithm that finds the exact solution may take 2n time in the worst case. [Blazewicz, Jaroszewski ’03]
The Simplified Partial Digest Problem may have exponential number of solutions.
The problem is APX-hard.
Simple algorithms can give correct solution with high probability.
![Page 7: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/7.jpg)
Proof of APX-Hardness
We proved Simplified Partial Digest Problem is APX-hard by reducing the Tripartite-Matching problem to it.
Tripartite-Matching Problem:
Given a set S of triples in {1,2,3..n}3 , |S|=T.
Find whether there exists a subset M of S such that |M| = n, and no two triples in M are the same in some coordinates.
![Page 8: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/8.jpg)
Tripartite Matching Problem
![Page 9: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/9.jpg)
Tripartite Matching Problem
![Page 10: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/10.jpg)
Proof of APX-Hardness Use symmetric restriction sites to cut the segment into
2T equal-length segments
…….1 2 2T
![Page 11: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/11.jpg)
Proof of APX-Hardness Use symmetric restriction sites to cut the segment into
2T equal-length segments
…….
Pairs of symmetric restriction sites
![Page 12: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/12.jpg)
Proof of APX-Hardness Use symmetric restriction sites to cut the segment into
2T equal-length segments
…….
Pairs of symmetric restriction sites
![Page 13: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/13.jpg)
Proof of APX-Hardness Use symmetric restriction sites to cut the segment into
2T equal-length segments
…….
Pairs of symmetric restriction sites
![Page 14: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/14.jpg)
Proof of APX-Hardness Use symmetric restriction sites to cut the segment into
2T equal-length segments. In each pair of equal-length segments, there are seven
restriction sites that can be put on either side.
…….1 2 2T
Sites “x" can be on either side
![Page 15: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/15.jpg)
Proof of APX-Hardness Use symmetric restriction sites to cut the segment into
2T equal-length segments. In each pair of equal-length segments, there are seven
restriction sites that can be put on either side.
…….1 2 2T
Sites “x" can be on either side
![Page 16: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/16.jpg)
Proof of APX-Hardness Those seven restriction sites can be divided into two
groups, denoted by “o” and “x” respectively.
![Page 17: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/17.jpg)
Proof of APX-Hardness Those seven restriction sites can be divided into two
groups, denoted by “o” and “x” respectively. In each segment, restriction sites in the same group
must be put on the same side.
![Page 18: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/18.jpg)
Proof of APX-Hardness Those seven restriction sites can be divided into two
groups, denoted by “o” and “x” respectively. In each segment, restriction sites in the same group
must be put on the same side. Each placement of restriction sites corresponds to a set
of triples chosen in the Tripartite Matching Problem.
not chosen
chosen
![Page 19: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/19.jpg)
Proof of APX-Hardness Those seven restriction sites can be divided in
to two groups, denoted by “o” and “x” respectively.
In each segment, restriction sites in the same group must be put on the same side.
Each placement of restriction sites corresponds to a set of triples chosen in the Tripartite Matching Problem.
The current placement of restriction sites is a solution iff the corresponding set of triples is a solution to the Tripartite Matching Problem.
![Page 20: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/20.jpg)
A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side
![Page 21: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/21.jpg)
A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side From each site, do (from endpoints to the middle)
If the base segment is matched, fix its location
![Page 22: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/22.jpg)
A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side From each site, do (from endpoints to the middle)
If the base segment is matched, fix its location If the base segment isn’t matched, move it and all points
toward middle to the other side.
![Page 23: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/23.jpg)
A Simple Algorithm Put all symmetric points at correct locations Put all asymmetric points on the left side From each site, do (from endpoints to the middle)
If the base segment is matched, fix its location If the base segment isn’t matched, move it and all points
toward middle to the other side.
![Page 24: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/24.jpg)
Analysis of the Algorithm Assuming a uniform distribution for restriction
sites, for many practical parameters*, with probability at least 0.4 the algorithm outputs correct locations.
All the primary fragments are matched, and at least ¼ of all base fragments will be matched in the worst case.
Runs in time linear to the number of sites
*Ex: Length of the DNA strand around 20,000, 10-20 restriction sites
![Page 25: The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis](https://reader035.vdocument.in/reader035/viewer/2022062421/56812acf550346895d8eab21/html5/thumbnails/25.jpg)
Future Work
Construct better heuristics to solve SPDP
Analyze the hardness of Partial Digest Problem
Find other characterizations of restriction sites that are both easy to measure and can be used to reconstruct the sites