querying and embedding compressed textsyura/talks/mfcs-talk.pdf · querying and embedding...
TRANSCRIPT
![Page 1: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/1.jpg)
Querying and Embedding
Compressed Texts
Yury Lifshits1, Markus Lohrey2
1Steklov Institute of Mathematics at St.Petersburg,[email protected]
2Stuttgart University,[email protected]
Stara LesnaAugust 2006
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 1 / 18
![Page 2: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/2.jpg)
Subsequence Matching (Embedding)
INPUT: pattern TEAM and text
I N T E R N A T I O N A L S Y M P O S I U M M F C S
TASK: to check whether the text contains the pattern as asubsequence (i.e. gaps are allowed)
OUTPUT: YesI N T E R N A T I O N A L S YMP O S I U M M F C S
Problem for this talk:
Given a COMPRESSED text and a COMPRESSED pattern can wesolve embedding faster than just “unpack-and-search”?
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 2 / 18
![Page 3: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/3.jpg)
Subsequence Matching (Embedding)
INPUT: pattern TEAM and text
I N T E R N A T I O N A L S Y M P O S I U M M F C S
TASK: to check whether the text contains the pattern as asubsequence (i.e. gaps are allowed)
OUTPUT: YesI N T E R N A T I O N A L S YMP O S I U M M F C S
Problem for this talk:
Given a COMPRESSED text and a COMPRESSED pattern can wesolve embedding faster than just “unpack-and-search”?
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 2 / 18
![Page 4: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/4.jpg)
Subsequence Matching (Embedding)
INPUT: pattern TEAM and text
I N T E R N A T I O N A L S Y M P O S I U M M F C S
TASK: to check whether the text contains the pattern as asubsequence (i.e. gaps are allowed)
OUTPUT: YesI N T E R N A T I O N A L S YMP O S I U M M F C S
Problem for this talk:
Given a COMPRESSED text and a COMPRESSED pattern can wesolve embedding faster than just “unpack-and-search”?
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 2 / 18
![Page 5: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/5.jpg)
Outline of the Talk
1 New topic in computer science: algorithmsfor compressed texts
2 Our problems and our results
3 Some proof ideas
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 3 / 18
![Page 6: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/6.jpg)
Part I
What are compressed texts?
Can we do something interesting withoutunpacking?
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 4 / 18
![Page 7: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/7.jpg)
Straight-line Programs: Definition
Straight-line program (SLP) is aContext-free grammar generatingexactly one stringTwo types of productions:Xi → a and Xi → XpXq
Example
abaababaabaab
X1 → bX2 → aX3 → X2X1
X4 → X3X2
X5 → X4X3
X6 → X5X4
X7 → X6X5
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 5 / 18
![Page 8: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/8.jpg)
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 6 / 18
![Page 9: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/9.jpg)
SLP = Compressed Text
Rytter, 2003: Consider the archive of size z obtained by LZ78,LZWor some dictionary-based compression method. Then we can in timeO(z) convert it to SLP of size O(z) generating the same text.
Rytter, 2003: Consider the LZ77-compressed or RLE-compressedtext T of original length n and the archive of size z . Then we can intime O(z log n) convert it to SLP of the size O(z log n) generatingthe same text.
In the following by compressed text we mean an SLP generating it
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 7 / 18
![Page 10: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/10.jpg)
SLP = Compressed Text
Rytter, 2003: Consider the archive of size z obtained by LZ78,LZWor some dictionary-based compression method. Then we can in timeO(z) convert it to SLP of size O(z) generating the same text.
Rytter, 2003: Consider the LZ77-compressed or RLE-compressedtext T of original length n and the archive of size z . Then we can intime O(z log n) convert it to SLP of the size O(z log n) generatingthe same text.
In the following by compressed text we mean an SLP generating it
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 7 / 18
![Page 11: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/11.jpg)
SLP = Compressed Text
Rytter, 2003: Consider the archive of size z obtained by LZ78,LZWor some dictionary-based compression method. Then we can in timeO(z) convert it to SLP of size O(z) generating the same text.
Rytter, 2003: Consider the LZ77-compressed or RLE-compressedtext T of original length n and the archive of size z . Then we can intime O(z log n) convert it to SLP of the size O(z log n) generatingthe same text.
In the following by compressed text we mean an SLP generating it
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 7 / 18
![Page 12: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/12.jpg)
Why algorithms on compressed texts?
Answer for algorithms people:
Might be faster than “unpack-and-search”
Saving storing space and transmitting costs
Many fields with highly compressible data: statistics (internetlog files), automatically generated texts, message sequencecharts for parallel programs
Answer for complexity people:
Some problems are hard in worst case. But they might be easyfor compressible inputs
New complexity relations. Similar problems have differentcomplexities on compressed inputs
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 8 / 18
![Page 13: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/13.jpg)
Problems on SLP-generated texts
∃ poly algorithms: At least NP-hard:
GKPR’96 Equivalence L’06 Hamming distanceGKPR’96 Regular Language Lohrey’04 Context-FreeMembership Language MembershipGKPR’96 Shortest Period BKLPR’02 Two-dimensionalL’06 Shortest Cover Compressed Pattern MatchingL’06 Fingerprint TableGKPR’96 Fully CompressedPattern MatchingCGLM’06 Window SubsequenceMatching
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 9 / 18
![Page 14: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/14.jpg)
Part II
What are embedding and querying problems on
compressed texts?
How computationally hard are they?
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 10 / 18
![Page 15: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/15.jpg)
Querying and Embedding Compressed Texts
Compressed Embedding Problem:INPUT: Two SLPs generating strings T and POUTPUT: YES if T contains P as a subsequence, otherwise NO
Compressed Querying Problem:INPUT: A SLP generating string T , position i , character aOUTPUT: YES if Ti = a, otherwise NO
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 11 / 18
![Page 16: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/16.jpg)
Querying and Embedding Compressed Texts
Compressed Embedding Problem:INPUT: Two SLPs generating strings T and POUTPUT: YES if T contains P as a subsequence, otherwise NO
Compressed Querying Problem:INPUT: A SLP generating string T , position i , character aOUTPUT: YES if Ti = a, otherwise NO
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 11 / 18
![Page 17: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/17.jpg)
Compressed Embedding is Hard
GKPR’96 proved that string matching when both the text and thepattern are compressed has a polynomial algorithm.
Natural question: then what about subsequence matching?
MAIN RESULT 1:
Compressed Embedding problem is NP-hard
Compressed Embedding problem is co-NP-hard.
Compressed Embedding problem is Θ2-hard
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 12 / 18
![Page 18: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/18.jpg)
Compressed Embedding is Hard
GKPR’96 proved that string matching when both the text and thepattern are compressed has a polynomial algorithm.
Natural question: then what about subsequence matching?
MAIN RESULT 1:
Compressed Embedding problem is NP-hard
Compressed Embedding problem is co-NP-hard.
Compressed Embedding problem is Θ2-hard
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 12 / 18
![Page 19: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/19.jpg)
Compressed Embedding is Hard
GKPR’96 proved that string matching when both the text and thepattern are compressed has a polynomial algorithm.
Natural question: then what about subsequence matching?
MAIN RESULT 1:
Compressed Embedding problem is NP-hard
Compressed Embedding problem is co-NP-hard.
Compressed Embedding problem is Θ2-hard
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 12 / 18
![Page 20: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/20.jpg)
Compressed Querying is Hard
The most used operation on compressed texts is decompressing.
Natural question: can it be done efficiently by a parallel algorithm?
MAIN RESULT 2:Compressed Querying problem is P-complete.
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 13 / 18
![Page 21: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/21.jpg)
Compressed Querying is Hard
The most used operation on compressed texts is decompressing.
Natural question: can it be done efficiently by a parallel algorithm?
MAIN RESULT 2:Compressed Querying problem is P-complete.
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 13 / 18
![Page 22: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/22.jpg)
Part III
How to prove NP-hardness of Embedding?
How to prove co-NP-hardness of Embedding?
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 14 / 18
![Page 23: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/23.jpg)
Proving NP-hardness
Classical reduction:
1 Take an NP-complete problem (Subset Sum)
2 For every instance of Subset Sum construct two straight lineprograms such that
Embedding holds ⇔ Subset Sum has answer “Yes”
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 15 / 18
![Page 24: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/24.jpg)
Proving co-NP-hardness
Lemma (Yes-No symmetry):For every SLPs X and Y we can in polynomial timeconstruct SLPs X ′ and Y ′ such that:
Embedding holds for X and Y⇔
Embedding does not hold for X ′ and Y ′
Corollary: NP-hardness implies co-NP-hardness
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 16 / 18
![Page 25: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/25.jpg)
Proving co-NP-hardness
Lemma (Yes-No symmetry):For every SLPs X and Y we can in polynomial timeconstruct SLPs X ′ and Y ′ such that:
Embedding holds for X and Y⇔
Embedding does not hold for X ′ and Y ′
Corollary: NP-hardness implies co-NP-hardness
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 16 / 18
![Page 26: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/26.jpg)
Summary
Main points:
Compressed text = text generated by SLP
For compressed texts querying is P-complete, embedding isΘ2-hard
Method: reduction from subset sum problem, “yes-no”symmetry
Open Problems:
What is exact complexity of Compressed Embedding problem(we know that it is somewhere between Θ2 and PSPACE)?
To construct O(nm) algorithms for edit distance, where n is thelength of T1 and m is the compressed size of T2
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 17 / 18
![Page 27: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/27.jpg)
Summary
Main points:
Compressed text = text generated by SLP
For compressed texts querying is P-complete, embedding isΘ2-hard
Method: reduction from subset sum problem, “yes-no”symmetry
Open Problems:
What is exact complexity of Compressed Embedding problem(we know that it is somewhere between Θ2 and PSPACE)?
To construct O(nm) algorithms for edit distance, where n is thelength of T1 and m is the compressed size of T2
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 17 / 18
![Page 28: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/28.jpg)
Last Slide
Yury Lifshits http://logic.pdmi.ras.ru/~yura/
Our relevant papers:
Yury Lifshits and Markus LohreyQuerying and Embedding Compressed TextsMFCS’06.
Yury LifshitsSolving Classical String Problems on Compressed Textspreprint at Arxiv:cs.DS/0604058, 2006.
P. Cegielski, I. Guessarian, Yu. Lifshits and Yu. MatiyasevichWindow Subsequence Problems for Compressed TextsCSR’06.
Markus LohreyWord Problems and Membership Problems on Compressed WordsICALP’04.
Thanks for attention!
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 18 / 18
![Page 29: Querying and Embedding Compressed Textsyura/talks/mfcs-talk.pdf · Querying and Embedding Compressed Texts Yury Lifshits1, Markus Lohrey2 1Steklov Institute of Mathematics at St.Petersburg,](https://reader035.vdocument.in/reader035/viewer/2022071123/601fd8c56367ca4a0b5c80ba/html5/thumbnails/29.jpg)
Last Slide
Yury Lifshits http://logic.pdmi.ras.ru/~yura/
Our relevant papers:
Yury Lifshits and Markus LohreyQuerying and Embedding Compressed TextsMFCS’06.
Yury LifshitsSolving Classical String Problems on Compressed Textspreprint at Arxiv:cs.DS/0604058, 2006.
P. Cegielski, I. Guessarian, Yu. Lifshits and Yu. MatiyasevichWindow Subsequence Problems for Compressed TextsCSR’06.
Markus LohreyWord Problems and Membership Problems on Compressed WordsICALP’04.
Thanks for attention!
Yury Lifshits, Markus Lohrey (SPb-Stuttgart) Querying and Embedding Compressed Texts MFCS’2006 18 / 18