a grammar compression algorithm based on induced suffix ... · introductiongcisresultsfinal...

54
Introduction GCIS Results Final Considerations A Grammar Compression Algorithm based on Induced Suffix Sorting D. S. N. Nunes 1,2 F. Louza 3 S. Gog 4 M. Ayala-Rinc´ on 2 G. Navarro 5 1 Federal Institute of Education, Science and Technology of Bras´ ılia, Brazil 2 Department of Computer Science, University of Bras´ ılia, Brazil 3 Department of Computing and Mathematics, University of S˜ ao Paulo, Brazil 4 Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Germany 5 Department of Computer Science, University of Chile, Chile 28 th March 2018 Data Compression Conference Snowbird, Utah, U.S. Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 1/33

Upload: others

Post on 02-Nov-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

A Grammar Compression Algorithm based onInduced Suffix Sorting

D. S. N. Nunes 1,2 F. Louza 3 S. Gog4 M. Ayala-Rincon 2

G. Navarro 5

1Federal Institute of Education, Science and Technology of Brasılia, Brazil

2Department of Computer Science, University of Brasılia, Brazil

3Department of Computing and Mathematics, University of Sao Paulo, Brazil

4Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Germany

5Department of Computer Science, University of Chile, Chile

28th March 2018

Data Compression Conference

Snowbird, Utah, U.S.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 1/33

Page 2: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Summary

1 Introduction

2 GCIS

3 Results

4 Final Considerations

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 2/33

Page 3: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Summary

1 Introduction

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 3/33

Page 4: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Introduction

Lossless data compression reduce space requirement byidentifying and eliminating redundancy.

Useful in practice: reduce resources required to store andtransmit data.

Space/Time trade-off.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 4/33

Page 5: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Introduction

The Suffix Array Data Structure is used extensively inStringology.

Key to solve several text-related problems in efficient oroptimal time, using a small footprint of memory, if comparedto other data structures.I Exact Pattern Matching;I Approximate Pattern Matching;I LZ factorization;I Finding repeats.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 5/33

Page 6: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Suffix Array

i A[i] TA[i]

0 22 0

1 21 a0

2 18 aana0

3 13 aananaana0

4 8 aananaananaana0

5 3 aananaananaananaana0

6 19 ana0

7 16 anaana0

8 11 anaananaana0

9 6 anaananaananaana0

10 1 anaananaananaananaana0

11 14 ananaana0

12 9 ananaananaana0

13 4 ananaananaananaana0

14 0 banaananaananaananaana0

15 20 na0

16 17 naana0

17 12 naananaana0

18 7 naananaananaana0

19 2 naananaananaananaana0

20 15 nanaana0

21 10 nanaananaana0

22 5 nanaananaananaana0

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 6/33

Page 7: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Nong’s Algorithm

Nong et al. algorithm is capable of sorting the suffixes of atext in linear optimal time.

Very fast in practice.

Induces the order of suffixes based in the already calculatedorder of other suffixes.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 7/33

Page 8: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Our Contribution

Develop a novel grammar compression algorithm based on theinduced suffix sorting algorithm from Nong et al. to compressthe original string.I Faster in compression than 7-zip and Re-Pair.I Lower memory peak under compression than Re-Pair.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 8/33

Page 9: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Summary

2 GCIS

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 9/33

Page 10: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCISWe adapt SAIS framework from Nong et al. to build our grammar:

1 Classify suffixes into “S”, “L”, or “LMS” types.

2 Sort all LMS-suffixes by its first symbol.3 Induce:

1 L-type suffixes from LMS-type suffixes;2 S-Type suffixes from L-type suffixes;

4 Rename the LMS-Substrings to obtain T ′ and create grammarrules;

5 If there are equal renamed factors, solve the problemrecursively for T ′. Else, store T explicitly.

6 LMS-suffixes are now sorted regarding all symbols. Repeat 3.1and 3.2 to obtain the suffix array.

7 LMS-suffixes are now sorted regarding all symbols. Repeat 3.1and 3.2 to obtain the suffix array.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 10/33

Page 11: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCIS

Example

i

T

0

b

1

a

2

n

3

a

4

a

5

n

6

a

7

n

8

a

9

a

10

n

11

a

12

n

13

a

14

a

15

n

16

a

17

n

18

a

19

a

20

n

21

a

22

0

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 11/33

Page 12: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCIS

Classification

i

T

B

0

b

L

1

a

*

2

n

L

3

a

*

4

a

S

5

n

L

6

a

*

7

n

L

8

a

*

9

a

S

10

n

L

11

a

*

12

n

L

13

a

*

14

a

S

15

n

L

16

a

*

17

n

L

18

a

*

19

a

S

20

n

L

21

a

L

22

0

*

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 12/33

Page 13: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCIS

Radixsort

i

T

B

A

0

b

L

0

22

1

a

*

1

2

n

L

2

3

a

*

3

4

a

S

4

5

n

L

5

6

a

*

6

1

7

n

L

7

3

8

a

*

8

6

9

a

S

9

8

10

n

L

10

11

11

a

*

11

13

12

n

L

12

16

13

a

*

13

18

14

a

S

14

15

n

L

15

16

a

*

16

17

n

L

17

18

a

*

18

19

a

S

19

20

n

L

20

21

a

L

21

22

0

*

22

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 13/33

Page 14: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCIS

Inducing L-Type Suffixes

i

T

B

A

A

0

b

L

22

22

0

1

a

*

21

1

2

n

L

2

3

a

*

3

4

a

S

4

5

n

L

5

6

a

*

1

1

6

7

n

L

3

3

7

8

a

*

6

6

8

9

a

S

8

8

9

10

n

L

11

11

10

11

a

*

13

13

11

12

n

L

16

16

12

13

a

*

18

18

13

14

a

S

0

14

15

n

L

20

15

16

a

*

2

16

17

n

L

5

17

18

a

*

7

18

19

a

S

10

19

20

n

L

12

20

21

a

L

15

21

22

0

*

17

22

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 14/33

Page 15: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCIS

Inducing S-Type Suffixes

i

T

B

A

A

0

b

L

22

22

0

1

a

*

21

21

1

2

n

L

18

2

3

a

*

3

3

4

a

S

8

4

5

n

L

13

5

6

a

*

1

19

6

7

n

L

3

1

7

8

a

*

6

4

8

9

a

S

8

6

9

10

n

L

11

9

10

11

a

*

13

11

11

12

n

L

16

14

12

13

a

*

18

16

13

14

a

S

0

0

14

15

n

L

20

20

15

16

a

*

2

2

16

17

n

L

5

5

17

18

a

*

7

7

18

19

a

S

10

10

19

20

n

L

12

12

20

21

a

L

15

15

21

22

0

*

17

17

22

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 15/33

Page 16: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCIS

Definition (LMS-Substring)

A LMS-Substring is either the sentinel symbol 0 or a substringT [i, j − 1] with both T [i] and T [j] being of type LMS and thereis no other LMS symbols for i 6= j. We denote this substring bysub(i).

After inducing L and S-type suffixes, all LMS-substrings aresorted.

All LMS-substrings are renamed according to their order.

A pairwise comparison should be done to check if theLMS-substrings are equal.

T is replaced by the renamed factors, giving place to T ′.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 16/33

Page 17: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

GCIS

Renaming

i

T

B

A

0

b

L

22

*

1

a

*

21

2

n

L

18

*

3

a

*

3

*

4

a

S

8

*

5

n

L

13

*

6

a

*

19

7

n

L

1

*

8

a

*

4

9

a

S

6

*

10

n

L

9

11

a

*

11

*

12

n

L

14

13

a

*

16

*

14

a

S

0

15

n

L

20

16

a

*

2

17

n

L

5

18

a

*

7

19

a

S

10

20

n

L

12

21

a

L

15

22

0

*

17

sub(22) = 0 7→ 0 sub(1) = an 7→ 3sub(18) = aana 7→ 1 sub(6) = an 7→ 3sub(3) = aan 7→ 2 sub(11) = an 7→ 3sub(8) = aan 7→ 2 sub(16) = an 7→ 3sub(13) = aan 7→ 2

T ′ = 323232310Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 17/33

Page 18: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Creating Rules

For every unique LMS-substring S a rule of the form X → Sis created.

In order to obtain a good compression ratio, certain propertiesshared between the LMS-substrings shall be explored.

The first property is that the LMS-substrings are sorted!

We can represent them by a pair (`, s):I `: the common prefix length shared with the previous

LMS-substring.I s the remaining suffix not encoded by `.

A special rule is created to store the first symbols of T notcovered by a LMS-substring.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 18/33

Page 19: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Encoding the Factors

For T = banaananaananaananaana0

0→ (0,’0’); //’0’;

1→ (0,’aana’); // ’aana’

2→ (3,”); // ’aan’

3→ (1,’n’); // ’an’

Tail→ b

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 19/33

Page 20: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Encoding the Factors

The ` values can be encoded succinctly by using Simple-8b

Tries to fit as many fixed-width integer as possible in a 64-bitword.

With a single memory access, many values are retrieved.

Table: Simple8b possible arrangements.

Selector value 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Item width 0 0 1 2 3 4 5 6 7 8 10 12 15 20 30 60Group Size 240 120 60 30 20 15 12 10 8 7 6 5 4 3 2 1Wasted bits 60 60 0 0 0 0 0 0 4 4 0 0 0 0 0 0

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 20/33

Page 21: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Encoding the Factors

The s parts are concatenated in a single array of integers withfixed width of dlg(|Σ|)e bits.

V = 0aanan

A support bitmap encoding the length of each s part iscreated.

BV = 0100001101

Length of suffix i =select1(BV, i + 1)− select1(BV, i)− 1

Start of suffix i in V = select1(BV, i)− i + 1.

To improve space usage, the bitmap is encoded usingElias-Fano.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 21/33

Page 22: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decoding

We begin by the reduced string.

At each recursion level, we read T [i] = X and replace it by itsthe correct LMS-substring X → S.

To make the decoding faster, all rules from the current levelare decompressed previously.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 22/33

Page 23: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Summary

3 Results

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 23/33

Page 24: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Results

In order to visualize the potential of the proposed grammarcompressor (GCIS), experiments were done with therepetitive Pizza-Chili corpus, which was populated with textsof different nature.

GCIS was compared against popular compressors suited torepetitive sequences: Re-Pair and 7-zip.

Three subjects were evaluated:I Compression ratio: compressedSize/Size.I Compression time (s).I Decompression time (s).

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 24/33

Page 25: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 26: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 27: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 28: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 29: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 30: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 31: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 32: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 25/33

Page 33: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 34: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 35: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 36: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 37: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 38: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 39: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 40: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Compression Ratio

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 26/33

Page 41: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 42: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 43: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 44: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 45: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 46: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 47: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 48: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Decompression Time

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 27/33

Page 49: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Peak Memory during Compression

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 28/33

Page 50: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Peak Memory during Decompression

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 29/33

Page 51: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Summary

4 Final Considerations

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 30/33

Page 52: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Final Considerations

GCIS showed to be a practical alternative to popularcompressors due its good compression ratio and time.

Competitive in compression regarding 7-zip and Re-Pairand much faster than both.

Slower decode times.

Worse compression ratio, but comparable to Re-Pair.

Lower memory peak than Re-Pair.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 31/33

Page 53: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Work in Progress

It is possible to support extraction of any substring withoutmuch space by storing the rule lenghts succinctly.

Re-Pair also supports extract but in a less space-efficientversion which requires (2.x to 3 times more space) .

7-zip does not support extraction.

Develop extraction and compare with the less space-efficientversion of Re-Pair.

We hope that GCIS will be more space-efficient thanRe-Pair when supporting the extract operation.

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 32/33

Page 54: A Grammar Compression Algorithm based on Induced Suffix ... · IntroductionGCISResultsFinal Considerations A Grammar Compression Algorithm based on Induced Su x Sorting D. S. N. Nunes

Introduction GCIS Results Final Considerations

Thank you

Thank you!

Nunes, D. S. N. et al. A Grammar Compression Algorithm based on Induced Suffix Sorting DCC 2018 33/33