an improved succinct dynamic k-ary tree representation (work in progress) diego arroyuelo department...

31
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Upload: isabella-benfield

Post on 19-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

An Improved Succinct Dynamic k-Ary Tree

Representation (work in progress)

Diego ArroyueloDepartment of Computer Science, Universidad de Chile

Page 2: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 3: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 4: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Succinct data structures

In a k-ary tree each node has at most k children, each children labeled with a symbol in the set {1,…, k} (tries)

A succinct data structure requires space close to the information-theoretic lower bound

There are different k-ary trees with n nodes

Therefore, the information-theoretical lower bound is about bits if k is not a constant with respect to n

Page 5: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Succinct data structures

We are interested in succinct representation that can be navigated

We are interested in operations parent(x): parent of node x child(x, i): ith child of node x child(x, a): child of node x by label a depth(x) degree(x) subtree-size(x) preorder(x) is-ancestor(x, y): is node x an ancestor of node y? insertions (assume in the leaves) deletions (just for unary nodes and leaves)

The traditional representation of trees requires nlog n bits for (almost) each operation

Page 6: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Succinct tree representations

Succinct representations for static trees:

LOUDS [Jacobson, FOCS’89] Balanced Parentheses [MR, STOC’97] DFUDS [Benoit et al., Algorithmica 2005] xbw [Ferragina et al., FOCS’05] Ultra succinct trees [Jansson et al., SODA’07]

These must be rebuilt from scrath upon insertion or deletion of nodes

Page 7: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Succinct tree representations

The case of succinct dynamic trees has been studied only for binary trees

Munro, Raman, and Storm [SODA’01] 2n + o(n) bits parent, child in constant time Updates and subtree-size in O(polylog(n)) time

Raman and Rao [ICALP’03] 2n + o(n) bits Parent, child, preorder, and subtree-size in O(1) time Updates in O((loglog n)1+) amortized (O(log n loglog n) worst case)

k-ary trees: basic navigation in O(k) time (assume k is not a constant)

Page 8: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Dynamic balanced parentheses Chan et al. [TALG 2007] define a dynamic

representation for balanced parentheses

This can be used to represent a dynamic k-ary tree using O(n) bits of space

The time for all operations is related to the number of nodes in the tree rather than to k (O(log n) time)

This data structure cannot take advantage when k is asymptotically smaller than n (e.g., k = O(polylog(n)))

We look to achieve o(log n) time whenever log k=o(log u)

Page 9: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Motivations

This work is motivated by previous works on LZ-indices

Space-efficient construction of LZ-index [AN, ISAAC’05] Very preliminary representation: nlog n bits for pointers,

child operation and insertions in O(k) worst-case time

LZ-index on disk [AN, CPM’07] Basic operations in O(1) CPU time, yet nlog n bits are

needed for pointers and does not support insertions nor deletions

Page 10: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 11: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Our basic tree representation

We incrementally divide the tree into disjoint blocks[MRS, RR, AN]

Every block represents a subtree of N nodes such that

Nmin ≤ N ≤ Nmax

We arrange these blocks in a tree by adding inter-block pointers (entire tree is tree of subtrees)

Page 12: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Our basic tree representation

frontier of the block

duplicated nodes

Page 13: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Our basic tree representation

We define Nmin (minimum block size) as follows

Inter-block pointers should require o(n) bits

Therefore we define Nmin = (log2n)(In general, Nmin = (log n f(n)), for f(n) = (1))

In this way we have (worst case) one pointer out of (log2n) nodes

And hence o(n) bits for pointers

Page 14: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Our basic tree representation

We define Nmax (maximum block size) as follows

In case of block overflow we should be able to create a new block of size at least Nmin from the full block

In the worst case, the root of the block has its k children, all of them having a subtree of the same size

By choosing Nmax= (klog2n) we solve this problem

Page 15: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Our basic tree representation

The blocks cannot be as small as we would like

We support dynamic operations on the tree by:

Dividing the tree into blocks (we only need to rebuild a block upon updates)

Making these smaller trees dynamic (different to other approaches)

We represent the blocks using a dynamic DFUDS representation on top of Chan et al.’s [TALG, 2007] We solve the basic navigation inside blocks in

O(log N) = O(log k + loglog n) Insertions can be also handled in the same time We require overall 2n+o(n) bits

Page 16: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 17: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Representing the blocks

We represent the symbols Sp labeling the arcs of the trie with a data structure for rank and select [GN, submitted] We compute childp(x, a) by

rank and select on Sp

childp(x, i) on p

childp(x, a) can be computed in O(log N log k / loglog N) = O((log2k + loglog n) / log(logk + log log n)) time

The space requirement is nlog k + o(nlog k) bits

Page 18: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 19: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Representing the frontier of a block We need to indicate which nodes in a block have a

pointer to a child block

This can be done by using a bit vector However this would require 3n+o(n) bits overall for the tree

structure

We define array Fp storing the preorders of the nodes having a child pointer Since there are O(n/log2n) pointers, this requires o(n) bits

Page 20: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Representing the frontier of a block

Tp: (((())(()))((())))

Fp:

We must change allthe preorders in FP from this position 3 5 8 4

(3) (8) (16) (20) 3 6 8 4(3) (9) (17) (21)

O(log N) time

Array Fp is represented in differential form with a data structure for Searchable Partial Sums

Page 21: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 22: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Representing inter-block pointers Pointers to child blocks

We store the pointers to child blocks in array PTRp Increasingly sorted according to the preorders of the nodes

in the frontier

Pointers to parent block In each block p we need a pointer to the representation of

the root of p in the parent block However the position of a node change upon updates A parent pointer is composed of

A pointer to the parent block q If p is the j-th child of q, then we store value j in p

Page 23: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Representing inter-block pointers

p,1 p,2 p,3 p,4

Tp: (((())(()))((())))

Fp:

PTRp:1 2 3 4

p

Page 24: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 25: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Solving the basic operations

child(x, i): Look for preorder of x in Fp If we find it, follow child pointer to block q and apply childq

on the root of q Otherwise, use childp operation This takes O(log N) = O(log k + loglog n) time

child(x,a) is solved in the same way, but using childp(x,a) instead

parent(x): if x is the root of block, follow parent pointer to block p. Then apply parentp(x)

Page 26: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Solving the basic operations

Insert: We use the corresponding insertion operation on the block When a block p becomes full

1. Choose node z in block p2. Reinsert the nodes in the subtree of z in a new block q

(along with the corresponding part in the frontier of p)3. Delete the subtree of z from p

Total cost is O(log k + loglog n) amortized (if we are able to spend time proportional to the size of the subtree of z)

List of candidates subtrees in each block (o(n) bits overall)

Page 27: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Page 28: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Solving specialized operations We can solve other operations by using this

representation degree(x) depth(x)

subtree-size(x) x

Sizep

Page 29: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Solving specialized operations We can solve other operations by using this

representation preorder(x)

is-ancestor(x, y)

lca(x, y)

Page 30: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Conclusions

We have defined a representation for dynamic k-ary trees requiring space close to the information-theoretical lower bound

We can profit from smaller alphabets o(log n) time for operations whenever log k = o(log n) In particular, O(loglog n) time for k=O(polylog(n)) Versus O(log n) time of Chan et al. for any alphabet size

We need extra o(nlog k) bits of space

Page 31: An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Discussion

What happens if we have external pointers to the tree nodes?

Can we compress the dynamic DFUDS representation of blocks? (just as in [JSS, SODA’07])

Suffix links in little space? (assuming a suffix-closed trie)