an improved succinct dynamic k-ary tree representation (work in progress) diego arroyuelo department...

Post on 19-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Improved Succinct Dynamic k-Ary Tree

Representation (work in progress)

Diego ArroyueloDepartment of Computer Science, Universidad de Chile

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Succinct data structures

In a k-ary tree each node has at most k children, each children labeled with a symbol in the set {1,…, k} (tries)

A succinct data structure requires space close to the information-theoretic lower bound

There are different k-ary trees with n nodes

Therefore, the information-theoretical lower bound is about bits if k is not a constant with respect to n

Succinct data structures

We are interested in succinct representation that can be navigated

We are interested in operations parent(x): parent of node x child(x, i): ith child of node x child(x, a): child of node x by label a depth(x) degree(x) subtree-size(x) preorder(x) is-ancestor(x, y): is node x an ancestor of node y? insertions (assume in the leaves) deletions (just for unary nodes and leaves)

The traditional representation of trees requires nlog n bits for (almost) each operation

Succinct tree representations

Succinct representations for static trees:

LOUDS [Jacobson, FOCS’89] Balanced Parentheses [MR, STOC’97] DFUDS [Benoit et al., Algorithmica 2005] xbw [Ferragina et al., FOCS’05] Ultra succinct trees [Jansson et al., SODA’07]

These must be rebuilt from scrath upon insertion or deletion of nodes

Succinct tree representations

The case of succinct dynamic trees has been studied only for binary trees

Munro, Raman, and Storm [SODA’01] 2n + o(n) bits parent, child in constant time Updates and subtree-size in O(polylog(n)) time

Raman and Rao [ICALP’03] 2n + o(n) bits Parent, child, preorder, and subtree-size in O(1) time Updates in O((loglog n)1+) amortized (O(log n loglog n) worst case)

k-ary trees: basic navigation in O(k) time (assume k is not a constant)

Dynamic balanced parentheses Chan et al. [TALG 2007] define a dynamic

representation for balanced parentheses

This can be used to represent a dynamic k-ary tree using O(n) bits of space

The time for all operations is related to the number of nodes in the tree rather than to k (O(log n) time)

This data structure cannot take advantage when k is asymptotically smaller than n (e.g., k = O(polylog(n)))

We look to achieve o(log n) time whenever log k=o(log u)

Motivations

This work is motivated by previous works on LZ-indices

Space-efficient construction of LZ-index [AN, ISAAC’05] Very preliminary representation: nlog n bits for pointers,

child operation and insertions in O(k) worst-case time

LZ-index on disk [AN, CPM’07] Basic operations in O(1) CPU time, yet nlog n bits are

needed for pointers and does not support insertions nor deletions

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Our basic tree representation

We incrementally divide the tree into disjoint blocks[MRS, RR, AN]

Every block represents a subtree of N nodes such that

Nmin ≤ N ≤ Nmax

We arrange these blocks in a tree by adding inter-block pointers (entire tree is tree of subtrees)

Our basic tree representation

frontier of the block

duplicated nodes

Our basic tree representation

We define Nmin (minimum block size) as follows

Inter-block pointers should require o(n) bits

Therefore we define Nmin = (log2n)(In general, Nmin = (log n f(n)), for f(n) = (1))

In this way we have (worst case) one pointer out of (log2n) nodes

And hence o(n) bits for pointers

Our basic tree representation

We define Nmax (maximum block size) as follows

In case of block overflow we should be able to create a new block of size at least Nmin from the full block

In the worst case, the root of the block has its k children, all of them having a subtree of the same size

By choosing Nmax= (klog2n) we solve this problem

Our basic tree representation

The blocks cannot be as small as we would like

We support dynamic operations on the tree by:

Dividing the tree into blocks (we only need to rebuild a block upon updates)

Making these smaller trees dynamic (different to other approaches)

We represent the blocks using a dynamic DFUDS representation on top of Chan et al.’s [TALG, 2007] We solve the basic navigation inside blocks in

O(log N) = O(log k + loglog n) Insertions can be also handled in the same time We require overall 2n+o(n) bits

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Representing the blocks

We represent the symbols Sp labeling the arcs of the trie with a data structure for rank and select [GN, submitted] We compute childp(x, a) by

rank and select on Sp

childp(x, i) on p

childp(x, a) can be computed in O(log N log k / loglog N) = O((log2k + loglog n) / log(logk + log log n)) time

The space requirement is nlog k + o(nlog k) bits

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Representing the frontier of a block We need to indicate which nodes in a block have a

pointer to a child block

This can be done by using a bit vector However this would require 3n+o(n) bits overall for the tree

structure

We define array Fp storing the preorders of the nodes having a child pointer Since there are O(n/log2n) pointers, this requires o(n) bits

Representing the frontier of a block

Tp: (((())(()))((())))

Fp:

We must change allthe preorders in FP from this position 3 5 8 4

(3) (8) (16) (20) 3 6 8 4(3) (9) (17) (21)

O(log N) time

Array Fp is represented in differential form with a data structure for Searchable Partial Sums

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Representing inter-block pointers Pointers to child blocks

We store the pointers to child blocks in array PTRp Increasingly sorted according to the preorders of the nodes

in the frontier

Pointers to parent block In each block p we need a pointer to the representation of

the root of p in the parent block However the position of a node change upon updates A parent pointer is composed of

A pointer to the parent block q If p is the j-th child of q, then we store value j in p

Representing inter-block pointers

p,1 p,2 p,3 p,4

Tp: (((())(()))((())))

Fp:

PTRp:1 2 3 4

p

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Solving the basic operations

child(x, i): Look for preorder of x in Fp If we find it, follow child pointer to block q and apply childq

on the root of q Otherwise, use childp operation This takes O(log N) = O(log k + loglog n) time

child(x,a) is solved in the same way, but using childp(x,a) instead

parent(x): if x is the root of block, follow parent pointer to block p. Then apply parentp(x)

Solving the basic operations

Insert: We use the corresponding insertion operation on the block When a block p becomes full

1. Choose node z in block p2. Reinsert the nodes in the subtree of z in a new block q

(along with the corresponding part in the frontier of p)3. Delete the subtree of z from p

Total cost is O(log k + loglog n) amortized (if we are able to spend time proportional to the size of the subtree of z)

List of candidates subtrees in each block (o(n) bits overall)

Roadmap Succinct data structures

Static tree representations Dynamic tree representations

Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers

Solving operations Basic operations Specialized operations

Discussion

Solving specialized operations We can solve other operations by using this

representation degree(x) depth(x)

subtree-size(x) x

Sizep

Solving specialized operations We can solve other operations by using this

representation preorder(x)

is-ancestor(x, y)

lca(x, y)

Conclusions

We have defined a representation for dynamic k-ary trees requiring space close to the information-theoretical lower bound

We can profit from smaller alphabets o(log n) time for operations whenever log k = o(log n) In particular, O(loglog n) time for k=O(polylog(n)) Versus O(log n) time of Chan et al. for any alphabet size

We need extra o(nlog k) bits of space

Discussion

What happens if we have external pointers to the tree nodes?

Can we compress the dynamic DFUDS representation of blocks? (just as in [JSS, SODA’07])

Suffix links in little space? (assuming a suffix-closed trie)

top related