an improved succinct dynamic k-ary tree representation (work in progress) diego arroyuelo department...
TRANSCRIPT
An Improved Succinct Dynamic k-Ary Tree
Representation (work in progress)
Diego ArroyueloDepartment of Computer Science, Universidad de Chile
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Succinct data structures
In a k-ary tree each node has at most k children, each children labeled with a symbol in the set {1,…, k} (tries)
A succinct data structure requires space close to the information-theoretic lower bound
There are different k-ary trees with n nodes
Therefore, the information-theoretical lower bound is about bits if k is not a constant with respect to n
Succinct data structures
We are interested in succinct representation that can be navigated
We are interested in operations parent(x): parent of node x child(x, i): ith child of node x child(x, a): child of node x by label a depth(x) degree(x) subtree-size(x) preorder(x) is-ancestor(x, y): is node x an ancestor of node y? insertions (assume in the leaves) deletions (just for unary nodes and leaves)
The traditional representation of trees requires nlog n bits for (almost) each operation
Succinct tree representations
Succinct representations for static trees:
LOUDS [Jacobson, FOCS’89] Balanced Parentheses [MR, STOC’97] DFUDS [Benoit et al., Algorithmica 2005] xbw [Ferragina et al., FOCS’05] Ultra succinct trees [Jansson et al., SODA’07]
These must be rebuilt from scrath upon insertion or deletion of nodes
Succinct tree representations
The case of succinct dynamic trees has been studied only for binary trees
Munro, Raman, and Storm [SODA’01] 2n + o(n) bits parent, child in constant time Updates and subtree-size in O(polylog(n)) time
Raman and Rao [ICALP’03] 2n + o(n) bits Parent, child, preorder, and subtree-size in O(1) time Updates in O((loglog n)1+) amortized (O(log n loglog n) worst case)
k-ary trees: basic navigation in O(k) time (assume k is not a constant)
Dynamic balanced parentheses Chan et al. [TALG 2007] define a dynamic
representation for balanced parentheses
This can be used to represent a dynamic k-ary tree using O(n) bits of space
The time for all operations is related to the number of nodes in the tree rather than to k (O(log n) time)
This data structure cannot take advantage when k is asymptotically smaller than n (e.g., k = O(polylog(n)))
We look to achieve o(log n) time whenever log k=o(log u)
Motivations
This work is motivated by previous works on LZ-indices
Space-efficient construction of LZ-index [AN, ISAAC’05] Very preliminary representation: nlog n bits for pointers,
child operation and insertions in O(k) worst-case time
LZ-index on disk [AN, CPM’07] Basic operations in O(1) CPU time, yet nlog n bits are
needed for pointers and does not support insertions nor deletions
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Our basic tree representation
We incrementally divide the tree into disjoint blocks[MRS, RR, AN]
Every block represents a subtree of N nodes such that
Nmin ≤ N ≤ Nmax
We arrange these blocks in a tree by adding inter-block pointers (entire tree is tree of subtrees)
Our basic tree representation
frontier of the block
duplicated nodes
Our basic tree representation
We define Nmin (minimum block size) as follows
Inter-block pointers should require o(n) bits
Therefore we define Nmin = (log2n)(In general, Nmin = (log n f(n)), for f(n) = (1))
In this way we have (worst case) one pointer out of (log2n) nodes
And hence o(n) bits for pointers
Our basic tree representation
We define Nmax (maximum block size) as follows
In case of block overflow we should be able to create a new block of size at least Nmin from the full block
In the worst case, the root of the block has its k children, all of them having a subtree of the same size
By choosing Nmax= (klog2n) we solve this problem
…
Our basic tree representation
The blocks cannot be as small as we would like
We support dynamic operations on the tree by:
Dividing the tree into blocks (we only need to rebuild a block upon updates)
Making these smaller trees dynamic (different to other approaches)
We represent the blocks using a dynamic DFUDS representation on top of Chan et al.’s [TALG, 2007] We solve the basic navigation inside blocks in
O(log N) = O(log k + loglog n) Insertions can be also handled in the same time We require overall 2n+o(n) bits
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Representing the blocks
We represent the symbols Sp labeling the arcs of the trie with a data structure for rank and select [GN, submitted] We compute childp(x, a) by
rank and select on Sp
childp(x, i) on p
childp(x, a) can be computed in O(log N log k / loglog N) = O((log2k + loglog n) / log(logk + log log n)) time
The space requirement is nlog k + o(nlog k) bits
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Representing the frontier of a block We need to indicate which nodes in a block have a
pointer to a child block
This can be done by using a bit vector However this would require 3n+o(n) bits overall for the tree
structure
We define array Fp storing the preorders of the nodes having a child pointer Since there are O(n/log2n) pointers, this requires o(n) bits
Representing the frontier of a block
Tp: (((())(()))((())))
Fp:
We must change allthe preorders in FP from this position 3 5 8 4
(3) (8) (16) (20) 3 6 8 4(3) (9) (17) (21)
O(log N) time
Array Fp is represented in differential form with a data structure for Searchable Partial Sums
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Representing inter-block pointers Pointers to child blocks
We store the pointers to child blocks in array PTRp Increasingly sorted according to the preorders of the nodes
in the frontier
Pointers to parent block In each block p we need a pointer to the representation of
the root of p in the parent block However the position of a node change upon updates A parent pointer is composed of
A pointer to the parent block q If p is the j-th child of q, then we store value j in p
Representing inter-block pointers
p,1 p,2 p,3 p,4
Tp: (((())(()))((())))
Fp:
PTRp:1 2 3 4
p
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Solving the basic operations
child(x, i): Look for preorder of x in Fp If we find it, follow child pointer to block q and apply childq
on the root of q Otherwise, use childp operation This takes O(log N) = O(log k + loglog n) time
child(x,a) is solved in the same way, but using childp(x,a) instead
parent(x): if x is the root of block, follow parent pointer to block p. Then apply parentp(x)
Solving the basic operations
Insert: We use the corresponding insertion operation on the block When a block p becomes full
1. Choose node z in block p2. Reinsert the nodes in the subtree of z in a new block q
(along with the corresponding part in the frontier of p)3. Delete the subtree of z from p
Total cost is O(log k + loglog n) amortized (if we are able to spend time proportional to the size of the subtree of z)
List of candidates subtrees in each block (o(n) bits overall)
Roadmap Succinct data structures
Static tree representations Dynamic tree representations
Our basic dynamic tree representation Representing blocks Representing the frontier of blocks Representing inter-block pointers
Solving operations Basic operations Specialized operations
Discussion
Solving specialized operations We can solve other operations by using this
representation degree(x) depth(x)
subtree-size(x) x
Sizep
Solving specialized operations We can solve other operations by using this
representation preorder(x)
is-ancestor(x, y)
lca(x, y)
Conclusions
We have defined a representation for dynamic k-ary trees requiring space close to the information-theoretical lower bound
We can profit from smaller alphabets o(log n) time for operations whenever log k = o(log n) In particular, O(loglog n) time for k=O(polylog(n)) Versus O(log n) time of Chan et al. for any alphabet size
We need extra o(nlog k) bits of space
Discussion
What happens if we have external pointers to the tree nodes?
Can we compress the dynamic DFUDS representation of blocks? (just as in [JSS, SODA’07])
Suffix links in little space? (assuming a suffix-closed trie)