lecture04 range searching
TRANSCRIPT
-
Orthogonal Range Searching
Lecture 4, CS 631100
Sheung-Hung [email protected]
Fall 2011National Tsing Hua University (NTHU)
Lecture 4, CS 631100 Orthogonal Range Searching 1
-
Orthogonal Range Searching
Outline
ReferenceTextbook chapter 5Mounts Lectures 17 and 18
Problem: querying a databaseSolution in one dimensionData structure in IR2: range treesExtension to higher dimensionslog n factor improvement
Lecture 4, CS 631100 Orthogonal Range Searching 2
-
Orthogonal Range Searching
An Example of Application on Database
A database in a bank records transactionsA query: find all the transactions such that
The amount is between $ 1000 and $ 2000It happened between 10:40am and 11:20am
Lecture 4, CS 631100 Orthogonal Range Searching 3
-
Orthogonal Range Searching
An Example of Application on Database
A database in a bank records transactionsA query: find all the transactions such that
The amount is between $ 1000 and $ 2000It happened between 10:40am and 11:20am
Geometric interpretation
Lecture 4, CS 631100 Orthogonal Range Searching 3
-
Orthogonal Range Searching
Query problems
Assume n is the total number of transactions in thedatabaseWe will show how to build a data structure in O(n log n)time that allows to perform this type of queries inO(k + log n) time where k is the size of the output (thenumber of transactions that are reported)The data structure is built only once, then a large numberof queries can be answered quicklyO(n log n) is the preprocessing timeO(k + log n) is the query time
Lecture 4, CS 631100 Orthogonal Range Searching 4
-
Orthogonal Range Searching
Boxes
2dboxAlso known asrectangleParallel to coordinateaxis
3dbox [0, 3] [0, 2.5] [0, 2]Generalize to any dimensionAlgorithmic problems withboxes are relatively easy
Lecture 4, CS 631100 Orthogonal Range Searching 5
-
Orthogonal Range Searching
Boxes
2dboxAlso known asrectangleParallel to coordinateaxis
3dbox [0, 3] [0, 2.5] [0, 2]Generalize to any dimensionAlgorithmic problems withboxes are relatively easy
Lecture 4, CS 631100 Orthogonal Range Searching 5
-
Orthogonal Range Searching
Boxes
2dboxAlso known asrectangleParallel to coordinateaxis
3dbox [0, 3] [0, 2.5] [0, 2]Generalize to any dimensionAlgorithmic problems withboxes are relatively easy
Lecture 4, CS 631100 Orthogonal Range Searching 5
-
Orthogonal Range Searching
Problem statement
Let P be a set of n points in IRd
We assume d = O(1)Preprocess P so as to answer queries of the type
Input: (a1, b1, a2, b2, . . . ad, bd)Output: P ([a1, b1] [a2, b2] [ad, bd])
We denote k = |P ([a1, b1] [a2, b2] [ad, bd])|
Lecture 4, CS 631100 Orthogonal Range Searching 6
-
Orthogonal Range Searching One Dimensional Case (d=1)
One Dimensional Case (d=1):
Using BBST
Lecture 4, CS 631100 Orthogonal Range Searching 7
-
Orthogonal Range Searching One Dimensional Case (d=1)
Problem statement
P is a set of real numbersQueries: find all the points in P that are between a and bData structure:
Balanced Binary Search TreePreprocessing time: (n log n) time to build a BBSTSpace usage: (n)
Query time: (k + log n) time. How?
Lecture 4, CS 631100 Orthogonal Range Searching 8
-
Orthogonal Range Searching One Dimensional Case (d=1)
Answering a query
Algorithm Report(T , a, b)Input: a BBST T storing P , an interval [a, b]Output: P [a, b]1. if T = NULL2. then return3. xvalue stored at the root of T4. if a
-
Orthogonal Range Searching One Dimensional Case (d=1)
Analysis of query time
Report left path, rightpath, vsplit and subtreesin between.Length of
path from root to vsplitleft pathright path
All lengths are O(log n)Sum of the sizes of redsubtrees: kQuery time: O(k + log n)
Lecture 4, CS 631100 Orthogonal Range Searching 10
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Two Dimensional Case (d=2):
Using range tree
Lecture 4, CS 631100 Orthogonal Range Searching 11
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Introduction
A set P of n points in IR2
Query: given (a1, b1, a2, b2),find all points (x, y) from P in rectangle [a1, b1] [a2, b2].Results presented in this section
(n log n) preprocessing time(n log n) space usage(k + log2 n) query time
Query time will be slightly improved in the last section
Lecture 4, CS 631100 Orthogonal Range Searching 12
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Canonical sets
First store T in a BBST using the xcoordinates as keysWe associate each node v of T with a canonical set Cvcontaining points in P stored in the subtree rooted at v.
Lecture 4, CS 631100 Orthogonal Range Searching 13
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Range trees in IR2
Each canonical set Cv is stored in a BBST Tvusing the ycoordinates as keys.Tv is called the canonical tree at node v.
We make the query through TWO steps:1st on x-coordinates, & 2nd on y-coordinates(as shown in the following slides).
Lecture 4, CS 631100 Orthogonal Range Searching 14
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Step 1: Querying x-coordinates
First make the query with range [a1, b1] on x-coordinatesLet P = P ([a1, b1] (,))Let P be the set of points on the right path and the leftpath (when searching for a1 and b1)We partition P \ P into c canonical subsets
Thus P = P C1 C2 . . . Cc
Lecture 4, CS 631100 Orthogonal Range Searching 15
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Partitioning P
After we make the query with range [a1, b1] on x-coord.:We take the nodes on the left path and the right path,which gives P .For each node on the left path,select canonical tree Ti of its right child, (gives some Ci).For each node on the right path,select canonical tree Ti of its left child, (gives some Ci).
It takes O(log n) time (height of the BBST).There are c = O(log n) canonical sets in our partition.
Lecture 4, CS 631100 Orthogonal Range Searching 16
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Step 2: Querying y-coordinates
p P check if p [a1, b1] [a2, b2], and report it if it is.For all i, use interval [a2, b2] to perform a 1-dim. searchquery in Ci using canonical tree Ti.The union of all these results gives P ([a1, b1] [a2, b2])
Analysis of query time:
Let ki = no. of points reported from Tici=1 ki k
Query time:
ci=1
O(log n + ki) = c log n +
ci=1
ki = O(log2 n + k)
Lecture 4, CS 631100 Orthogonal Range Searching 17
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Analysis of total query time
CiTi
canonicaltree
Query on x-coordinates on T :Obtain P (points on left & right paths)& canonical trees Ti.It takes O(log n) time.
Query on y-coordinates on Ti:It takes O(log2 n + k) (refer to previous slide).
Total query time = O(log n) +O(log2 n+k) = O(log2 n+k).
Lecture 4, CS 631100 Orthogonal Range Searching 18
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Space complexity (Proof 1)
A point p belongs to all the canonical sets in the path fromthe vertex of T that stores p to the root (and only thesecanonical sets)Thus p lies in O(log n) canonical setsHence
vT|Cv| = O(n log n),
where Cv = the canonical set at node v.The memory space used is O(n log n).Actually, it is (n log n).
Why?
Lecture 4, CS 631100 Orthogonal Range Searching 19
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Space complexity (Proof 2)
n4
. . . . . . . . . . . .
log nlevels
n
n2
n2
n4
n4
n4
n
2(n2 ) = n
4(n4 ) = n
n(1) = n
Total = (n log n)
. . .
TTi
Lecture 4, CS 631100 Orthogonal Range Searching 20
-
Orthogonal Range Searching Two Dimensional Case (d=2)
Preprocessing time
Tv can be build in O(|Cv| log |Cv|) timeHence the range tree can be built in timev
|Cv| log |Cv| log nv
|Cv| = log nO(n log n) = O(n log2 n)
We can do better ...Compute the Tv s from leaves to rootComputing Tv is merging two sorted sequencesIt takes O(|Cv|) timeOverall, we can build the range tree in time
v
|Cv| = (n log n)
Lecture 4, CS 631100 Orthogonal Range Searching 21
-
Orthogonal Range Searching Range trees in higher dimensions
Range trees in higher dimensions
Lecture 4, CS 631100 Orthogonal Range Searching 22
-
Orthogonal Range Searching Range trees in higher dimensions
Idea
We assume d > 1 and d = O(1).We want to perform range searching in IRd.
We still build T with respect to the x1coordinate.For each canonical set of T we build a (d 1)dimensionalrange searching data structure using coordinates(x2, x3, . . . xd).
To answer a ddimensional queryFind the canonical trees of T associated with [a1, b1]Make a d 1dimensional query on each canonical treerecursively, using [a2, b2] [a3, b3] . . . [ad, bd]
Lecture 4, CS 631100 Orthogonal Range Searching 23
-
Orthogonal Range Searching Range trees in higher dimensions
Analysis
Query time: O(logd n + k)Due to d nested levels in d-dim. range tree,Searching for d levels takes O(logd n) time.Reporting all points inside the query range takes O(k) time.
Space complexity: O(n logd1 n)By induction on d (See next slide ...)
Preprocessing time: O(n logd1 n)Compute the Tv s from leaves to rootAs the size of the range tree is O(n logd1 n),building the whole range tree takes O(n logd1 n).
Lecture 4, CS 631100 Orthogonal Range Searching 24
-
Orthogonal Range Searching Range trees in higher dimensions
Space complexity (Proof by Induction)
Suppose (d 1)-dim. range tree has size of O(n logd2 n).
. . . . . . . . . . . .
log nlevels
O(n logd2 n)
O(n2 logd2 n
2 )
O(n logd2 n)
nO(1) = O(n)
. . .
TTi
2O(n2 logd2 n
2 )
= O(n logd2 n2 )
4O(n4 logd2 n
4 )
= O(n logd2 n4 )
O(n4 logd2 n
4 )
Then size of d-dim. range tree isO(n logd2 n) + O(n logd2 n2 ) + O(n log
d2 n4 ) + . . . + O(n)
= log n O(n logd2 n) = O(n logd1 n).Lecture 4, CS 631100 Orthogonal Range Searching 25
-
Orthogonal Range Searching Improved range trees
Improved range trees:
Fractional cascading
Lecture 4, CS 631100 Orthogonal Range Searching 26
-
Orthogonal Range Searching Improved range trees
Motivation
In IR2 the query time of range trees is (k + log2 n)For comparison based algorithms,(k + log n) is a lower bound.
Can we do better to achieve the lower bound?Yes, well then show how to obtain (k + log n) optimalquery time.
Lecture 4, CS 631100 Orthogonal Range Searching 27
-
Orthogonal Range Searching Improved range trees
Step 1: Querying x-coordinates (Same as before:)
Make the query with range [a1, b1] on x-coordinates.
Ci Cj
Take the nodes on the left path and the right path.Select canonical set Ci at right child of a node on left path;Select canonical set Cj at left child of a node on right path.
It takes O(log n) time (height of the BBST T ).Let {C1, C2, . . . , Cc} = canonical sets selected,where c = O(log n).
Lecture 4, CS 631100 Orthogonal Range Searching 28
-
Orthogonal Range Searching Improved range trees
Step 2: Querying y-coordinates (Modified)
When processing a query (a1, b1, a2, b2), we searchcanonical trees Tv, always with two keys a2 and b2.For each such tree, we spend O(log n) searching time.Main Idea: As Cv.left and Cv.right are subsets of Cv,We keep pointers between nodes of Tv and nodes of Tv.left& Tv.right that keep same key, or next larger key.
Av
Av.left Av.right
Thus after performing search on a2 or b2 in Tv, we canperform search on a2 or b2 in Tv.left & Tv.right in O(1) time.
Lecture 4, CS 631100 Orthogonal Range Searching 29
-
Orthogonal Range Searching Improved range trees
Step 2: Querying y-coordinates (Modified)
Minor Idea: Replacing each canonical tree Ti by acanonical array Ai for canonical set Ci:
Make a search for key a2 in array Ai;Starting from a2, walk along array Ai until b2 is exceeded.
Av
Av.left Av.right
Lecture 4, CS 631100 Orthogonal Range Searching 30
-
Orthogonal Range Searching Improved range trees
Step 2: Querying y-coordinates (Modified)
First make a binary search for a2 in Aroot,which takes O(log n) time.
Ci
Ajv
u w
AwAroot
AuAv
Cj
Ai
By following pointer links, we can search a2 in a canonicalarray Ai in O(1) time.Starting from a2, walk along array Ai (& reporting them)until b2 is exceeded.
Lecture 4, CS 631100 Orthogonal Range Searching 31
-
Orthogonal Range Searching Improved range trees
Improving d-dim. range trees
Hence we can answer 2-dim. range query in O(log n + k)optimal time.This technique is known as fractional cascading.
By induction, it also improves by a factor O(log n) theresults in d > 2 (by using canonical arrays at the last level,and the linking pointers).Hence range trees with fractional cascading in d 2 yield
Query time: O(k + logd1 n) (improved by a O(log n) factor)Space usage: O(n logd1 n) (same as before)Preprocessing time: O(n logd1 n) (same as before)
Lecture 4, CS 631100 Orthogonal Range Searching 32
-
Orthogonal Range Searching Improved range trees
Remarks on 2-dim. improved range trees
O(log n + k) query time and O(n log n) preprocessing timeare optimal.But space complexity is NOT optimal.O(n log n/ log logn) space is possible in 2 dimensions withthe same query time, and this is optimal.(not covered in this course)
Lecture 4, CS 631100 Orthogonal Range Searching 33
-
Orthogonal Range Searching Improved range trees
Concluding remarks
Range trees:simplenearly optimal
Spatial databases mainly use Rtreesnot covered in this coursegood in practice with real data-setsbut no performance guarantee(no good worst case bound on the query time)
Lecture 4, CS 631100 Orthogonal Range Searching 34
-
Orthogonal Range Searching Next Lecture
Summary of this lecture:Orthogonal Range Searching
2-dim. range treesd-dim. range treesFractional cascading
Next lecture:Segment Trees and Interval Trees
Segment TreesInterval Trees
Lecture 4, CS 631100 Orthogonal Range Searching 35
Orthogonal Range SearchingOne Dimensional Case (d=1)Two Dimensional Case (d=2)Range trees in higher dimensionsImproved range treesNext Lecture