g eneral i zed s earch t rees

25
Generalized Search Trees J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21 st Int’l Conf. On VLDB, Sep. 1995 Presented By Ihab Ilyas

Upload: wren

Post on 11-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

G eneral i zed S earch T rees. J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21 st Int’l Conf. On VLDB, Sep. 1995. Presented By Ihab Ilyas. Topics. Motivation. Database Search Trees. Generalized Search Tree. Properties. Methods. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: G eneral i zed  S earch  T rees

Generalized Search Trees

J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21st Int’l Conf.

On VLDB, Sep. 1995

Presented By Ihab Ilyas

Page 2: G eneral i zed  S earch  T rees

Topics

Motivation.Database Search Trees.Generalized Search Tree.Properties.Methods.Applications.

Page 3: G eneral i zed  S earch  T rees

Motivation

New applications (Multimedia, CAD tools, document libraries…etc.)

New Data types

Extending search trees to maximum flexibility

Page 4: G eneral i zed  S earch  T rees

Specialized Search TreesExample: Spatial Search Trees ( R-Trees)Problem: New Applications implies new tree

structure from scratchSearch Trees For Extensible Data TypesExample: Extending B+ to index any ordinal

dataProblem: Extending data but not the set of

queries supported.

Before GiST

Page 5: G eneral i zed  S earch  T rees

GiST

A third direction for extending search trees

Extensible both in data types supported and in the queries applied on this data.

Allows new data types to be indexed in a manner that supports the queries natural to the data type.

Page 6: G eneral i zed  S earch  T rees

GiST (Cont.)

Unifies previously disparate structures for currently common data types.Examples: B+ and R trees can be

implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications.

Page 7: G eneral i zed  S earch  T rees

Database Search Trees

Canonical rough picture of database search tree

Leaf nodes (Linked List)

Internal Nodes

Key1 Key2 ….

Page 8: G eneral i zed  S earch  T rees

Search Trees (cont.)

Search Key: A search key may be arbitrary predicate that holds for each datum below the key.

Search Tree: A hierarchy of categorizations, in which each categorization holds for all data stored under it in the hierarchy.

Page 9: G eneral i zed  S earch  T rees

Generalized Search Tree

Definition: A GiST is a balanced multi-way tree of variable fan-out between kM and M Where k is the fill factor.

With the exception of the root node that can have fan-out from 2 to M.

212

kM

Page 10: G eneral i zed  S earch  T rees

GiST (Cont.)

Leaf nodes: (p,ptr)p: Predicate used as a search key.ptr: the identifier of some tuple of the database.

Non-leaf nodes: (p,ptr)p: Predicate used as a search key.ptr: Pointer to another tree node.

Page 11: G eneral i zed  S earch  T rees

Properties

Every node contain between kM and M unless it is the root.For each index entry (p,ptr) in a leaf node, p holds for the tuple For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr.All leaves appear on the same level.

Page 12: G eneral i zed  S earch  T rees

Note on Properties

…. (p,ptr) …..

…. (p’,ptr’) …..

…. (p1,ptr1) ….. …. (p2,ptr2)

p holds for p1,p2

p’ holds for p1,p2

p’ p Not Required

The ability of orthogonal classification.. Recall R-Tree

Page 13: G eneral i zed  S earch  T rees

GiST Methods

Key Methods: the methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree.Tree Methods: Provided by the GiST, and may invoke the required key methods.

Page 14: G eneral i zed  S earch  T rees

Key Methods

Consistent(E,q): False if p^q guaranteed unsatisfiable, true otherwise.Union(P): returns predicate r that holds for all predicates in PCompress(E): returns (p’,ptr).Decompress(E): returns (r,ptr) where pr. This a lossy compression as we do not require p r

E is an entry of the form (p,ptr) , q is a query, P a set of entries

Page 15: G eneral i zed  S earch  T rees

Key Methods (Cont.)

Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2).PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here.

Page 16: G eneral i zed  S earch  T rees

Tree Methods

Search: Controlled by the Consistent Method.Insert: Controlled by the Penalty and PickSplit.Delete: Controlled by the Consistent

Page 17: G eneral i zed  S earch  T rees

ExampleNew (q,ptr)

Penalty = m Penalty = nm < n

Penalty =i Penalty = j j < i

Full.. Then split according to PickSplit

(p,ptr) (p,ptr) (p,ptr)

(p,ptr) (p,ptr)

(p,ptr) (p,ptr)

R

(p,ptr) (p,ptr) (p,ptr) (p,ptr)(q,ptr) (p,ptr) (p,ptr)

New (q,ptr)

Page 18: G eneral i zed  S earch  T rees

Applications

GiST Over Z (B+ Trees)

GiST Over Polygons in R2 (R Trees)

Page 19: G eneral i zed  S earch  T rees

B+ Trees Using GiST

p here is on the form Contains([xp,yp),v)Consistent(E,q) returns true if If q= Contains([xq,yq),v): (xp<yq)^(yp>xq) If q= Equal (xq,v): xp xq <yp

Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).

Page 20: G eneral i zed  S earch  T rees

B+ Trees Using GiST (Cont.)

Penalty(E,F) If E is the leftmost pointer on its node, returns

MAX(y2-y1,0) If E is the rightmost pointer on its node, returns

MAX(x1-x2,0) Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0)

PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.

2P

Page 21: G eneral i zed  S earch  T rees

B+ Trees Using GiST (Cont.)

Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x Decompress(E) if E is the leftmost key on a non-leaf node let x= -

otherwise let x=E.p.x If E is the rightmost key on a non-leaf node let y= . If

E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1

Page 22: G eneral i zed  S earch  T rees

R - Trees Using GiST

The key here is in the form (xul,yul,xlr,ylr)

Query predicates are: Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))

Returns true if (xul1 xul2) ^( yul1 yul2) ^ ( xlr1 xlr2) ^ ( ylr1 ylr2)

Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))Returns true if (xul1 xlr2) ^( yul1 ylr2) ^ ( xul2 xlr1) ^ ( ylr1 yul2)

Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2))Returns true if (xul1= xul2) ^( yul1= yul2) ^ ( xlr1= xlr2) ^ ( ylr1= ylr2)

Page 23: G eneral i zed  S earch  T rees

R – Trees Using GiST(Cont.)

Consistent(E,q) p contains (xul1,yul1,xlr1,ylr1), and q is either

Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2)Returns true if Overlaps ((xul1,yul1,xlr1,ylr1),

(xul2,yul2,xlr2,ylr2))

Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.

Page 24: G eneral i zed  S earch  T rees

R – Trees Using GiST (Cont.)

Penalty(E,F)Compute q= Union(E,F) and return

area(q) – area(E.p)

PickSplit(P)Variety of algorithms are provided to best

split the entries in a over-full node.

Page 25: G eneral i zed  S earch  T rees

R – Trees Using GiST (Cont.)

Compress(E)Form the bounding rectangle of E.p

Decompress(E)The identity function