hierarchies & trees in sql by joe celko copyright 2008

42
Hierarchies & Trees in SQL Hierarchies & Trees in SQL by Joe Celko copyright 2008

Upload: brennan-stanley

Post on 14-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Hierarchies & Trees in SQLHierarchies & Trees in SQL

by

Joe Celko

copyright 2008

Page 2: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Trees in SQLTrees in SQL

Trees are graph structures used to represent – Hierarchies

– Parts explosions

– Organizational charts

Three methods in SQL– Adjacency list model

– Nested set model

– Path enumeration

Page 3: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Trees in SQL -2Trees in SQL -2

Trees are not hierarchies– Hierarchies have subordination– Kill your captain, you still have to

take orders from your general– Break an edge in a tree, and you

have two or more disjoint trees. This means an adjacency list model is

a tree, but not a hierarchy

Page 4: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Tree TerminologyTree Terminology

The Tree - term inology

grandchild leaf node

C hild S ib ling

C hild

Parent

R oot

Page 5: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Tree as GraphTree as Graph

A1 A2

A0 B0

R oot

Page 6: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Trees as Nested SetsTrees as Nested Sets

root

A0

A1 A2

B0

Page 7: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Graphs as TablesGraphs as Tables

Nodes and edges are not the same kind of things– Organizational chart & Personnel file

You should use separate tables for the structure and the elements – The structure table will be small (two

integers and a key) – You can put more than one structure

table on the same elements

Page 8: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Adjacency List ModelAdjacency List Model

node parent cost ================== Root NULL 2.50 A0 Root 1.75 A1 A0 2.00 A2 A0 3.50 B0 Root 4.00

Cost really should not be in the table, but most adjacency list tables mix nodes and edges (see Oracle’s Scott/Tiger sample database)

Most common method in use.

Page 9: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Adjacency List ModelAdjacency List Model

Programmers do not add constraints:

CHECK((SELECT COUNT(*) FROM Tree) -1 = SELECT COUNT(*) FROM ((SELECT child FROM Tree) UNION (SELECT parent FROM Tree)))

Page 10: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Path Enumeration ModelPath Enumeration Model

Tree node cost path ============== Root 2.50 ‘Root’ A0 1.75 ‘Root,A0’ A1 2.00 ‘Root,A0,A1’ A2 3.50 ‘Root,A0,A2’ B0 4.00 ‘Root,B0’

Cost really should not be in the table, but most path enumeration tables mix nodes and edges.

Paths are search with path LIKE ‘Root,%’predicates

Page 11: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Graph with TraversalGraph with Traversal

A 1le ft = 3

rig h t = 4

A 2le ft = 5

rig h t = 6

A 0le ft = 2

rig h t = 7

B 0le ft = 8

rig h t = 9

R ootle ft = 1

rig h t = 1 0

Page 12: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Nested Sets with NumbersNested Sets with Numbers

1 2 3 4 5 6 7 8 9 10

A0

A1 A2

B0Root

Page 13: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Nested Sets as NumbersNested Sets as Numbers

Split nodes and edges into two tables. You can join them back together later This could be personnel and Org chart

– Tree.node would be job titles

– Nodes would need job titles and the person holding it

Tree Nodes

node lft rgt node cost

============== ========

Root 1 10 Root 2.50

A0 2 7 A0 1.75

A1 3 4 A1 2.00

A2 5 6 A2 3.50

B0 8 9 B0 4.00

Page 14: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Problems with Adjacency listProblems with Adjacency list

You have to use cursors or self-joins to traverse the tree

Cursors are not a table -- their order has meaning -- Closure violation!

Cursors take MUCH longer than queries

Ten level self-joins are worse than cursors

Page 15: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Problems with Path EnumerationProblems with Path Enumeration

Path can get long in a deep tree Great for searching down the tree, but not

up the tree– SELECT * FROM Tree WHERE path LIKE ‘Root,%’; – SELECT * FROM Tree WHERE path LIKE ‘%,B0’;

Inserting and deleting nodes is complicated – Requires string manipulation to change all

the paths beneath the insertion or deletion point

Page 16: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Tree AggregatesTree Aggregates

Give me the total cost for all subtrees

– (root, 13.75) -- sum of every node in tree

– (A0, 7.25) -- sum of “A0” subtree – (A1, 2.00)– (A2, 3.50)

Dropping A2 would reduce all superior rows by 3.50,but would not change A1

Page 17: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Find Root of TreeFind Root of Tree

SELECT * FROM Tree WHERE lft = 1; It helps to have an index the lft

column The rgt value will be twice the

number of nodes in the tree. General rule: The number of nodes in

any subtree ((rgt -lft) + 1 )/ 2

Page 18: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Find All Leaf NodesFind All Leaf Nodes

SELECT * FROM Tree WHERE lft = rgt -1;

An index on lft will help A covering index on (lft, rgt) is even

better

Page 19: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Find Superiors of XFind Superiors of X

SELECT Super.*

FROM Tree AS T1, Tree AS Sup

WHERE T1.node = ‘X’

AND T1.lft BETWEEN Sup.lft

AND Sup.rgt; This is the most important trick in this

method The BETWEEN predicates preserve

subordination in the hierarchy One query for any depth tree

Page 20: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Find Subordinates of XFind Subordinates of X

SELECT Sub.*

FROM Tree AS T1, Tree AS Sub

WHERE T1.node = ‘X’

AND Sub.lft BETWEEN T1.lft

AND T1.rgt; This is the same pattern as finding

superiors

Page 21: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Find Depth of TreeFind Depth of Tree

SELECT T1.node, COUNT(T2.node) AS level

FROM Tree AS T1, Tree AS T2

WHERE T1.lft BETWEEN T2.lft

AND T2.rgt

GROUP BY T1.node; Count the containing nested sets for

levels The closer to the root a node is, the

greater the value of (rgt - lft)

Page 22: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Totals by Level in TreeTotals by Level in Tree

SELECT T1.node,

SUM(T2.cost) AS tot_level_cost

FROM Tree AS T1, Tree AS T2

WHERE T2.lft BETWEEN T1.lft

AND T1.rgt

GROUP BY T1.node; Uses any aggregate function the

same way

Page 23: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Delete a SubtreeDelete a Subtree

Remove subtree rooted at :my_nodeDELETE FROM Tree

WHERE lft BETWEEN

(SELECT lft

FROM Tree

WHERE node = :my_node)

AND (SELECT rgt

FROM Tree

WHERE node = :my_node);

Page 24: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Delete a Single NodeDelete a Single Node

Method one - promote a child to the parent’s prior position in the tree. Oldest son inherits family business

Method two- subordinate the entire subtree to the grandparent. Orphans go live with grandmother.

Page 25: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Delete & Promote Oldest - 1Delete & Promote Oldest - 1

Delete A0 node

A1 A2

A0 B0

R oot

Page 26: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Delete & Promote Oldest - 2Delete & Promote Oldest - 2

A2

A1 B0

root

Page 27: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Delete & Promote Subtree - 1Delete & Promote Subtree - 1

Delete A0 node

A1 A2

A0 B0

R oot

Page 28: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Delete & Promote Subtree - 2Delete & Promote Subtree - 2

A1 A2 B0

root

Page 29: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Closing gaps in nested set model -1Closing gaps in nested set model -1

Deleted nodes leave gaps in numbering of lft and rgt nodes.

Fill in gaps by sliding everyone over to the lft until there are no gaps.

UPDATE Tree

SET lft = lft - gap_size,

rgt = rgt - gap_size

WHERE rgt >= gap_start

OR lft >= gap_start;

Page 30: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Closing gaps in nested set model -2Closing gaps in nested set model -2

CREATE VIEW LftRgt(i)AS SELECT lft FROM Tree UNION ALL SELECT rgt FROM Tree;

UPDATE Tree SET lft = (SELECT COUNT(*) FROM LftRgt WHERE i <= lft; rgt = (SELECT COUNT(*) FROM LftRgt WHERE i <= rgt;

Page 31: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Inserting into a TreeInserting into a Tree

The real trick is numbering the subtree correctly before inserting it.

Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree.

The position of the subtree within the siblings of the new parent in the tree is another decision.

Page 32: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Inserting into a TreeInserting into a Tree

The real trick is numbering the subtree correctly before inserting it.

Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree.

The position of the subtree within the siblings of the new parent in the tree is another decision.

Page 33: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Inserting into a Tree -2Inserting into a Tree -2

If you are worried about having to update the tree structure too often, then use a bigger spread in the numbering.

At higher levels, use steps of 100,000, then 10,000 and so forth.

Most SQL products can handle DECIMAL(s,p) of 30 or more digits.

Since insertion are done on the right side of the siblings, you can re-organize the tree by sliding everyone to the left and closing the gaps.

Page 34: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Inserting into a Tree -4Inserting into a Tree -4

BA1 A2

A0Root

Slide everyone to the left

Page 35: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Creating a Tree -1Creating a Tree -1

If you want to have all the constraints for a proper hierarchy, then it is complicated.

CREATE TABLE Tree

(node_id INTEGER NOT NULL REFERENCES Nodes(node_id),

lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),

rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),

UNIQUE (lft, rgt), – redundant, but useful

CHECK (lft < rgt)

);

You can also declare node_id to be the PRIMARY KEY, but then one person cannot hold two jobs.

Page 36: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Creating a Tree -2Creating a Tree -2

Other needed constraints – no overlaps in the nodes

SELECT *

FROM Tree AS T1

WHERE EXISTS

(SELECT *

FROM Tree AS T2

WHERE T1.lft BETWEEN T2.lft AND T2.rgt

AND T1.rgt

NOT BETWEEN T2.lft AND T2.rgt;

Page 37: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Creating a Tree -3Creating a Tree -3

Other needed constraints – no disjoint nodes

SELECT * FROM Tree AS T1 WHERE EXISTS (SELECT * FROM Tree AS

T2 WHERE T1.lft < (SELECT rgt FROM Tree WHERE lft = 1));

Page 38: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Creating a Tree -4Creating a Tree -4

If you do not have triggers or CREATE ASSERTION, you can use an updatable view

CREATE VIEW GoodTree (node, i, j)ASSELECT T1.node, T1.i, T1.j FROM Tree AS T1WHERE NOT EXISTS (<overlaps>) AND NOT EXISTS (<disjoint>)WITH CHECK OPTION;

Page 39: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Converting an Adjacency Model into a Converting an Adjacency Model into a Nested Set ModelNested Set Model

Current best method is to load nodes into a tree in a host language, then do a recursive pre-order tree traversal to get the lft and rgt traversal numbers.

Adjacency list method does not order siblings; nested set model does automatically

Classic push down stack algorithm works You can keep both models in one table

with a column for the immediate superior

Page 40: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Converting a Nested Set Model into an Converting a Nested Set Model into an Adjacency ModelAdjacency Model

This actually pretty straight forward; you can put it into a single view

SELECT B.emp AS boss, P.emp

FROM OrgChart AS P

LEFT OUTER JOIN

OrgChart AS B

ON B.lft

= (SELECT MAX(lft)

FROM OrgChart AS S

WHERE P.lft > S.lft

AND P.lft < S.rgt);

Page 41: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Structure versus ContentsStructure versus Contents

Nested set model allows the structure of trees to be compared.

For each tree find the lft value of the root node of each tree

Make a canonical form and UNION ALL them EXISTS ( SELECT *

FROM ( SELECT (lft - lftmost), (rgt - lftmost)

FROM Tree1

UNION ALL

SELECT (lft - lftmost), (rgt - lftmost)

FROM Tree2) AS Both (lft, rgt)

GROUP BY Both.lft, Both.rgt

HAVING COUNT (*) =1 )

Page 42: Hierarchies & Trees in SQL by Joe Celko copyright 2008

Questions & AnswersQuestions & Answers

?