are p and q connected? - sjtubasics.sjtu.edu.cn/~dominik/teaching/2016-cs214/... · nd ops #long...
TRANSCRIPT
Are p and q connected?
Network connectivity
Yes, they are connected!
Network connectivity
I Problem:Given a set of nodes Nand a set of links between pairs of nodes L.Find connectivity for node p and node q.(p∈N,q∈N)
Real World Application
Kruskal
The union-find data structure
Zhengtian Xu Xiaoqing Geng Lihua QianRuxuan Zhang Chen Feng
Department of Computer Science and EngineeringShanghai Jiao Tong University
8th December 2016
Outline
Brief Introduction for Union-Find Data Structure
Improvement
Time Complexity Analysis
Union-Find data structure type
Goal.Support three operations on a set of elements:
I MAKE-SET(x).Create a new set containing only element x .
I FIND(x). Return a canonical element in the set containing x .
I UNION(x , y). Merge the sets containing x and y .
Union-Find example
6
a
2,7,9,3
b
4,5,8
c
Union-Find example
6
a
2,7,9,3
b
4,5,8
c
FIND(9) = 2
Union-Find example
6
a
2,7,9,3
b
4,5,8
c
FIND(9) = 2MAKE-SET(1)
Union-Find example
6
a
2,7,9,3
b
4,5,8
c
1
d
FIND(9) = 2MAKE-SET(1)
Union-Find example
6
a
2,7,9,3
b
4,5,8
c
1
d
FIND(9) = 2MAKE-SET(1)
UNION(2,4)
Union-Find example
6
a
2,7,9,3,4,5,8
e
1
d
FIND(9) = 2MAKE-SET(1)
UNION(2,4)
Union-Find data structure
Representation
Represent each set as a tree of elements.
I Each element has a parent pointer in the tree.
I The root serves as the canonical element.
I FIND(x). Find the root of the tree containing x .
I UNION(x , y). Make the root of one tree point to root ofother tree.
d
a c
e b
f
root
parent of e is c
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f FIND(g)
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f FIND(g)
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f FIND(g)
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f FIND(g)
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f FIND(g)
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f FIND(g)
FIND(d)
Find operation
Representation
FIND(x). Find the root of the tree containing x .
d
a c
e b
g
f FIND(g)
FIND(d)
Union operation
I Maintain an integer rank for each node, initially 0.
I Link root of smaller rank to root of larger rank; if tie, increaserank of new root by 1.
Note. For now, rank = height.
Union operation
I Maintain an integer rank for each node, initially 0.
I Link root of smaller rank to root of larger rank; if tie, increaserank of new root by 1.
union(d, g)
d
c i j
g
k b
a h
e
rank = 2rank = 1
Note. For now, rank = height.
Union by rank
I Maintain an integer rank for each node, initially 0.
I Link root of smaller rank to root of larger rank; if tie, increaserank of new root by 1.
union(d, g)
d
c i j
g
k b
a h
e
rank = 2
Union by rank
I Maintain an integer rank for each node, initially 0.
I Link root of smaller rank to root of larger rank; if tie, increaserank of new root by 1.
union(d, g)
d
c
l
i j
g
k b
a h
e
rank = 2rank = 2
Union by rankI Maintain an integer rank for each node, initially 0.I Link root of smaller rank to root of larger rank; if tie, increase
rank of new root by 1.
union(d, g)
d
c
l
i j
g
k b
a h
e
rank = 3
Union by rank: analysis
Lemma 1. Using union by rank, for every root node r
size(r) ≥ 2rank(r)
Proof.[ by induction on number of links ]
I Base case: singleton tree has size 1 and rank 0.
I Inductive hypothesis: assume true after first i links.
Union by rank: analysis
Proof.
I Case 1. [ rank(r) > rank(s) ] or [ rank(r) < rank(s) ]
size ′(r) ≥ size(r) ≥ 2rank(r) = 2rank ′(r)
s
r
size = 8
size = 3
(rank = 2)
(rank = 1)
Union by rank: analysisProof.
I Case 2. [ rank(r) = rank(s) ]
size ′(r) = size(r) + size(s)
≥ 2× size(r)
≥ 2× 2rank(r)
= 2rank(r)+1
= 2rank ′(r)
s
r
size = 6
size = 3
(rank = 2)
(rank = 1)
Union by rank:analysis
Lemma 2. There are at most n2k elements of rank k .
Union by rank:analysis
Lemma 2. There are at most n2k elements of rank k .
Proof. According to Lemma 1, for node has rank k, its sizes areat least 2k . If the size of all elements is n. Obviously we can getLemma 2.
Union by rank:analysis
Theorem. Using Union by rank, any FIND operations takesO(log2n) time in the worst case, where n is the number ofelements; any UNION operations take constant time.
Union by rank:analysis
Theorem. Using Union by rank, any FIND operations takesO(log2n) time in the worst case, where n is the number ofelements; any UNION operations take constant time.
Proof.
I The running time of each operation is bounded by the treeheight.
I We can know that the height ≤ blog2nc
Outline
Brief Introduction for Union-Find Data Structure
Improvement
Time Complexity Analysis
Improvement
Observation
I It is the height of the tree that affects the running time.
I When we’re trying to find the root of the tree containing agiven node, we’re touching all the nodes on the path fromthat node to the root.
So...
I Why not make each of those just point to the root?
I That’s the idea of path compression!
Path compression
I Just after computing the root of the target node, set theparent of each examined node to point to that root.
a
b
d
g
i j
h
e f
c
height=4
find(j)
Path compression
I Just after computing the root of the target node, set theparent of each examined node to point to that root.
a
b
d
g
i
h
e f
cj
Path compression
I Just after computing the root of the target node, set theparent of each examined node to point to that root.
a
b
d
h
e f
cj g
i
Path compression
I Just after computing the root of the target node, set theparent of each examined node to point to that root.
a
b
e f
cj g
i
d
h
height=2
Path compression: benefits
I The resulting tree is much flatter.I If the target node is very deep, path compression may
dramatically decrease the height of the tree.
I Speed up future operations on all the nodes on the path andon those referencing them, directly or indirectly.
Path compression: rank vs. height
I The rank of a tree does not change during path compression.
I ...but the height of a tree may decrease.
I Now it is possible that rank 6= height!
Path compression: rank vs. height
I Example: Apply the following operations on the forest below.I Union(a,g)I Find(f)I Find(j)
a
b
d
f
e
c
g
h
j
i
3
2 0
1 0
0
2
1 0
0
Path compression: rank vs. height
I Union(a,g)
I Find(f)
I Find(j)
a
b
d
f
e
c g
h
j
i
3
2 0
1 0
0
2
1 0
0
Path compression: rank vs. height
I Union(a,g)
I Find(f)
I Find(j)
a
b
e
d f c g
h
j
i
3
2 01
0
0 2
1 0
0
Path compression: rank vs. height
I Union(a,g)
I Find(f)
I Find(j)
a
b
e
d f c g
i
h j
3
2 01
0
0 2 1
0
0
Path compression: rank vs. height
I Union(a,g)
I Find(f)
I Find(j)
a
b
e
d f c g
i
h j
3
2 01
0
0 2 1
0
0
height(a)=26=rank(a)
Outline
Brief Introduction for Union-Find Data Structure
Improvement
Time Complexity Analysis
Time Complexity Analysis
I Without path compression : O(log n) per find instruction.
I With path compression : O(log log n) per find instruction.
Time Complexity Analysis
Lemma.
I There are at most n2k nodes with rank k .
I rank(parent(x)) > rank(x).
I rank(root) ≤ log2 n.
Time Complexity Analysis
Definition.
I f (t) = 2× t
a
b rank=2
rank=6
rank(a) > f (rank(b))
happy node, long edge
a
b rank=2
rank=4
rank(a) ≤ f (rank(b))
sad node, short edge
Time Complexity Analysis
Observation. In one find operation, at most log log n long edgesare traversed.
Proof.
r b · · · c d
rank ≤ log n rank ≥ 1
Observation. x is sad for at most rank(x) find operations.Proof.
I rank(parent(x)) > rank(x)
I rank(parent(x)) increases per find operation.
I After rank(x) find ops,rank(parent(x)) > rank(x) + rank(x) = f (rank(x))
Time Complexity Analysis
time of all find ops
∑find ops #long edges
log log n ×m
∑find ops #short edges
∑e(#find operations when e is short)
∑x rank(x)
2n
Time Complexity Analysis
cost of all find operations =∑
find ops
(#long edges + #short edges)
∑find ops
#long edges ≤ log log n ×#find ops
∑find ops
#short edges =∑
e
(#find operations when e is short)
≤∑
x
rank(x)
≤log n∑k=0
(k ×#rank − k − nodes)
≤∞∑
k=0
k × n
2k
≤ 2n
Time Complexity Analysis
Conclusion. With path compression, time for one findop(#find operation ≥ n)is:
O(log log n + 2) = O(log log n)
Improvement.
I f (t) = 2t2 → log ∗n
I Ackerman Function→ α(n)
Thanks!