are p and q connected? - sjtubasics.sjtu.edu.cn/~dominik/teaching/2016-cs214/... · nd ops #long...

Are p and q connected?

Network connectivity

Yes, they are connected!

Network connectivity

I Problem:Given a set of nodes Nand a set of links between pairs of nodes L.Find connectivity for node p and node q.(p∈N,q∈N)

Real World Application

Kruskal

The union-find data structure

Zhengtian Xu Xiaoqing Geng Lihua QianRuxuan Zhang Chen Feng

Department of Computer Science and EngineeringShanghai Jiao Tong University

8th December 2016

Outline

Brief Introduction for Union-Find Data Structure

Improvement

Time Complexity Analysis

Union-Find data structure type

Goal.Support three operations on a set of elements:

I MAKE-SET(x).Create a new set containing only element x .

I FIND(x). Return a canonical element in the set containing x .

I UNION(x , y). Merge the sets containing x and y .

Union-Find example

6

a

2,7,9,3

b

4,5,8

c

Union-Find example

6

a

2,7,9,3

b

4,5,8

c

FIND(9) = 2

Union-Find example

6

a

2,7,9,3

b

4,5,8

c

FIND(9) = 2MAKE-SET(1)

Union-Find example

6

a

2,7,9,3

b

4,5,8

c

1

d


Union-Find example

6

a

2,7,9,3

b

4,5,8

c

1

d


UNION(2,4)

Union-Find example

6

a

2,7,9,3,4,5,8

e

1

d


UNION(2,4)

Union-Find data structure

Representation

Represent each set as a tree of elements.

I Each element has a parent pointer in the tree.

I The root serves as the canonical element.

I FIND(x). Find the root of the tree containing x .

I UNION(x , y). Make the root of one tree point to root ofother tree.

d

a c

e b

f

root

parent of e is c

Find operation

Representation

FIND(x). Find the root of the tree containing x .

d

a c

e b

g

f

Find operation

Representation


d

a c

e b

g

f FIND(g)

Find operation

Representation


d

a c

e b

g

f FIND(g)

FIND(d)

Union operation

I Maintain an integer rank for each node, initially 0.

I Link root of smaller rank to root of larger rank; if tie, increaserank of new root by 1.

Note. For now, rank = height.

Union operation



union(d, g)

d

c i j

g

k b

a h

e

rank = 2rank = 1

Note. For now, rank = height.

Union by rank



union(d, g)

d

c i j

g

k b

a h

e

rank = 2

Union by rank



union(d, g)

d

c

l

i j

g

k b

a h

e

rank = 2rank = 2

Union by rankI Maintain an integer rank for each node, initially 0.I Link root of smaller rank to root of larger rank; if tie, increase

rank of new root by 1.

union(d, g)

d

c

l

i j

g

k b

a h

e

rank = 3

Union by rank: analysis

Lemma 1. Using union by rank, for every root node r

size(r) ≥ 2rank(r)

Proof.[ by induction on number of links ]

I Base case: singleton tree has size 1 and rank 0.

I Inductive hypothesis: assume true after first i links.

Union by rank: analysis

Proof.

I Case 1. [ rank(r) > rank(s) ] or [ rank(r) < rank(s) ]

size ′(r) ≥ size(r) ≥ 2rank(r) = 2rank ′(r)

s

r

size = 8

size = 3

(rank = 2)

(rank = 1)

Union by rank: analysisProof.

I Case 2. [ rank(r) = rank(s) ]

size ′(r) = size(r) + size(s)

≥ 2× size(r)

≥ 2× 2rank(r)

= 2rank(r)+1

= 2rank ′(r)

s

r

size = 6

size = 3

(rank = 2)

(rank = 1)

Union by rank:analysis

Lemma 2. There are at most n2k elements of rank k .


Lemma 2. There are at most n2k elements of rank k .

Proof. According to Lemma 1, for node has rank k, its sizes areat least 2k . If the size of all elements is n. Obviously we can getLemma 2.


Theorem. Using Union by rank, any FIND operations takesO(log2n) time in the worst case, where n is the number ofelements; any UNION operations take constant time.


Theorem. Using Union by rank, any FIND operations takesO(log2n) time in the worst case, where n is the number ofelements; any UNION operations take constant time.

Proof.

I The running time of each operation is bounded by the treeheight.

I We can know that the height ≤ blog2nc

Outline


Improvement


Improvement

Observation

I It is the height of the tree that affects the running time.

I When we’re trying to find the root of the tree containing agiven node, we’re touching all the nodes on the path fromthat node to the root.

So...

I Why not make each of those just point to the root?

I That’s the idea of path compression!

Path compression

I Just after computing the root of the target node, set theparent of each examined node to point to that root.

a

b

d

g

i j

h

e f

c

height=4

find(j)

Path compression


a

b

d

g

i

h

e f

cj

Path compression


a

b

d

h

e f

cj g

i

Path compression


a

b

e f

cj g

i

d

h

height=2

Path compression: benefits

I The resulting tree is much flatter.I If the target node is very deep, path compression may

dramatically decrease the height of the tree.

I Speed up future operations on all the nodes on the path andon those referencing them, directly or indirectly.

Path compression: rank vs. height

I The rank of a tree does not change during path compression.

I ...but the height of a tree may decrease.

I Now it is possible that rank 6= height!


I Example: Apply the following operations on the forest below.I Union(a,g)I Find(f)I Find(j)

a

b

d

f

e

c

g

h

j

i

3

2 0

1 0

0

2

1 0

0


I Union(a,g)

I Find(f)

I Find(j)

a

b

d

f

e

c g

h

j

i

3

2 0

1 0

0

2

1 0

0


I Union(a,g)

I Find(f)

I Find(j)

a

b

e

d f c g

h

j

i

3

2 01

0

0 2

1 0

0


I Union(a,g)

I Find(f)

I Find(j)

a

b

e

d f c g

i

h j

3

2 01

0

0 2 1

0

0


I Union(a,g)

I Find(f)

I Find(j)

a

b

e

d f c g

i

h j

3

2 01

0

0 2 1

0

0

height(a)=26=rank(a)

Outline


Improvement



I Without path compression : O(log n) per find instruction.

I With path compression : O(log log n) per find instruction.


Lemma.

I There are at most n2k nodes with rank k .

I rank(parent(x)) > rank(x).

I rank(root) ≤ log2 n.


Definition.

I f (t) = 2× t

a

b rank=2

rank=6

rank(a) > f (rank(b))

happy node, long edge

a

b rank=2

rank=4

rank(a) ≤ f (rank(b))

sad node, short edge


Observation. In one find operation, at most log log n long edgesare traversed.

Proof.

r b · · · c d

rank ≤ log n rank ≥ 1

Observation. x is sad for at most rank(x) find operations.Proof.

I rank(parent(x)) > rank(x)

I rank(parent(x)) increases per find operation.

I After rank(x) find ops,rank(parent(x)) > rank(x) + rank(x) = f (rank(x))


time of all find ops

∑find ops #long edges

log log n ×m

∑find ops #short edges

∑e(#find operations when e is short)

∑x rank(x)

2n


cost of all find operations =∑

find ops

(#long edges + #short edges)

∑find ops

#long edges ≤ log log n ×#find ops

∑find ops

#short edges =∑

e

(#find operations when e is short)

≤∑

x

rank(x)

≤log n∑k=0

(k ×#rank − k − nodes)

≤∞∑

k=0

k × n

2k

≤ 2n


Conclusion. With path compression, time for one findop(#find operation ≥ n)is:

O(log log n + 2) = O(log log n)

Improvement.

I f (t) = 2t2 → log ∗n

I Ackerman Function→ α(n)

Thanks!

are p and q connected? - sjtubasics.sjtu.edu.cn/~dominik/teaching/2016-cs214/... · nd ops #long...

Documents