assignment 1 - research school of computer science...bucket sort •assumption: sorting numbers in...

53
Assignment 1 Due: 23 Aug 23:59 Grace period until 24 Aug 13:00. No submission will be accepted after the grace period

Upload: others

Post on 09-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Assignment 1• Due: 23 Aug 23:59• Grace period until 24 Aug 13:00. No submission

will be accepted after the grace period

Page 2: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Last 4 WeeksüThe ProblemüAnalysis of AlgorithmsüModel of ComputationüAsymptotic NotationsüDivide-and-Conquer and Recurrence AnalysisüRandomized Algorithm and Probabilistic AnalysisüEmpirical AnalysisüCorrectness

From today: Algorithm Design Techniques

Page 3: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Design Techniques• Helps in designing new algorithms• Brute Force: When stuck, start here and improve• Divide & Conquer: Discussed a bit in recurrence analysis• Randomized Algorithm: Discussed a bit in prob. analysis • Decrease & Conquer: Very similar to divide & conquer,

usually reduced to a smaller sub-problem (e.g., Insertion Sort)• Transform & Conquer• Dynamic Programming• Greedy• Iterative Improvement

Page 4: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Design Techniques• Helps in designing new algorithms• Brute Force: When stuck, start here and improve• Divide & Conquer: Discussed a bit in recurrence analysis• Randomized Algorithm: Discussed a bit in prob. analysis • Decrease & Conquer: Very similar to divide & conquer, but

reduce to 1 smaller sub-problem (e.g., Insertion Sort)• Transform & Conquer• Dynamic Programming• Greedy• Iterative Improvement

Page 5: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Design Techniques• Helps in designing new algorithms• Brute Force: When stuck, start here and improve• Divide & Conquer: Discussed a bit in recurrence analysis• Randomized Algorithm: Discussed a bit in prob. analysis • Decrease & Conquer: Very similar to divide & conquer, but

reduce to 1 smaller sub-problem (e.g., Insertion Sort)• Transform & Conquer• Dynamic Programming• Greedy• Iterative Improvement

Page 6: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Design Techniques• Helps in designing new algorithms• Brute Force: When stuck, start here and improve• Divide & Conquer: Discussed a bit in recurrence analysis• Randomized Algorithm: Discussed a bit in prob. analysis • Decrease & Conquer: Very similar to divide & conquer, but

reduce to 1 smaller sub-problem (e.g., Insertion Sort)• Transform & Conquer: Today + next 1-2 weeks• Dynamic Programming• Greedy• Iterative Improvement

Page 7: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

COMP3600/6466 – Algorithms Transform & Conquer 1[Lev sec. 6.1] + [CLRS ch. 8]

+ [CLRS sec. 12.1, 12.2. 12.3]

Hanna Kurniawati

https://cs.anu.edu.au/courses/comp3600/

Page 8: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Topics• What is Transform & Conquer in Algorithm Design?• Instance Simplification• Common method: Pre-sorting• A bit more about sorting• Representation Change• Supporting Data Structure: • Binary Search Tree • AVL Tree• Red-Black Tree• Heaps

• Examples• Problem Reduction

Page 9: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Today• What is Transform & Conquer in Algorithm Design?• Instance Simplification• Common method: Pre-sorting• A bit more about sorting• Representation Change• Supporting Data Structure: • Binary Search Tree • AVL Tree• Red-Black Tree• Heaps

• Examples• Problem Reduction

Page 10: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

What is it?• Transform & conquer design technique consists

of 2 stages:• Transform the problem• Solve the transformed problem• Of course, the goal is that the transformation

should make solving easier. • There’s 3 types of transformations:• Instance simplification• Representation change• Problem reduction

Page 11: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

TodayüWhat is Transform & Conquer in Algorithm Design?• Instance Simplification• Common method: Pre-sorting• A bit more about sorting• Representation Change• Supporting Data Structure: • Binary Search Tree • AVL Tree• Red-Black Tree• Heaps

• Examples• Problem Reduction

Page 12: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

What is Instance Simplification?• Transform a problem to a simpler version of the

same problem• Commonly used method: Pre-sorting• Finding solutions in a list of sorted numbers are

usually easier than in an unsorted one• This is why sorting is considered an important problem• Examples of how it’s used: Tutorial-1 Q4 (finding a

particular value), check if all elements in an array are unique (i.e., the array contains no duplicates)

Page 13: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Example: Check for uniqueness• Input: A[0, …, n]• Output: True if all numbers in A appear only once

in A, False otherwise• Brute force: Compare every pair• Worst case #comparisons (and hence, run-time): Θ(𝑛$)• Transform & conquer:

1. Sort: Some algorithms have run-time 𝑂(𝑛 log 𝑛)2. Check whether two consecutive elements are the

same or not: Θ(𝑛)

Page 14: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

TodayüWhat is Transform & Conquer in Algorithm Design?• Instance SimplificationüCommon method: Pre-sorting• A bit more about sorting• Representation Change• Supporting Data Structure: • Binary Search Tree • AVL Tree• Red-Black Tree• Heaps

• Examples• Problem Reduction

Page 15: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

A bit more about Sorting• So far, we’ve seen 4 different sorting methods• Insertion sort, Bubble sort, Merge sort,

Randomized Quick sort• All of the above are comparison-based• Meaning: The sorted order is based only on

comparisons between the input elements • Comparison-based sort requires Ω(𝑛 log 𝑛)

comparisons. Hence, their run-time complexity must be Ω(𝑛 log 𝑛) too. Therefore, those methods whose run-time complexity is Θ(𝑛 log 𝑛) is (close to) optimal

Page 16: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Why #comparisons = Ω(𝑛 log 𝑛)• Let’s first represent possible comparisons that might

happen in sorting any permutation of n numbers as a decision tree. • A decision tree here refers to a full binary tree where• Each non-leaf node represents comparison between a pair of

elements to be sorted• Each leaf node represents a particular permutation of the

numbers• Edges represent subsequent comparison• Note: Full binary tree means a tree where all nodes aside

from the leaves have 2 children

Page 17: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Why #comparisons = Ω(𝑛 log 𝑛)• Illustration of a decision tree to sort 3 elements [𝑎-, 𝑎$, 𝑎/]

<2,1,3>

1:2

2:3 2:3

≤ >

≤ > ≤ >

1:3

≤ >

<1,2,3> <3,2,1>1:3

≤ >

<1,3,2> <3,1,2> <2,3,1>

<…>are the indices of the sorted numbers (e.g., <3,2,1>means the sorted order should be [𝑎/, 𝑎$, 𝑎-])

Page 18: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Why #comparisons = Ω(𝑛 log 𝑛)• The decision tree that represents the problem of

sorting n elements would have n! leaf nodes• A path from the root to a leaf node of the tree

represents comparison for sorting a particular permutation of n numbers. • In the worst case, the number of comparisons would

be the same as the height of the tree.• Since there’s n! leaf nodes in the tree, the height of the

tree is log 𝑛! = ∑<=-> log 𝑖 = 𝑛 log 𝑛• Therefore, any comparison-based sorting requires Ω(𝑛 log 𝑛) in the worst case

Page 19: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Can we sort without comparing?• Yes, we can!• In fact, run-time complexity can be better

without comparison• Examples: • Counting Sort, Radix Sort, Bucket Sort

Page 20: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Counting Sort • Assumption: Sorting integers of a small range• If the #elements to be sorted (i.e., n) is large but

the range is small, there must be many elements with the same values• Counting sort uses this observation to sort the

elements by counting elements of the same values• The result of counting is used to get a rough idea of the

position of each element once the numbers are sorted

Page 21: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Counting Sort

Page 22: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Counting Sort: Illustration

• A

• Set C as #elements in A whose values are = C’s index

• Set C as #elements in A whose values are ≤ C’ index

Index 1 2 3 4 5 6Value 3 4 0 2 0 2

Index 0 1 2 3 4Value 2 0 2 1 1

Index 0 1 2 3 4Value 2 2 4 5 6

Page 23: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Counting Sort: Illustration

• Output B by assigning values of A at index based on values of C & update C• B:

• C:

• Step-2• B:

• C:

Index 1 2 3 4 5 6Value 2

Index 0 1 2 3 4Value 2 2 3 5 6

Index 1 2 3 4 5 6Value 0 2

Index 0 1 2 3 4Value 1 2 3 5 6

Page 24: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Counting Sort

Page 25: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Counting Sort: Running-time• The running-time complexity: Θ(𝑛 + 𝑘) where 𝑛 is the

input size and the input ranges from 0 to 𝑘• Since counting sort is often used for problems where 𝑘 is

much smaller than 𝑛, the complexity is often stated as Θ(𝑛)• However, if counting sort is applied to problems where the

range of numbers (i.e., (𝑘 − 0)) is much larger than 𝑛, then the complexity is Θ(𝑘). And if both 𝑛 and 𝑘 are similar, then the complexity should be stated as Θ(𝑛 + 𝑘)

• Note that elements with the same values appear in the output array in the same order as they appear in the input array. This property is called stable and is quite useful (e.g., in Radix Sort).

Page 26: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Radix Sort

• Sort d-digit numbers

• i=1 is the least significant digit –the last digit of the numbers • Usually, line 2 uses Counting Sort• Example:

• Complexity: Θ(𝑑(𝑛 + 𝑘)), assuming Counting Sort is used

325457657839431

Input431325457657839

i=1325431839457657

i=2325431457657839

i=3

Page 27: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort • Assumption: Sorting numbers in the interval [0,1)

that are drawn independently from a uniform distribution • The idea is to transform the problem of sorting n

elements into n problems of sorting k elements, where k is much smaller than n. • It starts by assigning similar elements to the same

bucket, and then sort the elements within the bucket

Page 28: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort

Page 29: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort: Illustration

0.780.170.390.260.720.940.210.120.230.68

123456789

10

A0123456789

B

0.78

0.17

0.390.26

0.72

Page 30: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort: Illustration

Page 31: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort: Running-time• 𝑇 𝑛 = Θ 𝑛 + ∑<=F>G-𝑂(𝑛<$)

• Average case:𝐸[𝑇 𝑛 ] = 𝐸[Θ 𝑛 + ∑<=F>G-𝑂 𝑛<$ ]

Using linearity of expectation, we have𝐸[𝑇 𝑛 ] = 𝐸[Θ 𝑛 ] + ∑<=F>G-𝐸[𝑂 𝑛<$ ]𝐸[𝑇 𝑛 ] = Θ 𝑛 + ∑<=F>G-𝑂 𝐸[𝑛<

$]

Cost of Insertion Sort in bucket-𝑖

Page 32: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort: Running-time• Now, we need to compute 𝐸[𝑛<$ ] where 𝑛< is the

expected number of elements in bucket- 𝑖• Use Indicator Random Variable

• 𝑋<,J = 𝐼 𝐴 𝑗 𝑖𝑛 𝑏𝑢𝑐𝑘𝑒𝑡 𝑖 = S1 𝐴 𝑗 𝑖𝑛 𝑏𝑢𝑐𝑘𝑒𝑡 𝑖0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

• Since the input are independent and uniformly distributed samples, 𝑃(𝑋<,J = 1) = -

>

Page 33: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort: Running-time• Now, we can compute: 𝐸[𝑛<

$ ] = 𝐸 ∑J=-> 𝑋<,J$

= 𝐸 ∑J=-> ∑Y=-> 𝑋<,J𝑋<,Y= 𝐸 ∑J=-> 𝑋<,J$ + ∑J=-> ∑-ZYZ>,Y[J 𝑋<,J𝑋<,Y= ∑J=-> 𝐸[𝑋<,J$ ] + ∑J=-> ∑-ZYZ>,Y[J 𝐸[𝑋<,J𝑋<,Y]

𝐸 𝑋<,J$ = 1$. ->+ 0$ 1 − -

>= -

>

𝐸 𝑋<,J𝑋<,Y = 𝐸 𝑋<,J 𝐸 𝑋<,J = ->->= -

>]

Page 34: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Non-comparison based Sorting: Bucket Sort: Running-time• Now, we can plug the expected values back to

𝐸[𝑛<$ ] = ∑J=-> 𝐸[𝑋<,J$ ] + ∑J=-> ∑-ZYZ>,Y[J 𝐸[𝑋<,J𝑋<,Y]

= ∑J=-> ->+ ∑J=-> ∑-ZYZ>,Y[J

->]

= 𝑛 ->+ 𝑛 𝑛 − 1 -

>]= 1 + (>G-)

>= 2 − -

>

• Returning to 𝑇 𝑛 :𝑇 𝑛 = Θ 𝑛 +∑<=F>G-𝑂(𝑛<$)

= Θ 𝑛 + 𝑛.𝑂 2 − ->

= Θ 𝑛

Page 35: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Notes-1• Comparison-based methods can be used to sort any

type of data, as long as there’s a comparison function• Its running-time complexity is asymptotically lower bounded

by 𝑛 log 𝑛 --that is, Ω(𝑛 log 𝑛), where n is the input size• Non comparison-based can run in linear time. But, it

assumes the data to be sorted is of a certain type.• Tips: More efficient algorithms can be developed by

focusing on a sub-class of a problem sub-class (in the sorting case, a particular type of input).

Page 36: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Notes-2• All the non-comparison based methods we

discussed are not in-place, which means it takes more memory• In-place Sorting Algorithm: Re-arrange the elements inside

the array, with only a constant number of input array being stored outside the array at any given time

• There are comparison-based algorithms that are not in-place (e.g., merge sort whose worst case run-time is Θ(𝑛 log 𝑛)• Insertion Sort, Bubble Sort, and Rand Quick Sort are in-

place. But, the worst case run-time of all these in-place algorithms are Θ(𝑛$)

Page 37: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Notes-2• Tips: Pay attention to Speed vs Memory tradeoff• Usually we can make an algorithm faster by using more

memory and vice versa• Tips: Randomization sometimes help speed-up

while keeping memory requirement low• Example: Rand Quick Sort allows average case running-

time to be Θ(𝑛 log 𝑛) and in practice tend to be close to this average case rather than the worst case Θ(𝑛$)

Page 38: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

TodayüWhat is Transform & Conquer in Algorithm Design?üInstance SimplificationüCommon method: Pre-sortingüA bit more about sorting• Representation Change• Supporting Data Structure: • Binary Search Tree • AVL Tree• Red-Black Tree• Heaps

• Examples• Problem Reduction

Page 39: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

What is Representation Change?• Recall that algorithms transform input to output. This

usually means the algorithm will perform certain operations on the input data, so as to transform them into the desired output• The way these data are represented in the algorithm

could often influence the efficiency of the algorithm• Representation Change technique transforms the way

the data are being represented, so that solving becomes more efficient• This is particularly useful when the data is dynamic, i.e., the

input might change (the input size might increase, decrease, or the content changes) during run-time

Page 40: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Abstract Data Structures• The way these data are represented are often called

Abstract Data Structure• Abstract Data Structures can be thought of as a

mathematical model for data representation. An Abstract Data Structure consists of two components:• A container that holds the data • A set of operations on the data. These operations are

defined based on their behavior, rather than their exact implementation

Page 41: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

What is Binary Search Tree (BST)?• An abstract data structure that represents the data as

a binary tree with the following property:• Each node in the tree contains the data itself, as key +

satellite data, and pointers to its left child, right child, and parent node, usually denoted as left, right, p• Key: Data used for BST operations. In BST, all keys are

distinct• Satellite data: Data carried around but not used in the

implementation• The root node is the only node with p = NIL

• Suppose 𝑥 is a node of the BST. If 𝑦 is the left sub-tree of 𝑥then, 𝑦. 𝑘𝑒𝑦 ≤ 𝑥. 𝑘𝑒𝑦. If 𝑦 is the right sub-tree of 𝑥 then, 𝑦. 𝑘𝑒𝑦 ≥ 𝑥. 𝑘𝑒𝑦.

• Provide operations Search, Min, Max, Successor, Predecessor, Insert, Delete in 𝑂 ℎ time, where ℎ is the height of the tree

Page 42: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Binary Search Tree (BST)• Its property allows us to easily output all the keys in a

sorted order in Θ(𝑛) via inorder tree walk

• What will the output be if we swap lines 2 & 3?• What will the output be if we swap lines 3 & 4?

Page 43: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The
Page 44: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Binary Search Tree: Search Operation• To search for a key whose value is 𝑘:

Page 45: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Binary Search Tree : Min & Max•Where’s the Min in a BST?•Where’s the Max in a BST?

Page 46: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Binary Search Tree: Successor & Predecessor

• Successor of a node 𝑥 is the node with the smallest key greater than 𝑥. 𝑘𝑒𝑦 –that is, the node visited right after 𝑥 in inorder tree walk• If 𝑥 has a non-empty right sub-tree, its successor is the

minimum of its right sub-tree• Otherwise, 𝑥’s successor is the lowest ancestor of 𝑥 whose

left child is also an ancestor of 𝑥• Predecessor of a node 𝑥 is the node with the largest

key smaller than 𝑥. 𝑘𝑒𝑦 –that is, the node visited right before 𝑥 in inorder tree walk• If 𝑥 has a non-empty left sub-tree, its predecessor is the

maximum of its left sub-tree• Otherwise, 𝑥’s successor is the lowest ancestor of 𝑥 whose

right child is also an ancestor of 𝑥

Page 47: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Binary Search Tree: Insertion• Suppose we want to insert a new node 𝑧,where 𝑧. 𝑘𝑒𝑦 = 𝑣, 𝑧. 𝑙𝑒𝑓𝑡 = 𝑁𝐼𝐿, 𝑧. 𝑟𝑖𝑔ℎ𝑡 = 𝑁𝐼𝐿• Traverse the tree until the right position of 𝑧 is found• Add 𝑧 to the tree

Page 48: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Binary Search Tree: Insertion

Page 49: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Binary Search Tree: Deletion• Suppose we want to delete 𝑧. There’s 3 cases:• When 𝑧 has no children: Remove 𝑧 and modify its parent to

replace 𝑧 with NIL• When 𝑧 has only 1 child: Elevate the child to take 𝑧’s position

in the tree by modify its parent to replace 𝑧 with 𝑧’s child• When 𝑧 has 2 children, consider its successor 𝑦 (since 𝑧 has

2 children, 𝑦 must be in its right subtree) : • If 𝑦 is 𝑧’s right child, replace 𝑧 with 𝑦• If 𝑦 is not 𝑧’s right child, first replace 𝑦 by its right child and

then replace 𝑧 with 𝑦

Page 50: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The
Page 51: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Some Notes on Run-Time Complexity• Aside from the tree walk, all operations

discussed before have run-time complexity of Θ(ℎ) where ℎ is the tree’s depth• The question is, what is ℎ with respect to the

input size 𝑛? If the tree is a perfect binary tree, ℎ = Θ log 𝑛 . But if we’re not careful, we have ℎ = 𝑛.• A binary tree is perfect if it is full and all leaves

node have the same depth • How to ensure that ℎ is as close to Θ log 𝑛 as

possible? Tree rebalancing

Page 52: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

TodayüWhat is Transform & Conquer in Algorithm Design?üInstance SimplificationüCommon method: Pre-sortingüA bit more about sorting• Representation Change• Supporting Data Structure:

üBinary Search Tree • AVL Tree• Red-Black Tree• Heaps

• Examples• Problem Reduction

Page 53: Assignment 1 - Research School of Computer Science...Bucket Sort •Assumption: Sorting numbers in the interval [0,1) that are drawn independently from a uniform distribution •The

Next Week: AVL Tree + Red-black Tree + Heaps