data structure and algorithmskeddiyan.com/files/dsa/lecture1.pdf · data structure and algorithms...
TRANSCRIPT
Data Structure and Algorithms
Yan Ke
China Jiliang University
Course Mark Distribution
--Final Exam: 50%
--Take-home assignments: 20%
--Labs: 20%
-- Attendance: 10%
Course Syllabus
--In total 10-11 topics, including (possibly):
• Brute Force Algorithms
• Divide and Conquer Algorithms
• Lower Bound Sorting
• Dynamic Programming
• Greedy Algorithms
• Heaps
• NP Completeness
• 3DM
• Iterative Improvement
• Branch and Bound
• Final Answer sketches
Class arrangement -- Class webpage: http://keddiyan.com/Algo.html
-- Please check the webpage regularly for class arrangement updates.
--Lectures only on Wednesday!
-- Monday/Friday classes are shifted to every Wednesday 1:30 pm to 3:05 pm Labs.
--Monday and Friday classes are mainly for assignments and labs (Please do it on your own)
--Lab arrangement: every week (week 2- 16) from 1:30 pm to 3:05 pm at Cyber North 304 (Saibo North 304).
-- Assignments and labs can be done under/not under my supervision (at home or in my office)
Algorithms
An Algorithm is a sequence of unambiguous instructions forsolving a problem, i.e., for obtaining a required output forany legitimate input in a finite amount of time.
– p. 5/55
Algorithms
Input, what are the legitimate input
Output, what are the valid outputs based on the input;most often unique output depending on the input.
Unambiguous and Executable: The steps (instructions)are precisely and unambiguously stated; Instructionscan be “executed", followed by a “machine”, clear howto implement them
*Deterministic: Execution of each step is unique andgives a result determined only by the input to the step.[Randomness, nondeterminism, ..., Quantum]
*Termination: The algorithm terminates after finitelymany instructions are executed
– p. 6/55
Generality: Applicable to a wide range of inputs
Correctness:
Performance/Efficiency/Complexity:Time and Space (Memory)(Depends on input size and content, machine speed,....)
Number of Basic Operations (correlated with time)
Asymptotic Analysis
Worst Case
Average Case
– p. 7/55
Pseudocode
We can give a program. This is usually unambiguous andclear.However, this is tedious. Sometimes difficult to understand.
Pseudocode.Slightly informal.Still precise enough to understand exactly what steps aretaken.
– p. 8/55
Example: GCD (Greatest Common Divisor)Input: Two integers, at least one of them 6= 0Output: greatest natural number > 0 which divides both ofthem
Notation: N denotes the set of natural numbers 0, 1, 2, . . .Notation: Z denotes the set of integers. . . ,−2,−1, 0, 1, 2, . . .Notation: a mod b denotes the remainder when a is dividedby b.
– p. 9/55
GCD(m,n) = GCD(m,−n)GCD(m, 0) = m, for m > 0
GCD(m,n)m,n are natural numbers > 0.Let t = min(m,n).Loop
If t divides both m and n, then output tElse t = t− 1
Forever
– p. 10/55
Euclid’s Algorithm
Uses: If m = qn+ r, where m,n, q, r are natural numbers,then gcd(m,n) = gcd(n, r)
GCD(m,n)(* m,n are natural numbers, at least one of them > 0 *)
While n > 0 Dor = m mod nm = nn = r
EndWhileOutput m
– p. 11/55
Why does it work?
If m < n, then first iteration will swap them.
Each step after that will make n smaller than previousround.
Actually it works ‘much faster’ than going down by 1 ineach iteration.
On termination, m is the GCD.
– p. 12/55
Algorithms
Possible inputs need to be specified clearly
Each step must be clearly defined and executable
Many possible algorithms may exist
Algorithm Efficiency
Data Structures needed for doing the operations
– p. 13/55
Algorithms
Understand the problem
Design a method to solve
Convert to an algorithm, step by step process to solvethe problem
Choose Data Structures to implement the algorithm
Proving correctness of the algorithm/program
Analyzing the algorithm
Time, Space, Simplicity, Generality
Program: Coding the algorithm
– p. 14/55
Some Important Problems
Sorting: Stable sorting, In place sorting
Searching
String Processing
A string is sequence of symbols: aba, a4bca, . . .
string matching, sub-string searching, etc
Graph problems: graph traversal, shortest path, etc.
Geometric problems: closest pair, convex hull, etc.
Numerical problems: real numbers, approximations, etcfrom engineering/science
– p. 15/55
Design Strategies
Brute Force
Divide and Conquer (decrease and conquer, transformand conquer)
Greedy technique
Dynamic Programming
Iterative improvement
Backtracking
Branch and bound
P vs NP
– p. 16/55
Data Structures
Background Information: You should know about these.
Way of organizing Data for easy manipulation
Arrays, Linked Lists (singly linked list, doubly linked list),linear list, Stacks, Queues, Priority Queues, heaps
Static memory allocation/dynamic memory allocation
Directed/Undirected Graphs,
Trees: rooted, ordered, binary trees
Sets, Dictionaries
– p. 17/55
Data Structures
Array is a sequence of n items (usually of the sametype): refered to by A(0) to A(n− 1) or A(1) to A(n)depending on context.
Linked list is a sequence of zero of more items. Theymay have variable number of elements/members.
Constructed using pointers.
Starting element pointer.
Next element pointer from each element. Nullpointer denotes end of list.
Doubly linked list allows traversal both ways; pointersto starting and ending element of the list.
– p. 18/55
Data Structures
Linear list is an abstract concept, with insertion, deletionetc. It may be implemented using arrays or linked list.
Stack is a list which is "last in first out". Operationspush() and pop() to put elements in and take elementsout of the list.
Queue is a list which is "first in first out". Operations areenqueue() (to add to the end of the list) and dequeque()(to remove from front of the list).
Priority Queue: Elements are put in the queue basedon priority. Implemented using Heaps.
– p. 19/55
Graphs
Graph is a set of points (vertices) and edges connectingvertices.
G = (V,E). V is the set of vertices. E ⊆ V × V .
Directed graph.
Undirected graph (u, v) ∈ E iff (v, u) in E.
Self loops: edges of type (u, u).
Edges may be presented using adjacency matrixA(u, v) = 1 iff (u, v) is an edge. Edges representedusing adjacency lists: for each vertex u, use a linked listof edges v : (u, v) is an edge .
– p. 20/55
Complete graphs Kn. Complete Bipartite Graphs Km,n.
Weighted graphs: each edge has some weight, denotedby A(u, v) = w.
Path: v1, v2, v3, . . . , vk, where (vi, vi+1) is an edge in thegraph for all i. Path from v1 to vk.
Simple path (without vertices being repeated)
Length of a path (each edge counts for 1 or weight ofthe edge).
– p. 21/55
Cycle: paths where the first and last vertices are same.Simple cycle: only repetition in the path is the first andlast vertex.
Often, unless specified otherwise, we usually only talkabout simple paths, cycles.
Acyclic graph: no simple cycles
Connected components: path from each vertex to everyother vertex in the component (when viewed as"undirected graph").
– p. 22/55
Trees
Tree is a connected acyclic graph
Forest is a set of trees
In a tree |E| = |V | − 1
If in a connected graph, |E| = |V | − 1, then it is a tree.
Rooted tree: a designated node. Leads to naturalhierarchy based on distance from the root.
Parent, Children, Ancestors, Descendants, Siblings,subtrees.
Leaves of the tree: nodes without any children.
– p. 23/55
Level/Depth of a node: distance from the root. Root isat Level 0.
Depth/Height of a tree: maximum depth of any node.
Height of a node: Leaves at Height 0.
Ordered tree: Children are ordered (left, right; first,second, third, ...)
– p. 24/55
Binary tree: at most two children of each node.
Binary search tree: each node has a ‘key’
Maximum Height of a binary tree with n nodes is n− 1.Minimum height of a binary tree with n vertices is⌊log2 n⌋. To see this:
h−1∑
i=0
2i + 1 ≤ n ≤
h∑
i=0
2i
Which gives
2h ≤ n < 2h+1
– p. 25/55
Sets, Dictionaries
Set is a collection of unordered, distinct elements(members).
Sets maybe represented using bit vectors (when theuniversal set is known) or lists, or other ways where onecan do union, find (check if an element is member of aset), and other operations.
Data structure representing a set which supportsinsertion, deletion, and searching is called a dictionary.Dictionary maybe represented using array, balancedsearch tree, hashing etc.
– p. 26/55
Abstract Data Types
Various Data Types might be used to implementing analgorithm. If efficiency is not an issue, one ignores theexact way in which data types are implemented.
Abstract Data type: where only operations on the datatypes are specified, and implementation is not specified.
– p. 27/55
Finding maximum value in an array
Input: Array A (A[1], A[2], . . . , A[n])Output: Largest element in the array.Algorithm:
Findlargest(A)m = A[1].For i = 2 to n
If A[i] > m, then m = A[i]
End
– p. 28/55
Number of operations:n− 1 comparisons, n assignments (worst case)n− 1 assignments to the loop variable, . . .Roughly takes time proportional to n or about C ∗ n.
– p. 29/55
Analysis of Algorithms
Induction
Recurrence Relations
Mathematical Tools
Invariants, Loop Invariants
– p. 30/55
Some Basic notations and results
log, lg (base 2), ln (base e=2.718....)
logb a = n such that bn = a.
logb a = log alog b
log xy= log x− log y
open interval (a, b), semi-open interval (a, b] and closedinterval [a, b].
Alphabet Σ = a, b, . . .
Strings (word) over the alphabet ababba; null string ǫ
Sequences
Boolean expressions: A and (B or ¬C)
– p. 31/55
binomial coefficients:(x+ y)n =(
n0
)
x0yn +(
n1
)
x1yn−1 + . . .(
ni
)
xiyn−i + . . . +(
nn
)
xny0.
Also use the notation nCi =(
ni
)
bn+1−an+1
b−a= Σn
i=0aibn−i
Consider (b− a) ∗ Σni=0a
ibn−i.
(Σni=0a
ibn+1−i)− (Σni=0a
i+1bn−i).
= bn+1 + (Σni=1a
ibn+1−i)− (an+1 + Σn−1i=0 a
i+1bn−i).
= bn+1 − an+1 + (Σni=1a
ibn+1−i)− (Σni=1a
ibn+1−i).
= bn+1 − an+1
Thus, if 0 ≤ a < b, then
bn+1−an+1
b−a= Σn
i=0aibn−i ≤ (n+ 1)bn
– p. 32/55
Limit, Infimum and Supremum of a series
limn→∞ f(n)limm→∞min f(n) : n ≥ mlimm→∞max f(n) : n ≥ m
Lower Bounds and Upper BoundsWorst Case:Lower Bound f(n): For each n, for some input of size n,algorithm takes time at least f(n).Upper Bound f(n): For each n, for all inputs of size n,algorithm takes time at most f(n).
– p. 33/55
Asymptotic Upper Bounds
f ∈ O(g), f ∈ Ω(g), f ∈ Θ(g).f and g are functions from natural numbers to naturalnumbers.
f ∈ O(g), iff there exist constants c > 0 and n0 such thatfor all n ≥ n0, f(n) ≤ cg(n)Also common to use: f = O(g) or f is O(g) rather thanf ∈ O(g).
– p. 34/55
Some people also use the variation:f ∈ O(g) iff there exists constants c > 0 and d > 0, such thatfor all xf(x) ≤ cg(x) + dFor most cases, both are same/similar, though sometimesthey differ when we want to use "small complexity"
– p. 35/55
– Example:
f(n) = 50n2 + 200n.
g(n) = n3.Then f(n) ≤ 51g(n), for n ≥ 200.Thus, f(n) ∈ O(g(n)).
– p. 36/55
Asymptotic Lower Bounds:
f ∈ Ω(g), iff there exist constants c > 0 and n0 such thatfor all n ≥ n0, f(n) ≥ cg(n)
Asymptotic tight bounds
f ∈ Θ(g) iff f ∈ O(g) and f ∈ Ω(g).– Example
f(n) = aknk + ak−1n
k−1 + . . .+ a1n+ a0Then, f(n) ∈ Θ(nk).
– p. 37/55
Theorem: Suppose limn→∞
f(n)g(n)
= c.
(a) If c = 0, then f(n) ∈ O(g(n)), but f(n) 6∈ Ω(g(n)).(b) If c = ∞, then f(n) ∈ Ω(g(n)), but f(n) 6∈ O(g(n)).(c) If 0 < c < ∞, then f(n) ∈ Θ(g(n)).
Proof: By definition of limit, there exists an m such that for
all n > m,f(n)g(n) ≤ 1.
Then, for n ≥ m, f(n) ≤ g(n).Rest: Exercise
– p. 38/55
f = o(g) if for all c > 0, there exists a x0 such thatfor all x > x0, f(x) ≤ cg(x).f = ω(g) if for all c > 0, there exists a x0 such thatfor all x > x0, f(x) ≥ cg(x).
– p. 39/55
L’ Hospital’s RuleIf limn→∞ f(n) and limn→∞ g(n) are both 0 or both ∞,then
limn→∞
f(n)g(n)
= limn→∞
f ′(n)g′(n)
.
Showing f ∈ O(g)
Explicit bounds as in definition
Using limit theorem
You can use either, unless explicitly asked to use one ofthe methods ....
Misuse notation and say f(n) = O(g(n)) etc.
Recurrence relations
– p. 40/55
Example: Finding maximum value in an array
Input: Array A (A[1], A[2], . . . , A[n])Output: Largest element in the array.Algorithm:
Findlargest(A[1:n])If n = 1, then return A[1].Else,
let m = Findlargest(A[1 : n− 1]).If m > A[n], then return mElse return A[n].
End
T (n) ≤ T (n− 1) + C1.T (1) = C2
– p. 41/55
Finding element in sorted array
FindElement(A, i, j, key)If i > j, return 0
Let k = i+j2 .
If A[k] = key, then return 1Else, if A[k] < key, then return FindElement(A,k+1,j,key)Else return FindElement(A,i,k-1,key)
End
T (n) ≤ T (n/2) + C.
– p. 42/55
Some further recurrence examples:T (n) = T (n− 1) + n for n ≥ 2 and T (1) = 1
T (n) = 2T (⌈n/2⌉) + n, for n ≥ 2, and T (1) = 1.
T (n) = 2T (n− 1) + 1, for n ≥ 2, and T (1) = 1.
T (n) = 9 + Σn−1i=1 T (i), for n ≥ 2, and T (1) = 7.
– p. 43/55
Solve T (n) = T (n− 1) + 1, for n ≥ 2T (1) = 1.
T (n) = T (n− 1) + 1
= T (n− 2) + 1 + 1
= T (n− 2) + 2
= T (n− 3) + 3
= · · ·
= T (1) + n− 1
= n
– p. 44/55
T (n) = T (n/2) + n, for n ≥ 2T (1) = 1Solve T (n) for n being power of 2.
T (n) = T (n/2) + n
= T (n/4) + n/2 + n
= T (n/8) + n/4 + n/2 + n
= · · ·
= T (1) + 2 + 4 + · · · + n/4 + n/2 + n
= 1 + 2 + 4 + · · · + n/4 + n/2 + n
= 2n− 1
– p. 45/55
Need not want exactly, but only in order.T (n) = n+ T (n− 1), T (1) = 1T (n) = n+ (n− 1) + (n− 2) + . . .+ 1
T (n) ≤ n ∗ n = O(n2).
– p. 46/55
Solving Recurrence Relations by Induction
Guess the answer
Prove by induction
Example:T (n) = T (n/3) + T (2n/3) + nT (n) ≤ C2n log n
– p. 47/55
Main Recurrence Theorem (Master Theorem)– Useful for solving many cases (does not always work).Theorem: Suppose a ≥ 1, b ≥ 2 are constants.Upper Bound:
If T (n) ≤ aT (n/b) + f(n), where f(n) = O(nk), then
T (n) =
O(nk), if a < bk;
O(nk log n), if a = bk;
O(nlogb a), if a > bk;
– p. 48/55
Lower Bound:If T (n) ≥ aT (n/b) + f(n), where f(n) = Ω(nk), then
T (n) =
Ω(nk), if a < bk;
Ω(nk log n), if a = bk;
Ω(nlogb a), if a > bk;
– p. 49/55
Exact Bound:If T (n) = aT (n/b) + f(n), where f(n) = Θ(nk), then
T (n) =
Θ(nk), if a < bk;
Θ(nk log n), if a = bk;
Θ(nlogb a), if a > bk;
The theorem also holds if b > 1.
– p. 50/55
Proof Idea: Consider the case of n = br for some naturalnumber r. Let bk = d in the following.
T (br) ≤ aT (br−1) + c[(br)k]
T (br) ≤ aT (br−1) + c[dr]
≤ a[aT (br−2) + c[dr−1]] + c[dr]
≤ a2T (br−2) + c[dr + a1dr−1]
. . .
≤ arT (br−r) + c[
r∑
i=1
[ar−idi]]
≤ c′[
r∑
i=0
[ar−idi]]
– p. 51/55
If d > a, then
T (br) ≤ c′dr+1 − ar+1
d− a
= O(dr) = O(nk)
If d < a, then
T (br) ≤ c′ar+1 − dr+1
a− d
= O(ar) = O(br logb a) = O(nlogb a)
– p. 52/55
If d = a, then
T (br) ≤ c′[(r + 1) ∗ dr]
= O(nk log n)
– p. 53/55
Consider H(n) = aH(n/b) + cnk; H(1) = T (1).
Note that T (n) ≤ H(n).H(n) is increasing function.The bound we obtained above would give,H(n) ≤ ...., for n = br.
If br−1 < n′ < br, thenT (n′) ≤ H(n′) ≤ H(br) ≤ .....
For example, if we have H(n) ≤ c′ ∗ nk, for n of the form br,
then for br−1 < n′ < br = n, we have
T (n′) ≤ H(n′) ≤ H(n) ≤ c′ ∗ nk ≤ c′ ∗ bk(n′)k ≤ c′′(n′)k.
– p. 54/55
Example:T (n) = T (n/2) + n.a = 1, b = 2, k = 1So a < bk
T (n) = O(n)
Example: T (n) = 2T (n/2) + na = 2, b = 2, k = 1
So a = bk
T (n) = O(n log n)
– p. 55/55