data structure and algorithmskeddiyan.com/files/dsa/lecture1.pdf · data structure and algorithms...

Data Structure and Algorithms

Yan Ke

China Jiliang University

Course Mark Distribution

--Final Exam: 50%

--Take-home assignments: 20%

--Labs: 20%

-- Attendance: 10%

Course Syllabus

--In total 10-11 topics, including (possibly):

• Brute Force Algorithms

• Divide and Conquer Algorithms

• Lower Bound Sorting

• Dynamic Programming

• Greedy Algorithms

• Heaps

• NP Completeness

• 3DM

• Iterative Improvement

• Branch and Bound

• Final Answer sketches

Class arrangement -- Class webpage: http://keddiyan.com/Algo.html

-- Please check the webpage regularly for class arrangement updates.

--Lectures only on Wednesday!

-- Monday/Friday classes are shifted to every Wednesday 1:30 pm to 3:05 pm Labs.

--Monday and Friday classes are mainly for assignments and labs (Please do it on your own)

--Lab arrangement: every week (week 2- 16) from 1:30 pm to 3:05 pm at Cyber North 304 (Saibo North 304).

-- Assignments and labs can be done under/not under my supervision (at home or in my office)

http://keddiyan.com/Algo.html

http://keddiyan.com/Algo.html

Algorithms

An Algorithm is a sequence of unambiguous instructions forsolving a problem, i.e., for obtaining a required output forany legitimate input in a finite amount of time.

– p. 5/55

Algorithms

Input, what are the legitimate input

Output, what are the valid outputs based on the input;most often unique output depending on the input.

Unambiguous and Executable: The steps (instructions)are precisely and unambiguously stated; Instructionscan be “executed", followed by a “machine”, clear howto implement them

*Deterministic: Execution of each step is unique andgives a result determined only by the input to the step.[Randomness, nondeterminism, ..., Quantum]

*Termination: The algorithm terminates after finitelymany instructions are executed

– p. 6/55

Generality: Applicable to a wide range of inputs

Correctness:

Performance/Efficiency/Complexity:Time and Space (Memory)(Depends on input size and content, machine speed,....)

Number of Basic Operations (correlated with time)

Asymptotic Analysis

Worst Case

Average Case

– p. 7/55

Pseudocode

We can give a program. This is usually unambiguous andclear.However, this is tedious. Sometimes difficult to understand.

Pseudocode.Slightly informal.Still precise enough to understand exactly what steps aretaken.

– p. 8/55

Example: GCD (Greatest Common Divisor)Input: Two integers, at least one of them 6= 0Output: greatest natural number > 0 which divides both ofthem

Notation: N denotes the set of natural numbers 0, 1, 2, . . .Notation: Z denotes the set of integers. . . ,−2,−1, 0, 1, 2, . . .Notation: a mod b denotes the remainder when a is dividedby b.

– p. 9/55

GCD(m,n) = GCD(m,−n)GCD(m, 0) = m, for m > 0

GCD(m,n)m,n are natural numbers > 0.Let t = min(m,n).Loop

If t divides both m and n, then output tElse t = t− 1

Forever

– p. 10/55

Euclid’s Algorithm

Uses: If m = qn+ r, where m,n, q, r are natural numbers,then gcd(m,n) = gcd(n, r)

GCD(m,n)(* m,n are natural numbers, at least one of them > 0 *)

While n > 0 Dor = m mod nm = nn = r

EndWhileOutput m

– p. 11/55

Why does it work?

If m < n, then first iteration will swap them.

Each step after that will make n smaller than previousround.

Actually it works ‘much faster’ than going down by 1 ineach iteration.

On termination, m is the GCD.

– p. 12/55

Algorithms

Possible inputs need to be specified clearly

Each step must be clearly defined and executable

Many possible algorithms may exist

Algorithm Efficiency

Data Structures needed for doing the operations

– p. 13/55

Algorithms

Understand the problem

Design a method to solve

Convert to an algorithm, step by step process to solvethe problem

Choose Data Structures to implement the algorithm

Proving correctness of the algorithm/program

Analyzing the algorithm

Time, Space, Simplicity, Generality

Program: Coding the algorithm

– p. 14/55

Some Important Problems

Sorting: Stable sorting, In place sorting

Searching

String Processing

A string is sequence of symbols: aba, a4bca, . . .

string matching, sub-string searching, etc

Graph problems: graph traversal, shortest path, etc.

Geometric problems: closest pair, convex hull, etc.

Numerical problems: real numbers, approximations, etcfrom engineering/science

– p. 15/55

Design Strategies

Brute Force

Divide and Conquer (decrease and conquer, transformand conquer)

Greedy technique

Dynamic Programming

Iterative improvement

Backtracking

Branch and bound

P vs NP

– p. 16/55

Data Structures

Background Information: You should know about these.

Way of organizing Data for easy manipulation

Arrays, Linked Lists (singly linked list, doubly linked list),linear list, Stacks, Queues, Priority Queues, heaps

Static memory allocation/dynamic memory allocation

Directed/Undirected Graphs,

Trees: rooted, ordered, binary trees

Sets, Dictionaries

– p. 17/55

Data Structures

Array is a sequence of n items (usually of the sametype): refered to by A(0) to A(n− 1) or A(1) to A(n)depending on context.

Linked list is a sequence of zero of more items. Theymay have variable number of elements/members.

Constructed using pointers.

Starting element pointer.

Next element pointer from each element. Nullpointer denotes end of list.

Doubly linked list allows traversal both ways; pointersto starting and ending element of the list.

– p. 18/55

Data Structures

Linear list is an abstract concept, with insertion, deletionetc. It may be implemented using arrays or linked list.

Stack is a list which is "last in first out". Operationspush() and pop() to put elements in and take elementsout of the list.

Queue is a list which is "first in first out". Operations areenqueue() (to add to the end of the list) and dequeque()(to remove from front of the list).

Priority Queue: Elements are put in the queue basedon priority. Implemented using Heaps.

– p. 19/55

Graphs

Graph is a set of points (vertices) and edges connectingvertices.

G = (V,E). V is the set of vertices. E ⊆ V × V .

Directed graph.

Undirected graph (u, v) ∈ E iff (v, u) in E.

Self loops: edges of type (u, u).

Edges may be presented using adjacency matrixA(u, v) = 1 iff (u, v) is an edge. Edges representedusing adjacency lists: for each vertex u, use a linked listof edges v : (u, v) is an edge .

– p. 20/55

Complete graphs Kn. Complete Bipartite Graphs Km,n.

Weighted graphs: each edge has some weight, denotedby A(u, v) = w.

Path: v1, v2, v3, . . . , vk, where (vi, vi+1) is an edge in thegraph for all i. Path from v1 to vk.

Simple path (without vertices being repeated)

Length of a path (each edge counts for 1 or weight ofthe edge).

– p. 21/55

Cycle: paths where the first and last vertices are same.Simple cycle: only repetition in the path is the first andlast vertex.

Often, unless specified otherwise, we usually only talkabout simple paths, cycles.

Acyclic graph: no simple cycles

Connected components: path from each vertex to everyother vertex in the component (when viewed as"undirected graph").

– p. 22/55

Trees

Tree is a connected acyclic graph

Forest is a set of trees

In a tree |E| = |V | − 1

If in a connected graph, |E| = |V | − 1, then it is a tree.

Rooted tree: a designated node. Leads to naturalhierarchy based on distance from the root.

Parent, Children, Ancestors, Descendants, Siblings,subtrees.

Leaves of the tree: nodes without any children.

– p. 23/55

Level/Depth of a node: distance from the root. Root isat Level 0.

Depth/Height of a tree: maximum depth of any node.

Height of a node: Leaves at Height 0.

Ordered tree: Children are ordered (left, right; first,second, third, ...)

– p. 24/55

Binary tree: at most two children of each node.

Binary search tree: each node has a ‘key’

Maximum Height of a binary tree with n nodes is n− 1.Minimum height of a binary tree with n vertices is⌊log2 n⌋. To see this:

h−1∑

i=0

2i + 1 ≤ n ≤

h∑

i=0

2i

Which gives

2h ≤ n < 2h+1

– p. 25/55

Sets, Dictionaries

Set is a collection of unordered, distinct elements(members).

Sets maybe represented using bit vectors (when theuniversal set is known) or lists, or other ways where onecan do union, find (check if an element is member of aset), and other operations.

Data structure representing a set which supportsinsertion, deletion, and searching is called a dictionary.Dictionary maybe represented using array, balancedsearch tree, hashing etc.

– p. 26/55

Abstract Data Types

Various Data Types might be used to implementing analgorithm. If efficiency is not an issue, one ignores theexact way in which data types are implemented.

Abstract Data type: where only operations on the datatypes are specified, and implementation is not specified.

– p. 27/55

Finding maximum value in an array

Input: Array A (A[1], A[2], . . . , A[n])Output: Largest element in the array.Algorithm:

Findlargest(A)m = A[1].For i = 2 to n

If A[i] > m, then m = A[i]

End

– p. 28/55

Number of operations:n− 1 comparisons, n assignments (worst case)n− 1 assignments to the loop variable, . . .Roughly takes time proportional to n or about C ∗ n.

– p. 29/55

Analysis of Algorithms

Induction

Recurrence Relations

Mathematical Tools

Invariants, Loop Invariants

– p. 30/55

Some Basic notations and results

log, lg (base 2), ln (base e=2.718....)

logb a = n such that bn = a.

logb a = log alog b

log xy= log x− log y

open interval (a, b), semi-open interval (a, b] and closedinterval [a, b].

Alphabet Σ = a, b, . . .

Strings (word) over the alphabet ababba; null string ǫ

Sequences

Boolean expressions: A and (B or ¬C)

– p. 31/55

binomial coefficients:(x+ y)n =(

n0

)

x0yn +(

n1

)

x1yn−1 + . . .(

ni

)

xiyn−i + . . . +(

nn

)

xny0.

Also use the notation nCi =(

ni

)

bn+1−an+1

b−a= Σn

i=0aibn−i

Consider (b− a) ∗ Σni=0a

ibn−i.

(Σni=0a

ibn+1−i)− (Σni=0a

i+1bn−i).

= bn+1 + (Σni=1a

ibn+1−i)− (an+1 + Σn−1i=0 a

i+1bn−i).

= bn+1 − an+1 + (Σni=1a

ibn+1−i)− (Σni=1a

ibn+1−i).

= bn+1 − an+1

Thus, if 0 ≤ a < b, then

bn+1−an+1

b−a= Σn

i=0aibn−i ≤ (n+ 1)bn

– p. 32/55

Limit, Infimum and Supremum of a series

limn→∞ f(n)limm→∞min f(n) : n ≥ mlimm→∞max f(n) : n ≥ m

Lower Bounds and Upper BoundsWorst Case:Lower Bound f(n): For each n, for some input of size n,algorithm takes time at least f(n).Upper Bound f(n): For each n, for all inputs of size n,algorithm takes time at most f(n).

– p. 33/55

Asymptotic Upper Bounds

f ∈ O(g), f ∈ Ω(g), f ∈ Θ(g).f and g are functions from natural numbers to naturalnumbers.

f ∈ O(g), iff there exist constants c > 0 and n0 such thatfor all n ≥ n0, f(n) ≤ cg(n)Also common to use: f = O(g) or f is O(g) rather thanf ∈ O(g).

– p. 34/55

Some people also use the variation:f ∈ O(g) iff there exists constants c > 0 and d > 0, such thatfor all xf(x) ≤ cg(x) + dFor most cases, both are same/similar, though sometimesthey differ when we want to use "small complexity"

– p. 35/55

– Example:

f(n) = 50n2 + 200n.

g(n) = n3.Then f(n) ≤ 51g(n), for n ≥ 200.Thus, f(n) ∈ O(g(n)).

– p. 36/55

Asymptotic Lower Bounds:

f ∈ Ω(g), iff there exist constants c > 0 and n0 such thatfor all n ≥ n0, f(n) ≥ cg(n)

Asymptotic tight bounds

f ∈ Θ(g) iff f ∈ O(g) and f ∈ Ω(g).– Example

f(n) = aknk + ak−1n

k−1 + . . .+ a1n+ a0Then, f(n) ∈ Θ(nk).

– p. 37/55

Theorem: Suppose limn→∞

f(n)g(n)

= c.

(a) If c = 0, then f(n) ∈ O(g(n)), but f(n) 6∈ Ω(g(n)).(b) If c = ∞, then f(n) ∈ Ω(g(n)), but f(n) 6∈ O(g(n)).(c) If 0 < c < ∞, then f(n) ∈ Θ(g(n)).

Proof: By definition of limit, there exists an m such that for

all n > m,f(n)g(n) ≤ 1.

Then, for n ≥ m, f(n) ≤ g(n).Rest: Exercise

– p. 38/55

f = o(g) if for all c > 0, there exists a x0 such thatfor all x > x0, f(x) ≤ cg(x).f = ω(g) if for all c > 0, there exists a x0 such thatfor all x > x0, f(x) ≥ cg(x).

– p. 39/55

L’ Hospital’s RuleIf limn→∞ f(n) and limn→∞ g(n) are both 0 or both ∞,then

limn→∞

f(n)g(n)

= limn→∞

f ′(n)g′(n)

.

Showing f ∈ O(g)

Explicit bounds as in definition

Using limit theorem

You can use either, unless explicitly asked to use one ofthe methods ....

Misuse notation and say f(n) = O(g(n)) etc.

Recurrence relations

– p. 40/55

Example: Finding maximum value in an array

Input: Array A (A[1], A[2], . . . , A[n])Output: Largest element in the array.Algorithm:

Findlargest(A[1:n])If n = 1, then return A[1].Else,

let m = Findlargest(A[1 : n− 1]).If m > A[n], then return mElse return A[n].

End

T (n) ≤ T (n− 1) + C1.T (1) = C2

– p. 41/55

Finding element in sorted array

FindElement(A, i, j, key)If i > j, return 0

Let k = i+j2 .

If A[k] = key, then return 1Else, if A[k] < key, then return FindElement(A,k+1,j,key)Else return FindElement(A,i,k-1,key)

End

T (n) ≤ T (n/2) + C.

– p. 42/55

Some further recurrence examples:T (n) = T (n− 1) + n for n ≥ 2 and T (1) = 1

T (n) = 2T (⌈n/2⌉) + n, for n ≥ 2, and T (1) = 1.

T (n) = 2T (n− 1) + 1, for n ≥ 2, and T (1) = 1.

T (n) = 9 + Σn−1i=1 T (i), for n ≥ 2, and T (1) = 7.

– p. 43/55

Solve T (n) = T (n− 1) + 1, for n ≥ 2T (1) = 1.

T (n) = T (n− 1) + 1

= T (n− 2) + 1 + 1

= T (n− 2) + 2

= T (n− 3) + 3

= · · ·

= T (1) + n− 1

= n

– p. 44/55

T (n) = T (n/2) + n, for n ≥ 2T (1) = 1Solve T (n) for n being power of 2.

T (n) = T (n/2) + n

= T (n/4) + n/2 + n

= T (n/8) + n/4 + n/2 + n

= · · ·

= T (1) + 2 + 4 + · · · + n/4 + n/2 + n

= 1 + 2 + 4 + · · · + n/4 + n/2 + n

= 2n− 1

– p. 45/55

Need not want exactly, but only in order.T (n) = n+ T (n− 1), T (1) = 1T (n) = n+ (n− 1) + (n− 2) + . . .+ 1

T (n) ≤ n ∗ n = O(n2).

– p. 46/55

Solving Recurrence Relations by Induction

Guess the answer

Prove by induction

Example:T (n) = T (n/3) + T (2n/3) + nT (n) ≤ C2n log n

– p. 47/55

Main Recurrence Theorem (Master Theorem)– Useful for solving many cases (does not always work).Theorem: Suppose a ≥ 1, b ≥ 2 are constants.Upper Bound:

If T (n) ≤ aT (n/b) + f(n), where f(n) = O(nk), then

T (n) =

O(nk), if a < bk;

O(nk log n), if a = bk;

O(nlogb a), if a > bk;

– p. 48/55

Lower Bound:If T (n) ≥ aT (n/b) + f(n), where f(n) = Ω(nk), then

T (n) =

Ω(nk), if a < bk;

Ω(nk log n), if a = bk;

Ω(nlogb a), if a > bk;

– p. 49/55

Exact Bound:If T (n) = aT (n/b) + f(n), where f(n) = Θ(nk), then

T (n) =

Θ(nk), if a < bk;

Θ(nk log n), if a = bk;

Θ(nlogb a), if a > bk;

The theorem also holds if b > 1.

– p. 50/55

Proof Idea: Consider the case of n = br for some naturalnumber r. Let bk = d in the following.

T (br) ≤ aT (br−1) + c[(br)k]

T (br) ≤ aT (br−1) + c[dr]

≤ a[aT (br−2) + c[dr−1]] + c[dr]

≤ a2T (br−2) + c[dr + a1dr−1]

. . .

≤ arT (br−r) + c[

r∑

i=1

[ar−idi]]

≤ c′[

r∑

i=0

[ar−idi]]

– p. 51/55

If d > a, then

T (br) ≤ c′dr+1 − ar+1

d− a

= O(dr) = O(nk)

If d < a, then

T (br) ≤ c′ar+1 − dr+1

a− d

= O(ar) = O(br logb a) = O(nlogb a)

– p. 52/55

If d = a, then

T (br) ≤ c′[(r + 1) ∗ dr]

= O(nk log n)

– p. 53/55

Consider H(n) = aH(n/b) + cnk; H(1) = T (1).

Note that T (n) ≤ H(n).H(n) is increasing function.The bound we obtained above would give,H(n) ≤ ...., for n = br.

If br−1 < n′ < br, thenT (n′) ≤ H(n′) ≤ H(br) ≤ .....

For example, if we have H(n) ≤ c′ ∗ nk, for n of the form br,

then for br−1 < n′ < br = n, we have

T (n′) ≤ H(n′) ≤ H(n) ≤ c′ ∗ nk ≤ c′ ∗ bk(n′)k ≤ c′′(n′)k.

– p. 54/55

Example:T (n) = T (n/2) + n.a = 1, b = 2, k = 1So a < bk

T (n) = O(n)

Example: T (n) = 2T (n/2) + na = 2, b = 2, k = 1

So a = bk

T (n) = O(n log n)

– p. 55/55

data structure and algorithmskeddiyan.com/files/dsa/lecture1.pdf · data structure and algorithms...

Documents