introduction and basics

Introduction and Basics

Introduction Analyzing Algorithms Growth of Functions Sorting Techniques Recurrences Brute-Force Greedy Algorithms Divide and Conquer Dynamic Programming Back-Tracking Graph Algorithms Computational Geometry String Matching Algorithms NP-Completeness

7th Week: ◦ 2 Exams(20%) + 2 Quizzes (10%)

12th Week: ◦ 2 Exams (10%) + 1 Quiz (5%) + 1 Assignment (5%)

Final Exam (40%)

Sheets and Assignments (10%)

LATE ASSIGNMENTS ARE GRADED OUT OF 75% (1 week late), 50% (2 weeks late), 25% (3 weeks late), ZERO afterwards

Theoretical importance

◦ the core of computer science

Practical importance

◦ A practitioner’s toolkit of known algorithms

◦ Framework for designing and analyzing algorithms for new problems

How to design algorithms

How to prove your algorithm is correct

How to analyze algorithm efficiency

An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate input in a finite amount of time.

“computer”

problem

algorithm

input output

Recipe, process, method, technique, procedure, routine,… with following requirements:

1. Finiteness terminates after a finite number of steps

2. Definiteness rigorously and unambiguously specified

3. Input valid inputs are clearly specified

4. Output can be proved to produce the correct output given a valid

input

5. Effectiveness steps are sufficiently simple and basic

Problem: Find gcd(m,n), the greatest common divisor of two nonnegative, not both zero integers m and n

Examples: gcd(60,24) = 12, gcd(60,0) = 60 Euclid’s algorithm is based on repeated

application of equality gcd(m,n) = gcd(n, m mod n)

until the second number becomes 0, which makes the problem trivial.

Example: gcd(60,24) = gcd(24,12) = gcd(12,0) = 12

Step 1: If n = 0, return m and stop; otherwise go to Step 2

Step 2: Divide m by n and assign the remainder to r

Step 3: Assign the value of n to m and the value of r to n. Go to Step 1.

while n ≠ 0 do

r ← m mod n

m← n

n ← r

return m

How to design algorithms

How to express algorithms

Proving correctness

Efficiency ◦ Theoretical analysis

◦ Empirical analysis

Optimality

Algorithms are procedural solutions to problems

Steps for designing and analyzing an algorithm

◦ Understand the problem ◦ Ascertain the capabilities of a computational device ◦ Choose between exact and approximate problem

solving ◦ Decide on appropriate data structures

Brute force

Greedy approach

Divide and conquer

Dynamic programming

Backtracking

Others

Methods of specifying an algorithm ◦ Pseudocode (commonly used)

◦ Flowchart

Proving an algorithm’s correctness ◦ Mathematical induction for recursion

◦ Methods of Proof

◦ Approximation algorithms are more difficult

How good is the algorithm? ◦ Correctness

◦ Time efficiency

◦ Space efficiency

Does there exist a better algorithm? ◦ Lower bounds

◦ Optimality

sorting

searching

string processing

graph problems

geometric problems

numerical problems

Statement of problem: ◦ Input: A sequence of n numbers <a1, a2, …, an>

◦ Output: A reordering of the input sequence <a´

1, a´

2, …, a´n> so that a´

i ≤ a´j whenever i < j

Instance: The sequence <5, 3, 2, 8, 3>

Algorithms: ◦ Selection sort ◦ Insertion sort ◦ Merge sort ◦ (many others)

Input: array a[1],…,a[n]

Output: array a sorted in non-decreasing order

Algorithm:

Lists

Stacks

Queues

Graphs

Trees

Hash Tables

Predicting the resources that the algorithm

requires: Computational running time

Memory usage

Communication bandwidth

The running time of an algorithm Number of primitive operations on a particular input size

Depends on

Input size (e.g. 60 elements vs. 70000)

The input itself ( partially sorted input for a sorting algorithm)

Example: for some sorting algorithms, a sorting routine may require as few as N-1 comparisons and as many as

Types of analyses: ◦ Best-case: what is the fastest an algorithm can

run for a problem of size N? ◦ Average-case: on average how fast does an

algorithm run for a problem of size N? ◦ Worst-case: what is the longest an algorithm can

run for a problem of size N?

Computer scientists mostly use worst-case analysis

2

2N

Which is better: OR Answer depends on value of N: N

1 120 37

2 511 71

3 1374 159

4 2895 397

5 5260 1073

6 8655 3051

7 13266 8923

8 19279 26465

9 26880 79005

10 36255 236527

15243150 32 NNNNNN 34213 2

15243150 32 NNNNNN 34213 2

N

1 37 12 32.4

2 71 36 50.7

3 159 108 67.9

4 397 324 81.6

5 1073 972 90.6

6 3051 2916 95.6

7 8923 8748 98.0

8 26465 26244 99.2

9 79005 78732 99.7

10 236527 236196 99.9

◦ One term dominated the sum

NNN 34213 2 N34 ofTotal%

Function 10 100 1000 10000 100000

N2log 3 6 9 13 16

N 10 100 1000 10000 100000

NN 2log 30 664 9965 510 610

2N 210 410 610 810 1010

3N 310 610 910 1210 1510

N2 310 3010 30110 301010 3010310

Measure speed with respect to the part of the sum that grows quickest

Ordering:

15243150 32 NNN

NNN 34213 2

NNNNNNNN 32loglog1 32

22

Furthermore, simply ignore any constants in front of term and simply report general class of the term:

grows proportionally to

grows proportionally to When comparing algorithms, determine formulas

to count operation(s) of interest, then compare dominant terms of formulas

331N N34 NN 2log15

15243150 32 NNN

NNN 34213 2

3N

N3

Algorithm A requires time proportional to f(N) - algorithm is said to be of order f(N) or O(f(N))

Definition: an algorithm is said to take time proportional to O(f(N)) if there is some constant C such that for all but a finite number of values of N, the time taken by the algorithm is less than C*f(N)

Examples: If an algorithm is O(f(N)), f(N) is said to be the growth-rate function of

the algorithm

15243150 32 NNN

NNN 34213 2

)( 3NO

)3( NO

Big-O notation: f(n) = O(g(n)) ◦ f(n) grows no faster than g(n) (worst case)

Little-o notation: f(n) = o(g(n)) ◦ g(n) grows much faster than f(n)

Big-theta notation: f(n) = θ(g(n)) ◦ f(n) grows as fast as g(n)

Big-Omega notation: f(n) = Ω(g(n)) ◦ g(n) is the lower bound of f(n) (Best case)

In general a function ◦ f(n) is (g(n)) if positive constants c and n0 such

that 0 cg(n) f(n) n n0

A function f(n) is (g(n)) if positive constants c1, c2, and n0 such that c1 g(n) f(n) c2 g(n) n n0

f(n) is (g(n)) iff f(n) is both O(g(n)) and (g(n))

0

250

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

f(n) = n

f(n) = log(n)

f(n) = n log(n)

f(n) = n 2̂

f(n) = n 3̂

f(n) = 2 n̂

0

500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

f(n) = n

f(n) = log(n)

f(n) = n log(n)

f(n) = n 2̂

f(n) = n 3̂

f(n) = 2 n̂

0

1000

1 3 5 7 9 11 13 15 17 19

f(n) = n

f(n) = log(n)

f(n) = n log(n)

f(n) = n 2̂

f(n) = n 3̂

f(n) = 2 n̂

0

1000

2000

3000

4000

5000

1 3 5 7 9 11 13 15 17 19

f(n) = n

f(n) = log(n)

f(n) = n log(n)

f(n) = n 2̂

f(n) = n 3̂

f(n) = 2 n̂

1

10

100

1000

10000

100000

1000000

10000000

1 4 16 64 256 1024 4096 16384 65536

1. 3n3 + 90n2 – 2n +5 = O(n3 )

2. 2n2 + 3n +1000000 = (n2)

3. 2n = o(n2)

4. 3n2 = O(n2) tighter (n2)

5. n log n = O(n2)

6. True or false:

– n2 = O(n3 )

– n3 = O(n2)

– 2n+1= O(2n)

– (n+1)! = O(n!)

1 n n log n n2 nk

(3/2)n 2n (n)! (n+1)!

Rule 1 – For Loops The running time of a for loop is at most the running

time of the statement inside the for loop (including tests) times the number of iterations

Rule 2 – Nested Loops Analyze these inside out. The total running time of a

statement inside a group of nested loops is the running time of the statement multiplied by the product of the sizes of all the loops

Example 1:

sum = 0;

for (i=1; i <=n; i++)

sum += n;

Example 2: sum = 0;

for (j=1; j<=n; j++)

for (i=1; i<=j; i++)

sum++;

for (k=0; k<n; k++)

A[k] = k;

Algorithm 1: Sequential Search

int search(int A[], int N, int Num) {

int index = 0;

while ((index < N) && (A[index] < Num))

index++;

if ((index < N) && (A[index] == Num))

return index;

else

return -1;

}

Operations to count: how many times Num is compared to member of array Best-case: find the number we are looking for at the

first position in the array (1 + 1 = 2 comparisons) O(1)

Average-case: find the number on average half-way down the array (sometimes longer, sometimes shorter)

(N/2+1 comparisons) O(N)

Worst-case: have to compare Num to every element in the array (N + 1 comparisons) O(N)

int search(int A[], int N, int Num) {

int first = 0;

int last = N - 1;

int mid = (first + last) / 2;

while ((A[mid] != Num) && (first <= last)) {

if (A[mid] > Num)

last = mid - 1;

else

first = mid + 1;

mid = (first + last) / 2;

}

if (A[mid] == Num)

return mid;

else

return -1;

}

One comparison after loop First time through loop, toss half of array (2 comps) Second time, half remainder (1/4 original) 2 comps Third time, half remainder (1/8 original) 2 comps … Loop Iteration Remaining Elements 1 N/2 2 N/4 3 N/8 4 N/16 … ?? 1 How long to get to 1?

Looking at the problem in reverse, how long to double the number 1 until we get to N?

and solve for X two comparisons for each iteration, plus one

comparison at the end -- binary search takes in the worst case

Binary search is worst-case Sequential search is worst-case

XN 2

XN X )2(loglog 22

)(log 2 NO)(NO

1log2 2 N

introduction and basics

Documents