introduction and basics
TRANSCRIPT
Introduction and Basics
Introduction Analyzing Algorithms Growth of Functions Sorting Techniques Recurrences Brute-Force Greedy Algorithms Divide and Conquer Dynamic Programming Back-Tracking Graph Algorithms Computational Geometry String Matching Algorithms NP-Completeness
7th Week: ◦ 2 Exams(20%) + 2 Quizzes (10%)
12th Week: ◦ 2 Exams (10%) + 1 Quiz (5%) + 1 Assignment (5%)
Final Exam (40%)
Sheets and Assignments (10%)
LATE ASSIGNMENTS ARE GRADED OUT OF 75% (1 week late), 50% (2 weeks late), 25% (3 weeks late), ZERO afterwards
Theoretical importance
◦ the core of computer science
Practical importance
◦ A practitioner’s toolkit of known algorithms
◦ Framework for designing and analyzing algorithms for new problems
How to design algorithms
How to prove your algorithm is correct
How to analyze algorithm efficiency
An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate input in a finite amount of time.
“computer”
problem
algorithm
input output
Recipe, process, method, technique, procedure, routine,… with following requirements:
1. Finiteness terminates after a finite number of steps
2. Definiteness rigorously and unambiguously specified
3. Input valid inputs are clearly specified
4. Output can be proved to produce the correct output given a valid
input
5. Effectiveness steps are sufficiently simple and basic
Problem: Find gcd(m,n), the greatest common divisor of two nonnegative, not both zero integers m and n
Examples: gcd(60,24) = 12, gcd(60,0) = 60 Euclid’s algorithm is based on repeated
application of equality gcd(m,n) = gcd(n, m mod n)
until the second number becomes 0, which makes the problem trivial.
Example: gcd(60,24) = gcd(24,12) = gcd(12,0) = 12
Step 1: If n = 0, return m and stop; otherwise go to Step 2
Step 2: Divide m by n and assign the remainder to r
Step 3: Assign the value of n to m and the value of r to n. Go to Step 1.
while n ≠ 0 do
r ← m mod n
m← n
n ← r
return m
How to design algorithms
How to express algorithms
Proving correctness
Efficiency ◦ Theoretical analysis
◦ Empirical analysis
Optimality
Algorithms are procedural solutions to problems
Steps for designing and analyzing an algorithm
◦ Understand the problem ◦ Ascertain the capabilities of a computational device ◦ Choose between exact and approximate problem
solving ◦ Decide on appropriate data structures
Brute force
Greedy approach
Divide and conquer
Dynamic programming
Backtracking
Others
Methods of specifying an algorithm ◦ Pseudocode (commonly used)
◦ Flowchart
Proving an algorithm’s correctness ◦ Mathematical induction for recursion
◦ Methods of Proof
◦ Approximation algorithms are more difficult
How good is the algorithm? ◦ Correctness
◦ Time efficiency
◦ Space efficiency
Does there exist a better algorithm? ◦ Lower bounds
◦ Optimality
sorting
searching
string processing
graph problems
geometric problems
numerical problems
Statement of problem: ◦ Input: A sequence of n numbers <a1, a2, …, an>
◦ Output: A reordering of the input sequence <a´
1, a´
2, …, a´n> so that a´
i ≤ a´j whenever i < j
Instance: The sequence <5, 3, 2, 8, 3>
Algorithms: ◦ Selection sort ◦ Insertion sort ◦ Merge sort ◦ (many others)
Input: array a[1],…,a[n]
Output: array a sorted in non-decreasing order
Algorithm:
Lists
Stacks
Queues
Graphs
Trees
Hash Tables
Predicting the resources that the algorithm
requires: Computational running time
Memory usage
Communication bandwidth
The running time of an algorithm Number of primitive operations on a particular input size
Depends on
Input size (e.g. 60 elements vs. 70000)
The input itself ( partially sorted input for a sorting algorithm)
Example: for some sorting algorithms, a sorting routine may require as few as N-1 comparisons and as many as
Types of analyses: ◦ Best-case: what is the fastest an algorithm can
run for a problem of size N? ◦ Average-case: on average how fast does an
algorithm run for a problem of size N? ◦ Worst-case: what is the longest an algorithm can
run for a problem of size N?
Computer scientists mostly use worst-case analysis
2
2N
Which is better: OR Answer depends on value of N: N
1 120 37
2 511 71
3 1374 159
4 2895 397
5 5260 1073
6 8655 3051
7 13266 8923
8 19279 26465
9 26880 79005
10 36255 236527
15243150 32 NNNNNN 34213 2
15243150 32 NNNNNN 34213 2
N
1 37 12 32.4
2 71 36 50.7
3 159 108 67.9
4 397 324 81.6
5 1073 972 90.6
6 3051 2916 95.6
7 8923 8748 98.0
8 26465 26244 99.2
9 79005 78732 99.7
10 236527 236196 99.9
◦ One term dominated the sum
NNN 34213 2 N34 ofTotal%
Function 10 100 1000 10000 100000
N2log 3 6 9 13 16
N 10 100 1000 10000 100000
NN 2log 30 664 9965 510 610
2N 210 410 610 810 1010
3N 310 610 910 1210 1510
N2 310 3010 30110 301010 3010310
Measure speed with respect to the part of the sum that grows quickest
Ordering:
15243150 32 NNN
NNN 34213 2
NNNNNNNN 32loglog1 32
22
Furthermore, simply ignore any constants in front of term and simply report general class of the term:
grows proportionally to
grows proportionally to When comparing algorithms, determine formulas
to count operation(s) of interest, then compare dominant terms of formulas
331N N34 NN 2log15
15243150 32 NNN
NNN 34213 2
3N
N3
Algorithm A requires time proportional to f(N) - algorithm is said to be of order f(N) or O(f(N))
Definition: an algorithm is said to take time proportional to O(f(N)) if there is some constant C such that for all but a finite number of values of N, the time taken by the algorithm is less than C*f(N)
Examples: If an algorithm is O(f(N)), f(N) is said to be the growth-rate function of
the algorithm
15243150 32 NNN
NNN 34213 2
)( 3NO
)3( NO
Big-O notation: f(n) = O(g(n)) ◦ f(n) grows no faster than g(n) (worst case)
Little-o notation: f(n) = o(g(n)) ◦ g(n) grows much faster than f(n)
Big-theta notation: f(n) = θ(g(n)) ◦ f(n) grows as fast as g(n)
Big-Omega notation: f(n) = Ω(g(n)) ◦ g(n) is the lower bound of f(n) (Best case)
In general a function ◦ f(n) is (g(n)) if positive constants c and n0 such
that 0 cg(n) f(n) n n0
A function f(n) is (g(n)) if positive constants c1, c2, and n0 such that c1 g(n) f(n) c2 g(n) n n0
f(n) is (g(n)) iff f(n) is both O(g(n)) and (g(n))
0
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
f(n) = n
f(n) = log(n)
f(n) = n log(n)
f(n) = n 2̂
f(n) = n 3̂
f(n) = 2 n̂
0
500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
f(n) = n
f(n) = log(n)
f(n) = n log(n)
f(n) = n 2̂
f(n) = n 3̂
f(n) = 2 n̂
0
1000
1 3 5 7 9 11 13 15 17 19
f(n) = n
f(n) = log(n)
f(n) = n log(n)
f(n) = n 2̂
f(n) = n 3̂
f(n) = 2 n̂
0
1000
2000
3000
4000
5000
1 3 5 7 9 11 13 15 17 19
f(n) = n
f(n) = log(n)
f(n) = n log(n)
f(n) = n 2̂
f(n) = n 3̂
f(n) = 2 n̂
1
10
100
1000
10000
100000
1000000
10000000
1 4 16 64 256 1024 4096 16384 65536
1. 3n3 + 90n2 – 2n +5 = O(n3 )
2. 2n2 + 3n +1000000 = (n2)
3. 2n = o(n2)
4. 3n2 = O(n2) tighter (n2)
5. n log n = O(n2)
6. True or false:
– n2 = O(n3 )
– n3 = O(n2)
– 2n+1= O(2n)
– (n+1)! = O(n!)
1 n n log n n2 nk
(3/2)n 2n (n)! (n+1)!
Rule 1 – For Loops The running time of a for loop is at most the running
time of the statement inside the for loop (including tests) times the number of iterations
Rule 2 – Nested Loops Analyze these inside out. The total running time of a
statement inside a group of nested loops is the running time of the statement multiplied by the product of the sizes of all the loops
Example 1:
sum = 0;
for (i=1; i <=n; i++)
sum += n;
Example 2: sum = 0;
for (j=1; j<=n; j++)
for (i=1; i<=j; i++)
sum++;
for (k=0; k<n; k++)
A[k] = k;
Algorithm 1: Sequential Search
int search(int A[], int N, int Num) {
int index = 0;
while ((index < N) && (A[index] < Num))
index++;
if ((index < N) && (A[index] == Num))
return index;
else
return -1;
}
Operations to count: how many times Num is compared to member of array Best-case: find the number we are looking for at the
first position in the array (1 + 1 = 2 comparisons) O(1)
Average-case: find the number on average half-way down the array (sometimes longer, sometimes shorter)
(N/2+1 comparisons) O(N)
Worst-case: have to compare Num to every element in the array (N + 1 comparisons) O(N)
int search(int A[], int N, int Num) {
int first = 0;
int last = N - 1;
int mid = (first + last) / 2;
while ((A[mid] != Num) && (first <= last)) {
if (A[mid] > Num)
last = mid - 1;
else
first = mid + 1;
mid = (first + last) / 2;
}
if (A[mid] == Num)
return mid;
else
return -1;
}
One comparison after loop First time through loop, toss half of array (2 comps) Second time, half remainder (1/4 original) 2 comps Third time, half remainder (1/8 original) 2 comps … Loop Iteration Remaining Elements 1 N/2 2 N/4 3 N/8 4 N/16 … ?? 1 How long to get to 1?
Looking at the problem in reverse, how long to double the number 1 until we get to N?
and solve for X two comparisons for each iteration, plus one
comparison at the end -- binary search takes in the worst case
Binary search is worst-case Sequential search is worst-case
XN 2
XN X )2(loglog 22
)(log 2 NO)(NO
1log2 2 N