data structures and algorithms lecture 5 and 6 instructor: quratulain date: 15 th and 18 th...
TRANSCRIPT
Data Structures and Algorithms
Lecture 5 and 6
Instructor: Quratulain
Date: 15th and 18th September, 2009
Faculty of Computer Science, IBA
2
AlgorithmsAn algorithm is any well-defined
procedure that take some value, or set values and produce output.
Input: Output:
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
3
Describing AlgorithmsA complete description of an
algorithm consists of three parts:
1.the algorithm(expressed in whatever way is clearest and most concise, can be English and / or pseudocode)
2.a proof of the algorithm’s correctness
3.a derivation of the algorithm’s running time
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
4
Correctness proofUse a loop invariant to understand why an
algorithm gives the correct answer.
Loop invariant A loop invariant is a relation among program
variables that is true when control enters a loop, remains true each time the program executes the body of the loop, and is still true when control exits the loop.
Understanding loop invariants can help us analyze programs, check for errors, and derive programs from specifications.
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
5
Correctness proof To proof correctness with a loop invariant we need to
show three things:
InitializationInvariant is true prior to the first iteration of the loop.
MaintenanceIf the invariant is true before an iteration of the loop, it remains true before the next iteration.
TerminationWhen the loop terminates, the invariant (usually along with the reason that the loop terminated) gives us a useful property that helps show that the algorithm is correct.
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
6
Examplepublic static int sum(int a[]) { int s = 0; for (int i = 0; i < a.length; i++) { // s is the sum of the first i array elements // s == a[0] + .. + a[i-1] The comments in the
loop body describe the invariant. s = s + a[i]; } return s; } The invariant expresses a relation between the three variables
s, i, and a that remains true even when the values of s and i change: s is always the sum of the first i elements in a.
When control enters the loop, i is 0 so s is the sum of no array elements. When control exits the loop, i is a.length, so s is the sum of all of the array elements. This is the intended result. CSE 246 Data Structures and Algorithms Fall2009
Quratulain
7
Analyzing AlgorithmIn this section, we examine a means of
analyzing the performance of an algorithm.
Usually we are interested in the amount of memory required by an algorithm – its space complexity – and the time taken for an algorithm to run – its time complexity.
In this course we will consider only the time complexity of algorithms.
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
8
Introduction to Time Complexity
An important question is: How efficient is an algorithm or piece of code? Efficiency covers lots of resources, including:
CPU (time) usage memory usage disk usage network usage
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
9
Introduction to Time Complexity
Be careful to differentiate between:◦ Performance: how much
time/memory/disk/... is actually used when a program is run. This depends on the machine, compiler, etc. as well as the code.
◦ Complexity: how do the resource requirements of a program or algorithm scale, i.e., what happens as the size of the problem being solved gets larger.
Complexity affects performance but not the other way around.CSE 246 Data Structures and Algorithms Fall2009
Quratulain
10
Introduction to Time ComplexityTo do this exactly, we should count the
number of CPU cycles it takes to perform each operation in the algorithm.◦ Needless to say, this would be a bit tedious
and is not a very practical approach.
Instead we will make a simplification. We will count the number of times simple statements are executed.
By simple statement we mean a statement that is executed in an amount of time T.
we are usually interested in the worst case: what are the max operations that might be performed for a given problem size (other cases -- best case and average case)
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
11
Analysis of AlgorithmsDilemma: you have two (or more)
methods to solve problem, how to choose the BEST?
One approach: implement each algorithm in C, test how long each takes to run.
Problems:◦ Different implementations may cause an
algorithm to run faster/slower◦ Some algorithms run faster on some
computers◦ Algorithms may perform differently
depending on data (e.g., sorting often depends on what is being sorted)
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
12
Analysis of AlgorithmsAnalysis
Concept of "best"What to measure
Types of "best”best-, average-, worst-case
Comparison methodsBig-O analysis
ExamplesSequential search and binary search
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
13
Analysis: A Better ApproachIdea: characterize performance in terms
of key operation(s)◦ Sorting:
count number of times two values compared count number of times two values swapped
◦ Search: count number of times value being searched for
is compared to values in array
◦ Recursive function: count number of recursive calls
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
14
Analysis in GeneralWant to comment on the “general”
performance of the algorithm◦ Measure for several examples, but what
does this tell us in general?
Instead, assess performance in an abstract manner
Idea: analyze performance as size of problem grows
Examples:◦ Sorting: how many comparisons for array
of size N?◦ Searching: #comparisons for array of size
N
May be difficult to discover a reasonable formula
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
15
Analysis where Results VaryTypes of analyses:
◦ Best-case: what is the fastest an algorithm can run for a problem of size N?
◦ Average-case: on average how fast does an algorithm run for a problem of size N?
◦ Worst-case: what is the longest an algorithm can run for a problem of size N?
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
16
How to Compare Formulas?Which is better:
or
Answer depends on value of N:N1 120 372 511 713 1374 1594 2895 3975 5260 10736 8655 30517 13266 89238 19279 264659 26880 79005
10 36255 236527
15243150 32 NNNNNN 34213 2
15243150 32 NNN NNN 34213 2
17
What Happened?
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
N1 37 12 32.42 71 36 50.73 159 108 67.94 397 324 81.65 1073 972 90.66 3051 2916 95.67 8923 8748 98.08 26465 26244 99.29 79005 78732 99.710 236527 236196 99.9
◦ One term dominated the sum
NNN 34213 2 N34 ofTotal%
18
As N Grows, Some Terms Dominate
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
Function 10 100 1000 10000 100000
N2log 3 6 9 13 16
N 10 100 1000 10000 100000
NN 2log 30 664 9965 510 610
2N 210 410 610 810 10103N 310 610 910 1210 1510N2 310 3010 30110 301010 3010310
19
Order of Magnitude Analysis
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
Measure speed with respect to the part of the sum that grows quickest
Ordering:
15243150 32 NNN
NNN 34213 2
NNNNNNNN 32loglog1 3222
20
Order of Magnitude Analysis (cont)
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
Furthermore, simply ignore any constants in front of term and simply report general class of the term:
grows proportionally to
grows proportionally toWhen comparing algorithms, determine
formulas to count operation(s) of interest, then compare dominant terms of formulas
331N N34 NN 2log15
15243150 32 NNN
NNN 34213 2
3N
N3
21
Analyzing Running Time
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
1. n = read input from user2. sum = 03. i = 04. while i < n 5. number = read input from user6. sum = sum + number7. i = i + 18. mean = sum / n
T(n), or the running time of a particular algorithm on input of size n, is taken to be the number of times the instructions in the algorithm are executed. Pseudo code algorithm illustrates the calculation of the mean (average) of a set of n numbers:
Statement Number of times executed1 12 13 14 n+15 n6 n7 n8 1
The computing time for this algorithm in terms on input size n is: T(n) = 4n + 5.
22
Big-O Notation Computer scientists like to
categorize algorithms using Big-O notation.
If you tell a computer scientist that they can choose between two algorithms:◦one whose time complexity is O( N )◦another that is O( N2 ),
then the O( N ) algorithm will likely be chosen. Let’s find out why…CSE 246 Data Structures and Algorithms Fall2009
Quratulain
23
Big-O Notation Let’s consider a function on N, T(N) N is a
measure of the size of the problem we try to solve, e.g., Definition: T(N) is O( g(N) ) if: So, if T(N) is O( g(N) ) then T(N) grows no faster than g(N).
Example 1Suppose, for example, that T is the time
taken to run an algorithm to process data in an array of length N. We expect that T will be a function of N and hence write:
If we now say that T(N) is O(N) we are saying that T(N) grows no faster than:
This is good news – it tells us that if we double the size of the array, we can expect that the time taken to run the algorithm will double (roughly speaking).
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
24
Example
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
Suppose f(n) = n2 + 3n - 1. We want to show that f(n) = O(n2). f(n) = n2 + 3n - 1
< n2 + 3n (subtraction makes things smaller so drop it) <= n2 + 3n2 (since n <= n2 for all integers n)
= 4n2
Therefore, if C = 4, we have shown that f(n) = O(n2). Notice that all we are doing is finding a simple function that is an upper bound on the original function. Because of this, we could also say that
This would be a much weaker description, but it is still valid.
f(n) = O(n3) since (n3) is an upper bound on n2
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
25
Example 1 Searching Sorted ArrayAlgorithm 1: Sequential Search
int search(int A[], int N, int Num) { int index = 0;
while ((index < N) && (A[index] < Num)) index++;
if ((index < N) && (A[index] == Num)) return index; else return -1;}
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
26
Analyzing Search Algorithm 1Operations to count: how many times Num is
compared to member of arrayOne after the loop each time plus ...Best-case: find the number we are looking for at the
first position in the array (1 + 1 = 2 comparisons) O(1)
Average-case: find the number on average half-way down the array (sometimes longer, sometimes shorter)
(N/2+1 comparisons) O(N)
Worst-case: have to compare Num to very element in the array (N + 1 comparisons) O(N)
27
Search Algorithm 2: Binary Searchint search(int A[], int N, int Num) { int first = 0; int last = N - 1; int mid = (first + last) / 2; while ((A[mid] != Num) && (first <= last)) {
if (A[mid] > Num) last = mid - 1; else first = mid + 1; mid = (first + last) / 2; } if (A[mid] == Num) return mid; else return -1;}
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
28
Analyzing Binary SearchOne comparison after loopFirst time through loop, toss half of array (2
comps)Second time, half remainder (1/4 original) 2
compsThird time, half remainder (1/8 original) 2 comps…Loop Iteration Remaining Elements
1 N/2 2 N/4 3 N/8 4 N/16… ?? 1How long to get to 1?
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
29
Analyzing Binary Search (cont)
Looking at the problem in reverse, how long to double the number 1 until we get to N?
and solve for X
two comparisons for each iteration, plus one comparison at the end -- binary search takes in the worst case
Binary search is worst-caseSequential search is worst-case
XN 2XN X )2(loglog 22
)(log2 NO)(NO
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
30
Big-Oh Operations
Summation RuleSuppose T1(n) = O(f1(n)) and T2(n) = O(f2(n)). Further, suppose that f2 grows no faster than f1, i.e., f2(n) = O(f1(n)). Then, we can conclude that T1(n) + T2(n) = O(f1(n)). More generally, the summation rule tells us O(f1(n) + f2(n)) = O(max(f1(n), f2(n))).
Proof:
Suppose that C and C' are constants such that T1(n) <= C * f1(n) and T2(n) <= C' * f2(n). Let D = the larger of C and C'. Then,
T1(n) + T2(n) <= C * f1(n) + C' * f2(n)<= D * f1(n) + D * f2(n)<= D * (f1(n) + f2(n))<= O(f1(n) + f2(n))
CSE 246 Data Structures and Algorithms Fall2009 Quratulain
31
Big-Oh Operations
Product RuleSuppose T1(n) = O(f1(n)) and T2(n) = O(f2(n)). Then, we can conclude thatT1(n) * T2(n) = O(f1(n) * f2(n)).
The Product Rule can be proven using a similar strategy as the Summation Rule proof.
Analyzing Some Simple Programs (with No Sub-program Calls)
General Rules:
• All basic statements (assignments, reads, writes, conditional testing, library calls) run in constant time: O(1).
• The time to execute a loop is the sum, over all times around the loop, of the time to execute all the statements in the loop, plus the time to evaluate the condition for termination. Evaluation of basic termination conditions is O(1) in each iteration of the loop.
• The complexity of an algorithm is determined by the complexity of the most frequently executed statements. If one set of statements have a running time of O(n3) and the rest are O(n), then the complexity of the algorithm is O(n3). This is a result of the Summation Rule.
QuestionSuppose you have two
algorithms (Algorithm A and Algorithm B) and that both algorithms are O(N). Does this mean that they both take the same amount of time to run?
ExampleWe will now look at various Java code segments
and analyze the time complexity of each:
int count = 0;int sum = 0;while( count < N ){sum += count;System.out.println( sum );count++;}
Total number of constant time statements executed...?
Exampleint count = 0;int sum = 0;
while( count < N ){int index = 0;while( index < N ){sum += index * count;System.out.println( sum );index++;}count++;}
Total number of constant time statements executed ….?
Exampleint count = 0;
while( count < N ){int index = 0;while( index < count + 1 ){sum++;index++;}count++;}
Total number of constant time statements executed…?
Example
int count = N;int sum = 0;
while( count > 0 ){sum += count;count = count / 2;}
Total number of constant time statements executed…?