data structures and algorithms lecture 5 and 6 instructor: quratulain date: 15 th and 18 th...

36
Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

Upload: anthony-cross

Post on 29-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

Data Structures and Algorithms

Lecture 5 and 6

Instructor: Quratulain

Date: 15th and 18th September, 2009

Faculty of Computer Science, IBA

Page 2: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

2

AlgorithmsAn algorithm is any well-defined

procedure that take some value, or set values and produce output.

Input: Output:

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 3: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

3

Describing AlgorithmsA complete description of an

algorithm consists of three parts:

1.the algorithm(expressed in whatever way is clearest and most concise, can be English and / or pseudocode)

2.a proof of the algorithm’s correctness

3.a derivation of the algorithm’s running time

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 4: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

4

Correctness proofUse a loop invariant to understand why an

algorithm gives the correct answer.

Loop invariant A loop invariant is a relation among program

variables that is true when control enters a loop, remains true each time the program executes the body of the loop, and is still true when control exits the loop.

Understanding loop invariants can help us analyze programs, check for errors, and derive programs from specifications.

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 5: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

5

Correctness proof To proof correctness with a loop invariant we need to

show three things:

InitializationInvariant is true prior to the first iteration of the loop.

MaintenanceIf the invariant is true before an iteration of the loop, it remains true before the next iteration.

TerminationWhen the loop terminates, the invariant (usually along with the reason that the loop terminated) gives us a useful property that helps show that the algorithm is correct.

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 6: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

6

Examplepublic static int sum(int a[]) { int s = 0; for (int i = 0; i < a.length; i++) { // s is the sum of the first i array elements // s == a[0] + .. + a[i-1] The comments in the

loop body describe the invariant. s = s + a[i]; } return s; } The invariant expresses a relation between the three variables

s, i, and a that remains true even when the values of s and i change: s is always the sum of the first i elements in a.

When control enters the loop, i is 0 so s is the sum of no array elements. When control exits the loop, i is a.length, so s is the sum of all of the array elements. This is the intended result. CSE 246 Data Structures and Algorithms Fall2009

Quratulain

Page 7: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

7

Analyzing AlgorithmIn this section, we examine a means of

analyzing the performance of an algorithm.

Usually we are interested in the amount of memory required by an algorithm – its space complexity – and the time taken for an algorithm to run – its time complexity.

In this course we will consider only the time complexity of algorithms.

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 8: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

8

Introduction to Time Complexity

An important question is: How efficient is an algorithm or piece of code? Efficiency covers lots of resources, including:

CPU (time) usage memory usage disk usage network usage

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 9: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

9

Introduction to Time Complexity

Be careful to differentiate between:◦ Performance: how much

time/memory/disk/... is actually used when a program is run. This depends on the machine, compiler, etc. as well as the code.

◦ Complexity: how do the resource requirements of a program or algorithm scale, i.e., what happens as the size of the problem being solved gets larger.

Complexity affects performance but not the other way around.CSE 246 Data Structures and Algorithms Fall2009

Quratulain

Page 10: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

10

Introduction to Time ComplexityTo do this exactly, we should count the

number of CPU cycles it takes to perform each operation in the algorithm.◦ Needless to say, this would be a bit tedious

and is not a very practical approach.

Instead we will make a simplification. We will count the number of times simple statements are executed.

By simple statement we mean a statement that is executed in an amount of time T.

we are usually interested in the worst case: what are the max operations that might be performed for a given problem size (other cases -- best case and average case)

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 11: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

11

Analysis of AlgorithmsDilemma: you have two (or more)

methods to solve problem, how to choose the BEST?

One approach: implement each algorithm in C, test how long each takes to run.

Problems:◦ Different implementations may cause an

algorithm to run faster/slower◦ Some algorithms run faster on some

computers◦ Algorithms may perform differently

depending on data (e.g., sorting often depends on what is being sorted)

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 12: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

12

Analysis of AlgorithmsAnalysis

Concept of "best"What to measure

Types of "best”best-, average-, worst-case

Comparison methodsBig-O analysis

ExamplesSequential search and binary search

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 13: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

13

Analysis: A Better ApproachIdea: characterize performance in terms

of key operation(s)◦ Sorting:

count number of times two values compared count number of times two values swapped

◦ Search: count number of times value being searched for

is compared to values in array

◦ Recursive function: count number of recursive calls

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 14: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

14

Analysis in GeneralWant to comment on the “general”

performance of the algorithm◦ Measure for several examples, but what

does this tell us in general?

Instead, assess performance in an abstract manner

Idea: analyze performance as size of problem grows

Examples:◦ Sorting: how many comparisons for array

of size N?◦ Searching: #comparisons for array of size

N

May be difficult to discover a reasonable formula

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 15: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

15

Analysis where Results VaryTypes of analyses:

◦ Best-case: what is the fastest an algorithm can run for a problem of size N?

◦ Average-case: on average how fast does an algorithm run for a problem of size N?

◦ Worst-case: what is the longest an algorithm can run for a problem of size N?

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 16: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

16

How to Compare Formulas?Which is better:

or

Answer depends on value of N:N1 120 372 511 713 1374 1594 2895 3975 5260 10736 8655 30517 13266 89238 19279 264659 26880 79005

10 36255 236527

15243150 32 NNNNNN 34213 2

15243150 32 NNN NNN 34213 2

Page 17: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

17

What Happened?

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

N1 37 12 32.42 71 36 50.73 159 108 67.94 397 324 81.65 1073 972 90.66 3051 2916 95.67 8923 8748 98.08 26465 26244 99.29 79005 78732 99.710 236527 236196 99.9

◦ One term dominated the sum

NNN 34213 2 N34 ofTotal%

Page 18: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

18

As N Grows, Some Terms Dominate

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Function 10 100 1000 10000 100000

N2log 3 6 9 13 16

N 10 100 1000 10000 100000

NN 2log 30 664 9965 510 610

2N 210 410 610 810 10103N 310 610 910 1210 1510N2 310 3010 30110 301010 3010310

Page 19: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

19

Order of Magnitude Analysis

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Measure speed with respect to the part of the sum that grows quickest

Ordering:

15243150 32 NNN

NNN 34213 2

NNNNNNNN 32loglog1 3222

Page 20: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

20

Order of Magnitude Analysis (cont)

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Furthermore, simply ignore any constants in front of term and simply report general class of the term:

grows proportionally to

grows proportionally toWhen comparing algorithms, determine

formulas to count operation(s) of interest, then compare dominant terms of formulas

331N N34 NN 2log15

15243150 32 NNN

NNN 34213 2

3N

N3

Page 21: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

21

Analyzing Running Time

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

1. n = read input from user2. sum = 03. i = 04. while i < n 5. number = read input from user6. sum = sum + number7. i = i + 18. mean = sum / n

T(n), or the running time of a particular algorithm on input of size n, is taken to be the number of times the instructions in the algorithm are executed. Pseudo code algorithm illustrates the calculation of the mean (average) of a set of n numbers:

Statement Number of times executed1 12 13 14 n+15 n6 n7 n8 1

The computing time for this algorithm in terms on input size n is: T(n) = 4n + 5.

Page 22: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

22

Big-O Notation Computer scientists like to

categorize algorithms using Big-O notation.

If you tell a computer scientist that they can choose between two algorithms:◦one whose time complexity is O( N )◦another that is O( N2 ),

then the O( N ) algorithm will likely be chosen. Let’s find out why…CSE 246 Data Structures and Algorithms Fall2009

Quratulain

Page 23: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

23

Big-O Notation Let’s consider a function on N, T(N) N is a

measure of the size of the problem we try to solve, e.g., Definition: T(N) is O( g(N) ) if: So, if T(N) is O( g(N) ) then T(N) grows no faster than g(N).

Example 1Suppose, for example, that T is the time

taken to run an algorithm to process data in an array of length N. We expect that T will be a function of N and hence write:

If we now say that T(N) is O(N) we are saying that T(N) grows no faster than:

This is good news – it tells us that if we double the size of the array, we can expect that the time taken to run the algorithm will double (roughly speaking).

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 24: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

24

Example

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Suppose f(n) = n2 + 3n - 1. We want to show that f(n) = O(n2). f(n) = n2 + 3n - 1

< n2 + 3n (subtraction makes things smaller so drop it) <= n2 + 3n2 (since n <= n2 for all integers n)

= 4n2

Therefore, if C = 4, we have shown that f(n) = O(n2). Notice that all we are doing is finding a simple function that is an upper bound on the original function. Because of this, we could also say that

This would be a much weaker description, but it is still valid.

f(n) = O(n3) since (n3) is an upper bound on n2

Page 25: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

25

Example 1 Searching Sorted ArrayAlgorithm 1: Sequential Search

int search(int A[], int N, int Num) { int index = 0;

while ((index < N) && (A[index] < Num)) index++;

if ((index < N) && (A[index] == Num)) return index; else return -1;}

Page 26: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

26

Analyzing Search Algorithm 1Operations to count: how many times Num is

compared to member of arrayOne after the loop each time plus ...Best-case: find the number we are looking for at the

first position in the array (1 + 1 = 2 comparisons) O(1)

Average-case: find the number on average half-way down the array (sometimes longer, sometimes shorter)

(N/2+1 comparisons) O(N)

Worst-case: have to compare Num to very element in the array (N + 1 comparisons) O(N)

Page 27: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

27

Search Algorithm 2: Binary Searchint search(int A[], int N, int Num) { int first = 0; int last = N - 1; int mid = (first + last) / 2; while ((A[mid] != Num) && (first <= last)) {

if (A[mid] > Num) last = mid - 1; else first = mid + 1; mid = (first + last) / 2; } if (A[mid] == Num) return mid; else return -1;}

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

Page 28: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

28

Analyzing Binary SearchOne comparison after loopFirst time through loop, toss half of array (2

comps)Second time, half remainder (1/4 original) 2

compsThird time, half remainder (1/8 original) 2 comps…Loop Iteration Remaining Elements

1 N/2 2 N/4 3 N/8 4 N/16… ?? 1How long to get to 1?

Page 29: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

29

Analyzing Binary Search (cont)

Looking at the problem in reverse, how long to double the number 1 until we get to N?

and solve for X

two comparisons for each iteration, plus one comparison at the end -- binary search takes in the worst case

Binary search is worst-caseSequential search is worst-case

XN 2XN X )2(loglog 22

)(log2 NO)(NO

Page 30: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

30

Big-Oh Operations

Summation RuleSuppose T1(n) = O(f1(n)) and T2(n) = O(f2(n)). Further, suppose that f2 grows no faster than f1, i.e., f2(n) = O(f1(n)). Then, we can conclude that T1(n) + T2(n) = O(f1(n)). More generally, the summation rule tells us O(f1(n) + f2(n)) = O(max(f1(n), f2(n))).

Proof:

Suppose that C and C' are constants such that T1(n) <= C * f1(n) and T2(n) <= C' * f2(n). Let D = the larger of C and C'. Then,

T1(n) + T2(n) <= C * f1(n) + C' * f2(n)<= D * f1(n) + D * f2(n)<= D * (f1(n) + f2(n))<= O(f1(n) + f2(n))

Page 31: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

CSE 246 Data Structures and Algorithms Fall2009 Quratulain

31

Big-Oh Operations

Product RuleSuppose T1(n) = O(f1(n)) and T2(n) = O(f2(n)). Then, we can conclude thatT1(n) * T2(n) = O(f1(n) * f2(n)).

The Product Rule can be proven using a similar strategy as the Summation Rule proof.

Analyzing Some Simple Programs (with No Sub-program Calls)

General Rules:

• All basic statements (assignments, reads, writes, conditional testing, library calls) run in constant time: O(1).

• The time to execute a loop is the sum, over all times around the loop, of the time to execute all the statements in the loop, plus the time to evaluate the condition for termination. Evaluation of basic termination conditions is O(1) in each iteration of the loop.

• The complexity of an algorithm is determined by the complexity of the most frequently executed statements. If one set of statements have a running time of O(n3) and the rest are O(n), then the complexity of the algorithm is O(n3). This is a result of the Summation Rule.

Page 32: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

QuestionSuppose you have two

algorithms (Algorithm A and Algorithm B) and that both algorithms are O(N). Does this mean that they both take the same amount of time to run?

Page 33: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

ExampleWe will now look at various Java code segments

and analyze the time complexity of each:

int count = 0;int sum = 0;while( count < N ){sum += count;System.out.println( sum );count++;}

Total number of constant time statements executed...?

Page 34: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

Exampleint count = 0;int sum = 0;

while( count < N ){int index = 0;while( index < N ){sum += index * count;System.out.println( sum );index++;}count++;}

Total number of constant time statements executed ….?

Page 35: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

Exampleint count = 0;

while( count < N ){int index = 0;while( index < count + 1 ){sum++;index++;}count++;}

Total number of constant time statements executed…?

Page 36: Data Structures and Algorithms Lecture 5 and 6 Instructor: Quratulain Date: 15 th and 18 th September, 2009 Faculty of Computer Science, IBA

Example

int count = N;int sum = 0;

while( count > 0 ){sum += count;count = count / 2;}

Total number of constant time statements executed…?