object (data and algorithm) analysis cmput 115 - lecture 5 department of computing science...

28
Object (Data and Algorithm) Analysis Cmput 115 - Lecture 5 Department of Computing Science University of Alberta ©Duane Szafron 1999 Some code in this lecture is based on code from the book: Java Structures by Duane A. Bailey or the companion structure package Revised 12/31/99

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Object (Data and Algorithm) Analysis

Cmput 115 - Lecture 5

Department of Computing Science

University of Alberta©Duane Szafron 1999

Some code in this lecture is based on code from the book:Java Structures by Duane A. Bailey or the companion structure package

Revised 12/31/99

©Duane Szafron 1999

2

About This LectureAbout This Lecture

In this lecture we will learn how to compare the relative space and time efficiencies of different software solutions.

©Duane Szafron 1999

3

OutlineOutline

Abstraction

Implementation choice criteria

Time and space complexity

Asymptotic analysis

Worst, best and average case complexity

©Duane Szafron 1999

4

Abstraction in Software SystemsAbstraction in Software Systems

A software system is an abstract model of a real-world phenomena.

It is abstract since some of the properties of the real world are ignored in the system.

For example, a banking system does not usually model the height and weight of its customers, even though it represents its customers in the software.

The Customer class is only an abstraction of a real customer, with details missing.

©Duane Szafron 1999

5

Design ChoicesDesign Choices

When a software system is designed, the designer has many choices about what abstractions to make.

For example, in a banking system, the designer can choose to design a Customer object that knows its client number and password.

Alternately, each Customer object could know its BankCard object which knows its client number and password.

©Duane Szafron 1999

6

Implementation ChoicesImplementation Choices

After a software system is designed, the programmer has many choices to make when implementing classes.

For example, in a banking system, should an Account object remember an Array of Transactions or a Vector of Transactions?

For example, should a sort method use a selection sort or a merge sort?

In this course, we will focus on choices at the class and method level.

©Duane Szafron 1999

7

Choice CriteriaChoice Criteria

What criteria should be used to make a choice at the implementation level?

Historically, these two criteria have been considered the most important:

– The amount of space that an object uses and the amount of temporary storage used by its methods.

– The amount of time that a method takes to run.

©Duane Szafron 1999

8

Other Choice CriteriaOther Choice Criteria

However, we should also consider:– The simplicity of an object representation.– The simplicity of an algorithm that is implemented as a

method. Why?

These criteria are important because they directly affect:– Amount of time it takes to implement a class.– Number of defects (bugs) in classes.– Amount of time it takes to update a class when

requirements change.

©Duane Szafron 1999

9

Comparing MethodsComparing Methods

What is better, a selection sort or a merge sort?

– We can implement each sort as a method.

– We can count the amount of storage space each

method requires for sample data.

– We can time the two methods on sample data and see

which one is faster.

– We could time how long it takes a programmer to write

the code for each method.

– We can count the defects in the two implementations.

©Duane Szafron 1999

10

Comparing Average MethodsComparing Average Methods

Unfortunately, our answers would depend on the sample data and the individual programmer chosen.

To circumvent this problem, we could compute averages over many sample data sets and over many programmers.

This would be a long and expensive process.

We need a way to predict the answers by analysis rather than experimentation.

11

©Duane Szafron 1999

Time and Space ComplexityTime and Space ComplexityTime and Space ComplexityTime and Space Complexity

The minimum time and space requirements of a method can be determined from the algorithm.

Since these values depend on the data size, we often describe them as a function of the data size (referred to as n).

These functions are called respectively, the time complexity, time(n), and space complexity, space(n), of the algorithm.

©Duane Szafron 1999

12

Space ComplexitySpace Complexity It is easy to compute the space complexity for many

algorithms:

– A selection sort of one word elements has space complexity: space(n) = n + 1 words

• it takes n words to store the elements.• It takes one extra word for swapping elements.

– A merge sort of one word elements has space complexity: space(n) = 2n words

• It takes n words to store the elements.• It takes n words for swapping elements during merges.

©Duane Szafron 1999

13

Time Complexity - OperationsTime Complexity - Operations The time complexity of an algorithm is more difficult to

compute.– The time depends on the number and kind of machine

instructions that are generated.– The time depends on the time it takes to run machine instructions.

One approach is to pick some basic operation and count these operations.– For example, if we count addition operations in an algorithm that

adds the elements of a list, the time complexity for a list with n elements is: n.

©Duane Szafron 1999

14

Data Size is ImportantData Size is Important

If we are sorting 100 values, it probably doesn’t matter which sort we use:– The time and space complexities are irrelevant.– However, the simplicity may be a factor.

If we are sorting 20,000 values, it may matter: – 1 sec (SS) versus 16 sec (MS) on current hardware.– 20,001 words (SS) versus 40,000 words (MS).– The simplicity difference is still the same.

If we are sorting 1,000,000 values, it does matter.

n2 = 1,000,000,000,000 nlog2n= 1,000,000*20=20,000,000

©Duane Szafron 1999

15

Asymptotic AnalysisAsymptotic Analysis We are often only interested in the time and space

complexities for large data sizes, n.

That is, we are interested in the asymptotic behavior of time(n) and space(n) in the limit as n goes to infinity.

A function f(n) is O(g(n)) “big-O” if and only if:limit {f(n)/g(n)} = c, where c is a constantn

If you are uncomfortable with limits then the textbook has an alternate definition.

©Duane Szafron 1999

16

Constant Time - O(1) AlgorithmsConstant Time - O(1) Algorithms The time to assign a value to an array is:

– It takes a constant time, say c1.

– time(n) is O(1) since

limit {c1 / 1} = c1.n

The time to add two integers stored in memory is:– Accessing each int takes a constant time, say c1.

– Adding the two ints takes a constant time, say c2.

– time(n) is O(1) since

limit {(c1 + c2) / 1} = c1 + c2. n

myArray[0] = 1;

first + second

©Duane Szafron 1999

17

Linear Time - O(n) AlgorithmsLinear Time - O(n) Algorithms

The time to add the values in an array is linear:

– Initializing the sum and index to zero takes c1.

– Each addition and store takes c2.

– Each loop index increment and comparison to the stop index takes c3.

– time(n) is O(n) since

limit {(c1 + n* c2 + n* c3)/ n} = c2 + c3.

n sum = 0;for (i = 0; i < list.length; i++)

sum = sum + list[i];

©Duane Szafron 1999

18

Polynomial Time - O(nPolynomial Time - O(ncc) Algorithms) Algorithms

The time to add up the elements in a square matrix of size n is quadratic (nc with c = 2):

– Initializing the running sum to zero takes c1.– Each addition takes c2.– Each outer loop index modification takes c3.– Each inner loop index modification takes c3.– time(n) is O(n2) since

limit {(c1+n2*c2 + (n+n2)*c3)/ n2} = c2 + c3.

n sum = 0;for (row = 0; row < aMatrix.height(); row++)

for (column = 0; column < aMatrix.height(); column++)sum = sum + aMatrix.elementAt(row, column);

©Duane Szafron 1999

19

Exponential Time - O(cExponential Time - O(cnn) Algorithms) Algorithms

The time to print each element of a [full] binary tree of depth n is exponential (cn with c = 2):

– Each print takes c1.– Each left node and right node access takes c2.– Each stop check takes c3.– time(n) is O(2n) since

limit {(c1*(2n-1)+c2*(2n-2)+c3*(2n-2) )/ 2n} = c1+c2+c3.

n System.out.println(aNode.value());leftNode = aNode.left();if (leftNode != null) leftNode.print();rightNode = aNode.right();if (rightNode != null) rightNode.print();

The number of nodesin the tree is 2n - 1

©Duane Szafron 1999

20

Variable Time AlgorithmsVariable Time Algorithms

The time complexity of many algorithms depends not only on the size of the data (n), but also on the specific data values:

– The time to find a key value in an indexed container depends on the location of the key in the container.

– The time to sort a container depends on the initial order of the elements.

©Duane Szafron 1999

21

Worst, Best and Average ComplexityWorst, Best and Average Complexity

We define three common cases.

– The best case complexity is the smallest value for any problem of size n.

– The worst case complexity is the largest value for any problem of size n.

– The average case complexity is the weighted average of all problems of size n, where the weight of a problem is the probability that the problem occurs.

©Duane Szafron 1999

22

35 35 35 35 35 3535 35 35 35 35

25 50 10 95 75 30 70 55 60 800 1 2 3 4 5 6 7 8 9 Index

Value

Worst Case Complexity ExampleWorst Case Complexity Example

Assume we are doing a sequential search on a container of size n.

The worst case complexity is when the element is not in the container.

35

The worst case complexity is O(n).

©Duane Szafron 1999

23

Best Case Complexity ExampleBest Case Complexity Example

Assume we are doing a sequential search on a container of size n.

The best case complexity is when the element is the first one.

25

25 50 10 95 75 30 70 55 60 800 1 2 3 4 5 6 7 8 9

The best case complexity is O(1).

©Duane Szafron 1999

24

Average Case Complexity ExampleAverage Case Complexity Example

To compute the average case complexity, we must make assumptions about the probability distribution of the data sets. – Assume a sequential search on a container of size n.

– Assume a 50% chance that the key is in the container.

– Assume equal probabilities that the key is at any position.

The probability that the key is at any particular location in the container is (1/2) * (1/n) = 1/2n.

The average case complexity is:1*(1/2n) + 2*(1/2n) + … + n*(1/2n) + n*(1/2) =

(1/2n)*(1 + 2 + … + n) + (n/2) = (1/2n)*(n*(n+1)/2) + (n/2) =

(n+1)/4 + (n/2) = (3n + 1)/4 = O(n)

©Duane Szafron 1999

25

Trading Time and SpaceTrading Time and Space It is often possible to construct two different

algorithms, where one is faster but takes more space.

For example:– A selection sort has worst case time complexity O(n2)

while using n+1 words of space.

– A merge sort has worst case time complexity O(n*log(n)) while using 2*n words of space.

Analysis lets us choose the best algorithm based on our time and space requirements.

©Duane Szafron 1999

26

PrinciplesPrinciples

Big O analysis generally is only applicablefor large n!

(n)

©Duane Szafron 1999

27

Trace of Selection SortTrace of Selection Sort

Unsorted Pass Number(i) Sorted

J Kj 1 2 3 4 5 6 7

12345678

4223741165589436

1123744265589436

1123744265589436

1123364265589474

1123364265589474

1123364258659474

1123364258659474

1123364258657494

space(n)= n + 1 time(n)=n+(n-1)+(n-2)+…+1= n(n+1)/2= O(n2)

©Duane Szafron 1999

28

Trace of Merge SortTrace of Merge Sort

Unsorted Pass Number(i) Sorted

J Kj 1 2 3 4 5 6 7

12345678

4223741165589436

1123427436586594

1123364258657494

space(n)= 2ntime(n)=1+2+4+…+2log(n-1)+n*1+(n/2)*2+(n/4)*4 +...+1*n = n + n*log(n) = O(n*logn)

4223741165589436

4223741165589436

4223741165589436

2342117458653694

Recallx=2n =>log2x=n

8=23 =>log28=3

DECOMPOSE MERGE