cs503: fourteenth lecture, fall 2008 amortized analysis, sets michael barnathan

25
CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Post on 22-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

CS503: Fourteenth Lecture, Fall 2008Amortized Analysis, Sets

Michael Barnathan

Page 2: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

A Preliminary Note

• The withdrawal deadline is Nov. 4.• THAT IS ONE WEEK FROM TODAY.

• It’s also election day, so if you wish to both vote and withdraw from a course, you may wish to plan ahead.

Page 3: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Here’s what we’ll be learning:• Theory:

– Amortized Analysis– More complex recurrences.

• Data Structures:– Disjoint sets and unions (very quick overview).

• Java:– Sets and Multisets.

Page 4: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Traditional Asymptotic Analysis• Looks at the behavior of one operation.

– One insertion takes O(n) time…– One search takes O(log n) time…– One, one, one.

• If every operation takes the same amount of time, this is perfectly fine. We can figure out the cost of the sequence.– What is the total complexity of n operations, each taking time

proportional to O(n)?• However, this is not always the case.

– What about Vectors, which increase in size when they are filled?– Each insertion at the end takes O(1) time, until the array is full,

upon which the next insertion takes O(n) time.

Page 5: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Vector Doubling in Traditional Analysis

• Suppose we perform n insertions on a vector that employs the doubling strategy.

• In traditional analysis, every operation has the same cost. So what is the worst case cost of insertion into an array?– O(n), because in the worst case, we double.– This is despite the fact that most insertions take O(1) time, because

the majority do not double.• We perform n insertions, each taking O(n) time.• What is the bound upon the complexity?

– n * O(n) = O(n2).• This is clearly not a tight bound.

Page 6: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Amortization• Amortized analysis analyzes worst-case performance over a

sequence of operations.• It is not an average case analysis; it is an “average of worst

cases”.• Going back to Vector insertion:

– If we perform 6 insertions into a Vector of size 5, 5 of those insertions will take 1 unit of time. The sixth will take 6 units. (Both in the worst case)

– Since all 6 insertions will take 11 units of time, one insertion contributes roughly 2 time units, not on average, but in the worst case.

– 2 is a constant. We would expect constant time behavior on insert.– An individual insertion may take longer (the sixth insertion takes 6

units of time, for example), but it will make up for it by preparing subsequent insertions to run quickly (by doubling the array).

Page 7: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Methods of Amortization• There are three commonly employed amortized analysis

techniques.• From least to most formal:

– The Aggregate Method:• Count the number of time units across a sequence of operations and

divide by the number of operations.– The Accounting Method:

• Each operation “deposits” time, which is then used to “pay for” expensive operations.

– The Potential Method:• A “potential function” φ is defined based on the change in state brought

about by each operation and the difference in potential is added to the total cost (this difference may be negative).

• We won’t go into too much detail on this method.• Each method has its limitations.

Page 8: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

The Aggregate Method• This is the simplest method of analysis.• Simply add the worst-case cost of each operation in a

sequence up, then divide by the total number of operations in the sequence:

• The cost of each operation is very often defined asymptotically, not as a number.

• But that’s OK; O(n) means “a linear function of n”.• So O(n) / n = O(1), O(n2) / n = O(n), and so forth.

n

iCostostAmortizedC

n

i 1)(

Page 9: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

The Aggregate Method – Example:

Data 1 2 3Cost O(1) O(1) O(1)

So far, the amortized cost is [n * O(1)] / n = O(1).

Data 1 2 3 4 5 6 7Cost O(1) O(1) O(1) O(n) O(1) O(1) O(1)

The fourth insertion doubles the array, an O(n) operation.

Now the amortized cost is [(n-1) * O(1) + O(n)] / n = O(1) + O(1) = O(1).

Page 10: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Caveats

• The lack of formalism in the aggregate method has some consequences.

• Specifically, when using the aggregate method, be careful with your asymptotics!

• O(n) at the 4th insertion is very different from O(n) at the 32768th insertion!

• It is thus sometimes useful to define the elementary cost of inserting without doubling as simply “1” and to use exact numbers.

Page 11: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Again, with numbers.

Data 1 2 3Cost 1 1 1

So far, the amortized cost is [3 * 1] / 3 = 1.

Data 1 2 3 4 5 6 7Cost 1 1 1 4 1 1 1

The fourth insertion doubles the array, an O(n) operation.

Now the amortized cost is [6 * 1 + 4] / 7 = 10 / 7 = 1.43.

Page 12: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

How it Converges• It turns out that this is always constant-time, no matter how

many insertions you do:• For a sequence of n operations, the algorithm will double at n,

n/2, n/4, n/8, …• The number of elements in the array at each double is the cost

of that doubling step (because it’s O(n)).• So the cost is defined by a convergent series:

• So, at worst, it will take you thrice as long to use a doubling array as a preallocated one.

• 3 * O(1) = O(1); this is still constant-time.

3...8

1

4

1

2

111

...842

n

nnnnn

CostCost of insertion

Cost of doubling

Page 13: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

The Accounting Method• The accounting method begins by assigning each elementary

operation a cost of $1.– The cost your analysis returns, of course, is then in terms of how long

those operations take.• Each operation will then pay for:

– The actual cost of the operation, and– The future cost of keeping that element maintained (for example,

copying it in the array).• We save in advance so we have something to “spend” when

we double.• We call the saved money “the bank”.

– The bank balance never goes negative; there are no subprime loans.• This is sort of difficult because it requires us looking ahead to

see what happens when we double.

Page 14: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

What’s the cost?• Each element we insert costs $1 immediately.• When doubling:

– We will have to move each element to a new array; each move costs $1.

– We will have to create a new element for each existing element (because we’re doubling the size). Something is eventually going to fill this spot as well. This will cost $1.

• So the total cost is $3 per insertion.– $1 for now, $2 for the future.

• This is the same answer we received using the aggregate method.

• But requires careful inspection to arrive at.

Page 15: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Does it work?

• Remember, the bank must never go negative.• Doubling costs $n+1: $n to copy the n

elements, $1 for the insertion that immediately follows.

• Each insertion pays $3 and costs $1, so $2 goes into the bank at each non-doubling step.

• And each doubling costs $n+1.

Page 16: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Yes, it does.

i 1 2 3 4 5 6

size 1 2 4 4 8 8

Deposit $3 $3 $3 $3 $3 $3

Cost $1 $2 $3 $1 $5 $1

Profit $2 $1 $0 $2 -$2 $2

Bank $2 $3 $3 $5 $3 $5

Red fields represent insertions that cause the array to double.

Page 17: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Potential Method

• Instead of “saving” and paying the cost later, the potential method measures the “potential difference” between two adjacent operations.

• This is defined by a potential function φ.– Φ(0) = 0.– Φ(i) ≥ 0 for all i.

• The amortized cost of operation i is determined by the actual cost plus the difference in potential:– aci = ci + [φ(i) - φ(i-1)]

• The total cost is the sum of these individual costs:

n

i i

n

i i nciic11

)]0()([)]1()([

Page 18: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Potential Method Example

• For the array doubling problem,• That is, the difference between the amount that the

array size would double to and the least power of 2 greater than the current size of the array.

• If i-1 is a power of 2, c = i + 2 - (i - 1) = 1 + 2 = 3.• If i-1 is not a power of 2, ceil(lg(i)) = ceil(lg(i-1)), the

potential terms cancel, and c = 1 + 2 - 0 = 3.• So we get the same answer as in the other methods.

iii lg22)(

Page 19: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Sets

• A set is a data structure that can be used to store records in which the key is the same as the value, keeping all elements in the set unique.

• There are two types in Java:– TreeSets: Sorted Unique Containers.– HashSets: Unsorted Unique Containers.

• A multiset, or bag, is like a set, but without the uniqueness constraint.

Page 20: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Special Properties• Elements in a set class are guaranteed unique.• Attempting to insert an element that already exists will not modify the set

at all and will cause add() to return false.• Sets can be split and merged.

– You can get the entire set of elements greater than or less than a target, for example.

– Or you can merge two disjoint sets together.• This is called a union operation.

• TreeSets are implemented using Binary Search Trees in Java, providing O(log n) insertion, access, and update and guaranteeing sorted order (remember to implement Comparable in your classes).

• HashSets are implemented using hash tables, providing average-case O(1) insertion, access, and deletion, but not guaranteeing sorted order.

• They are thus appropriate data structures to use for operations such as picking out the unique words in a book and outputting them in sorted order.

Page 21: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Methods

• Has some of the usual ones: add(), remove(), size(), addAll()…

• But also some exotic ones that return elements or subsets greater than or less than an element:– higher(Object): Returns the first element > Object.– lower(Object): Returns the first element < Object.– floor(Object): Returns the first element ≤ Object.– ceiling(Object): Returns the first element ≥ Object.– headSet(Object): Returns the whole set < Object.– tailSet(Object): Returns the whole set > Object.

Page 22: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Special Set Operations• Sets are usually represented as trees.• The union algorithm merges two sets by attaching the smaller

set/tree to the larger tree. This is determined by the set’s rank.

• The rank of the set, also defined as the Horton-Strahler number of its tree, is as follows:– A set with one element has a rank of 0.– The result of a union between two sets of the same rank is a set of

rank r+1.• The optimal find algorithm utilizes a strategy called path

compression, which traverses up the tree and makes each node it encounters a child of the root. The amortized running time of this approach is O(a(n)), where a(n) is the inverse Ackermann function. This functions grows extraordinarily slowly, so it effectively runs in amortized constant-time.

Page 23: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Recurrences, part 2.

• If we have a recurrence of the form T(n) = a*T(n/b) + f(n), the solution can be found using the Master Method.

• However, what if we have something like T(n) = T(n/3) + T(n/4) + O(1)?

• Then we need to use a different method.

Page 24: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Solving Complex Recurrences• Isolate the recursive terms:

– T(n) = T(n/3) + T(n/4) + O(1).– This part of the recurrence is called a homogenous recurrence.

• Guess a general form solution for this part of the recurrence.– Generally, these recurrences will be polynomial, so guess c*n^a.

• Plug your guess into the homogenous recurrence:– c*n^a = c*(n/3)^a + c*(n/4)^a.

• Solve for a (or at least get a bound on a):– c*n^a = c*n^a/3^a + c*n^a/4^a.– 1 = 1/3^a + 1/4^a.– Does a = 1 work? 1 > 1/3 + 1/4, so it’s too high.– a = .5 is too small, but close.– a = .56 works.

• So the solution is O(n.56).• If the solution were of the same order as the driving function, we would still need

to multiply by log n.

Page 25: CS503: Fourteenth Lecture, Fall 2008 Amortized Analysis, Sets Michael Barnathan

Performance on a Sequence

• We covered amortized analysis and sets today, plus a bit on recurrences.

• Next time, we will discuss graphs – the root data structure from which most others derive.

• We will also have a somewhat theoretical assignment on analyzing the performance of hashes next time.

• The lesson:– Plan for the future. Plan your current actions to

make your future efforts easier.