o() analysis of methods and data structures reasonable vs. unreasonable algorithms using o()...
TRANSCRIPT
O() Analysis of Methods and Data Structures
Reasonable vs. UnreasonableAlgorithms
Using O() Analysis in Design
Now Available Online!
http://www.coursesurvey.gatech.edu
O() Analysis of Methods and Data Structures
The Scenario
• We’ve talked about data structures and methods to act on these structures…– Linked lists, arrays, trees– Inserting, Deleting, Searching, Traversal,
Sorting, etc.
• Now that we know about O() notation, let’s discuss how each of these methods perform on these data structures!
Recipe for Determining O()
• Break algorithm down into “known” pieces
– We’ll learn the Big-Os in this section
• Identify relationships between pieces
– Sequential is additive
– Nested (loop / recursion) is multiplicative• Drop constants• Keep only dominant factor for each variable
Array Size and Complexity
How can an array change in size?
MAX is 30ArrayType definesa array[1..MAX] of Num
We need to know what N is in advance to declare an array, but for analysis of complexity, we can still use N as a variable for input size.
Traversals
• Traversals involve visiting every element in a collection.
• Because we must visit every node, a traversal must be O(N) for any data structure.
– If we visit less than N elements, then it is not a traversal.
Comparing Data Structures and Methods
Data Structure Traverse
Unsorted L List N
Sorted L List N
Unsorted Array N
Sorted Array N
Binary Tree N
BST N
F&B BST N
LB
Searching for an Element
Searching involves determining if an element is a member of the collection.
• Simple/Linear Search:– If there is no ordering in the data structure– If the ordering is not applicable
• Binary Search:– If the data is ordered or sorted– Requires non-linear access to the elements
Simple Search
• Worst case: the element to be found is the Nth element examined.
• Simple search must be used for:– Sorted or unsorted linked lists– Unsorted array– Binary tree– Binary Search Tree if it is not full and balanced
Comparing Data Structures and Methods
Data Structure Traverse Search
Unsorted L List N N
Sorted L List N N
Unsorted Array N N
Sorted Array N
Binary Tree N N
BST N N
F&B BST N
?
LB
Balanced Binary Search Trees
If a binary search tree is not full, then in the worst case it takes on the structure and characteristics of a sorted linked list with N/2 elements.
7
11
14
42
58
Binary Search TreesIf a binary search tree is not full or balanced, then in
the worst case it takes on the structure and characteristics of a sorted linked list.
7
11
14
42
58
Example: Linked List
• Let’s determine if the value 83 is in the collection:
5 19 35 42 \\
83 Not Found!
Head
Simple/Linear Search Algorithm
cur <- head
loop exitif(cur = NIL) OR (cur^.data = target) cur <- cur^.next
endloop
if(cur <> NIL) then print( “Yes, target is there” )
else print( “No, target isn’t there” )
endif
Pre-Order Search Traversal Algorithm
• As soon as we get to a node, check to see if we have a match
• Otherwise, look for the element in the left sub-tree
• Otherwise, look for the element in the right sub-tree
14
Left ??? Right ???
94
36
6722 14 9
3
Find 9
curIf I have to watchone more of theseI think I’m going to die.
LB
Big-O of Simple Search
• The algorithm has to examine every element in the collection– To return a false– If the element to be found is the Nth
element
• Thus, simple search is O(N).
Binary Search
• We may perform binary search on– Sorted arrays– Full and balanced binary search trees
• Tosses out ½ the elements at each comparison.
Full and Balanced Binary Search Trees
Contains approximately the same number of elements in all left and right sub-trees (recursively) and is fully populated.
34
25
21 29
45
41 52
Binary Search Example
7 12 42 59 71 86 104 212
Looking for 89
Binary Search Example
7 12 42 59 71 86 104 212
Looking for 89
Binary Search Example
7 12 42 59 71 86 104 212
Looking for 89
Binary Search Example
7 12 42 59 71 86 104 212
89 not found – 3 comparisons
3 = Log(8)
Binary Search Big-O
• An element can be found by comparing and cutting the work in half.– We cut work in ½ each time– How many times can we cut in half?
– Log2N
• Thus binary search is O(Log N).
What?
N
2
4
8
16
32
64
128
256
512
LB
Searches
1
2
3
4
5
6
7
8
9
What?
log2(N)
log2(2)
log2(4)
log2(8)
log2(16)
log2(32)
log2(64)
log2(128)
log2(256)
log2(512)
LB
Searches
1
2
3
4
5
6
7
8
9
=
=
=
=
=
=
=
=
=
CS Notation
lg(N)
lg(2)
lg(4)
lg(8)
lg(16)
lg(32)
lg(64)
lg(128)
lg(256)
lg(512)
LB
Searches
1
2
3
4
5
6
7
8
9
=
=
=
=
=
=
=
=
=
Recall
LB
log2 N = k • log10 N
k = 0.30103...
So: O(lg N) = O(log N)
Comparing Data Structures and Methods
Data Structure Traverse Search
Unsorted L List N N
Sorted L List N N
Unsorted Array N N
Sorted Array N Log N
Binary Tree N N
BST N N
F&B BST N Log N
LB
Insertion
• Inserting an element requires two steps:– Find the right location– Perform the instructions to insert
• If the data structure in question is unsorted, then it is O(1)– Simply insert to the front – Simply insert to end in the case of an array– There is no work to find the right spot and
only constant work to actually insert.
Comparing Data Structures and Methods
Data Structure Traverse Search Insert
Unsorted L List N N 1
Sorted L List N N
Unsorted Array N N 1
Sorted Array N Log N
Binary Tree N N 1
BST N N
F&B BST N Log N
LB
Insert into a Sorted Linked List
Finding the right spot is O(N)– Recurse/iterate until found
Performing the insertion is O(1)– 4-5 instructions
Total work is O(N + 1) = O(N)
Inserting into a Sorted Array
Finding the right spot is O(Log N)– Binary search on the element to insert
Performing the insertion – Shuffle the existing elements to make
room for the new item
Shuffling Elements
5 12 35 77 101
Note – we must have at least one empty cell
Insert 29
Big-O of Shuffle
5 12
Worst case: inserting the smallest number
1017735
Would require moving N elements…Thus shuffle is O(N)
Big-O of Inserting into Sorted Array
Finding the right spot is O(Log N)
Performing the insertion (shuffle) is O(N)
Sequential steps, so add:
Total work is O(Log N + N) = O(N)
Comparing Data Structures and Methods
Data Structure Traverse Search Insert
Unsorted L List N N 1
Sorted L List N N N
Unsorted Array N N 1
Sorted Array N Log N N
Binary Tree N N 1
BST N N
F&B BST N Log N
LB
Inserting into a Full and Balanced BST
Always insert when current = NIL.
Find the right spot – Binary search on the element to insert
Perform the insertion – 4-5 instructions to create & add node
Full and Balanced BST Insert
12
3
7
41
98
Add 4
352
Full and Balanced BST Insert
12
3
7
41
98352
4
Big-O of Full & Balanced BST Insert
Find the right spot is O(Log N)
Performing the insertion is O(1)
Sequential, so add:
Total work is O(Log N + 1) = O(Log N)
Comparing Data Structures and Methods
Data Structure Traverse Search Insert
Unsorted L List N N 1
Sorted L List N N N
Unsorted Array N N 1
Sorted Array N Log N N
Binary Tree N N 1
BST N N
F&B BST N Log N Log N
LB
What about a BST
Not full & balanced?
LB
Comparing Data Structures and Methods
Data Structure Traverse Search Insert
Unsorted L List N N 1
Sorted L List N N N
Unsorted Array N N 1
Sorted Array N Log N N
Binary Tree N N 1
BST N N N
F&B BST N Log N Log N
LB
Two Sorting Algorithms
• Bubblesort– Brute-force method of sorting– Loop inside of a loop
• Mergesort– Divide and conquer approach– Recursively call, splitting in half– Merge sorted halves together
Bubblesort Review
Bubblesort works by comparing and swapping values in a list
512354277 101
1 2 3 4 5 6
Bubblesort Review
Bubblesort works by comparing and swapping values in a list
77123542 5
1 2 3 4 5 6
101
Largest value correctly placed
procedure Bubblesort(A isoftype in/out Arr_Type) to_do, index isoftype Num to_do <- N – 1
loop exitif(to_do = 0) index <- 1 loop exitif(index > to_do) if(A[index] > A[index + 1]) then Swap(A[index], A[index + 1]) endif index <- index + 1 endloop to_do <- to_do - 1 endloopendprocedure // Bubblesort
to_doN-1
Analysis of Bubblesort
• How many comparisons in the inner loop?– to_do goes from N-1 down to 1, thus– (N-1) + (N-2) + (N-3) + ... + 2 + 1– Average: N/2 for each “pass” of the
outer loop.
• How many “passes” of the outer loop?– N – 1
Bubblesort Complexity
Look at the relationship between the two loops:– Inner is nested inside outer – Inner will be executed for each iteration of
outer
Therefore the complexity is:
O((N-1)*(N/2)) = O(N2/2 – N/2) = O(N2)
LB
Graphically
N-1
N-1
2N-1
2N-1
LB
O(N2) Runtime Example
Assume you are sorting 250,000,000 items
N = 250,000,000
N2 = 6.25 x 1016
If you can do one operation per nanosecond (10-9 sec) which is fast!
It will take 6.25 x 107 seconds
So 6.25 x 107
60 x 60 x 24 x 365
= 1.98 years
674523 14 6 3398 42
674523 14 6 3398 42
4523 1498
2398 45 14
676 33 42
676 33 42
23 98 4514 676 4233
14 23 45 98 6 33 42 67
6 14 23 33 42 45 67 98
Mergesort
Log N
Log N
Analysis of MergesortPhase I
– Divide the list of N numbers into two lists of N/2 numbers
– Divide those lists in half until each list is size 1
Log N steps for this stage.
Phase II– Build sorted lists from the decomposed lists– Merge pairs of lists, doubling the size of the
sorted lists each time
Log N steps for this stage.
Analysis of the Merging
2398 45 14 676 33 42
23 98 4514 676 4233
Merge MergeMergeMerge
14 23 45 98 6 33 42 67
MergeMerge
6 14 23 33 42 45 67 98
Merge
Mergesort Complexity
Each of the N numerical values is compared or copied during each pass– The total work for each pass is O(N).– There are a total of Log N passes
Therefore the complexity is:
O(Log N + N * Log N) = O (N * Log N)
Break apart Merging
O(NLogN) Runtime Example
Assume same 250,000,000 items
N*Log(N) = 250,000,000 x 8.3
= 2, 099, 485, 002
With the same processor as before
2 seconds
Summary• You now have the O() for basic methods on
varied data structures.• You can combine these in more complex
situations.
• Break algorithm down into “known” pieces
• Identify relationships between pieces
– Sequential is additive
– Nested (loop/recursion) is multiplicative• Drop constants• Keep only dominant factor for each variable
Questions?
Reasonable vs. UnreasonableAlgorithms
Reasonable vs. Unreasonable
Reasonable algorithms have polynomial factors– O (Log N)– O (N)– O (NK) where K is a constant
Unreasonable algorithms have exponential factors– O (2N)– O (N!)– O (NN)
Algorithmic Performance Thus Far
• Some examples thus far:– O(1) Insert to front of linked list– O(N) Simple/Linear Search– O(N Log N) MergeSort– O(N2) BubbleSort
• But it could get worse:– O(N5), O(N2000), etc.
An O(N5) Example
For N = 256
N5 = 2565 = 1,100,000,000,000
If we had a computer that could execute a million instructions per second…
• 1,100,000 seconds = 12.7 days to complete
But it could get worse…
The Power of Exponents
A rich king and a wise peasant…
The Wise Peasant’s Pay
Day(N) Pieces of Grain
1 2
2 4
3 8
4 16
...
2N
63 9,223,000,000,000,000,000
64 18,450,000,000,000,000,000
How Bad is 2N?
• Imagine being able to grow a billion (1,000,000,000) pieces of grain a second…
• It would take– 585 years to grow enough grain
just for the 64th day– Over a thousand years to fulfill
the peasant’s request!
?
So the King cut off the peasant’s head.
LB
The Towers of Hanoi
A B C
Goal: Move stack of rings to another peg– Rule 1: May move only 1 ring at a time– Rule 2: May never have larger ring on top of
smaller ring
Original State Move 1
Move 2 Move 3
Move 4 Move 5
Move 6 Move 7
Towers of Hanoi: Solution
Towers of Hanoi - Complexity
For 3 rings we have 7 operations.
In general, the cost is 2N – 1 = O(2N)
Each time we increment N, we double the amount of work.
This grows incredibly fast!
Towers of Hanoi (2N) Runtime
For N = 64
2N = 264 = 18,450,000,000,000,000,000
If we had a computer that could execute a million instructions per second…
• It would take 584,000 years to complete
But it could get worse…
The Bounded Tile Problem
Match up the patterns in thetiles. Can it be done, yes or no?
The Bounded Tile Problem
Matching tiles
Tiling a 5x5 Area
25 availabletiles remaining
Tiling a 5x5 Area
24 availabletiles remaining
Tiling a 5x5 Area
23 availabletiles remaining
Tiling a 5x5 Area
22 availabletiles remaining
Tiling a 5x5 Area
2 availabletiles remaining
Analysis of the Bounded Tiling Problem
Tile a 5 by 5 area (N = 25 tiles)1st location: 25 choices2nd location: 24 choicesAnd so on…
Total number of arrangements:– 25 * 24 * 23 * 22 * 21 * .... * 3 * 2 * 1– 25! (Factorial) =
15,500,000,000,000,000,000,000,000Bounded Tiling Problem is O(N!)
Tiling (N!) Runtime
For N = 25
25! = 15,500,000,000,000,000,000,000,000
If we could “place” a million tiles per second…
• It would take 470 billion years to complete
Why not a faster computer?
A Faster Computer
• If we had a computer that could execute a trillion instructions per second (a million times faster than our MIPS computer)…
• 5x5 tiling problem would take 470,000 years
• 64-ring Tower of Hanoi problem would take 213 days
Why not an even faster computer!
The Fastest Computer Possible?
• What if:– Instructions took ZERO time to execute– CPU registers could be loaded at the speed of
light
• These algorithms are still unreasonable!• The speed of light is only so fast!
Where Does this Leave Us?
• Clearly algorithms have varying runtimes.
• We’d like a way to categorize them:
– Reasonable, so it may be useful – Unreasonable, so why bother running
Performance Categories of Algorithms
Sub-linear O(Log N)
Linear O(N)
Nearly linear O(N Log N)
Quadratic O(N2)
Exponential O(2N)
O(N!)
O(NN)
Po
lyn
om
ial
Reasonable vs. Unreasonable
Reasonable algorithms have polynomial factors– O (Log N)– O (N)– O (NK) where K is a constant
Unreasonable algorithms have exponential factors– O (2N)– O (N!)– O (NN)
Reasonable vs. Unreasonable
Reasonable algorithms• May be usable depending upon the input size
Unreasonable algorithms• Are impractical and useful to theorists• Demonstrate need for approximate solutions
Remember we’re dealing with large N (input size)
Two Categories of Algorithms
2 4 8 16 32 64 128 256 512 1024Size of Input (N)
1035
1030
1025
1020
1015
trillionbillionmillion100010010
N
N5
2NNN
Unreasonable
Don’t Care!
Reasonable
Ru
nti
me
Summary
• Reasonable algorithms feature polynomial factors in their O() and may be usable depending upon input size.
• Unreasonable algorithms feature exponential factors in their O() and have no practical utility.
Questions?
Using O() Analysis in Design
Air Traffic Control
Coast, add, delete
Conflict Alert
Problem Statement
• What data structure should be used to store the aircraft records for this system?
• Normal operations conducted are:– Data Entry: adding new aircraft entering the
area– Radar Update: input from the antenna– Coast: global traversal to verify that all aircraft
have been updated [coast for 5 cycles, then drop]
– Query: controller requesting data about a specific aircraft by location
– Conflict Analysis: make sure no two aircraft are too close together
Air Traffic Control System
Program Algorithm Freq 1. Data Entry / Exit Insert 15 2. Radar Data Update N*Search 12 3. Coast / Drop Traverse 60 4. Query Search 1 5. Conflict Analysis Traverse*Search 12
LLU LLS AU AS BT F/B BST
#1 1 N 1 N 1 LogN
#2 N^2 N^2 N^2 NlogN N^2 NlogN
#3 N N N N N N
#4 N N N LogN N LogN
#5 N^2 N^2 N^2 NlogN N^2 NlogN
Questions?