today, we will cover ◦ typing ◦ induction and recursion ◦ asymptotic complexity ◦ data...
TRANSCRIPT
Today, we will cover◦ Typing◦ Induction and Recursion◦ Asymptotic Complexity◦ Data Structures◦ Abstract Data Types and Implementing ADTs◦ Searching and Sorting◦ Graphs
For GUIs, you are fine if you can do the practice problems (just do it!)
Do not worry about◦ Threads and concurrency◦ Recurrences◦ Java virtual machine◦ How to balance trees (AVL trees)
But do know the difference between a balanced and unbalanced tree
◦ Software engineering (sort of) Don’t break every known rule of software
engineering when asked to write code We may use a design pattern on the final, but you
won’t have to memorize them
Primitive Types◦ boolean, int, double, etc…◦ Test equality with == and !=◦ Compare with <, <=, >, and >=
void f(int x) { x--;}
int x = 10;f(x);// x == 10
f10f9
main10
Reference types◦ Actual object is stored elsewhere◦ Variable contains a reference to the object ◦ == tests equality for the reference◦ equals() tests equality for the object
Two different references (!=) may exist to two objects with the same value (equals())
◦ Can compare objects of type T with compareTo() if the Comparable<T> interface is implemented
void f(ArrayList<Integer> l) { l.add(2); l = new ArrayList<Integer>();}
ArrayList<Integer> l = new ArrayList<Integer >();l.add(1);f(l);// l contains 1, 2
{}
{}
fl
mainl{1}{1,2
}
We know that type B can implement/extend A◦ B is a subtype of A; A is a supertype of B
The real type of the object is its dynamic type◦ This type is known only at run-time
Any object can act like the supertype of its dynamic type◦ But it cannot act like a subtype of its dynamic type
Variables and function arguments of type A can also accept any subtype of A◦ Type A is a supertype of the dynamic type
The static type is the type your object has in the code when it is compiled
doesn't make sense-objects don't have static type, expressions do
◦ Dynamic type might be a subtype of the static type Casting can only change the static type casting changes neither the static type of an expression nor the dynamic
type of an object
Upcasts are always safe◦ Always cast to a supertype of the dynamic type
Downcasts may not be safe◦ Can downcast to a supertype of the dynamic type◦ Can downcast to the dynamic type itself◦ Cannot downcast to a subtype of the dynamic type
If B extends A, and B and A both have function foo, which foo gets called?◦ Answer depends on the dynamic type
If the dynamic type is B, B’s foo will even be called if foo is invoked inside a function of A
Exception: static functions◦ Static functions are not associated with any object◦ Thus, they do not have any type
Recursion◦ Basic examples
Factorial : n! = n(n-1)! Combinations Pascal’s triangle
◦ Recursive structure Tree (tree t = root with right/left subtree)
Depth first search
◦ Don’t forget base case (in proof, in your code)
Induction◦ Can do induction on previous recursive problems
Algorithm correctness proof (DFS) Math equation proof
Prelim 2 questions
Step 1◦ Base case
Step 2◦ suppose n is the variable you’re going to do induction
on. Suppose the equation holds when n=k ◦ Strong induction: suppose it holds for all n<=k
Step 3◦ prove that when n = k+1, equation still holds, by
making use of the assumptions in Step 2.
f(n) is O(g(n)) if ∃ (c, n0) such that ∀n≥ n0, f(n)≤c⋅g(n)◦ ∃ - there exists; ∀ - for all◦ (c, n0) is called the witness pair
Once you have a correct witness pair, you can probably use induction to prove it is correct
f(n) is O(g(n)) means that the function f(n) is roughly less than or equal to g(n)
Big-O notation is a model for running time◦ Models usually but do not always work in real life
Meaning of n0
◦ We can compare one integer to another◦ How can we tell if one function is less than or
equal to another?◦ Answer is which function grows faster
One function could also start ahead of the other and grow at the same rate, staying ahead
◦ 60-mph car with no headstart will eventually overtake a 40-mph with a headstart
◦ At what time does the faster car/function take over? n0
Meaning of c◦ Suppose we cannot get a precise integer value
897 is less than or equal to 899, but maybe due to some errors the real numbers were 892 and 884
E.g.: ballot counts in Minnesota recount◦ Idea: Compare order of magnitude
Compare numbers by the number of digits 897 and 899 have the same number of digits Difference between 42 and 482 is far bigger Gives us some room for error
Meaning of c◦ What is the difference between n3+1 and n3?◦ What about n3, n3+2n2, and 2n3?◦ We can be off by a constant factor, c◦ If f(n) is only twice as fast as g(n), setting c to 2 or
greater makes g(n) run faster◦ Constant factor cannot account for difference
between n and n2, log n and n, 2n and 3n
◦ There are three common types of growth Logarithmic, polynomial, and exponential growth
Linked Lists◦ Singly-linked/doubly-linked◦ Sorted/unsorted◦ Add, delete elements
Arrays◦ Sorted/unsorted◦ Add, delete elements
Search Tree◦ Balanced and unbalanced
Search for an element in array/list/tree◦ sorted arrays and balanced search trees O(log n) ◦ linked lists (sorted/unsorted) O(n)◦ other unsorted/unbalanced structures O(n)
Trees◦ Traversal◦ Search
Similar to binary search in an array O(log n) Heap
◦ Min/max heap : heap order invariant Every node smaller/larger than its immediate children
◦ Add an element (see lecture notes) O(log n)◦ Delete an element (see lecture notes) O(log n)◦ Implemented with either a binary tree of array
Motivation◦ Sort n numbers between 0 and 2n – 1◦ Instead of sorting abstract comparable objects,
we are sorting integers within a certain range General lower bound of O(n log n) may not apply
◦ Can be done in O(n) time with counting sort Create an array of size 2n The ith entry counts all the numbers equal to i For each number, increment the correct entry
◦ Can also find a given number in O(1) time
Can not do this with arbitrary data types◦ The integer type alone can have over 4 billion
possible values; no array should be that big For a hashtable, create an array of size m
◦ Hash function maps each object to an array index between 0 and m – 1 (in O(1) time) Hash function makes sorting impossible, but still can
lookup an element in O(1) time◦ Quality of hash function is based on how many
elements map to same index in the hashtable Need to expect O(1) collisions
Dealing with collisions◦ In counting sort, one array entry contains only
element of the same value◦ The hash function can map different objects to
the same index of the hashtable Chaining
◦ Each entry of the hashtable is a linked list Linear Probing
◦ If h(x) is taken, try h(x) + 1, h(x) + 2, h(x) + 3, …◦ Quadratic probing: h(x) + 1, h(x) + 4, h(x) + 9, …
Table Size◦ If too large, we waste space◦ If too small, everything collides with each other
Probing falls apart if number of items (n) is almost the size of the hashtable (m)
◦ Typically have a load factor 0 < λ ≤ 1 Resize table when n/m exceeds λ
◦ Resizing changes m; we have to reinsert everything with a new hash function
Table Size◦ What if we double the size every time we exceed
our load factor? Must double the number of items to exceed the load
factor again Worst case is when we just doubled the hashtable Consider all prior times we doubled the table
n + n/2 + n/4 + n/8 + … < 2n
◦ With table doubling, we can insert n items in O(n) time on average Some operations take O(n) time
◦ This also works for growing an ArrayList
Java, hashcode() and equals()◦ Java uses hashcode() in its hash function
hashcode() assigns each item an integer value Java has a special formula to map this integer to
some number between 0 and m – 1◦ If one object equals() another, they should have
the same hashcode() Cannot insert an object with one hashcode() and
then look the same object up with a different hashcode()
If you override equals(), you must also override hashcode() to preserve this property
Java, hashcode() and equals()◦ Different objects can have the same hashcode()
If this happens too often, we have too many collisions
Only equals() can determine if they are equal
Lists Stacks
◦ LIFO Queues
◦ FIFO Sets Dictionaries (Maps) Priority Queues Java API
◦ E.g.: ArrayList is an ADT list backed by an array
Priority Queue◦ Implement as List (sorted/unsorted) : O(n)◦ Implement as heap
PeekMin look at heap root : O(1) ExtractMin heap “delete” op : O(log n) Insert heap “add” op : O(log n)
Insertion Sort Selection Sort Merge Sort Quick Sort Heap Sort
◦ Best/worse case Average case for quicksort
◦ Asymptotic complexity
Inheritance/Interfaces Abstract classes Meaning of static
A graph has vertices A graph has edges between two vertices n – number of vertices; m – number of
edges Directed vs. undirected graph
◦ Directed edges can only be traversed one way◦ Undirected edges can be traversed both way
Weighted vs. unweighted graph◦ Edges could have weights/costs assigned to them
What makes a graph special?◦ Cycles!!!
What is a graph without a cycle?◦ Undirected graphs
Trees◦ Directed graphs
Directed acyclic graph (DAG)
Topological sort is for directed graphs Indegree: number of edges entering a
vertex Outdegree: number of edges leaving a
vertex Topological sort algorithm
◦ Delete a vertex with an indegree of 0 Delete its outgoing edges, too
◦ Repeat until no vertices have an indegree of 0
A
B
D
E C
A B E DC
What is the only thing a topological sort cannot delete?◦ Cycles!!!
If a graph is a DAG, a topological sort will delete the entire graph
If a topological sort deletes the entire graph, the graph is a DAG
Works on directed and undirected graphs You have a start vertex which you visit first You want to visit all vertices reachable from
the start vertex◦ For directed graphs, depending on your start
vertex, some vertices may not be reachable You can traverse an edge from an already
visited vertex to another vertex
Why is choosing any path on a graph risky?◦ Cycles!!!◦ Could traverse a cycle forever
Need to keep track of vertices already visited◦ No cycles if you do not visit a vertex twice
Might also help to keep track of all unvisited vertices you can visit from a visited vertex
Add the start vertex to the collection of vertices to visit
Pick a vertex from the collection to visit◦ If you have already visited it, do nothing◦ If you have not visited it:
Visit that vertex Follow its edges to neighboring vertices Add unvisited neighboring vertices to the set to visit (You may add the same unvisited vertex twice)
Repeat until there are no more vertices to visit
Running time analysis◦ Visit each vertex only once◦ When you visit a vertex, you traverse its edges
You traverse all edges once on a directed graph Twice on an undirected graph
◦ At worst, you add a new vertex to the collection to visit for each edge (collection has size of O(m))
◦ Lower bound is O(n + m) Actual results depends on cost to add/delete vertices
to/from the collection of vertices to visit
Depth-first search and breadth-first search are two graph searching algorithms
DFS pushes vertices to visit onto a stack◦ Examines a vertex by popping it off the stack
BFS uses a queue instead Both have O(n + m) running time
◦ Push/enqueue and pop/dequeue have O(1) time
A
B
D
E C
∅-AA-BB-C
B-EE-D
A B E D C
A
B
D
E C
∅-A
E-D
A-B
C-D
B-C
B-E
A B E DC
MSTs apply to undirected graphs Take only some of the edges in the graph
◦ Spanning – all vertices connected together◦ Tree – no cycles connected
For all spanning trees, m = n – 1◦ All unweighted spanning trees are MSTs
Need to find MST for a weighted graph
A connected component has a path between all vertices in that component.
Idea: find two unconnected components; connect them
Pick the smallest edge between two unconnected components◦ This is a greedy strategy, but it somehow works
Start with a graph with no edges◦ n connected components, n trees
Add edges between unconnected components◦ Forms a bigger tree
What if you add an edge between two vertices in the same component?◦ Cycles!!!
Kruskal’s algorithm◦ Process edges from least to greatest◦ Either an edge connects two different components
or it connects a component to itself Add an edge only in the former case
◦ Picks smallest edge between two components◦ O(m log m) time to sort the edges
Also need the union-find structure to keep track of components, but it does not change the running time
A
B
C
D
E
G
F
5
7
2
8
3
10
124
1
9
Prim’s algorithm◦ Graph search algorithm, builds up a spanning tree
from one root vertex◦ Like BFS, but it uses a priority queue
Priority is the weight of the edge to the vertex Also need to keep track of which edge we used
◦ Always picks smallest edge to an unvisited vertex◦ Size of heap is O(m); running time is O(m log m)
A
B
C
D
E
G
F
5
7
2
8
3
10
124
1
9
∅-A0
A-C2
A-B5
C-B4
C-D3
D-E8
D-F10
B-E7
D-G12
E-G9
G-F1
Works on directed and undirected graphs What is the shortest path from one vertex
(the source) to another (the sink)?◦ (Hint: the answer is not cycles)
If edges have positive weights, we can use Dijkstra’s algorithm
Dijkstra’s algorithm is a graph search algorithm
If it visits a vertex, it knows the shortest path to that vertex
It will eventually hit the sink vertex and know the shortest path to it
Requires positive edge weights to work
Dijkstra’s algorithm is similar to Prim’s◦ Uses a priority queue◦ Also has O(m log m) time
Difference lies in the priority◦ Priority is the length of shortest path to a visited
vertex + cost of edge to unvisited vertex◦ We know the shortest path to every visited vertex
On unweighted graphs, BFS gives us the same result as Dijkstra’s algorithm
A
B
C
D
E
G
F
5
7
2
8
3
10
154
1
9
∅-A0
A-B5
A-C2
C-B6
C-D5
B-E12
D-E13
D-F15
D-G20
F-G16
E-G21