csce350 algorithms and data structure lecture 17 jianjun hu department of computer science and...

26
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11.

Upload: myra-carpenter

Post on 12-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

CSCE350 Algorithms and Data Structure

Lecture 17

Jianjun Hu

Department of Computer Science and Engineering

University of South Carolina

2009.11.

Page 2: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Midterm2

Average 84

>90 12 students

Page 3: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

B-Trees

Organize data for fast queries

Index for fast search

For datasets of structured records, B-tree indexing is used

Page 4: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

B-Trees 4

Motivation (cont.)

Assume that we use an AVL tree to store about 20 million records

We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes about 0.2 seconds

We know we can’t improve on the log n lower bound on search for a binary tree

But, the solution is to use more branches and thus reduce the height of the tree!

• As branching increases, depth decreases

Page 5: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

B-Trees 5

Constructing a B-tree

17

3 8 28 48

1 2 6 7 12 14 16 52 53 55 6825 26 29 45

Page 6: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

B-Trees 6

Definition of a B-tree

A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which:

1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree

2. all leaves are on the same level3. all non-leaf nodes except the root have at least m / 2

children4. the root is either a leaf node, or it has from two to m

children5. a leaf node contains no more than m – 1 keys

The number m should always be odd

Page 7: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

B-Trees 7

Comparing Trees

Binary trees• Can become unbalanced and lose their good time complexity

(big O)• AVL trees are strict binary trees that overcome the balance

problem• Heaps remain balanced but only prioritise (not order) the keys

Multi-way trees• B-Trees can be m-way, they can have any (odd) number of

children• One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a

permanently balanced binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations

Page 8: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Chapter 8: Dynamic Programming

Dynamic Programming is a general algorithm design technique

Invented by American mathematician Richard Bellman in the 1950s to solve optimization problems

“Programming” here means “planning”

Main idea: • solve several smaller (overlapping) subproblems• record solutions in a table so that each subproblem is only

solved once• final state of the table will be (or contain) solution

Page 9: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Example: Fibonacci numbers

Recall definition of Fibonacci numbers:

f (0) = 0f (1) = 1f (n) = f (n-1) + f (n-2)

Computing the nth Fibonacci number recursively (top-down):

f(n)

f(n-1) + f(n-2)

f(n-2) + f(n-3) f(n-3) + f(n-4)

...

Page 10: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Example: Fibonacci numbers

Computing the nth Fibonacci number using bottom-up iteration: f(0) = 0 f(1) = 1 f(2) = 0+1 = 1 f(3) = 1+1 = 2 f(4) = 1+2 = 3 f(5) = 2+3 = 5 … f(n-2) = f(n-1) = f(n) = f(n-1) + f(n-2)

Page 11: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Examples of Dynamic Programming Algorithms

Computing binomial coefficients

Optimal chain matrix multiplication

Constructing an optimal binary search tree

Warshall’s algorithm for transitive closure

Floyd’s algorithms for all-pairs shortest paths

Some instances of difficult discrete optimization problems:

• travelling salesman• knapsack

Page 12: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

The problem: check if all pair have a valid path

a

c d

b

You can run depth first search from each vertex…But can u do better?

Page 13: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Warshall’s Algorithm: Transitive Closure

• Computes the transitive closure of a relationComputes the transitive closure of a relation

• (Alternatively: all paths in a directed graph)(Alternatively: all paths in a directed graph)

• Example of transitive closure:Example of transitive closure:

3

42

1

0 0 1 01 0 0 10 0 0 00 1 0 0

0 0 1 01 1 11 1 10 0 0 011 1 1 11 1

3

42

1

Page 14: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Warshall’s Algorithm

Main idea: a path exists between two vertices i, j, iff•there is an edge from i to j; or•there is a path from i to j going through vertex 1; or•there is a path from i to j going through vertex 1 and/or 2; or•there is a path from i to j going through vertex 1, 2, and/or 3; or•...•there is a path from i to j going through any of the other vertices

Page 15: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Warshall’s Algorithm

3

42

13

42

13

42

13

42

1

R0

0 0 1 01 0 0 10 0 0 00 1 0 0

R1

0 0 1 01 0 11 10 0 0 00 1 0 0

R2

0 0 1 01 0 1 10 0 0 01 1 1 1 11 1

R3

0 0 1 01 0 1 10 0 0 01 1 1 1

R4

0 0 1 01 11 1 10 0 0 01 1 1 1

3

42

1

Page 16: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Warshall’s Algorithm

In the kth stage determine if a path exists between two vertices i, j using just vertices among 1,…,k

R(k-1)[i,j] (path using just 1 ,…,k-1) R(k)[i,j] = or

(R(k-1)[i,k] and R(k-1)[k,j]) (path from i to k and from k to i using just 1 ,…,k-1)

i

j

k

kth stage

{

Page 17: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Pseudocode of Warshall’s Algorithm

)(

)1()1()1()(

)0(

]),[],[(],[],[

1

1

1

])..1,..1[(

n

kkkk

R

jkRkiRjiRjiR

nj

ni

nk

AR

nnAWarshall

return

and or

do to for

do to for

do to for

ALGORITHM

Page 18: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Example

Time efficiency?

a

c d

b

Page 19: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Intuitive understanding of Warshall algo

Can the above Warshall algorithm find all possible paths?

Does Warshall algorithm avoid duplicate updates?

i=6 9 12 3 7 5=j

69

912

123

37

75=j

Page 20: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Floyd’s Algorithm: All pairs shortest paths

In a weighted graph, find shortest paths between every pair of vertices

Same idea: construct solution through series of matrices D(0), D(1), … using an initial subset of the vertices as intermediaries.

2

43

12

3 6

7

1064

1073

022

301

4321

091664

10773

65022

431001

4321

Page 21: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Similar to Warshall’s Algorithm

in is equal to the length of shortest path among all paths from the ith vertex to jth vertex with each intermediate vertex, if any, numbered not higher than k

)(kijd )(kD

v i v j

v k

di k

(k-1)d

k j

(k-1)

di j

(k-1)

ijijk

kjk

ikk

ijk

ij wdkdddd )0()1()1()1()( ,1},min{ for

Page 22: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Pseudocode of Floyd’s Algorithm

The next matrix in sequence can be written over its predecessor

D

jkDkiDjiDjiD

nj

ni

nk

WD

nnWFloyd

return

do to for

do to for

do to for

ALGORITHM

]},[],[],,[min{],[

1

1

1

])..1,..1[(

Page 23: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Example

Time efficiency?

1

3 4

22

37

6

1

Page 24: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Principle of Optimality

An optimal solution to any instance of a optimization problem is composed of optimal solution to its subinstance

e.g.

Shortest path between a and b… then any distance of two points S, T on the path from a to b must also be shortest….

Why?

Otherwise, we can choose another path S’ and T’!

Page 25: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Key issues in DP algorithm design

Derive a recurrence relationship that expresses a solution to an instance of the problem in terms of solutions to its smaller subinstances

Page 26: CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina 2009.11

Design Algo for Sequence Alignment Problem using DP

ACTGGAGGCCA

AGCTAGAGAAG

Score= Sum(si)match: +3

Mismatch:-2gap-X -1