outline

28
Outline 1. General Design and Problem Solving Strategies 2. More about Dynamic Programming Example: Edit Distance 3. Backtracking (if there is time) Another Strategy for the Knapsack Problem:

Upload: tricia

Post on 09-Feb-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Outline. General Design and Problem Solving Strategies More about Dynamic Programming Example: Edit Distance Backtracking (if there is time) Another Strategy for the Knapsack Problem:. Design Strategies. Dynamic Programming Design Strategy Solve an “easy” sub-problem Store the solution - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Outline

Outline

1. General Design and Problem Solving Strategies

2. More about Dynamic Programming– Example: Edit Distance

3. Backtracking (if there is time)– Another Strategy for the Knapsack Problem:

Page 2: Outline

Design Strategies

Dynamic Programming Design Strategy– Solve an “easy” sub-problem– Store the solution– Use stored solution to solve a more difficult sub-

problem.– Repeat until you solve the “big” hard problem

Other Strategies– Divide and Conquer– Brute Force– Greedy

Page 3: Outline

Design Strategies

Dynamic Programming is not divide and conquer.

Consider Floyd’s algorithm– At no point did we break the input into two parts– What is Floyd’s algorithm really doing?

Page 4: Outline

Design Strategies

What is Floyd’s algorithm really doing?– STEP 1: Find all the shortest paths allowing a hop

through vertex A, store the shortest paths– STEP 2: Now, use that answer to find the shortest

paths allowing a hop through vertex BThe algorithm exploits what was

already computed, so STEP 2 really finds the shortest paths allowing hops through vertex A and B.

Page 5: Outline

More General Design Strategies

Top Down– See the big picture first– Break it into parts– Analyze each part– Continue breaking down sub-parts into solvable

tasks

Quicksort is a classic example.

Page 6: Outline

More General Design Strategies

Top Down - Quicksort– See the big picture first

Need to put items in the correct sorted position– Break it into parts

Put the pivot in the correct position and partition the list into two parts

– Analyze each part Pick a pivot for each part…

– Continue breaking down sub-parts into solvable tasks

Continue recursively until sub-parts are lists of size 1

Page 7: Outline

More General Design Strategies

Bottom Up– Use the solution to larger and larger problems to

solve the BIG problem and see the big picture– Use solution to small tasks to solve larger

problems– Identify easily solvable tasks

Mergesort is a classic example

Page 8: Outline

More General Design Strategies

Bottom Up - Mergesort – Use the solution to larger and larger problems to

solve the BIG problem and see the big picture Merging the final two sorted list

– Use solution to small tasks to solve larger problems

Merging sorted lists– Identify easily solvable tasks

Sorting lists of size 2

Page 9: Outline

More General Design Strategies

Is Floyd’s Algorithm Top Down or Bottom Up?

Page 10: Outline

More General Design Strategies

Divide and Conquer can be both Top Down or Bottom Up

Dynamic Programming tends to only be Bottom Up.

Page 11: Outline

More General Design Strategies

Consider Bottom-up Strategies– Divide and Conquer usually merges two

smaller sub-problems into a large problem (N/2 + N/2) N

– Dynamic Programming usually extends the solution in some way

N-2 N-1 N_simple_version N_harder_version

Page 12: Outline

More about Dynamic Programming

How does Floyd’s Algorithm extend the solution?– N-2 N-1

Does it consider a smaller graph and then extends the solution to a larger graph?

– N_simple_version N_harder_version Does it consider a simpler shortest path problem and

extend it to a more complex shortes path problem?

Page 13: Outline

More about Dynamic Programming

In a graph, it is really easy to solve the shortest path problem if you do not allow any hops (intermediate vertices)– The adjacency matrix stores all the shortest paths

(direct hop)

Page 14: Outline

More about Dynamic Programming

It is also easy to solve the problem if you only allow a hop through vertex x

if (M[a][x] + M[x][b] < M[a][b]) then– update the distance

O(N2) is required to update all the cells Then, just repeat this process N times; one

for each intermediate vertex. O(N3) total time

Page 15: Outline

Top Down vs. Bottom Up

Top Down– Rethinking the design of existing ideas/inventions– Managing projects that are underway– Works really good in a Utopian world

Bottom Up– Designing totally new ideas– Putting together projects from scratch– Seen more often in the real world.

Page 16: Outline

Bottom-up Design

Top Down– Lets build a flying carriage; what are the parts?– Lift, propulsion, steering, etc.

Lets build a steering mechanism; what are the parts?– We need a steering control– Umm? Wait, we need to know how the other parts work

first. Lets build a lift mechanism; how do we do this?

– ??? Lets build a propulsion mechanism

Page 17: Outline

Bottom-up Design

Bottom UP– Discoveries:

This shape produces lift A spinning propeller creates propulsion in the air Canvas with a wood frame is light enough

– Next Step: Perhaps we can build an stable, controllable flying thing.

Page 18: Outline

Bottom-up Design

Before we can analyze the big picture We have to

– Look at some of the initial smaller problems– See how they were solved– See how they led to new discoveries

Page 19: Outline

Another Dynamic Programming Algorithm

Problem:– Find The Edit Distance Between Two Strings

Solutions:– Brute Force – O(KN) – Greedy – No Optimal Algorithms yet– Divide & Conquer – None discovered yet– Dynamic Programming – O(N2)

Page 20: Outline

Edit Distance

How many edits are needed to exactly match the Target with the Pattern

Target: TCGACGTCA Pattern: TGACGTGC

Page 21: Outline

Edit Distance

How many edits are needed to exactly match the Target with the Pattern

Target: TCGACGT CA Pattern: T GACGTGC Three:

– By Deleting C and A from the target, and by Deleting G from the Pattern

Page 22: Outline

Edit Distance

Applications:– Approximate String Matching– Spell checking– Google – finding similar word variations– DNA sequence comparison– Pattern Recognition

Page 23: Outline

Edit Distance – Dynamic Programming

A C G TC G C AT

A

C

G

T

G

T

G

C

0 1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 8

1 2 1 2 3 4 5 6 7

2 3 2 1 2 3 4 5 6

3 2 3 2 1 2 3 4 5

4 3 2 3 2 1 2 3 4

5 4 3 4 3 2 1 2 3

6 5 4 5 4 3 2 3 4

7 6 5 6 5 5 3 2 3

Optimal edit distance forTG and TCG

Optimal edit distance for TG and TCGA

Optimal edit distance for TGA and TCG

Final Answer

Optimal edit distance for TGA and TCGA

Page 24: Outline

Edit Distance

int matrix[n+1][m+1];for (x = 0; x <= n; x++) matrix[x][0] = x;for (y = 1; y <= m; y++) matrix [0][y] = y;for (x = 1; x <= n; x++)

for (y = 1; y <= m; y++)if (seq1[x] == seq2[y])

matrix[x][y] = matrix[x-1][y-1];else

matrix[x][y] = max(matrix[x][y-1] + 1,matrix[x-1][y] + 1);

return matrix[n][m];

0 1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 8

1 2 1 2 3 4 5 6 7

2 3 2 1 2 3 4 5 6

3 2 3 2 1 2 3 4 5

4 3 2 3 2 1 2 3 4

5 4 3 4 3 2 1 2 3

6 5 4 5 4 3 2 3 4

7 6 5 6 5 5 3 2 3

Page 25: Outline

Edit Distance

int matrix[n+1][m+1];for (x = 0; x <= n; x++) matrix[x][0] = x;for (y = 0; y <= m; y++) matrix [0][y] = y;for (x = 1; x <= n; x++)

for (y = 1; y <= m; y++)if (seq1[x] == seq2[y])

matrix[x][y] = matrix[x-1][y-1];else

matrix[x][y] = max(matrix[x][y-1] + 1,matrix[x-1][y] + 1);

return matrix[n][m];

How many times is this comparison performed?

How many times is this assignment performed?

How many times is this assignment performed?

How many times is this assignment performed?

Page 26: Outline

Edit Distance – Dynamic Programming

A C G TC G C AT

A

C

G

T

G

T

G

C

0 1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 8

1 2 1 2 3 4 5 6 7

2 3 2 1 2 3 4 5 6

3 2 3 2 1 2 3 4 5

4 3 2 3 2 1 2 3 4

5 4 3 4 3 2 1 2 3

6 5 4 5 4 3 2 3 4

7 6 5 6 5 5 3 2 3

To derive the value 7,we need to know that we

can match two T’sn=8

In the worst case,this may take n comparisons

To derive the value 6,we need to know that we can match two C’s after

matching two T’s

To derive this value 5,we need to know that

we can match two G’s after already matching two C’s and previously matching two T’s

Page 27: Outline

Edit Distance – Dynamic Programming

A C G TC G C AT

A

C

G

T

G

T

G

C

0 1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

0 1 2 3 4 5 6 7 8

1 2 1 2 3 4 5 6 7

2 3 2 1 2 3 4 5 6

3 2 3 2 1 2 3 4 5

4 3 2 3 2 1 2 3 4

5 4 3 4 3 2 1 2 3

6 5 4 5 4 3 2 3 4

7 6 5 6 5 5 3 2 3

Given our previous matches,there is no way we can match two A’sThus, the edit distance is increased

Luckily, we can match these two C’sBut now we’ve matched the last symbol

We can’t do any more matching (period!)

Page 28: Outline

Lesson to learn

There is no way to compute the optimal (minimum) edit distance without considering all possible matching combinations.

The only way to do that is to consider all possible sub-problems.

This is the reason the entire table must be considered.

If you can compute the optimal (minimum) edit distance using less than O(nm) computations.

Then you will be renown!