precept 4: traveling salesman problem, hierarchical clustering...a naive algorithm to find a tsp...

Post on 28-May-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Precept 4: Traveling Salesman Problem, Hierarchical Clustering

Qian Zhu2/23/2011

Agenda

• Assignment: Traveling salesman problem• Hierarchical clustering

– Example– Comparisons with K-means

TSP

• TSP: Given the coordinates for a set of vertices,• Find an order of connecting the vertices, such that 1) each

vertex is visited once (except for the start vertex which is visited twice), and 2) the total distance is minimum. Start vertex and end vertex must be the same.

Can you spot a TSP touramong these possible answers?

TSP

• Smaller problems (n=50): humans can do very well.

• Larger problems: not so much… • Machine: there is no general way to solve TSP.

Computationally hard. – Challenge: find a polynomial time algorithm to

solve this problem, and you will win a million bucks!

A naive algorithm to find a TSP tour

• Exhaustive search among all possible tours, and pick the minimal tour.

• How many possibilities?– N cities– N x (N-1) x (N-2) x … x 1 – N!

• Obviously not very efficient.

TSP

• Let’s say no polynomial time algorithm exists to solve this problem.

• We can use heuristic methods to derive a reasonably good answer in reasonable time.

– Heuristics: no guarantee that answer is optimal

– Modern methods can solve much larger TSP (involving millions of nodes) in reasonable time.

Assignment goals

• Use heuristic methods to estimate answer for a computationally hard problem

• Visualize the answer to see how well the heuristics perform.

• Implement a linked list structure to represent a tour in Java.– You cannot use LinkedList from Java standard library.

• We will use two heuristics:• Nearest neighbor• Minimize total tour distance

– Extra credit: implement your own heuristic

Tour representation

• Circular linked list

private class Node{Point P;Node next;

}

Node A

A.p

A.next

size-1

Node A

A.p

A.next

Node B

B.p

B.next

size-2

Inner class: better encapsulation

Tour traversal

• Start from the head node• Iterate through the list until the head node is

reached again.

Adding a node to the listNode A

A.p

A.next

Node B

B.p

B.next

Node C

C.p

C.next

1 2

Add:

Node A

A.p

A.next

Node B

B.p

B.next

Node A

A.p

A.next

Node B

B.p

B.next

Node C

C.p

C.next

To:

Insert here

Nearest neighbor (NN)

• Read in the next point, and add it to the current tour after the point to which it is closest.

t_list = list of points in current tourp_list = list of pointsfor each p in p_list:

x = closest point in t_listadd p to t_list, after x

A C

B

E

Current point

Current tour

Compare E with each node in the tour.Pick the node that is closest to E. Connect E after this node.

Nearest neighbor (NN)

A C

B

E

Current point

Current tour

Let’s say B is closest to E. Add E after E:

A C

B

E

Smallest increase heuristic

• Read in the next point, add it to the current tour after the point where it results in the least possible increase in tour length.

t_list = list of points in current tourp_list = list of pointsfor each p in p_list:

x = point where if p is inserted after x, smallest increase in tour length

add p to t_list, after x

Need to calculate every possible increase in order to know which is smallest.

How to calculate increase in tour length?

A

C

BE

5

A

C

B

4

E4

Increase in tour length = (4+4) – 5 = 3

Assume…

• The order of inserting nodes is given. (use the order of nodes in the input file)

• Does order matter?– Try randomize the order and note whether you

get a better answer.

Other problems similar to TSP

• Easy problems:– Shortest path problem:

• Assume: vertices and edges are known• Given two vertices. Find the shortest connected path between them.• Algorithm: Dijkstra (O(|V|2))• Application: route planning

– Chinese postman problem:• A mailman wants to deliver mail to a certain neighborhood. The mailman

is lazy, and he wants to find the shortest route through neighborhood, that meets these criteria:

– It is a closed circuit– He needs to go through every street at least once

• Algorithm: Edmond (O(|V|3))• Application: UPS delivery route planning

Edges weighted

Edges weighted

Other problems similar to TSP

• Easy problems:– Minimum spanning tree problem

• Given: a connected graph G with weighted edges.• Find: a connected subgraph that touches every vertex in the graph. The

sum of weight of edges must be minimal.• Algorithm: Prim and Kruskal (O(|E| log |E|))• Application: A cable TV company is laying cable to a new neighborhood. It

is constrained to bury cable only along certain paths. Find a spanning tree that is a subset of those paths that connects every house and incur minimum laying cost. Edges weighted

Other problems similar to TSP

• Easy problems (continued):– Euler Trail problem:

Seven bridges of Konigsberg -

Take a walk through the city that would cross each bridge once and only once.

Is there a solution?

Other problems similar to TSP

• Euler Trail problem (continued):– Let’s convert it to a graph problem:

• Define a trail: a path that visits every edge exactly once.• Open trail: a trail where start != end.• Closed trail: a trail where start = end.• Find an open trail or a closed trail in the graph. Edges unweighted

Can you find an Euler trail in Konigsberg?

Other problems similar to TSP

• Euler Trail problem (continued):– Modern day Konigsberg (after bombing during

WWII destroyed two bridges):

Other problems similar to TSP

• Hard problems:– Find a Hamiltonian circuit (a path that visits every vertex in

the graph exactly once and return to starting vertex)

Edges unweighted

Hierarchical clustering

0.1 0.2 0.3 0.4

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

x[,1]

x[,2

]

2

3

4

5

1

1 2 3 4

2 0.62

3 0.42 0.19

4 1.32 0.81 0.93

5 1.02 0.48 0.61 0.33

Hierarchical clustering (single-linkage)1 2 3 4

2 0.62

3 0.42 0.19

4 1.32 0.81 0.93

5 1.02 0.48 0.61 0.33

node nodemin distmin

1 3 0.422 3 0.193 2 0.194 5 0.335 4 0.33

Join 2 and 3

1 2 3 4

2 0.62

3 0.42 0.19

4 1.32 0.81 0.93

5 1.02 0.48 0.61 0.33

1 2,3 4

2,3 0.42

4 1.32 0.81

5 1.02 0.48 0.33

Iteration 1

1 2,3 4

2,3 0.42

4 1.32 0.81

5 1.02 0.48 0.33

node nodemin distmin

1 2,3 0.422,3 1 0.424 5 0.335 4 0.33

1 2,3 4

2,3 0.42

4 1.32 0.81

5 1.02 0.48 0.33

Join 4 and 5

1 2,3

2,3 0.42

4,5 1.02 0.48

Iteration 2

1 2,3

2,3 0.42

4,5 1.02 0.48

node nodemin distmin

1 2,3 0.422,3 1 0.424,5 2,3 0.48

1 2,3

2,3 0.42

4,5 1.02 0.48

Join 1 and 2,3

1,2,3

4,5 0.48

Iteration 3

node nodemin distmin

1,2,3 4,5 0.484,5 1,2,3 0.48

1,2,3

4,5 0.48

Join 1,2,3 and 4,5

Final Tree:

{ { {2, 3}, 1}, {4, 5} }4 5

1

2 30.15

0.20

0.25

0.30

0.35

0.40

0.45

Cluster Dendrogram

hclust (*, "single")dist(x)

Hei

ght

Note that the height corresponds to the minimum distance at each iteration

Iteration 4

Exercise

-0.5 0.0 0.5 1.0

-0.5

0.0

0.5

1.0

1.5

x[,1]

x[,2

]

67

8

9

10

3

1

24

5

1 2 3 4 5 6 7 8 9

2 0.46

3 1.02 1.43

4 0.19 0.37 1.06

5 0.76 0.88 0.90 0.62

6 1.37 1.83 0.94 1.52 1.70

7 1.42 1.88 1.00 1.58 1.76 0.06

8 1.92 2.38 1.35 2.07 2.19 0.55 0.50

9 1.23 1.68 0.86 1.38 1.58 0.14 0.19 0.69

10 2.18 2.62 1.78 2.35 2.57 0.87 0.81 0.49 0.99

Pairwise distances

Cluster these points using single-linkage method, using complete-linkage method

Answers9

6 7

8 10

3

5

2

1 4

0.0

0.5

1.0

1.5

2.0

2.5

Cluster Dendrogram

hclust (*, "complete")dist(x)

Hei

ght 5

2

1 4

3

9

6 7

8 10

0.0

0.2

0.4

0.6

0.8

Cluster Dendrogram

hclust (*, "single")dist(x)

Hei

ght

Complete linkage Single linkage

Hierarchical vs K-means

• Run-time analysis. Which clustering algorithm runs faster?

• What’s the bottleneck step in hierarchical clustering?

• What’s the disadvantage of K-means?• What’s the disadvantage of hierarchical

clustering?

top related