dealing with np-c problems
DESCRIPTION
Randomized Algorithm. Dealing with NP-C problems. NP-Complete Problem. A problem that, right now, we need exhaustive search Example: SAT TSP Vertex Cover Etc. Travelling Salesman Problem (TSP). Find a sequence of city to visit Minimize the cost of travel. Example. Example. - PowerPoint PPT PresentationTRANSCRIPT
DEALING WITH NP-C PROBLEMS
Randomized Algorithm
NP-Complete Problem
A problem that, right now, we need exhaustive search
Example: SAT TSP Vertex Cover Etc.
Travelling Salesman Problem (TSP) Find a sequence
of city to visit Minimize the cost
of travel
Example
Route CostA B C DA B D CA C B DA C D BA D B CA D C B
Example
Route CostA B C D 20 + 30 + 12 + 35A B D C 20 + 34 + 12 + 42A C B D 42 + 30 + 34 + 35A C D B 42 + 12 + 34 + 20A D B C 35 + 34 + 30 + 42A D C B 35 + 12 + 30 + 20
General Search Algorithm
while has_unexplored_solution() { Solution sol = gen_another_solution() int cost = evaluate(sol) if (cost < min_cost) { min_cost = cost; min_sol = sol; }}
Time = number of solutions * evaluation
TSP Solution Space
N cities (N-1)! / 2
assume symmetric graph
What to do if N is too large?
Relaxation
In practice, we don’t really need the “best” solution But we need it now
Something close to now (1 min, 1 hour, etc.. But not 100 years)
Just something being “near” the best solution
One who try to optimize everything is bounded to be unhappy
Approximation Algorithm
Bounded solution
If solving for “optimum” solution requires exponential time What polynomial time could give us? What can we say about the solution from
polynomial time?
Approximation Ratio
The “best” solution for IOur algorithm
Problem Instance
Approximation ratio, upper bound of our sub-optimal
Approximation Algorithm
Algorithm that run fast (polynomial time) that give good approximation ratio
Reasonable choice when dealing with NP complete problem
Clustering
Input: Set of points X Integer k
Output: Partition of points into k set Such that the diameter of each set is
minimized
Metric Property
A function d(x,y) such that1. d(x,y) >= 02. d(x,y) = 0 if and only if x = y3. d(x,y) = d(y,x)4. d(x,y) < d(x,z) + d(z,y)
TriangularInequality
Example
Approximated Version
Guarantee
Ratio = 2 i.e., the resulting diameter is not
more than twice of original
Why ?
Let p be the point in X that is farthest from μ1, μ 2, μ 3,…, μ k
Let r be the distance from p to it’s closest center
Then… by triangular inequality, every cluster has diameter at most 2r
So what?
How r relate to the optimal solution? There are k + 1 points
μ1, μ 2, μ 3,…, μ k, p Such that they are all at least r from the
others So, any partition into k set must have
some set that contains at least two of tem
That set must has diameter at least r
Approximated Euclidian TSP TSP such that distance between two
cities is a Metric What is closely relate to TSP and can
be easily compute?
MST and TSP
Given an answer for TSP It is a cycle Remove one edge from the answer The result is path that is also a
spanning tree (not minimal) Let p be that path
From MST to TSP
Given an MST Do DFS, the result of visiting is
simply a cycle that visit every vertex Also visit some vertex multiple times
From MST to TSP
Length of that path is at most twice of the best TSP path
Fix the path into TSP Simply skip the vertex that is about
to re-visit and move to the next vertex in the list
Fixing the PathBy triangular inequality, the new path is shorter
Approximated 01-Knapsack Input
A number W, the capacity of the sack n pairs of weight and price ((w1,p1),(w2,p2),…,(wn,pn))
wi = weight of the ith items pi = price of the ith item
Output A subset S of {1,2,3,…,n} such that
is maximum
Si
ip
Si
i Ww
Guarantee
Pick any ε > 0 Result value is at least (1 – ε) of the
maximum value
Approximated 01-Knapsack
Knapsack can be solved using dynamic programming
Using O(nW) W = sum of weight We can derive similar algorithm
using O(nV) where V = sum of values
O(nV) knapsack
Let K(v) be the “minimal weight” when sum of selected value is v
If the ith item is in the best solution K(v) = K(v – pi) + wi
But, we don’t really know that the ith
item is in the optimal solution So, we try everything K(v) = min1≤i ≤ n(K(v – pi) + wi)
Approximation
Since it is O(nV) Can we reduce V?
To improve running time
Value scaling
Scale the price by a constant Resulting price is at most n/ε Thus, the running time is (n3/ ε)
The optimal solution
Let S be the selected subset of the optimal
Let K* be the maximum value Find the result of the rescaled input
Approximation Ratio
Rewrite in terms of K*
Let Ŝ be the set of selected rescaled item
Search with bounded resource
RANDOM SEARCH
General Search Algorithm
while time_not_exceed() { Solution sol = random_a_solution() int cost = evaluate(sol) if (cost < min_cost) { min_cost = cost; min_sol = sol; }}
Time is bounded Best solution is not guaranteed
Does it work?
If we have “enough” time…
Eventually, we will hit the “right” answer
It’s ok to be greed
Hill Climbing
Can we improve Random Search? Anything better than randomly
generate a new answer?
0-1 Knapsack Problem
Pick a combination of items
Solution Space 00000 (0) 00001 (1) 00010 (2) 00011 (3) 00100 (4) . . 11111 (31) 32
solutions in total
Evaluation Function
Solution
eval
(1) (2) (3) (4)(5) (6) …
Neighbor of Solution
In solution space 00100 (4) is close to
00011 (3) and 00101 (5)
Hill Climbing Search Generate only the neighbor solution Move to the best neighbor solution
Solution sol = random_a_solution()while time_not_exceed() { Solution nb[] = gen_all_neighbors(sol) Solution nb_sol; int nb_min; For all x in nb { int cost = evaluate(x) if (cost < nb_min) { nb_min = cost; nb_sol = x; } } if (nb_min < min_cost) { min_cost = nb_min; sol = nb_sol; }}
If cost does not improve, we might stop
Several definition of neighbor
0
2
4
6 0
2
4
6
-1
-0.5
0
0.5
1
0
2
4
6
Best Problem for Hill Climbing Unimodal
Bad problem for hill climbing Multimodal
Local Minima
Hill climbing is a local search (solver can define their own “local”)
It could stuck at local minima Need something to fix it
If stuck Randomly generate another solution and
start from that solution
O’ mother nature, I worship thou
Simulated Annealing
Annealing
A material (metal) is heated and then slowly cooled down
Rearrangement of atom Because, if not heated, atom is stuck at
irregular position from inertia That’s like the “Local Minima”
Simulated Annealing
Just like hill climbing But solution is allowed to move to
lower position With some probability
With inverse proportional to the elapsed time
Help escape from local minima
Simulated AnnealingSolution sol = random_a_solution()while time_not_exceed() { Solution nb = gen_one_neighbors(sol) int cost = evaluate(nb); if (cost < min_cost) { sol = nb; min_cost = cost; } else { if (random() < chance_at_time(t)) { sol = nb; min_cost = cost; } }}
Fitness Function
Randomized for Decision Problem
Decision Problem
We look for “the solution” Not the best Answer is either yes/no
SAT Problem
Find assignment to x,y,z that make this expression true
evaluate(sol) is either “true” or “false”
Fitness of SAT
Use other function for hill climbing A function that is maximal when
evaluate(sol) is true Should return something near maximum
when sol is near the “correct” answer
Example for SAT
Fitness = number of clause that is true
GA? SA? Random? Hill Climbing? Lot’s more….
What to use?
No Free Lunch Theorem
algo A outperform algo B on average For example Hill Climbing >>>
Random? Here’s come the “No Free Lunch”
theorem No such A,B exist
Definition: Black Box algorithm Oracle-based model
Does not ask same question twice
X1
X2
F(X1)F(X2)
Definition: algorithm
Sample Dm = all sample of size m D = all sample = D1 U D2 U … Optimization Algorithm is a mapping
From D to a : d D { x | x dx }
Given Algorithm a m number of oracle call Cost function f
Performance = P(dym | f , m , a )
Performance measure in dym : Φ(dy
m ) e.g., minimization
Definition: performance
Desiredoutput Proble
mat hand
Resource
Algorithm
The NFL theorem
For any two algorithms a1, a2
The chance of getting any particular dym
P(dym |f,m,a)
Is equal when sum over all possible function
Why?
Why?
= finite search space = finite cost value space Opt. problem f : =
= all possible function Size = ||||
Large, but finite
Implication of NFL
NFL says There are no universally good algo if A do better than B on some problem
B must do better than A on other problem E.g. If A beat random search on some
set… A must performs worse than random
for the rest of problems
Implication of NFL
If someone write a paper saying “We design A for one particular Problem
F” “We compare it with B on test problems
f1,f2,f3” “Result show that A is better than B” “My algo should be better than B on all
F” It might be wrong…
Another view of NFL
Better indicator for performance NFL is proven when P(f) is flat
Flat P(f) is sufficient... But not necessary In real life, P(f) is not flat However
Some knowledge must put into A structure of problem does not justify the
choice
Conclusion
Randomized algorithm works well in practice
But you need to at least put some knowledge it in
Given a problem, if you really has no idea, try them first…