9/16 scan by kalpesh shah. what is needed: --a neighborhood function the larger the neighborhood you...

9/16

Scan by Kalpesh Shah

What is needed: --A neighborhood function The larger the neighborhood you consider, the less myopic the search (but the more costly each iteration) --A “goodness” function needs to give a value to non-solution configurations too for 8 queens: (-ve) of number of pair-wise conflicts

Applying min-conflicts based hill-climbing to 8-puzzle

Local Minima

Problematic scenarios for hill-climbing

When the state-space landscape has local minima, any search that moves only in the greedy direction cannot be (asymptotically) complete

Random walk, on the other hand, is asymptotically complete

Idea: Put random walk into greedy hill-climbing

Ridges

Solution(s): Random restart hill-climbing Do the non-greedy thing with some probability p>0 Use simulated annealing

Making Hill-Climbing Asymptotically Complete

• Random restart hill-climbing– Keep some bound B. When you made more than B moves, reset

the search with a new random initial seed. Start again. • Getting random new seed in an implicit search space is non-trivial!

– In 8-puzzle, if you generate a random state by making random moves from current state, you are still not truly random (as you will continue to be in one of the two components)

• “biased random walk”: Avoid being greedy when choosing the seed for next iteration – With probability p, choose the best child; but with probability (1-

p) choose one of the children randomly• Use simulated annealing

– Similar to the previous idea—the probability p itself is increased asymptotically to one (so you are more likely to tolerate a non-greedy move in the beginning than towards the end)

With random restart or the biased random walk strategies, we can solve very large problems million queen problems in under minutes!

9/18

Bye Bye, Galieleo…

http://www.jpl.nasa.gov/galileo/graphics/Top_Ten_01_Euruopa_Disk.jpg

Announcements

• Sreelakshmi says:Recitation in Rm 409, 10:30—11:30AM

(note changed timing)

• Homework 2 socket will be closed by Friday and it will be due on next Thursday.

“Beam search” for Hill-climbing• Hill climbing, as described, uses one seed solution that is

continually updated– Why not use multiple seeds?

• Stochastic hill-climbing uses multiple seeds (k seeds k>1). In each iteration, the neighborhoods of all k seeds are evaluated. From the neighborhood, k new seeds are selected probabilistically – The probability that a seed is selected is proportional to how good it is. – Not the same as running k hill-climbing searches in parallel

• Stochastic hill-climbing is sort of “almost” close to the way evolution seems to work with one difference– Define the neighborhood in terms of the combination of pairs of current

seeds (Sexual reproduction; Crossover)• The probability that a seed from current generation gets to “mate” to produce

offspring in the next generation is proportional to the seed’s goodness• To introduce “randomness” do mutation over the offspring

– Genetic algorithms limit number of matings to keep the num seeds the same

– This type of stochastic beam-search hillclimbing algorithms are called Genetic algorithms.

Illustration of Genetic Algorithms in Action

Very careful modeling needed so the things emerging from crossover and mutation are still potential seeds (and not monkeys typing Hamlet)Is the “genetic” metaphor reallybuying anything?

Hill-climbing in “continuous” search spaces

• Gradient descent (that you study in calculus of variations) is a special case of hill-climbing search applied to continuous search spaces

– The local neighborhood is defined in terms of the “gradient” or derivative of the error function.

• Since the error function gradient will be zero near the minimum, and higher farther from it, you tend to take smaller steps near the minimum and larger steps farther away from it. [just as you would want]

• Gradient descent is guranteed to converge to the global minimum if alpha (see on the right) is small, and the error function is “uni-modal” (I.e., has only one minimum).

– Versions of gradient-descent algorithms will be used in neuralnetwork learning.

• Unfortunately, the error function is NOT unimodal for multi-layer neural networks. So, you will have to change the gradient descent with ideas such as “simulated annealing” to increase the chance of reaching global minimum.

X

Err=|x3-a|

a1/3 xo

x1= x0 - [ d/dx[Err(x)] * alpha

-- the negative sign in front of d/dx shows That you are supposed to step in the directionOpposite to that of the gradient-- alpha is a constant that adjusts the step size

--larger the alpha, the faster the convergencebut also the higher the chance of oscillation

--The smaller the alpha, slower the convergence,but lower the chance of oscillation (around the minumum)

Example: cube rootFinding using newton-Raphson approximation

Tons of variations based on how alpha is set

Origins of gradient descent:Newton-Raphson applied to function minimization

• Newton-Raphson method is used for finding roots of a polynomial– To find roots of g(x), we start with some value of x

and repeatedly do• x x – g(x)/g’(x)

– To minimize a function f(x), we need to find the roots of the equation f’(x)=0

• X x – f’(x)/f’’(x) • If x is a vector then

– X x – f’(x)/f’’(x)

f(x) Hf(x)

gradient

Hessian

f xfXHXX )()(1

Because hessian is costly toCompute (will have n2 doubleDerivative entries for an n-dimensional vector), we tryapproximations

http://hamptonminiatures.com/HessiancloseUp.htm

The middle ground between hill-climbing and systematic search

• Hill-climbing has a lot of freedom in deciding which node to expand next. But it is incomplete even for finite search spaces.– Good for problems which have solutions, but the solutions are

non-uniformly clustered. • Systematic search is complete (because its search tree keeps

track of the parts of the space that have been visited). – Good for problems where solutions may not exist,

• Or the whole point is to show that there are no solutions (e.g. propositional entailment problem to be discussed later).

– or the state-space is densely connected (making repeated exploration of states a big issue). Smart idea: Try the middle ground between the two?

Between Hill-climbing and systematic search

• You can reduce the freedom of hill-climbing search to make it more complete– Tabu search

• You can increase the freedom of systematic search to make it more flexible in following local gradients– Random restart search

Tabu Search

• A variant of hill-climbing search that attempts to reduce the chance of revisiting the same states– Idea:

• Keep a “Tabu” list of states that have been visited in the past.

• Whenever a node in the local neighborhood is found in the tabu list, remove it from consideration (even if it happens to have the best “heuristic” value among all neighbors)

– Properties: • As the size of the tabu list grows, hill-climbing will asymptotically

become “non-redundant” (won’t look at the same state twice)

• In practice, a reasonable sized tabu list (say 100 or so) improves the performance of hill climbing in many problems

Random restart search

Variant of depth-first search where

• When a node is expanded, its children are first randomly permuted before being introduced into the open list– The permutation may

well be a “biased” random permutation

• Search is “restarted” from scratch anytime a “cutoff” parameter is exceeded– There is a “Cutoff”

(which may be in terms of # of backtracks, #of nodes expanded or amount of time elapsed)

•Because of the “random” permutation, every time the search is restarted, you are likely to follow different paths through the search tree. This allows you to recover from the bad initial moves.

•The higher the cutoff value the lower the amount of restarts (and thus the lower the “freedom” to explore different paths).

•When cutoff is infinity, random restart search is just normal depth-first search—it will be systematic and complete•For smaller values of cutoffs, the search has higher freedom, but no guarantee of completeness

•A strategy to guarantee asymptotic completeness:

•Start with a low cutoff value, but keep increasing it as time goes on.

•Random restart search has been shown to be very good for problems that have a reasonable percentage of “easy to find” solutions (such problems are said to exhibit “heavy-tail” phenomenon). Many real-world problems have this property.

Leaving goal-based search…

• Looked at– Systematic Search

• Blind search (BFS, DFS, Uniform cost search, IDDFS)• Informed search (A*, IDA*; how heuristics are made)

– Local search• Greedy (Hill climbing)• Asymptotically complete (Hill climbing with random restart;

biased random walk or simulated annealing)• Multi-seed hill-climbing

– Genetic algorithms…

MDPs as Utility-based problem solving agents

[can generalize to have action costs C(a,s)]

If Mij matrix is not known a priori, then we have a reinforcement learning scenario..

Think of these as h*() values…Called value function U*

Think of these as related to h* values

What does a solution to an MDP look like?

• The solution should tell the optimal action to do in each state (called a “Policy”)

– Policy is a function from states to actions– Not a sequence of actions anymore

• Needed because of the non-deterministic actions – If there are |S| states and |A| actions that we can

do at each state, then there are |A||S| policies• How do we get the best policy?

– Pick the policy that gives the maximal expected reward

– For each policy • Simulate the policy (take actions suggested by

the policy) to get behavior traces• Evaluate the behavior traces• Take the average value of the behavior traces.

• Qn: Is there a simpler way than having to evaluate |A||S| policies? – Yes…

Think of these as h*() values…Called value function U*

Think of these as related to h* values

Policies change with rewards..

- -

(Value)

.8

.1.1

Why are values coming down first?Why are some states reaching optimal value faster?

Updates can be done synchronously OR asynchronously --convergence guaranteed as long as each state updated infinitely often

.8

.1.1

Policies converge earlier than values

Given a utility vector Ui we can compute the greedy policy ui

The policy loss of is ||UU*|| (max norm difference of two vectors is the maximum amount by which they differ on any dimension)

So search in the space of policies

We can either solve the linear eqns exactly, or solve them approximately by running the value iteration a few times (the update wont have max factor)

Incomplete observability(the dreaded POMDPs)

• To model partial observability, all we need to do is to look at MDP in the space of belief states (belief states are fully observable even when world states are not)

– Policy maps belief states to actions• In practice, this causes (humongous) problems

– The space of belief states is “continuous” (even if the underlying world is discrete and finite).

– Even approximate policies are hard to find (PSPACE-hard). • Problems with few dozen world states are hard to solve currently

– “Depth-limited” exploration (such as that done in adversarial games) are the only option…

LALA Land—Don’t look below.

9/16 scan by kalpesh shah. what is needed: --a neighborhood function the larger the neighborhood you...

Documents

greedy hill

random state

random moves

annealing slide

complete random restart

random new seed

galieleo slide

complete random walk