oregon state university – cs430 intro to ai (c) 2003 thomas g. dietterich and devika subramanian1...

19
(c) 2003 Thomas G. Dietteri ch and Devika Subramanian 1 Oregon State University – CS430 Intro to AI Search-Based Agents Appropriate in Static Environments where a model of the agent is known and the environment allows prediction of the effects of actions evaluation of goals or utilities of predicted states Environment can be partially- observable, stochastic, sequential, continuous, and even multi-agent, but it must be static ! We will first study the deterministic, discrete, single-agent case.

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

(c) 2003 Thomas G. Dietterich and Devika Subramanian

1

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Search-Based Agents

Appropriate in Static Environments where a model of the agent is known and the environment allows prediction of the effects of actions evaluation of goals or utilities of predicted states

Environment can be partially-observable, stochastic, sequential, continuous, and even multi-agent, but it must be static!

We will first study the deterministic, discrete, single-agent case.

(c) 2003 Thomas G. Dietterich and Devika Subramanian

2

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Computing Driving Directions

Arad

Oradea

Zerind

Timisoara

Lugoj

Dobreta

Sibiu

Rimnicu Vilcea

Fagaras

Pitesti

Mehadia

Craiova

Bucharest

Giurgia

Urzicenl

Neamt

Iasi

Vasini

Hirsova

Eforie

71

75

118

140

151

99

80

111

70

75

120

146

97

138

101

211

85

90

142

92

87

98

86

You are here

You want to be here

(c) 2003 Thomas G. Dietterich and Devika Subramanian

3

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Search Algorithms

Breadth-First Depth-First Uniform Cost A* Dijkstra’s Algorithm

(c) 2003 Thomas G. Dietterich and Devika Subramanian

4

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI

Oradea (146)

71

Breadth-First

Arad (0)

75

Zerind (75)

118

Timisoara (118)

Lugoj (229)

111

Mehadia (299)

70

75Dobreta

(486)120

140Sibiu (140)

151

Rimnicu Vilcea (220)

80

Fagaras (239)99

Craiova (366)

146

Pitesti (317)97

138 Bucharest (450)

211

101

Detect duplicate path (291)

Detect new shorter path (418)

Detect duplicate path (455)

Detect duplicate path (504)

Detect new shorter path (374)

Detect duplicate path (494)

Detect duplicate path (197)

Breadth-FirstOradea (146)

71

Arad (0)

75

Zerind (75)

118

Timisoara (118)

Lugoj (229)

111

Mehadia (299)

70

75Dobreta

(486)120

140Sibiu (140)

151

Rimnicu Vilcea (220)

80

Fagaras (239)99

Craiova (366)

146

Pitesti (317)97

138 Bucharest (450)

211

101

Arad (0)

Sibiu (140)Zerind (75) Timisoara (118)

Arad (150) Oradea (146) Arad (280) Fagaras (239) Oradea (291) Arad (236) Lugoj (229)

Sibiu (197) Sibiu (338) Bucharest (450)

Rimnicu Vilcea (220)

Timisoara (340) Mehadia (299)Sibiu (300) Pitesti (317) Craiova (366)

Lugoj (369) Dobreta (374)Craiova (455)Rimnicu Vilcea (317) Bucharest (418) Rimnicu Vilcea (482) Dobreta (486) Pitesti (504)

Medhadia (449) Craiova (494)

Zerind (217)

(c) 2003 Thomas G. Dietterich and Devika Subramanian

6

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Formal Statement of Search

Problems

State Space: set of possible “mental” states cities in Romania

Initial State: state from which search begins Arad

Operators: simulated actions that take the agent from one mental state to another traverse highway between two cities

Goal Test: Is current state Bucharest?

(c) 2003 Thomas G. Dietterich and Devika Subramanian

7

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI General Search Algorithm

Strategy: first-in first-out queue (expand oldest leaf first)

(c) 2003 Thomas G. Dietterich and Devika Subramanian

8

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Leaf Selection Strategies

Breadth-First Search: oldest leaf (FIFO) Depth-First Search: youngest leaf (LIFO) Uniform Cost Search: cheapest leaf (Priority

Queue) A* search: leaf with estimated shortest total

path length g(x) + h(x) = f(x) where g(x) is length so far and h(x) is estimate of remaining length (Priority Queue)

(c) 2003 Thomas G. Dietterich and Devika Subramanian

9

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI A* Search

Let h(x) be a “heuristic function” that gives an underestimate of the true distance between x and the goal state Example: Euclidean distance

Let g(x) be the distance from the start to x, then g(x) + h(x) is an lower bound on the length of the optimal path

(c) 2003 Thomas G. Dietterich and Devika Subramanian

10

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Euclidean Distance Table

Arad 366 Mehadia 241

Bucharest 0 Neamt 234

Craiova 160 Oradea 380

Dobreta 242 Pitesti 100

Eforie 161 Rimnicu Vilcea 193

Fagaras 176 Sibiu 253

Giurgiu 77 Timisoara 329

Hirsova 151 Urziceni 80

Iasi 226 Vaslui 199

Lugoj 244 Zerind 374

(c) 2003 Thomas G. Dietterich and Devika Subramanian

11

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI A* Search

Arad (0+366=366)

Sibiu (140+253=393)Zerind (75+374=449) Timisoara (118+329=447)

Fagaras (239+176=415) Oradea (291+380=671)

Bucharest (450+0=450)

Rimnicu Vilcea (220+193=413)

Pitesti (317+100=417) Craiova (366+160=526)

Craiova (455+160=615)Bucharest (418+0=418)

All remaining leaves have f(x) ¸ 418, so we know they cannot have shorter paths to Bucharest

(c) 2003 Thomas G. Dietterich and Devika Subramanian

12

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Dijkstra’s Algorithm

Works backwards from the goal Each node keeps track of the shortest

known path (and its length) to the goal Equivalent to uniform cost search

starting at the goal No early stopping: finds shortest path

from all nodes to the goal

(c) 2003 Thomas G. Dietterich and Devika Subramanian

13

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Local Search Algorithms

Keep a single current state x Repeat

Apply one or more operators to x Evaluate the resulting states according to an

Objective Function J(x) Choose one of them to replace x (or decide

not to replace x at all)

Until time limit or stopping criterion

(c) 2003 Thomas G. Dietterich and Devika Subramanian

14

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Hill Climbing

Simple hill climbing: apply a randomly-chosen operator to the current state

If resulting state is better, replace current state

Steepest-Ascent Hill Climbing:

Apply all operators to current state, keep state with the best value

Stop when no successors state is better than current state

(c) 2003 Thomas G. Dietterich and Devika Subramanian

15

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Gradient Ascent

In continuous state spaces, x = (x1, x2, …, xn) is a vector of real values

Continuous operator: x := x+ x for any arbitrary vector x (infinitely many operators!)

Suppose J(x) is differentiable. Then we can compute the direction of steepest increase of J by the first derivative with respect to x, the gradient:

(c) 2003 Thomas G. Dietterich and Devika Subramanian

16

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Gradient Descent Search

Repeat Compute Gradient rJ Update x := x + rJ

Until rJ ¼ 0

is the “step size”, and it must be chosen carefully

Methods such as conjugate gradient and Newton’s method choose automatically

(c) 2003 Thomas G. Dietterich and Devika Subramanian

17

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Visualizing Gradient Ascent

If is too large, search may overshoot and miss the maximum or oscillate forever

(c) 2003 Thomas G. Dietterich and Devika Subramanian

18

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Problems with Hill Climbing

Local optima

Flat regions

Random restarts can give good results

(c) 2003 Thomas G. Dietterich and Devika Subramanian

19

Ore

gon

Sta

te U

nive

rsit

y –

CS

430

Intr

o to

AI Simulated Annealing

T = 100 (or some large value) Repeat

Apply randomly-chosen operator to x to obtain x′. Let E = J(x′) – J(x) If E > 0, /* J(x′) is better */

switch to x′ Else /* J(x′) is worse */

switch to x′ with probability exp [E/T] /* large negative steps are less likely */ T := 0.99 * T */ “cool” T */

Slowly decrease T (“anneal”) to zero Stop when no changes have been accepted for many moves Idea: Accept “down hill” steps with some probability to help

escape from local minima. As T ! 0 this probability goes to zero.