exploration strategies for learned probabilities in smart terrain dr. john r. sullins youngstown...
TRANSCRIPT
![Page 1: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/1.jpg)
Exploration Strategies for Learned Probabilities in
Smart Terrain
Dr. John R. Sullins
Youngstown State University
![Page 2: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/2.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
2
Problem Definition
• Agent given a map of world
• Map gives locations where goals may possibly be
• Different categories of locations have different probabilities
![Page 3: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/3.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
3
Learned Probabilities
Problem:• Agent does not know these probabilities• Agent must learn them from examples [a, b] of
that category
ai = number of past examples of category Ci where
goal has been present
bi = number of past examples of category Ci where
goal has not been present
![Page 4: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/4.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
4
Learning with Costs
• Agent must physically move to a target to know whether it meets goal
• Cost usually proportional to distance traveled
![Page 5: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/5.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
5
Learning with Costs
Knowledge gained by exploring target
Cost of exploring targettradeoff
Requires a rational strategy for exploration
![Page 6: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/6.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
6
Outline
• Learning as reducing future costs
• Beta functions and probabilistic smart terrain
• Defining an information gain function – Estimating extra distances traveled due to errors– Factoring in category prevalence
• Creating an influence map for agent movement
• Benchmark and empirical testing
![Page 7: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/7.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
7
Exploration Strategy
• Main idea: Exploration now reduces travel time in future
– t1 is instance of category C1 with prior knowledge [a1, b1]
– t2 is instance of category C2 with prior knowledge [a2, b2]
Agentt1 t2d d
![Page 8: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/8.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
8
Value of Information
• Rational action: Move to target in more probable category first
• Problem:Agent must estimate probabilities from examples
• Fewer examples Greater likelihood estimate wrong
Agentt1 t2d d
![Page 9: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/9.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
9
Value of Information
• Probabilities estimated from limited data: p1
estimate = 0.15 p2estimate = 0.2
– Agent will move towards t2
• Suppose actual probabilities different: p1
actual = 0.25 p2actual = 0.1
• Would have been better to move to t1 first
Agentt1t2
![Page 10: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/10.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
10
Value of Information
• Agent will have to backtrack to t1 if goal not met by t2
• Expected distance traveled will be greater than if moved towards t1 first
• Better estimates of probabilities less travel time
Agentt1
t2
![Page 11: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/11.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
11
Outline
• Learning as reducing future costs
• Beta functions and probabilistic smart terrain
• Defining an information gain function – Estimating extra distances traveled due to errors– Factoring in category prevalence
• Creating an influence map for agent movement
• Benchmark and empirical testing
![Page 12: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/12.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
12
Beta Distribution
• Estimate of probability category meets goal given examples [a, b] of category
beta[a, b]() = α a -1 b -1
• “Liklihood” the actual probability is given [a, b]
• Best estimate of actual probability = Exp(beta[a, b]() )
![Page 13: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/13.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
13
Beta Distribution
• “Narrows” as more examples explored
• More examples less error in estimate of
![Page 14: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/14.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
14
Probabilistic Smart Terrain
• Agent movement in worlds where targets have probability of meeting goal– pi : probability target i meets goal
– di : distance (in moves) from agent to target i
– Based on targets within dmax moves
• For each adjacent tile, computes expected distance to some target that meets goal
![Page 15: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/15.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
15
Probabilistic Smart Terrain
• Expected number of moves character must travel from x to target that meets goal
dmax
Dist(x) = Σ (1 – pi ) d di < d
Probability no target within d moves of x meets goal
(assumption of conditional independence)
Summed over all distancesup to some maximum dmax
(otherwise sum could be infinite)
![Page 16: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/16.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
16
Probabilistic Smart Terrain• Compute expected distance Dist(x) for all tiles x• Agent moves to adjacent tile with lowest Dist(x)
![Page 17: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/17.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
17
Outline
• Learning as reducing future costs
• Beta functions and probabilistic smart terrain
• Defining an information gain function – Estimating extra distances traveled due to errors– Factoring in category prevalence
• Creating an influence map for agent movement
• Benchmark and empirical testing
![Page 18: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/18.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
18
Simple Two-target Case
• Simple case where agent must “choose” between two targets to explore
– ti is instance of category Ci with prior knowledge [ai, bi]
– tJ is instance of category Cj with prior knowledge [aJ, bJ]
• Targets equidistant at distance d
• d is average distance between targets in world
Agentti tJdd
![Page 19: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/19.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
19
Estimating Distance Traveled
• Assume ti has higher estimated probability(Exp(beta[ai, bi](i) ) > Exp(beta[aj, bj](j) )
• Expected distance traveled: Dist(i, J) = d + 2d(1 - i) + (dmax - 3d) (1 - i) (1- J)
Agentti tJdd
Move to ti Backtrack to tj if ti does not meet goal
Case where neither ti nor tj meet goal
![Page 20: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/20.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
20
Defining an Error Function
i, J may take on many values– Likelihood of a
particular defined by beta()[a, b]
• Moving to ti first is error in cases where i < J
Ci
CJ
i J
![Page 21: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/21.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
21
Defining an Error Function
• Amount of error for given (i, J) defined as
ErrDist(i, J) = Dist(i, J) - Dist(J, i)
= 2d(J - i) if J > i
0 otherwise
Expected distance if move to ti first
Expected distance if move to tj first
![Page 22: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/22.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
22
Defining an Error Function
• Error weighted by likelihood of i, J (as defined by beta function)
ErrPair([ai, bi], [aJ, bJ]) =
0 0 ErrDist(i, J) beta[ai, bi](i) beta[aJ, bJ](J) i J
Total error possible given these examples of Ci and Cj
Summed over all possible combinations
of i, J weighted by their likelihoods
![Page 23: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/23.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
23
Value of Information
• Additional values of [a, b] narrow the beta distributions
• Narrow distributions allow less error
• P(i < J ) much
smaller
CJ
Ci
![Page 24: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/24.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
24
Value of Information
• Categories with similar [a, b] may still overlap
• However, i and j
will likely be similar even if i < j
• ErrDist(i, j) will be very small
Cj
Ci
i j
![Page 25: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/25.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
25
Outline
• Learning as reducing future costs
• Beta functions and probabilistic smart terrain
• Defining an information gain function – Estimating extra distances traveled due to errors– Factoring in category prevalence
• Creating an influence map for agent movement
• Benchmark and empirical testing
![Page 26: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/26.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
26
Category Prevalence
• Prioritize instances of more prevalent categories
– ti category Ci with |Ci| instances in world
– tJ category CJ with |CJ| instances in world
– |Ci| >> |CJ| (many more instances of Ci)
• More benefit to be gained by exploring ti
Agentti tJ
![Page 27: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/27.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
27
Category Pair Likilihood
• Agent is between two targets in different categories
• What is likelihood those categories are Ci and Cj?
• L(Ci, Cj) = |Ci| |CJ| + |Ci| |CJ| |Ctotal| (|Ctotal| - |Cj|) |Ctotal| (|Ctotal| - |CJ|)
• Ctotal = total number of targets in all categories
Agentti tJdd
![Page 28: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/28.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
28
Category Error Measure
• Total error measure for category Ci based on relationship to all other categories CJ :
– Error ErrPair([ai, bi], [aJ, bJ]) relative to that category
(based on overlap of their beta functions)
– Likelihood L(Ci, CJ) agent must choose between two
targets in those categories
ErrCat(Ci, [ai, bi]) = ErrPair([ai, bi], [aJ, bJ]) L(Ci, CJ) i ≠ J
![Page 29: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/29.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
29
Defining Information Gain
• Information gain from exploring instance of Ci
How incrementing [ai, bi] would decrease ErrCat(Ci, [ai, bi]) by narrowing the beta function
• Gain(Ci, [ai, bi]) ) = ErrCat(Ci, [ai, bi]) – ErrCat(Ci, [ai′, bi′])
Current error before target explored
Estimated error if target were explored
![Page 30: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/30.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
30
Defining Information Gain
• Problem: Do not know whether given target meets goal until explored
– Do not know whether it would increment ai or bi
• Solution: Estimate from current expected value Exp(beta[ai, bi](i))
[ai′, bi′] = [ai + Exp(beta[ai, bi](i)),
bi + (1 - Exp(beta[ai, bi](i)))]
![Page 31: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/31.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
31
Example of Information Gain
• Example: Information gain for [2, 6] and [4, 4]– Same prevalence, average distance = 10
New Examples
Category [4, 4]
Category [2, 6]
1 1.941 2.1192 1.597 1.7273 1.336 1.4354 1.133 1.2125 0.972 1.038
![Page 32: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/32.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
32
Prior Category Knowledge
• More existing examples Less valuable future examples become
• Preference given to categories about which less is known
![Page 33: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/33.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
33
Outline
• Learning as reducing future travel costs
• Beta functions and probabilistic smart terrain
• Defining an information gain function – Estimating extra distances traveled due to errors– Factoring in category prevalence
• Creating an influence map for agent movement
• Benchmark and empirical testing
![Page 34: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/34.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
34
Influence Maps
• Targets influence nearby agents– Influence = information gain
of target category
• Influence decreases with distance from target
• Agent moves in direction of increasing influence
![Page 35: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/35.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
35
Falloff Function
• Inverse function used to decrease influence over distance
Influence(t) = Gain(Ci, [ai, bi])) 1 + t / d
t = distance in tiles
d = average distance between targets
![Page 36: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/36.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
36
Combining Influences
• Question: How should influences from multiple targets be combined
• Goal: Prioritize exploring groups of targets
– |Ci| ≈ |Cj| ≈ |Ck|
– |[ai, bi]| ≈ |[aj, bj]| ≈ |[ak, bk]|
• Can quickly explore both ti and tk by moving left
Agentti tjtk
Prior information and prevalence similar
![Page 37: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/37.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
37
Additive Combined Influences
• Influences from targets in different categories added to compute total influence at a tile
• Inverse falloff function chosen to minimize possibility of local maxima in influence map
![Page 38: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/38.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
38
Influences in Single Category
• Information gain decreases for each target explored in same category
• Decrease must be factored into influence map
Agent ti1
ti3
ti2
2.1192.119
2.119
1.727
1.435
![Page 39: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/39.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
39
Computing Total Influence
• Influence at tile t from all targets in category Ci:
TotalInfluence(t, i) = Gain(Ci, [ai, bi], k) k 1 + tk / d– tk = distance to kth
nearest target
– Gain(Ci, [ai, bi], k) = expected information gain from kth example
• Influence at tile t from targets in all categories:
TotalInfluence(t) = Gain(Ci, [ai, bi], k) i k 1 + tk / d
![Page 40: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/40.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
40
Updating the Influence Map
• Influence map computed for all tiles in area of agent
• Agent moves in direction of increasing influence until some target ti reached
• Agent determines whether target meets goal, and either increments ai or bi for category Ci
• Information gain recomputed for all categories
• Influence map recomputed (with ti removed)
![Page 41: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/41.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
41
Updating the Influence Map
![Page 42: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/42.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
42
Outline
• Learning as reducing future travel costs
• Beta functions and probabilistic smart terrain
• Defining an information gain function – Estimating extra distances traveled due to errors– Factoring in category prevalence
• Creating an influence map for agent movement
• Benchmark and empirical testing
![Page 43: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/43.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
43
Prior Knowledge Benchmark
• Instance of category with knowledge [1, 2]• Instance of category with knowledge [2, 4]
– Category prevalence similar
• Agent should move towards instance of category with less knowledge
![Page 44: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/44.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
44
Category Prevalence Benchmark
• Instance of category with two instances• Instance of category with single instance
– Prior knowledge of both = [1, 2]
• Agent should move towards instance of category with greater prevalence
![Page 45: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/45.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
45
Much Closer Distance Benchmark
• Knowledge = [10, 15] and prevalence = 7• Knowledge = [8, 12] and prevalence = 8
• Even though further target has better information gain and prevalence, agent should move towards significantly closer targets
![Page 46: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/46.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
46
Large-scale Testing
• 30 x 20 world (with obstacles)
• 4 categories of targets
• Targets placed randomly for each trial
• Probability tile contains target = 0.05
![Page 47: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/47.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
47
Category Data
Preva-lence
Actual probability
Prior knowledge
[a, b]
A 0.2 0.1 [10, 90]
B 0.2 0.1 [1, 3]
C 0.3 0.25 [1, 5]
D 0.3 0.25 [25, 75]
High priority
due to information gain
Somewhat high priority due to category prevalence
![Page 48: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/48.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
48
Importance of Learning
• Limited category data can cause errors in estimated probabilities
• This can lead to incorrect decisions about which target to move to next
Actual P Prior knowledge
A 0.1 [10, 90]B 0.1 [1, 3]C 0.25 [1, 5]D 0.25 [25, 75]
Overestimates probability of B –moves towards instances too often
Underestimates probability of C – ignores instances too often
![Page 49: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/49.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
49
Does the Learning Strategy Work?
• 100 trials with targets randomly placed• For each trial, agent given 50 moves for learning
– Influence map generated– Agent followed influence map to target– Actual probabilities used to update [a, b] for that
category– Information gains updated and map recomputed
• Question: Which categories were explored most?
![Page 50: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/50.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
50
Does the Learning Strategy Work?
• Average number of each category explored per trial:
Category Average explored per trial
A 1.17
B 2.52
C 3.66
D 2.10
Greater information gain
Higher prevalence
![Page 51: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/51.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
51
Is the Learning Strategy Useful?
• Does the information gain strategy reduce future search time for targets that meet goals?
• Comparison of results to simpler “naïve” strategy– During learning phase, simply
move to closest target instead of computing information gains
![Page 52: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/52.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
52
Training and Testing
• Training phase:– Learning strategy (information gain or naïve) used to
move agent 50 moves
– Each time target in category Ci reached, update its [ai, bi] based on actual category probabilities
– Product of learning: estimated probabilities pi for each
category computed as Exp(beta[ai, bi](i)
![Page 53: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/53.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
53
Training and Testing
• Testing Phase:– Agent placed at every location in world
(536 non-wall tiles)
– Existing probabilistic smart terrain algorithm used to search for a target that meets goal from that point
• Based on estimated probabilities from training phase
• Question: How many moves were required on average to find a goal?
![Page 54: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/54.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
54
Results of Testing
• 100 trials using both naïve and information gain learning
• Information gain learning focused on categories about which less was known (B and C)
• More accurate estimated probabilities• Less travel time due to moving to wrong targets
Strategy Average tiles explored until goal found
Information gain 5.294Naive 6.473
![Page 55: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/55.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
55
Ongoing Work
• Learning while acting to meet goals– Agent must meet current needs
(which presumably have some urgency)– Agent must also explore to learn knowledge to better
meet future needs
TradeoffCosts of not meeting current needs while exploring
Costs of extra travel in future if exploration not done now
![Page 56: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/56.jpg)
John Sullins Youngstown State University
Exploration Strategies for Learned Probabilities in Smart Terrain
56
Ongoing Work
• Learning in hierarchical worlds– Agent does not know exact location of all targets– Agent only knows expected number in a given region– Will not know what region actually contains until move
to it
AExp(C1) = 3.2
Exp(C2) = 2.4
Exp(C1) = 1.7
Exp(C2) = 4.5
? ?
![Page 57: Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University](https://reader030.vdocument.in/reader030/viewer/2022032612/56649efe5503460f94c127b0/html5/thumbnails/57.jpg)
Exploration Strategies for Learned Probabilities in
Smart Terrain
Dr. John R. Sullins
Youngstown State University