do humans make good observers – and can they reliably fuse information? dr. mark bedworth mv...
TRANSCRIPT
Do Humans make Good Observers – and can they
Reliably Fuse Information?
Dr. Mark BedworthMV Concepts Ltd.
What we will cover:
• The decision making process• The information fusion context• The reliability of the process• Where the pitfalls lie• How not to get caught out• Suggestions for next steps
What we will not cover:• Systems design and architectures• Counter-piracy specifics• Inferencing frameworks• Tracking• Multi-class problems• Extensive mathematics• In fact… most of the detail!
Our objectives:• Understanding of the context of data fusion
for decision making• Quantitative grasp of a few key theories• Appreciation of how to put the theory into
practice• Knowledge of where the gaps in theory
remain
Warning
This presentation containsaudience participation
experiments
Decision Making
• To make an informed decision:– Obtain data on the relevant factors– Reason within the domain context– Understand the possible outcomes– Have a method of implementation
Boyd Cycle• This is captured more formally as a
fusion architecture:– Observe: acquire data– Orient: form perspective– Decide: determine course of action– Act: put into practice
• Also called OODA loop
OODA loop
Decide
Observe
ActOrient
Adversarial OODA Loops
Owninformation
Adversaryinformation
Decide
Observe
ActOrient
Decide
Observe
OrientAct
Physical world
Winning the OODA Game
• To achieve dominance:– Make better decisions– In a more timely manner– And implement more effectively
Dominance History• Action dominance (-A)
– Longer range, more destructive, more accurate weapons
• Observation dominance (O-)– Longer range, more robust, more accurate
sensors• Information dominance (-O-D-)
– More timely and relevant information with better support to the decision maker
Information DominancePart One: Orientation
“Having acquired relevant data;to undertake reasoning about the data within the domain context to form aperspective of the current situation;so that an informed decision cansubsequently be made”
A number of approaches• Fusion of hard decisions
– Majority rule– Weighted voting– Maximum a posteriori fusion– Behaviour knowledge space
• Fusion of soft decisions– Probability fusion
Reasoning Frameworks• Boolean
– Truth and falsehood
• Fuzzy (Zadeh)– Vagueness
• Evidential (Dempster-Shafer)– Belief and ignorance
• Probabilistic (Bayesian)– Uncertainty
Probability theory• 0 ≤ P(H) ≤ 1• if P(H)=1
then H is certain to occur• P(H) + P(~H) = 1
either H or not-H is certain to occur (negation rule)
• P(G,H) = P(G|H) P(H) = P(H|G) P(G)the joint probability is the conditional probability multiplied by the prior (conjunction rule)
Bayes’ Theorem
Posteriorprobability
Likelihood Priorprobability
Marginallikelihood
)()()|(
)|(XP
HPHXPXHP
Perspective Calculation
• Usually the marginal likelihood is awkward to compute– But is not needed since it is independent
of the hypothesis– Compute the products of the likelihoods
and priors; then normalise over hypotheses
Human Fusion Experiment (1)
• A threat is present 5% of the time it is looked for
• Observers A and B both independently look for the threat
• Both report an absence of the threat with posterior probabilities 70% and 80%
• What is the fused probability that the threat is absent?
Human Fusion Experiment (2)
• Threat absent ≡ the hypothesis (H)• P(~H) = 0.05• P(H) = 0.95
• P(H|XA) = 0.70
• P(H|XB) = 0.80
• P(H|XA,XB) = ?
Human Fusion Experiment (3)No threat
H=1.00
PriorP(H)=0.95
Report BP(H|XB)=0.80
Report AP(H|XA)=0.70
Conditional Independence• Assume the data to be conditionally independent
given the class:
• Note that this does not necessarily imply:
)|()|()|,( HBPHAPHBAP
)()(),( BPAPBAP
Conditionally Independent
Sensor 1 measurement
Sen
sor
2 m
easu
rem
ent
Conditionally independent
Sensor 1 measurement
Sen
sor
2 m
easu
rem
ent
Not conditionally independent
Sensor 1 measurement
Sen
sor
2 m
easu
rem
ent
Not conditionally independent
Sensor 1 measurement
Sen
sor
2 m
easu
rem
ent
Fusion: Product Rule (1)
• We require:
• From Bayes’ theorem:
),|( BAHP
),()()|,(
),|(BAP
HPHBAPBAHP
Fusion: Product Rule (2)
• We assume conditional independence so may write:
),()()|()|(
),|(BAP
HPHBPHAPBAHP
Fusion: Product Rule (3)
• Applying Bayes’ theorem again:
• And collecting terms:
),()(
)()()|(
)()()|(
),|(BAP
HPHP
BPBHPHP
APAHPBAHP
),()()(
)()|()|(
),|(BAPBPAP
HPBHPAHP
BAHP .
Fusion: Product Rule (4)
• We may drop the marginal likelihoods again and normalise:
)()|()|(
),|(HP
BHPAHPBAHP
Posteriorprobability
Priorprobability
Posteriorprobability
Fused posteriorprobability
Multisource Fusion Rule
• The generalisation of this fusion rule to multiple sources:
• This is commutative
11
)(
)|()|(
N
N
ii
HP
xHPXHP
Commutativity of Fusion (1)
)()(
)|(
)(
)|(
)(
)|()|(
11
11
11
HPHP
xHP
HP
xHP
HP
xHPXHP
S
S
ii
R
R
ii
N
N
ii
Commutativity of Fusion (2)
• The probability fusion rule commutes:– It doesn’t matter what the architecture is– It doesn’t matter if it is single stage or
multi-stage
Experiment: Results
• Normalising gives:P(H|A,B) = 0.33 P(~H|A,B) = 0.67
59.095.0
80.070.0)(
)|()|(),|(
×
HPBHPAHP
BAHP
20.105.0
20.030.0)(~
)|(~)|(~),|(~
×
HPBHPAHP
BAHP
Human Fusion Experiment (3)No threat
H=1.00
PriorP(H)=0.95
Report BP(H|XB)=0.80
Report AP(H|XA)=0.70
Fusion A,BP(H|XA,XB)=0.33
Why was that so hard?
• Most humans find it difficult to intuitively fuse uncertain information– Not because they are innumerate– But because they cannot comfortably
balance the evidence (likelihood) with their predisposition (prior)
Prior Sensitivity (1)
• If the issue is with the priors – do they matter?
• Can we ignore the priors?• Do we get the same final decision if
we change the priors?
Prior Sensitivity (2)• If P(H|A) = P(H|B)• What value of P(H)
makes P(H|A,B) = 0.5?
22
2
)|(1)|(
)|()(
AHPAHP
AHPHP
Prior Sensitivity (3)
0
0.1
0.20.3
0.4
0.5
0.6
0.70.8
0.9
1
0 0.2 0.4 0.6 0.8 1
P(H|A)=P(H|B)
P(H)
Prior Sensitivity (4)
• Between 0.2 < P(H|A) < 0.8 the prior has a significant effect
• Carefully define the domain over which the prior is evaluated
• Put effort into using a reasonable value
Sensitivity to Posterior Probability
• What about the posterior probabilities delivered to the fusion centre?
• Can we endure errors here?• Which types of errors hurt most?
Probability Experiment (1)
• 10 estimation questions• Write down lower and upper bound• So that you are 90% sure it covers the
actual value• All questions relate to the highest
point in various countries (in metres)
Probability experiment (2)
• Winner defined as:– Person with most answers correct– Tie-break decided by smallest sum of
ranges (for all 10 questions)
• Pick a range big enough• But not too big!
The questions:-1. Australia2. Chile3. Cuba4. Egypt5. Ethiopia6. Finland7. Hong Kong8. India9. Lithuania10. Poland
The answers:-1. Australia (2228m)2. Chile (6893m)3. Cuba (1974m)4. Egypt (2629m)5. Ethiopia (4550m)6. Finland (1324m)7. Hong Kong (958m)8. India (8586m)9. Lithuania (294m)10. Poland (2499m)
Overconfidence (1)• Large trials show that most people get
fewer than 40% correct• Should be 90% correct!• People are often overconfident
(even when primed that they are being tested!)
Overconfidence (2)
Actual probability
Decl
are
d p
robab
ility
overconfident
overconfident
underconfident
underconfident
wrong
wrong
Confidence Amplification(1)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Input class probability
Fus
ed
cla
ss p
rob
ab
ility
2 sensors3 sensors4 sensors5 sensors
Fu
sed
cla
ss p
rob
ab
ility
Input class probability
Confidence Amplification(2)
Veto Effect• If any local decision-maker outputs a
probability of close to zero for a class then the fused probability is close to zero– even if all the other decision-makers output a
high probability– about 40% of the response surface for two
sensors is either <0.1 or >0.9– this rises to 50% for three sensors and nearly
60% for four
Moderation of probabilities
• If we suspect that the posterior probabilities are overconfident then we should moderate them– By building it into automatic techniques– By allowing for it if this is not possible
Gaussian Moderation
• For Gaussian classifiers the Bayesian correction is analytically tractable
• By integrating over the mean and variance rather than taking the maximum likelihood value
Student t-distribution(1)• For Gaussian data this is:
• Which is a “Student” t-distribution:
)|,(),|()|( 2
0
22 DPxPddDxP ii
2
2
22 1
ˆ)1(
)ˆ(.
21
)1(ˆ
2),ˆ,ˆ|(
N
ii
N
xN
N
N
NxP
Student t-distribution(2)
-10 -5 0 5 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Measurement value
Lik
elih
oo
d o
f da
taMeasurement value
Like
lihoo
d of
dat
a-10 -5 0 5 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Measurement value
Lik
elih
oo
d o
f da
ta
Student t-distribution(3)
-10 -5 0 5 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Measurement value
Pro
ba
bili
ty o
f cla
ss 1
Pro
babi
lity
of c
lass
1
-10 -5 0 5 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Measurement valueP
rob
ab
ility
of c
lass
1P
roba
bilit
y of
cla
ss 1
Approximate Moderation(1)
• We can get a similar effect at the fusion centre using the posteriors– Convert back to “likelihoods” by dividing by the
prior– Add a constant to everything– Convert back to “posteriors” by multiplying by
the prior– Renormalise
Approximate Moderation(2)
• How much to add depends on the source of the posterior probabilities– Correction factor for each source– Learned from data
Other Issues
• Conditional independence not holding• Information incest• Missing data• Communication errors• Asynchronous information
Information DominancePart Two: Decision
“Having reasoned about the datato form a perspective of the current situation; to make an informed decision which optimises the desirability of the outcome”
Deciding what to do
“Decision theory is trivial, apart from the details”
• Select an action that maximises the expected utility of the outcome
Utility functions?
• A utility function describes how desirable each possible outcome is– People are sometimes irrational– Desirability cannot be captured by a
single valued function– Allais paradox
Utility Experiment(1)
1. Guaranteed €1 million2. 89% chance of €1 million
10% chance of €5 million1% chance of nothing
Utility Experiment(2)
1. 89% chance of nothing11% chance of €1 million
2. 90% chance of nothing10% chance of €5 million
Utility Experiment(3)
• If you prefer 1 to 2 on the first slideYou should prefer 1 to 2 on the second slide as well
• If not you are acting irrationally…
Decision Theory• Assume we are able to construct a utility
function (or at least get our superior to define one!)
• Enumerate the possible actions– Use our fused probabilities to weight the utility of
the possible outcomes– Choose the action for which the expected utility
of the outcome is greatest
Timing the decision
• What about timing?• When should the decision be made?
– If we wait then maybe the (fused) probabilities will be more accurate
– Or the action will be more effective
Explore versus Exploit• By waiting you can explore the situation• By stopping you can exploit the situation• Stopping rule
– Sequential analysis– SPRT– Bayesian optimal stopping
Experiment with timing
• I will show you 20 numbers• They are drawn from the same
(uniform) distribution• Select the highest value• But no going back• A bit like ¡Allá tú!
Experiment with timing(1)
131
Experiment with timing(2)
16
Experiment with timing(3)
125
Experiment with timing(4)
189
Experiment with timing(5)
105
Experiment with timing(6)
172
Experiment with timing(7)
39
Experiment with timing(8)
94
Experiment with timing(9)
57
Experiment with timing(10)
133
Experiment with timing(11)
52
Experiment with timing(12)
69
Experiment with timing(13)
7
Experiment with timing(14)
242
Experiment with timing(15)
148
Experiment with timing(16)
163
Experiment with timing(17)
23
Experiment with timing(18)
139
Experiment with timing(19)
146
Experiment with timing(20)
211
The answer…
• How many people chose 242?• Balance between collecting data on
how big the numbers might be (exploration)and actually picking a big number(exploitation)
The 1/e Law(1)
• Consider a rule of the form:
Observe M and remember the best value (V)
Observe remaining N-M and pick the first that exceeds V
The 1/e Law(2)
• It can be shown that the optimum value for M is N/e
• And that for this rule the probability of selecting the maximum is at least 1/e
• Even for huge values of N
Time Pressure (1)
• Individuals tend to make the decision too early
• Committees tend to leave the decision too late
Time Pressure (2)
• Lecturers tend to overrun their time slot!
Time Pressure (3)• Apologies for skipping over so much of the
detail• Some of the other areas that warrant
mention:– Game theory– Sensor management– Graphical models– Cognitive inertia– Inattentional blindness
Please feel freeto contact me
www.mv-concepts.com
Or just come and introduce yourself…
Thank you!
Questions…