learning perceptual causality from video · • needed: learn causality from video, integrating st...
TRANSCRIPT
-
Learning Perceptual Causality from Video
Amy Fire and Song-Chun Zhu Center for Vision, Cognition, Learning, and Art
UCLA
1
-
Ideally: Learn Causality from Raw Video
2
-
Inference Using Learned Causal Structure
• Answer why events occurred
• Joint STC: Infer misdetections and hidden objects/actions
• Infer triggers, goals, and intents
3
a) Input: Video b) Event Parsing c) STC-Parsing d) Inference Over Time
Agent
Actions
Fluents
Action
Hidden
Fluent
Causal
Link
UNKNOWN
Time
ON
OFF
THIRSTY
NOT
Lig
ht
Agen
t
Time Time
Drink
Flip
Switch
-
But…
OBSERVATION CAUSALITY
(generally)
4
-
SO…WHERE ARE WE NOW?
5
-
Vision Research and Causal Knowledge
• Use pre-specified causal relationships for action detection – E.g., PADS (Albanese, et al. 2010)
– Model Newtonian mechanics (Mann, Jepson, and Siskind 1997)
• Use causal measures to aid action detection – E.g., Prabhakar, et al. 2010
• Use infant perceptions of motion to learn causality – Using cognitive science (Brand 1997)
• Needed: Learn causality from video, integrating ST learning strategies at pixel level
6
-
Causality and Video Data: Often Disjoint
• Learning Bayesian networks – Constraint satisfaction (Pearl 2009)
– Bayesian formulations (Heckerman 1995)
– Intractable on vision sensors
• Commonsense reasoning (Mueller 2006) – first order logic. – Do not allow for ambiguity/probabilistic solutions
• MLNs (Richardson and Domingos 2006) – Intractable
– Used for action detection (Tran and Davis 2008)
• KB formulas not learned
7
-
MOVING FORWARD: OUR PROPOSED SOLUTION
8
-
Cognitive Science as a Gateway: Perceptual Causality
• Causal Induction from Observation in Infancy – Agentive actions are causes (Saxe, Tenenbaum, and Carey 2005)
– Co-occurrence of events and effects (Griffiths and Tenenbaum 2005)
– Temporal lag between the two is short (Carey 2009)
– Cause precedes effect (Carey 2009)
• Note: NOT the same as necessary and sufficient causes
9
Learning Perceptual Causality from VideoAmy Fire and Song-Chun Zhu
Center for Vision, Cognition, Learning and Art (VCLA), University of California, Los Angeles
I N TROD UCTI ON
Goal: Learn causality from raw video
M otivation: Increased capacity for inferencea) Input: Video b) Event Parsing c) STC-Parsing d) Inference Over Time
Agent
Actions
Fluents UNKNOWN
Time
ON
OFF
THIRSTY
NOT
Lig
ht
Agen
t
Time Time
Drink
Flip
Switch
1. Answer why events occur
2. Correct misdetections and infer hidden
objects/ actions
3. Infer triggers, goals, and intents
However:
OBSERVATION CAUSALITY
(generally)
CAUSALI TY I N V I SI ON
Works in vision that incorporate causality can
be categorized as
1. Using pre-specified causal relationships
for action detection
2. Using causal measuresto aid action detec-
tion
3. Using a pre-specified grammar for learn-
ing causality
Learning causality from video is largely miss-
ing from the vision literature.
CAUSALI TY AN D V I D EO D ATA
The focus of causality research is often disjoint
from the needs of a vision system:
1. Learning causal networks via constraint
satisfaction or Bayesian methods: In-
tractable on vision sensors
2. First-order logic: Not probabilistic
3. Markov logic networks: Intractable, not
learned
PERCEPTUAL CAUSALI TY
Cognitive science research suggests infants use heuristics in judging causal relationships:
1. Agentive actions are causes
2. Measure co-occurrence
cr :
Action ¬Action
Effect c0 c1¬Effect c2 c3
3. Temporal lag between the two is short
Time(Action) − Time(Effect) < ✏
4. Cause precedes effect
Time(Action) − Time(Effect) > 0
To learn perceptual causality in video, we restrict co-occurrence of detected events and effects to
these heuristics.
CAUSES AN D EFFECTS: TH E CAUSAL AN D -OR GRAPH
Effects: Fluents are time-varying statuses of objects.
t
Light On Inertially Light
Fluent
ON
OFF
Light Off Inertially Light Turns Off
Light Turns On
Causes: Actions suggest an And-Or representation.
And-Nodes
Compose
Single Causes
A
Approach
Switch Flip
Switch
a a
Or-Nodes
Give
Alternatives
On Inertially
A
a
Pairing cause and effect: Fluent changes are matched
with corresponding causing actions. In the absence of
change-inducing actions, fluent values are causally at-
tributed to the inertial action.
The Causal And-Or Graph
Light fluent
on off
a01 a01
OR OR
AND AND
GROUN D I N G CAUSALI TY ON V I D EO
Fluent changes in the Causal And-Or Graph are detected using classifiers. Actions are detected as
instances from the Temporal And-Or Graph, which are grounded on relationships between objects.
Objects are detected on the Spatial And-Or Graph using templates.
The Temporal And-Or Graph
fluent1
object1
A
parent(A)
children(A)
terminates
object2
R(A)
fluent2
S-AOG
fragment
T-AOG fragment C-AOG
fragment
f1
f2 O2
O1
The Spatial And-Or Graph
parent(o)
children(o)
fluents
f1
actions
terminates
grounding
templates(O)
O
S-AOG fragment
f2 f3
A
A1
A2
A3
T-AOG
fragment
C-AOG fragment
T1(o)
Grounded on Pixels
PRI N CI PLED LEARN I N G
Information Projection:
True contingency
table distribution
Initial
distribution
Select cause/effect relationship
that maximizes distance (KL)
Repeat
IEIEp fp 111 : crcr
IEIEp fp 222 : crcr
Match the statistics of
the contingency table
KL f || p( ) = KL f || p+( ) + KL p+ || p( )
M odel Pursuit: Incrementally pursue a model,
adding a contingency table at each iteration:
p+ (pg) =1
z+p(pg) exp (− hλ+ cr + i )
Prop. 1: Matching statistics on the model to the
observed data, Ep (cr + ) = E f (cr + ), gives
λ+ ,i = log
✓hi
h0·
f 0
f i
◆
Prop. 2: Adding causal relations
cr + = argmaxcr
KL(p+ ||p) = argmaxcr
KL(f ||h)
PRELI M I N ARY RESULTS
Pursuit orders of causal relations.
Hierarchical Example: The Locked Door
Confounded Example: The Elevator
Our method acquires true causes before non-
causes, outperforming Hellinger ’s Chi-Square
and Treatment Effect.
CON TACT I N FORM ATI ON
Web http:/ / www.stat.ucla.edu/ ~amyfire
Email [email protected]
-
MODIFIED GOAL: LEARN AND INFER PERCEPTUAL CAUSALITY
10
-
What are the effects? Fluent changes.
11
t
Door Opens
Door Closed Inertially
Door Closes
Light On Inertially Light
ON
OFF
Door
OPEN
CLOSED
Light Off Inertially Light Turns Off
Door Open
Inertially
Light Turns On
• Fluents are time-varying statuses of objects – Mueller – Commonsense Reasoning 2006
-
What are the causes? Actions.
• Probabilistic Graphical Representation for Causality – And-Or Graph
12
And-Nodes
Compose Single
Causes (multiple
sub-actions)
A
Unlock
Door
Push
Door
a a
Or-Nodes Give
Alternative
Causes Open Other
Side
A
a
-
Causal AOG
13
Light fluent
on off
a01 a01
OR OR
AND AND
-
Connecting Temporal to Causal and Spatial
14
fluent1
object1
A
parent(A)
children(A)
terminates
object2
R(A)
fluent2
S-AOG
fragment
T-AOG fragment C-AOG
fragment
f1
f2 O2
O1
-
Grounding on Pixels: Connecting S/T/C-AOG
15
parent(O)
children(O)
fluents
f1
actions
terminates
grounding
templates(O)
O
S-AOG fragment
f2 f3
A
A1
A2
A3
T-AOG
fragment
C-AOG fragment
T1(o)
Grounded on Pixels
-
LEARNING PERCEPTUAL CAUSALITY Preliminary Theory
16
-
Principled Approach: Information projection
DellaPietra, DellaPietra,Lafferty, 97
Zhu, Wu, Mumford, 97
17
True contingency table distribution
Initial distribution
Repeat
IEIEp fp 222 : crcr
Select cause/effect relationship that makes p1 closest (KL) to p to preserve learning history while maximizing Info. Gain
KL f || p( ) = KL f || p+( ) + KL p+ || p( )
max KL f || p( ) - KL f || p+( )éë ùû= maxKL p+ || p( )
IEIEp fp 111 : crcr Match the statistics of the contingency table
-
Learning Pursuit: Add Causal Relations
• Model Pursuit
• Proposition 1: Find parameters – Model formed by min KL (p+ || p), matching statistics
• Proposition 2: Pursue cr.
18
fppppp k 10
Ep+ cr+( ) = E f cr+( )
cr,exp1
pgpz
pgp
(On ST-AOG)
cr Effect Effect
Action co c2
Action c1 c3
l+,i = loghi
h0×
f0
fi
æ
èç
ö
ø÷
cr+ = argmaxcr
KL p+ || p( ) = argmaxcr
KL f || h( )
hi is ci under p fi is ci from f
-
Selection from ST-AOG
2121 AAAAAfffff
1Af
)( FT
A
A1 A2
2Af
1Af
)( FT
A
A1 A2
p p121
)1( AAA fppff
Or-node And-node
-
Performance vs. TE
20
-
Performance vs. Hellinger χ2
21
-
Increasing Misdetections (Simulation)
22
0% misdetection
10% misdetection
20% misdetection
-
STC-Parsing Demo
23
-
Looking Forward:
• Finish learning the C-AOG
• Increase reasoning capacity of the C-AOG
• Integrate experiment design
24