learning perceptual causality from video · • needed: learn causality from video, integrating st...

Learning Perceptual Causality from Video

Amy Fire and Song-Chun Zhu Center for Vision, Cognition, Learning, and Art

UCLA

1

Ideally: Learn Causality from Raw Video

2

Inference Using Learned Causal Structure

• Answer why events occurred

• Joint STC: Infer misdetections and hidden objects/actions

• Infer triggers, goals, and intents

3

a) Input: Video b) Event Parsing c) STC-Parsing d) Inference Over Time

Agent

Actions

Fluents

Action

Hidden

Fluent

Causal

Link

UNKNOWN

Time

ON

OFF

THIRSTY

NOT

Lig

ht

Agen

t

Time Time

Drink

Flip

Switch

But…

OBSERVATION CAUSALITY

(generally)

4

SO…WHERE ARE WE NOW?

5

Vision Research and Causal Knowledge

• Use pre-specified causal relationships for action detection – E.g., PADS (Albanese, et al. 2010)

– Model Newtonian mechanics (Mann, Jepson, and Siskind 1997)

• Use causal measures to aid action detection – E.g., Prabhakar, et al. 2010

• Use infant perceptions of motion to learn causality – Using cognitive science (Brand 1997)

• Needed: Learn causality from video, integrating ST learning strategies at pixel level

6

Causality and Video Data: Often Disjoint

• Learning Bayesian networks – Constraint satisfaction (Pearl 2009)

– Bayesian formulations (Heckerman 1995)

– Intractable on vision sensors

• Commonsense reasoning (Mueller 2006) – first order logic. – Do not allow for ambiguity/probabilistic solutions

• MLNs (Richardson and Domingos 2006) – Intractable

– Used for action detection (Tran and Davis 2008)

• KB formulas not learned

7

MOVING FORWARD: OUR PROPOSED SOLUTION

8

Cognitive Science as a Gateway: Perceptual Causality

• Causal Induction from Observation in Infancy – Agentive actions are causes (Saxe, Tenenbaum, and Carey 2005)

– Co-occurrence of events and effects (Griffiths and Tenenbaum 2005)

– Temporal lag between the two is short (Carey 2009)

– Cause precedes effect (Carey 2009)

• Note: NOT the same as necessary and sufficient causes

9

Learning Perceptual Causality from VideoAmy Fire and Song-Chun Zhu

Center for Vision, Cognition, Learning and Art (VCLA), University of California, Los Angeles

I N TROD UCTI ON

Goal: Learn causality from raw video

M otivation: Increased capacity for inferencea) Input: Video b) Event Parsing c) STC-Parsing d) Inference Over Time

Agent

Actions

Fluents UNKNOWN

Time

ON

OFF

THIRSTY

NOT

Lig

ht

Agen

t

Time Time

Drink

Flip

Switch

1. Answer why events occur

2. Correct misdetections and infer hidden

objects/ actions

3. Infer triggers, goals, and intents

However:

OBSERVATION CAUSALITY

(generally)

CAUSALI TY I N V I SI ON

Works in vision that incorporate causality can

be categorized as

1. Using pre-specified causal relationships

for action detection

2. Using causal measuresto aid action detec-

tion

3. Using a pre-specified grammar for learn-

ing causality

Learning causality from video is largely miss-

ing from the vision literature.

CAUSALI TY AN D V I D EO D ATA

The focus of causality research is often disjoint

from the needs of a vision system:

1. Learning causal networks via constraint

satisfaction or Bayesian methods: In-

tractable on vision sensors

2. First-order logic: Not probabilistic

3. Markov logic networks: Intractable, not

learned

PERCEPTUAL CAUSALI TY

Cognitive science research suggests infants use heuristics in judging causal relationships:

1. Agentive actions are causes

2. Measure co-occurrence

cr :

Action ¬Action

Effect c0 c1¬Effect c2 c3

3. Temporal lag between the two is short

Time(Action) − Time(Effect) < ✏

4. Cause precedes effect

Time(Action) − Time(Effect) > 0

To learn perceptual causality in video, we restrict co-occurrence of detected events and effects to

these heuristics.

CAUSES AN D EFFECTS: TH E CAUSAL AN D -OR GRAPH

Effects: Fluents are time-varying statuses of objects.

t

Light On Inertially Light

Fluent

ON

OFF

Light Off Inertially Light Turns Off

Light Turns On

Causes: Actions suggest an And-Or representation.

And-Nodes

Compose

Single Causes

A

Approach

Switch Flip

Switch

a a

Or-Nodes

Give

Alternatives

On Inertially

A

a

Pairing cause and effect: Fluent changes are matched

with corresponding causing actions. In the absence of

change-inducing actions, fluent values are causally at-

tributed to the inertial action.

The Causal And-Or Graph

Light fluent

on off

a01 a01

OR OR

AND AND

GROUN D I N G CAUSALI TY ON V I D EO

Fluent changes in the Causal And-Or Graph are detected using classifiers. Actions are detected as

instances from the Temporal And-Or Graph, which are grounded on relationships between objects.

Objects are detected on the Spatial And-Or Graph using templates.

The Temporal And-Or Graph

fluent1

object1

A

parent(A)

children(A)

terminates

object2

R(A)

fluent2

S-AOG

fragment

T-AOG fragment C-AOG

fragment

f1

f2 O2

O1

The Spatial And-Or Graph

parent(o)

children(o)

fluents

f1

actions

terminates

grounding

templates(O)

O

S-AOG fragment

f2 f3

A

A1

A2

A3

T-AOG

fragment

C-AOG fragment

T1(o)

Grounded on Pixels

PRI N CI PLED LEARN I N G

Information Projection:

True contingency

table distribution

Initial

distribution

Select cause/effect relationship

that maximizes distance (KL)

Repeat

IEIEp fp 111 : crcr

IEIEp fp 222 : crcr

Match the statistics of

the contingency table

KL f || p( ) = KL f || p+( ) + KL p+ || p( )

M odel Pursuit: Incrementally pursue a model,

adding a contingency table at each iteration:

p+ (pg) =1

z+p(pg) exp (− hλ+ cr + i )

Prop. 1: Matching statistics on the model to the

observed data, Ep (cr + ) = E f (cr + ), gives

λ+ ,i = log

✓hi

h0·

f 0

f i

◆

Prop. 2: Adding causal relations

cr + = argmaxcr

KL(p+ ||p) = argmaxcr

KL(f ||h)

PRELI M I N ARY RESULTS

Pursuit orders of causal relations.

Hierarchical Example: The Locked Door

Confounded Example: The Elevator

Our method acquires true causes before non-

causes, outperforming Hellinger ’s Chi-Square

and Treatment Effect.

CON TACT I N FORM ATI ON

Web http:/ / www.stat.ucla.edu/ ~amyfire

Email [email protected]

MODIFIED GOAL: LEARN AND INFER PERCEPTUAL CAUSALITY

10

What are the effects? Fluent changes.

11

t

Door Opens

Door Closed Inertially

Door Closes

Light On Inertially Light

ON

OFF

Door

OPEN

CLOSED

Light Off Inertially Light Turns Off

Door Open

Inertially

Light Turns On

• Fluents are time-varying statuses of objects – Mueller – Commonsense Reasoning 2006

What are the causes? Actions.

• Probabilistic Graphical Representation for Causality – And-Or Graph

12

And-Nodes

Compose Single

Causes (multiple

sub-actions)

A

Unlock

Door

Push

Door

a a

Or-Nodes Give

Alternative

Causes Open Other

Side

A

a

Causal AOG

13

Light fluent

on off

a01 a01

OR OR

AND AND

Connecting Temporal to Causal and Spatial

14

fluent1

object1

A

parent(A)

children(A)

terminates

object2

R(A)

fluent2

S-AOG

fragment

T-AOG fragment C-AOG

fragment

f1

f2 O2

O1

Grounding on Pixels: Connecting S/T/C-AOG

15

parent(O)

children(O)

fluents

f1

actions

terminates

grounding

templates(O)

O

S-AOG fragment

f2 f3

A

A1

A2

A3

T-AOG

fragment

C-AOG fragment

T1(o)

Grounded on Pixels

LEARNING PERCEPTUAL CAUSALITY Preliminary Theory

16

Principled Approach: Information projection

DellaPietra, DellaPietra,Lafferty, 97

Zhu, Wu, Mumford, 97

17

True contingency table distribution

Initial distribution

Repeat

IEIEp fp 222 : crcr

Select cause/effect relationship that makes p1 closest (KL) to p to preserve learning history while maximizing Info. Gain

KL f || p( ) = KL f || p+( ) + KL p+ || p( )

max KL f || p( ) - KL f || p+( )éë ùû= maxKL p+ || p( )

IEIEp fp 111 : crcr Match the statistics of the contingency table

Learning Pursuit: Add Causal Relations

• Model Pursuit

• Proposition 1: Find parameters – Model formed by min KL (p+ || p), matching statistics

• Proposition 2: Pursue cr.

18

fppppp k 10

Ep+ cr+( ) = E f cr+( )

cr,exp1

pgpz

pgp

(On ST-AOG)

cr Effect Effect

Action co c2

Action c1 c3

l+,i = loghi

h0×

f0

fi

æ

èç

ö

ø÷

cr+ = argmaxcr

KL p+ || p( ) = argmaxcr

KL f || h( )

hi is ci under p fi is ci from f

Selection from ST-AOG

2121 AAAAAfffff

1Af

)( FT

A

A1 A2

2Af

1Af

)( FT

A

A1 A2

p p121

)1( AAA fppff

Or-node And-node

Performance vs. TE

20

Performance vs. Hellinger χ2

21

Increasing Misdetections (Simulation)

22

0% misdetection

10% misdetection

20% misdetection

STC-Parsing Demo

23

Looking Forward:

• Finish learning the C-AOG

• Increase reasoning capacity of the C-AOG

• Integrate experiment design

24

learning perceptual causality from video · • needed: learn causality from video, integrating st...

Documents