learning perceptual causality from video · • needed: learn causality from video, integrating st...

24
Learning Perceptual Causality from Video Amy Fire and Song-Chun Zhu Center for Vision, Cognition, Learning, and Art UCLA 1

Upload: others

Post on 20-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Learning Perceptual Causality from Video

    Amy Fire and Song-Chun Zhu Center for Vision, Cognition, Learning, and Art

    UCLA

    1

  • Ideally: Learn Causality from Raw Video

    2

  • Inference Using Learned Causal Structure

    • Answer why events occurred

    • Joint STC: Infer misdetections and hidden objects/actions

    • Infer triggers, goals, and intents

    3

    a) Input: Video b) Event Parsing c) STC-Parsing d) Inference Over Time

    Agent

    Actions

    Fluents

    Action

    Hidden

    Fluent

    Causal

    Link

    UNKNOWN

    Time

    ON

    OFF

    THIRSTY

    NOT

    Lig

    ht

    Agen

    t

    Time Time

    Drink

    Flip

    Switch

  • But…

    OBSERVATION CAUSALITY

    (generally)

    4

  • SO…WHERE ARE WE NOW?

    5

  • Vision Research and Causal Knowledge

    • Use pre-specified causal relationships for action detection – E.g., PADS (Albanese, et al. 2010)

    – Model Newtonian mechanics (Mann, Jepson, and Siskind 1997)

    • Use causal measures to aid action detection – E.g., Prabhakar, et al. 2010

    • Use infant perceptions of motion to learn causality – Using cognitive science (Brand 1997)

    • Needed: Learn causality from video, integrating ST learning strategies at pixel level

    6

  • Causality and Video Data: Often Disjoint

    • Learning Bayesian networks – Constraint satisfaction (Pearl 2009)

    – Bayesian formulations (Heckerman 1995)

    – Intractable on vision sensors

    • Commonsense reasoning (Mueller 2006) – first order logic. – Do not allow for ambiguity/probabilistic solutions

    • MLNs (Richardson and Domingos 2006) – Intractable

    – Used for action detection (Tran and Davis 2008)

    • KB formulas not learned

    7

  • MOVING FORWARD: OUR PROPOSED SOLUTION

    8

  • Cognitive Science as a Gateway: Perceptual Causality

    • Causal Induction from Observation in Infancy – Agentive actions are causes (Saxe, Tenenbaum, and Carey 2005)

    – Co-occurrence of events and effects (Griffiths and Tenenbaum 2005)

    – Temporal lag between the two is short (Carey 2009)

    – Cause precedes effect (Carey 2009)

    • Note: NOT the same as necessary and sufficient causes

    9

    Learning Perceptual Causality from VideoAmy Fire and Song-Chun Zhu

    Center for Vision, Cognition, Learning and Art (VCLA), University of California, Los Angeles

    I N TROD UCTI ON

    Goal: Learn causality from raw video

    M otivation: Increased capacity for inferencea) Input: Video b) Event Parsing c) STC-Parsing d) Inference Over Time

    Agent

    Actions

    Fluents UNKNOWN

    Time

    ON

    OFF

    THIRSTY

    NOT

    Lig

    ht

    Agen

    t

    Time Time

    Drink

    Flip

    Switch

    1. Answer why events occur

    2. Correct misdetections and infer hidden

    objects/ actions

    3. Infer triggers, goals, and intents

    However:

    OBSERVATION CAUSALITY

    (generally)

    CAUSALI TY I N V I SI ON

    Works in vision that incorporate causality can

    be categorized as

    1. Using pre-specified causal relationships

    for action detection

    2. Using causal measuresto aid action detec-

    tion

    3. Using a pre-specified grammar for learn-

    ing causality

    Learning causality from video is largely miss-

    ing from the vision literature.

    CAUSALI TY AN D V I D EO D ATA

    The focus of causality research is often disjoint

    from the needs of a vision system:

    1. Learning causal networks via constraint

    satisfaction or Bayesian methods: In-

    tractable on vision sensors

    2. First-order logic: Not probabilistic

    3. Markov logic networks: Intractable, not

    learned

    PERCEPTUAL CAUSALI TY

    Cognitive science research suggests infants use heuristics in judging causal relationships:

    1. Agentive actions are causes

    2. Measure co-occurrence

    cr :

    Action ¬Action

    Effect c0 c1¬Effect c2 c3

    3. Temporal lag between the two is short

    Time(Action) − Time(Effect) < ✏

    4. Cause precedes effect

    Time(Action) − Time(Effect) > 0

    To learn perceptual causality in video, we restrict co-occurrence of detected events and effects to

    these heuristics.

    CAUSES AN D EFFECTS: TH E CAUSAL AN D -OR GRAPH

    Effects: Fluents are time-varying statuses of objects.

    t

    Light On Inertially Light

    Fluent

    ON

    OFF

    Light Off Inertially Light Turns Off

    Light Turns On

    Causes: Actions suggest an And-Or representation.

    And-Nodes

    Compose

    Single Causes

    A

    Approach

    Switch Flip

    Switch

    a a

    Or-Nodes

    Give

    Alternatives

    On Inertially

    A

    a

    Pairing cause and effect: Fluent changes are matched

    with corresponding causing actions. In the absence of

    change-inducing actions, fluent values are causally at-

    tributed to the inertial action.

    The Causal And-Or Graph

    Light fluent

    on off

    a01 a01

    OR OR

    AND AND

    GROUN D I N G CAUSALI TY ON V I D EO

    Fluent changes in the Causal And-Or Graph are detected using classifiers. Actions are detected as

    instances from the Temporal And-Or Graph, which are grounded on relationships between objects.

    Objects are detected on the Spatial And-Or Graph using templates.

    The Temporal And-Or Graph

    fluent1

    object1

    A

    parent(A)

    children(A)

    terminates

    object2

    R(A)

    fluent2

    S-AOG

    fragment

    T-AOG fragment C-AOG

    fragment

    f1

    f2 O2

    O1

    The Spatial And-Or Graph

    parent(o)

    children(o)

    fluents

    f1

    actions

    terminates

    grounding

    templates(O)

    O

    S-AOG fragment

    f2 f3

    A

    A1

    A2

    A3

    T-AOG

    fragment

    C-AOG fragment

    T1(o)

    Grounded on Pixels

    PRI N CI PLED LEARN I N G

    Information Projection:

    True contingency

    table distribution

    Initial

    distribution

    Select cause/effect relationship

    that maximizes distance (KL)

    Repeat

    IEIEp fp 111 : crcr

    IEIEp fp 222 : crcr

    Match the statistics of

    the contingency table

    KL f || p( ) = KL f || p+( ) + KL p+ || p( )

    M odel Pursuit: Incrementally pursue a model,

    adding a contingency table at each iteration:

    p+ (pg) =1

    z+p(pg) exp (− hλ+ cr + i )

    Prop. 1: Matching statistics on the model to the

    observed data, Ep (cr + ) = E f (cr + ), gives

    λ+ ,i = log

    ✓hi

    h0·

    f 0

    f i

    Prop. 2: Adding causal relations

    cr + = argmaxcr

    KL(p+ ||p) = argmaxcr

    KL(f ||h)

    PRELI M I N ARY RESULTS

    Pursuit orders of causal relations.

    Hierarchical Example: The Locked Door

    Confounded Example: The Elevator

    Our method acquires true causes before non-

    causes, outperforming Hellinger ’s Chi-Square

    and Treatment Effect.

    CON TACT I N FORM ATI ON

    Web http:/ / www.stat.ucla.edu/ ~amyfire

    Email [email protected]

  • MODIFIED GOAL: LEARN AND INFER PERCEPTUAL CAUSALITY

    10

  • What are the effects? Fluent changes.

    11

    t

    Door Opens

    Door Closed Inertially

    Door Closes

    Light On Inertially Light

    ON

    OFF

    Door

    OPEN

    CLOSED

    Light Off Inertially Light Turns Off

    Door Open

    Inertially

    Light Turns On

    • Fluents are time-varying statuses of objects – Mueller – Commonsense Reasoning 2006

  • What are the causes? Actions.

    • Probabilistic Graphical Representation for Causality – And-Or Graph

    12

    And-Nodes

    Compose Single

    Causes (multiple

    sub-actions)

    A

    Unlock

    Door

    Push

    Door

    a a

    Or-Nodes Give

    Alternative

    Causes Open Other

    Side

    A

    a

  • Causal AOG

    13

    Light fluent

    on off

    a01 a01

    OR OR

    AND AND

  • Connecting Temporal to Causal and Spatial

    14

    fluent1

    object1

    A

    parent(A)

    children(A)

    terminates

    object2

    R(A)

    fluent2

    S-AOG

    fragment

    T-AOG fragment C-AOG

    fragment

    f1

    f2 O2

    O1

  • Grounding on Pixels: Connecting S/T/C-AOG

    15

    parent(O)

    children(O)

    fluents

    f1

    actions

    terminates

    grounding

    templates(O)

    O

    S-AOG fragment

    f2 f3

    A

    A1

    A2

    A3

    T-AOG

    fragment

    C-AOG fragment

    T1(o)

    Grounded on Pixels

  • LEARNING PERCEPTUAL CAUSALITY Preliminary Theory

    16

  • Principled Approach: Information projection

    DellaPietra, DellaPietra,Lafferty, 97

    Zhu, Wu, Mumford, 97

    17

    True contingency table distribution

    Initial distribution

    Repeat

    IEIEp fp 222 : crcr

    Select cause/effect relationship that makes p1 closest (KL) to p to preserve learning history while maximizing Info. Gain

    KL f || p( ) = KL f || p+( ) + KL p+ || p( )

    max KL f || p( ) - KL f || p+( )éë ùû= maxKL p+ || p( )

    IEIEp fp 111 : crcr Match the statistics of the contingency table

  • Learning Pursuit: Add Causal Relations

    • Model Pursuit

    • Proposition 1: Find parameters – Model formed by min KL (p+ || p), matching statistics

    • Proposition 2: Pursue cr.

    18

    fppppp k 10

    Ep+ cr+( ) = E f cr+( )

    cr,exp1

    pgpz

    pgp

    (On ST-AOG)

    cr Effect Effect

    Action co c2

    Action c1 c3

    l+,i = loghi

    h0×

    f0

    fi

    æ

    èç

    ö

    ø÷

    cr+ = argmaxcr

    KL p+ || p( ) = argmaxcr

    KL f || h( )

    hi is ci under p fi is ci from f

  • Selection from ST-AOG

    2121 AAAAAfffff

    1Af

    )( FT

    A

    A1 A2

    2Af

    1Af

    )( FT

    A

    A1 A2

    p p121

    )1( AAA fppff

    Or-node And-node

  • Performance vs. TE

    20

  • Performance vs. Hellinger χ2

    21

  • Increasing Misdetections (Simulation)

    22

    0% misdetection

    10% misdetection

    20% misdetection

  • STC-Parsing Demo

    23

  • Looking Forward:

    • Finish learning the C-AOG

    • Increase reasoning capacity of the C-AOG

    • Integrate experiment design

    24