juergen gall analyzing human behavior in video...

62
Juergen Gall Analyzing Human Behavior in Video Sequences

Upload: others

Post on 20-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Juergen Gall

Analyzing Human Behavior in

Video Sequences

Page 2: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Analyzing Human Behavior

Videos

Low level features, e.g., gradients, optical flow

Analyzing Human Behavior

Human Pose

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 2

Page 3: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

21 Actions from HMDB

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 3

928 clips, 33183 frames

HMDB51 (Kuehne et al, ICCV 2011)

Page 4: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Puppet Annotation

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 4

Page 5: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Joint-annotated HMDB (JHMDB)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 5

[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

Page 6: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Study with Annotated Data (2013)

• Large potential gain for pose feature

• Not with existing 2d human pose methods

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 6

given flow

+ ~11%

given mask

+ ~9%

pose features

+ ~20%

baseline given puppet flow given puppet mask given joint positions

Low Mid High

baseline

GT

[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

Page 7: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

CNNs for Pose Estimation

Stack CNNs:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 7

[ S.-E. Wei et al. Convolutional Pose Machines. CVPR 2016 ]

Page 8: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Coupled Action Recognition and Pose

Estimation

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 16

[ U. Iqbal et al. Pose for Action – Action for Pose. FG 2017 ]

Page 9: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Estimation in Videos

Video datasets for human pose in unconstrained videos

does not exist.

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 18

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 10: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Estimation in Videos

Video datasets for human pose in unconstrained videos

does not exist.

Unconstrained means

• Public available content from the Internet (e.g.

Youtube)

• Multiple persons in a video (no assumption about

position)

• Arbitrary number of visible joints (truncation and

occlusion)

• Large scale variations (unknown scale)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 19

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 11: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose-Track Dataset

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 20

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 12: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Joint-annotated HMDB (JHMDB)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 22

[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

Page 13: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose-Track Dataset

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 23

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 14: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose-Track Dataset

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 24

Page 15: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Challenge ICCV 2017

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 25

[ http://posetrack.net/workshops/iccv2017 ]

Page 16: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 26

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 17: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

• Predict body joints (CNN trained on MPII Pose)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 27

Page 18: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

• Predict body joints (CNN trained on MPII Pose)

• Build a graph with temporal and spatial edges

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 28

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 19: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 29

f’f f’’f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 20: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 30

f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 21: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Unaries: Confidences of detected joints

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 31

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 22: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Spatial binaries: Extract quadratic bounding box around

detection

Two cases:

• Different joint type:

• Logistic regression based on distance and orientation

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 32

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 23: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Spatial binaries: Extract quadratic bounding box around

detection

Two cases:

• Same joint type:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 33

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 24: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 34

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 25: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Temporal binaries: Compute optical flow (DeepMatching)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 35

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 26: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Temporal binaries: Compute optical flow (DeepMatching)

Logistic regression:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 36

f’f[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 27: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Solve integer linear program:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 37

f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 28: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Solve integer linear program:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 38

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 29: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Solve integer linear program:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 39

f’f f’’

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 30: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

• Spatial transitivity:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 40

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 31: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

• Spatial transitivity:

• Temporal transitivity:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 42

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 32: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

• Spatial transitivity:

• Temporal transitivity:

• Spatio-temporal trans.:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 43

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 33: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 44

Page 34: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

To obtain plausible pauses, constraints are added:

Spatio-temporal consistency:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 45

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 35: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

Estimate pose + person association over time:

• Predict body joints (CNN trained on MPII Pose)

• Build a graph with temporal and spatial edges

• Partition spatio-temporal graph

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 46

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 36: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Simultaneous Pose

Estimation and Tracking

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 47

Page 37: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Evaluation

• Pose estimation accuracy (mAP)

• Person association (MOTA)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 48

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 38: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Pose Track: Evaluation

• Pose estimation accuracy (mAP)

• Person association (MOTA)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 49

[ U. Iqbal et al. Pose-Track: Joint Multi-Person

Pose Estimation and Tracking. CVPR 2017 ]

Page 39: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Joint-annotated HMDB (JHMDB)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 50

[ H. Jhuang et al. Towards Understanding Action Recognition. ICCV 2013 ]

[ http://jhmdb.is.tue.mpg.de ]

Page 40: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Video Analysis for Studying the

Behavior of Mice

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 72

Page 41: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Recurrent Neural Networks

• Gated units (LSTM/GRU)

10 /9 /20 17 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 74

Page 42: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

• Fully supervised:

• Weakly supervised (transcripts)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 75

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 43: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 77

Page 44: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 78

• Represent an activity a like “spoon_powder” by latent

sub-activities s1(a) ,s2

(a),s3(a),…

• Optimal number of sub-activities is unknown:

• Many sub-activities for long activities

• Few sub-activities for short activities

s1(a) s2

(a) s3(a) s4

(a) s5(a) s6

(a)

Page 45: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Model

• RNN with Gated Recurrent Units (GRU)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 79

Page 46: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Model

• Hidden Markov Model (HMM) enforce fixed order of

sub-activities: s1(a) ,s2

(a),s3(a),…

• HMMs use probabilities of RNN as input

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 80

Page 47: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Model

• Hidden Markov Model (HMM) for each activity

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 81

Page 48: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Model

• The transcripts define the order of activities:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 82

Page 49: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Model

• The transcripts define the order of activities:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 83

Page 50: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Model

• The transcripts define the order of activities:

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 84

Page 51: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 85

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 52: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 86

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 53: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 87

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 54: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 88

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 55: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Weakly Supervised Learning

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 89

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 56: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Results

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 90

Page 57: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Results

• Accuracy on unseen sequences (video without

transcript)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 91

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 58: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Results

• Accuracy on unseen sequences (video without

transcript)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 92

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 59: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Results

• Accuracy on unseen sequences (video with

transcript)

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 93

[ A. Richard et al. Weakly Supervised Action Learning with RNN

based Fine-to-Coarse Modeling. CVPR 2017 ]

Page 60: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Research Unit - Anticipating Human

Behavior

20 .09 .2 01 6 Resear ch Uni t 2535 - Ant ic i p a t in g Hum an Behavior 94

[ https://pages.iai.uni-bonn.de/FOR2535 ]

Page 61: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Research Unit - Anticipating Human

Behavior

20 .09 .2 01 6 Resear ch Uni t 2535 - Ant ic i p a t in g Hum an Behavior 95

Page 62: Juergen Gall Analyzing Human Behavior in Video Sequencescmp.felk.cvut.cz/cmp/events/colloquium-2017.10.05/gall-cmp_colloq … · Analyzing Human Behavior Videos Low level features,

Thank you for your attention.

09 .10 .2 01 7 Juer gen Ga l l – I ns t i t u t e o f Com puter S c ience I I I – Com puter V is ion Gr oup 96