recognizing human activities at scale - microsoft.com filerecognizing human actions learning actions...

26

Upload: lamnga

Post on 05-Feb-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

Recognizing Human Activities At Scale

Juan Carlos Niebleswww.niebles.net

@@jcniebles

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 3

An image is worth one thousand wordsA video is worth one million words

Source: YouTube

“a tiger attacking a person on a grass field”

“a man packing a suitcase in a store”

“someone unlocking a combination lock”

“the tiger is being playful” “the man is unpacking the suitcase”

“the person is attempting to pick the lock”

Image:

Video:

Video Generation and Consumption is Huge

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 4

Netflix: 100 million hours watched per day

YouTube: 400 hours uploaded per minute

Cisco: ~1 million minutes of video per second by 2020

~200 peta-pixels/second

Videos are about People

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 5Source: YouTube

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 6

Recognizing Human Actions

ActionDetection

Action Recognition in Videos

Short, pre-trimmed videos, only containing one action

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 7

KTH dataset [Schuldt, 2004] HOHA dataset [Laptev, 2008] UCF101 dataset [Soomro, 2012]

Action Recognition in Videos

Traditional action classification pipeline

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 8

Input Video(pre-trimmed)

Feature extraction(handcrafted/learne

d)

Featureencoding

YES

NO

Classifier

Source: YouTube

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 9

Temporal Detection of Actions

“Polishing shoes”Classifier

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 10

Temporal Detection of Actions

• Apply complex classifier at each temporal location frame• Exhaustive search• Repeat for all actions we want to detect• Questionable scalability

Long input video

… …

Generic ActivityProposal

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 11

Fast Activity Proposals for Action DetectionLong input video

… …

[Caba Heilbron, Niebles & Ghanem. CVPR 2016][Escorcia, Caba Heilbron, Niebles & Ghamen, ECCV 2016]

• Runs very quickly (>130 fps)• Find all temporal intervals that contain “any activity”• Retrieve action intervals with high recall

Generic ActivityProposal

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 12

Long input video

… …

Generic ActivityProposal

Generic ActivityProposal

“Polishing shoes”Classifier

“Polishing shoes”Classifier

“Polishing shoes”Classifier

[Caba Heilbron, Niebles & Ghanem. CVPR 2016][Escorcia, Caba Heilbron, Niebles & Ghamen, ECCV 2016]

Fast Activity Proposals for Action Detection

Generic ActivityProposal

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 13

Fast Activity Proposals for Action Detection

[Caba Heilbron, Niebles & Ghanem. CVPR 2016][Escorcia, Caba Heilbron, Niebles & Ghamen, ECCV 2016]

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 14

Recognizing Human Actions

Learning ActionsWithWeak

Supervision

ActionDetection

Temporal Action Labeling

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 15

TestVideo

LearnedModel

PredictionPan Fry Put to Plate

Fully Supervised Learning

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 16

TrainVideo

Model

LabelPan Fry Put to Plate

• Many training videos with per frame action labels• Costly to annotate!

Weakly-Supervised Learning

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 17

TrainVideo

Model

? ? ? ?Label ? ?

• Only use action ordering• Disambiguate by aggregating across with many

training videos

Pan Fry then Put to Plate

[Huang, Fei-Fei & Niebles, ECCV 2016]

Extended Connectionist Temporal Classification

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 18

• Extends the CTC framework• Explores space of frame-to-labels assignments efficiently• Incorporates pairwise frame similarities

[Huang, Fei-Fei & Niebles, ECCV 2016]

InputVideo

Pan Fry

Cut Bun

Put Plate

Evolution of Training Frame-to-Label correspondence

GroundTruth

ep1ep2ep3ep4ep5ep8ep11ep14ep17ep20Ours (weak)

Our approach starts without label correspondences for the training

videos and iteratively improves the localization of the actions.

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 20

Weakly Supervised Activity Segmentation Results

[Huang, Fei-Fei & Niebles, ECCV 2016]

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 21

Weakly Supervised Action Detection Results

0

1

Dri

ve C

ar

[Huang, Fei-Fei & Niebles, ECCV 2016]

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 22

0

1

[Huang, Fei-Fei & Niebles, ECCV 2016]

Dri

ve C

arWeakly Supervised Action Detection Results

Hierarchical Modeling of Composable Activities

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 23

[Lillo, Soto & Niebles, CVPR 2014]

[Lillo, Niebles & Soto, CVPR 2016]

Large-scaleBenchmark

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 24

ActionDetection

Learning ActionsWithWeak

Supervision

Recognizing Human Actions

ActivityNet – www.activity-net.org

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 25

[Caba Heilbron, Escorcia, Ghanem & Niebles, CVPR 2015]

7/13/16 Juan Carlos Niebles - Recognizing Human Actions at Scale 26

Thank you!Juan Carlos Niebles

www.niebles.net

@@jcniebles

ActionDetection

Learning Actions

WithWeak

Supervision

Large-scaleBenchmark