learning better object models using video dataasamir/cifar/flobject presentation.pdfmotivation...

18
Learning Better Object Models using Video Data Patrick Li, Inmar Givoni, Brendan Frey

Upload: others

Post on 10-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Learning Better Object Models using Video DataPatrick Li, Inmar Givoni, Brendan Frey

Page 2: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Motivation

Training on a collection of static monocular images is unnatural.

Labelled Training Images are hard to get. And the lack of is becoming a problem.

�ere is a wealth of video data available.

Page 3: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

First Attempt: Learning Bags of Features Modelsfor Image Classi�cation

Goal:

Represent Objects as Bags of SIFT Features

Use unsupervised learning to learn models of objects

Use learned models for image classi�cation

Page 4: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Image Classi�cationINPUT: OUTPUT:

TRAINING:

“Cow”

“Boat” “Car” “Sofa”

...

Page 5: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

PART 3

Overview of the TechniqueUnsupervised Training from Video

Supervised Training on Labelled Images

Testing

PART 1 PART 2 PART 3 PART 60

PART 2

...

“Cow”

PART 8PART 1 “Sofa”

Page 6: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Bags of Features Models

PART 1

PART 2

PART 60

...

Page 7: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Latent Dirichlet Allocation for Topic Modelling

SPORT POLITICS BANKINGBASEBALL

HITKICKSOCCER

LEADER

DEM

OC

RAC

YC

APITALISMSH

OU

TERS

MONEY

TRANSACTIONS

TRANSACTIONS

ANIM

ALS

CATDO

G

FROG

CAT

FROG

DO

G

BASEBALLSOCCER

CAPITALISM

LEADER

DEM

OC

RAC

Y

20% ANIMALS40% POLITICS39% BANKING1% SPORTSSingle Document

Page 8: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Latent Dirichlet Allocation for Topic Modelling

Corpus of Documents

? ? ? ?1 2 3 ... 60

Page 9: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Latent Dirichlet Allocation for Topic Modelling

Corpus of Documents

? ? ?1 2 3 ... 60

Money

Transactions

Page 10: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Latent Dirichlet Allocation for Object Modelling

Single Image

COW CAR BOAT

SOFT

DRI

NKS

90% SOFT DRINKS10% CORPORATE LOGOS

Page 11: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Latent Dirichlet Allocation for Object Modelling

Image Collection

? ? ? ?1 2 3 ... 60

Page 12: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Flow-LDA for Motion Modelling

COW CAR BOAT

VILLA

IN

50% SWORD50% VILLAIN

Pair of Consecutive Frame Pairs

Page 13: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Flow-LDA for Motion Modelling

? ? ? ?1 2 3 ... 60

Frame Pair Collection

Page 14: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Flow-LDA for Motion Modelling

Page 15: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Unsupervised Training from Video using FLDA

Training And Testing Images

Image Recognition

PART 1

PART 2

PART 60

...

0.8 Part 10.2 Part 2

0.7 Part 10.2 Part 30.1 Part 4

0.6 Part 20.2 Part 130.2 Part 24

Page 16: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

Initial Results

Naive Guesser: 8.6% ErrorSVM trained on SIFT histograms directly: 8.6% ErrorSVM trained using LDA model (no motion): 5.6% ErrorSVM trained using FLDA model (motion): 3.7% Error

Page 17: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

... to continue

Experiment on Real Dataset

Go beyond Bags of Features models -Hierarchical Models -Account for Spatial Relations -Account for temporal relations between more than 2 frames

Page 18: Learning Better Object Models using Video Dataasamir/cifar/Flobject Presentation.pdfMotivation Training on a collection of static monocular images is unnatural. Labelled Training Images

�ank you!