learning better object models using video dataasamir/cifar/flobject presentation.pdfmotivation...

Learning Better Object Models using Video DataPatrick Li, Inmar Givoni, Brendan Frey

Motivation

Training on a collection of static monocular images is unnatural.

Labelled Training Images are hard to get. And the lack of is becoming a problem.

�ere is a wealth of video data available.

First Attempt: Learning Bags of Features Modelsfor Image Classi�cation

Goal:

Represent Objects as Bags of SIFT Features

Use unsupervised learning to learn models of objects

Use learned models for image classi�cation

Image Classi�cationINPUT: OUTPUT:

TRAINING:

“Cow”

“Boat” “Car” “Sofa”

...

PART 3

Overview of the TechniqueUnsupervised Training from Video

Supervised Training on Labelled Images

Testing

PART 1 PART 2 PART 3 PART 60

PART 2

...

“Cow”

PART 8PART 1 “Sofa”

Bags of Features Models

PART 1

PART 2

PART 60

...

Latent Dirichlet Allocation for Topic Modelling

SPORT POLITICS BANKINGBASEBALL

HITKICKSOCCER

LEADER

DEM

OC

RAC

YC

APITALISMSH

OU

TERS

MONEY

TRANSACTIONS

TRANSACTIONS

ANIM

ALS

CATDO

G

FROG

CAT

FROG

DO

G

BASEBALLSOCCER

CAPITALISM

LEADER

DEM

OC

RAC

Y

20% ANIMALS40% POLITICS39% BANKING1% SPORTSSingle Document


Corpus of Documents

? ? ? ?1 2 3 ... 60


Corpus of Documents

? ? ?1 2 3 ... 60

Money

Transactions

Latent Dirichlet Allocation for Object Modelling

Single Image

COW CAR BOAT

SOFT

DRI

NKS

90% SOFT DRINKS10% CORPORATE LOGOS

Latent Dirichlet Allocation for Object Modelling

Image Collection

? ? ? ?1 2 3 ... 60

Flow-LDA for Motion Modelling

COW CAR BOAT

VILLA

IN

50% SWORD50% VILLAIN

Pair of Consecutive Frame Pairs


? ? ? ?1 2 3 ... 60

Frame Pair Collection

Unsupervised Training from Video using FLDA

Training And Testing Images

Image Recognition

PART 1

PART 2

PART 60

...

0.8 Part 10.2 Part 2

0.7 Part 10.2 Part 30.1 Part 4

0.6 Part 20.2 Part 130.2 Part 24

Initial Results

Naive Guesser: 8.6% ErrorSVM trained on SIFT histograms directly: 8.6% ErrorSVM trained using LDA model (no motion): 5.6% ErrorSVM trained using FLDA model (motion): 3.7% Error

... to continue

Experiment on Real Dataset

Go beyond Bags of Features models -Hierarchical Models -Account for Spatial Relations -Account for temporal relations between more than 2 frames

�ank you!