video concept detection by learning from web images

Video Concept Detection by Learning from Web Images

Shiai Zhu, Ting Yao, Chong-Wah NgoCity University of Hong Kong

Why concepts for media remixing?

00:00 00:10 00:20 Time (seconds)

Cake is in preparation

Someone is talkingSomeone is talking

Background music is playing

…

A example of event recounting

Recounting using 21 concept classifiers

kitchen, outdoor/indoor, baseball field, crowd, cake, walking, running, squatting, standing, hand, batting, speech, music, clipping, cheering

Why concept learning is challenging?

Requires thousands of concepts for practical applicationsCollecting training examples is always expensive

Economic solution: Get it free from Internet

Thousands of new upload per minute

ZoomWowUsa

TexasCritter

CreatureOlympusCloseup

CatJune2010

Animalpet

Residence

Place of Worship

Building

Country House Temple

Church Buildin

g

House

Approach I – Semantic Field (SF)

Residence

Place of Worship

Building

Country House Temple

Church Buildin

g

House

Approach II – Semantic Pooling (SP)

Does it work practically?

Dancing Dancing

Boy

TRECVID videos Flickr images

Ocean

Boy

Ocean

Transfer Learning

Wenyuan

Dai, ICML 2007(TrAdaBoost)

Knowledge

of instance

Kate Saenko, ECCV 2010(shared representation)

Feature

representation

Jun Yang, ACM MM 2007(Adaptive‐SVM)

Parameter(Model)

Yu‐Gang Jiang, ACM MM 2009(Semantic context transfer)Rational

knowledge

Transfer learning

Transfer Learning

Adaptive SVM

Model-level learning

TrAdaBoost

Instance-level learning

Target domain (video) dataSource domain (image) data

0.5

0.5 0.5

0.5

0.5

0.50.5

0.8

0.50.5

0.5

0.30.3

0.3

Painful Experience on TRECVID

64 runs of other SIN systemsTRECVID training data aloneCross domain learningWeb image alone

Baseline detector ASVM-SF TradaBoost-SF SP SF

Negative transfer!

Negative transfer happens when knowledge

transfer has a negative impact on target domain

Dataset Training set

Testing set

# evaluated concepts

# positiveinstances

TRECVID 2011 266,474 137,327 50/346 1800

Positive or negative transfer?

Number of training examples?

Type of a concept? People, object, scene, event

Change of data distribution?2/4

1/131/14

2/19

< 500

positive examples

500~1000

1001~2000

>2000

Percentage of improved concepts versus number of positive training examples

A case study on cross-domain learningTarget domain (Web videos)

– TRECVID 2012 dataset

Source domain (Web images)– Semantic Field

1000 positive examples per concept– Semantic Pooling

SF + additional 1000 examples per concept 16,367 of concepts + 0.7 million images for pooling

Dataset Training set

Testing set

# evaluated concepts

# positiveinstances

TRECVID 2012 400,289 145,634 46/346 1200

SIFT

feat

ure

spac

eBasic Framework

Number of positive examples?

A-SVM-SF: Semantic Field + A-SVMA-SVM-SP: Semantic Pooling + A-SVMBaseline: learnt using TRECVID training example

Min

fAP

Number of positive instances

Baseline A-SVM-SF A-SVM-SP

-transfer+transfer

Pooling is a practical strategy to diversify the coverage of training examples

22/46 concepts improve if each concept only has 100 positive examples

Type of Concept?M

infA

P


Min

fAP


Min

fAP


Min

fAP


Scene

(15)

Object (10)

People

(12)

Event (8)

Probably not a good idea to use images for learning event

Change in Data Distribution?Maximum Mean Discrepancy (MMD)

23 concepts with lower mismatch 23 concepts with higher mismatch

ForestForest

TRECVID Flickr

ComputerComputer

SingingSingingMeetingMeeting

TRECVID Flickr

KitchenKitchen

MotorcycleMotorcycle

ThrowingThrowing

TRECVID Flickr

StadiumStadium

ChairChair

< 10 = 50 > 100

break even point

Average is average

MMD

Bre

ak-e

ven

poin

t

50

100

Boat_Ship Glasses Singing

AirplaneBaby Male_Person

Airplane_Flying Instrumental_Musician

OceansForest

Man_Wearing_A_Suit

Bridges

Military_Airplane

Fields

Stadium Landscape

SkierPress_Conference

Nighttime

Teenagers

Highway Walking_RunningLakes

Bicycling

Computers

Roadway_Junction

Apartments

Clearing

Girl

Civilian_Person

KitchenMotorcycle

Meeting

Female_Person Government‐Leader

Sitting_Down Hill

Boy

SoldiersChair

Basketball Throwing

OfficeGeorge_Bush

Scene_Text

Greeting

difficulty

difficulty

lower mismatch higher mismatch

Question?

• Using Web images to learn concept classifiers for video (TRECVID) domain

• When positive examples in target domain < 100

• Event might be difficult to transfer

• Data distribution can be a cue to predict the difficulty

• Pooling strategy has a better chance to survive positive transfer

• Feasibility of transfer learning?

Key ideas Messages

video concept detection by learning from web images

Technology

positive instances trecvid

number of positive examples

concept learning

number of training examples

transfer pooling

learning event

concept semantic pooling

video trecvid domain