video concept detection by learning from web images

20
Video Concept Detection by Learning from Web Images Shiai Zhu, Ting Yao, Chong-Wah Ng o City University of Hong Kong

Upload: mediamixercommunity

Post on 29-Jun-2015

401 views

Category:

Technology


2 download

DESCRIPTION

Video concept detection by learning from web images

TRANSCRIPT

Page 1: Video concept detection by learning from web images

Video Concept Detection by Learning from Web Images

Shiai Zhu, Ting Yao, Chong-Wah NgoCity University of Hong Kong

Page 2: Video concept detection by learning from web images

Why concepts for media remixing?

00:00 00:10 00:20 Time (seconds)

Cake is in preparation

Someone is talkingSomeone is talking

Background music is playing

A example of event recounting

Page 3: Video concept detection by learning from web images

Recounting using 21 concept classifiers

kitchen, outdoor/indoor, baseball field, crowd, cake, walking, running, squatting, standing, hand, batting, speech, music, clipping, cheering

Page 4: Video concept detection by learning from web images

Why concept learning is challenging?

Requires thousands of concepts for practical applicationsCollecting training examples is always expensive

Page 5: Video concept detection by learning from web images

Economic solution: Get it free from Internet

Thousands of new upload per minute

ZoomWowUsa

TexasCritter

CreatureOlympusCloseup

CatJune2010

Animalpet

Page 6: Video concept detection by learning from web images

Residence

Place of Worship

Building

Country House Temple

Church Buildin

g

House

Approach I – Semantic Field (SF)

Page 7: Video concept detection by learning from web images

Residence

Place of Worship

Building

Country House Temple

Church Buildin

g

House

Approach II – Semantic Pooling (SP)

Page 8: Video concept detection by learning from web images

Does it work practically?

Dancing Dancing

Boy

TRECVID videos Flickr images

Ocean

Boy

Ocean

Page 9: Video concept detection by learning from web images

Transfer Learning

Wenyuan

Dai, ICML 2007(TrAdaBoost)

Knowledge 

of instance

Kate Saenko, ECCV 2010(shared representation)

Feature 

representation

Jun Yang, ACM MM 2007(Adaptive‐SVM)

Parameter(Model)

Yu‐Gang Jiang, ACM MM 2009(Semantic context transfer)Rational 

knowledge

Transfer learning

Page 10: Video concept detection by learning from web images

Transfer Learning

Adaptive SVM

Model-level learning

TrAdaBoost

Instance-level learning

Target domain (video) dataSource domain (image) data

0.5

0.5 0.5

0.5

0.5

0.50.5

0.8

0.50.5

0.5

0.30.3

0.3

Page 11: Video concept detection by learning from web images

Painful Experience on TRECVID

64 runs of other SIN systemsTRECVID training data aloneCross domain learningWeb image alone

Baseline detector ASVM-SF TradaBoost-SF SP SF

Negative transfer!

Negative transfer happens when knowledge 

transfer has a negative impact on target domain

Dataset Training set

Testing set

# evaluated concepts

# positiveinstances

TRECVID 2011 266,474 137,327 50/346 1800

Page 12: Video concept detection by learning from web images

Positive or negative transfer?

Number of training examples?

Type of a concept? People, object, scene, event

Change of data distribution?2/4

1/131/14

2/19

< 500

positive examples

500~1000

1001~2000

>2000

Percentage of improved concepts versus number of positive training examples

Page 13: Video concept detection by learning from web images

A case study on cross-domain learningTarget domain (Web videos)

– TRECVID 2012 dataset

Source domain (Web images)– Semantic Field

1000 positive examples per concept– Semantic Pooling

SF + additional 1000 examples per concept 16,367 of concepts + 0.7 million images for pooling

Dataset Training set

Testing set

# evaluated concepts

# positiveinstances

TRECVID 2012 400,289 145,634 46/346 1200

Page 14: Video concept detection by learning from web images

SIFT

feat

ure

spac

eBasic Framework

Page 15: Video concept detection by learning from web images

Number of positive examples?

A-SVM-SF: Semantic Field + A-SVMA-SVM-SP: Semantic Pooling + A-SVMBaseline: learnt using TRECVID training example

Min

fAP

Number of positive instances

Baseline A-SVM-SF A-SVM-SP

-transfer+transfer

Pooling is a practical strategy to diversify the coverage of training examples

22/46 concepts improve if each concept only has 100 positive examples

Page 16: Video concept detection by learning from web images

Type of Concept?M

infA

P

Number of positive instances

Min

fAP

Number of positive instances

Min

fAP

Number of positive instances

Min

fAP

Number of positive instances

Scene 

(15)

Object (10) 

People 

(12)

Event (8)

Probably not a good idea to use images for learning event

Page 17: Video concept detection by learning from web images

Change in Data Distribution?Maximum Mean Discrepancy (MMD)

23 concepts with lower mismatch 23 concepts with higher mismatch

Page 18: Video concept detection by learning from web images

ForestForest

TRECVID Flickr

ComputerComputer

SingingSingingMeetingMeeting

TRECVID Flickr

KitchenKitchen

MotorcycleMotorcycle

ThrowingThrowing

TRECVID Flickr

StadiumStadium

ChairChair

< 10 = 50 > 100

break even point

Page 19: Video concept detection by learning from web images

Average is average

MMD

Bre

ak-e

ven

poin

t

50

100

Boat_Ship Glasses Singing

AirplaneBaby Male_Person

Airplane_Flying Instrumental_Musician

OceansForest

Man_Wearing_A_Suit

Bridges

Military_Airplane

Fields

Stadium Landscape

SkierPress_Conference

Nighttime

Teenagers

Highway Walking_RunningLakes

Bicycling

Computers

Roadway_Junction

Apartments

Clearing

Girl

Civilian_Person

KitchenMotorcycle

Meeting

Female_Person Government‐Leader

Sitting_Down Hill

Boy

SoldiersChair

Basketball Throwing

OfficeGeorge_Bush

Scene_Text

Greeting

difficulty

difficulty

lower mismatch higher mismatch

Page 20: Video concept detection by learning from web images

Question?

• Using Web images to learn concept classifiers for video (TRECVID) domain

• When positive examples in target domain < 100

• Event might be difficult to transfer

• Data distribution can be a cue to predict the difficulty

• Pooling strategy has a better chance to survive positive transfer

• Feasibility of transfer learning?

Key ideas Messages