video concept detection by learning from web images
DESCRIPTION
Video concept detection by learning from web imagesTRANSCRIPT
Video Concept Detection by Learning from Web Images
Shiai Zhu, Ting Yao, Chong-Wah NgoCity University of Hong Kong
Why concepts for media remixing?
00:00 00:10 00:20 Time (seconds)
Cake is in preparation
Someone is talkingSomeone is talking
Background music is playing
…
A example of event recounting
Recounting using 21 concept classifiers
kitchen, outdoor/indoor, baseball field, crowd, cake, walking, running, squatting, standing, hand, batting, speech, music, clipping, cheering
Why concept learning is challenging?
Requires thousands of concepts for practical applicationsCollecting training examples is always expensive
Economic solution: Get it free from Internet
Thousands of new upload per minute
ZoomWowUsa
TexasCritter
CreatureOlympusCloseup
CatJune2010
Animalpet
Residence
Place of Worship
Building
Country House Temple
Church Buildin
g
House
Approach I – Semantic Field (SF)
Residence
Place of Worship
Building
Country House Temple
Church Buildin
g
House
Approach II – Semantic Pooling (SP)
Does it work practically?
Dancing Dancing
Boy
TRECVID videos Flickr images
Ocean
Boy
Ocean
Transfer Learning
Wenyuan
Dai, ICML 2007(TrAdaBoost)
Knowledge
of instance
Kate Saenko, ECCV 2010(shared representation)
Feature
representation
Jun Yang, ACM MM 2007(Adaptive‐SVM)
Parameter(Model)
Yu‐Gang Jiang, ACM MM 2009(Semantic context transfer)Rational
knowledge
Transfer learning
Transfer Learning
Adaptive SVM
Model-level learning
TrAdaBoost
Instance-level learning
Target domain (video) dataSource domain (image) data
0.5
0.5 0.5
0.5
0.5
0.50.5
0.8
0.50.5
0.5
0.30.3
0.3
Painful Experience on TRECVID
64 runs of other SIN systemsTRECVID training data aloneCross domain learningWeb image alone
Baseline detector ASVM-SF TradaBoost-SF SP SF
Negative transfer!
Negative transfer happens when knowledge
transfer has a negative impact on target domain
Dataset Training set
Testing set
# evaluated concepts
# positiveinstances
TRECVID 2011 266,474 137,327 50/346 1800
Positive or negative transfer?
Number of training examples?
Type of a concept? People, object, scene, event
Change of data distribution?2/4
1/131/14
2/19
< 500
positive examples
500~1000
1001~2000
>2000
Percentage of improved concepts versus number of positive training examples
A case study on cross-domain learningTarget domain (Web videos)
– TRECVID 2012 dataset
Source domain (Web images)– Semantic Field
1000 positive examples per concept– Semantic Pooling
SF + additional 1000 examples per concept 16,367 of concepts + 0.7 million images for pooling
Dataset Training set
Testing set
# evaluated concepts
# positiveinstances
TRECVID 2012 400,289 145,634 46/346 1200
SIFT
feat
ure
spac
eBasic Framework
Number of positive examples?
A-SVM-SF: Semantic Field + A-SVMA-SVM-SP: Semantic Pooling + A-SVMBaseline: learnt using TRECVID training example
Min
fAP
Number of positive instances
Baseline A-SVM-SF A-SVM-SP
-transfer+transfer
Pooling is a practical strategy to diversify the coverage of training examples
22/46 concepts improve if each concept only has 100 positive examples
Type of Concept?M
infA
P
Number of positive instances
Min
fAP
Number of positive instances
Min
fAP
Number of positive instances
Min
fAP
Number of positive instances
Scene
(15)
Object (10)
People
(12)
Event (8)
Probably not a good idea to use images for learning event
Change in Data Distribution?Maximum Mean Discrepancy (MMD)
23 concepts with lower mismatch 23 concepts with higher mismatch
ForestForest
TRECVID Flickr
ComputerComputer
SingingSingingMeetingMeeting
TRECVID Flickr
KitchenKitchen
MotorcycleMotorcycle
ThrowingThrowing
TRECVID Flickr
StadiumStadium
ChairChair
< 10 = 50 > 100
break even point
Average is average
MMD
Bre
ak-e
ven
poin
t
50
100
Boat_Ship Glasses Singing
AirplaneBaby Male_Person
Airplane_Flying Instrumental_Musician
OceansForest
Man_Wearing_A_Suit
Bridges
Military_Airplane
Fields
Stadium Landscape
SkierPress_Conference
Nighttime
Teenagers
Highway Walking_RunningLakes
Bicycling
Computers
Roadway_Junction
Apartments
Clearing
Girl
Civilian_Person
KitchenMotorcycle
Meeting
Female_Person Government‐Leader
Sitting_Down Hill
Boy
SoldiersChair
Basketball Throwing
OfficeGeorge_Bush
Scene_Text
Greeting
difficulty
difficulty
lower mismatch higher mismatch
Question?
• Using Web images to learn concept classifiers for video (TRECVID) domain
• When positive examples in target domain < 100
• Event might be difficult to transfer
• Data distribution can be a cue to predict the difficulty
• Pooling strategy has a better chance to survive positive transfer
• Feasibility of transfer learning?
Key ideas Messages