fusing semantic, observability, reliability and diversity ...xiaoyong/papers/mm08.ppt.pdffusing...
TRANSCRIPT
Fusing Semantic, Observability, Reliability and Diversity of Concept Detectors
for Video Search
Xiao-Yong WEI, Chong-Wah NgoDept. of Computer Science
City University of Hong Kong
ACM Multimedia 2008, Vancouver, Canada
Find shots of military personnel or soldiers together with military vehicles or weapons
Which concepts are related to this
query?
Find shots of military personnel or soldierstogether with military vehicles or weapons
explosion, flag, (entertainment)
thinkingobserving
armored car, armed person, tank
(e.g., IS-A relation)
(occur together)
Find shots of military personnel or soldierstogether with military vehicles or weapons
explosionMilitary
vehicle
soldiers
Find shots of military personnel or soldierstogether with military vehicles or weapons
Military personnel, soldier, military vehicle,
weaponWhat else?
How to model different types of relations among
concepts?
ObservabilitySpace
thinkingobserving
Semantic Space
Outline
IntroductionSemantic Space vs. Observability SpaceConcept Selection and FusionExperimental ResultsConclusions
Video Search vs. Semantic Gap
User Level
Multimedia Level
Query Query Query Query
Introduction - Background
Video Search vs. Semantic Gap
User Level
Multimedia Level
Query Query Query
Text Image Motion Audio
Low-Level Representations
Low-Level Features
Query
Semantic Gap
Natural language
Machine computable
Introduction - Background
Video Search vs. Semantic GapConcept-based Video Search
User Level
Multimedia Level
Query Query Query
Semantic G
ap
Text Image Motion Audio
Low-Level Representations
Concept Concept Concept …….
Low-Level Features
Query
Introduction - Background
Video Search vs. Semantic GapConcept-based Video Search
User Level
Multimedia Level
Query Query Query
Semantic G
ap
Text Image Motion Audio
Low-Level Representations
Concept Concept Concept …….
High-L
evel Sem
antic
Low-Level Features
High-Level Concepts
Query
Introduction - Background
User Level
Multimedia Level
Query Query Query
Semantic G
ap
Text Image Motion Audio
Low-Level Representations
Data Flow
Concept Concept Concept …….
High-L
evel Sem
anticG
eneral V
ocabularies
Low-Level Features
High-Level Concepts
Vocabularies Set (General Knowledge)
Query
Video Search vs. Semantic GapConcept-based Video Search
Introduction - Background
User Level
Multimedia Level
Query Query Query
Semantic G
ap
Text Image Motion Audio
Low-Level Representations
Data Flow
Concept Concept Concept …….
High-L
evel Sem
anticG
eneral V
ocabularies
Low-Level Features
High-Level Concepts
Vocabularies Set (General Knowledge)
Query
Video Search vs. Semantic GapConcept-based Video Search
Introduction - Background
Crowd … Banner
protest
User Level
Multimedia Level
Query Query Query
Semantic G
ap
Text Image Motion Audio
Low-Level Representations
Data Flow
Concept Concept Concept …….
High-L
evel Sem
anticG
eneral V
ocabularies
Low-Level Features
High-Level Concepts
Vocabularies Set (General Knowledge)
Query
How many and which detectors should be developed?
Critical questions to answer
User Level
Multimedia Level
Query Query Query
Semantic G
ap
Text Image Motion Audio
Low-Level Representations
Data Flow
Concept Concept Concept …….
High-L
evel Sem
anticG
eneral V
ocabularies
Low-Level Features
High-Level Concepts
Vocabularies Set (General Knowledge)
Query
How many and which detectors should be developed?
Which concepts should be selected to describe the query?
Introduction - Background
User Level
Multimedia Level
Query Query Query
Semantic G
ap
Text Image Motion Audio
Low-Level Representations
Data Flow
Concept Concept Concept …….
High-L
evel Sem
anticG
eneral V
ocabularies
Low-Level Features
High-Level Concepts
Vocabularies Set (General Knowledge)
Query ⎫⎪⎪⎪⎬⎪⎪⎪⎭⎫⎪⎪⎪⎬⎪⎪⎪⎭
How many and which detectors should be developed?
Which concepts should be selected to describe the query?
How to answer the query with selected concepts ?
Introduction - Background
Large scale concept ontology for multimedia (LSCOM)MediaMill – 101TRECVID
How many and which concepts should be developed?
Query-to-concept mapping
Which concepts should be selected?
Query-to-concept mappingOntology reasoning: Resnik, JCN, WUP
Which concepts should be selected?
Object
militarypersonnel
soldier
militaryvehicle
tank armoredcar
ontologyQueries: … military personnel
or military vehicles
…
concepts
animal soldier
bus tank
armored car car
explosion
Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of concepts
Which concepts should be selected?
descriptionsSoldier: is a …military personnel
Tank: is a …military vehicle
Armored car: is a …military vehicle
concepts
animal soldier
bus tank
armored car carBus: is a …
Queries: … military personnel
or military vehicles
…explosion
Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of conceptsStatistic-based (e.g., by Internet)
Which concepts should be selected?
Explosion and military vehicle frequently occur together…
concepts
animal soldier
bus tankarmored car car
explosion
Queries: … military personnel
or military vehicles
…
Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of conceptsStatistic-based (e.g., by Internet)Example-based
[C. G. M. Snoek, IEEE Trans. on Multimedia, 2007]
Vector-based (for image and video query examples)[John R. Smith, ICME’03]
Which concepts should be selected?
Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of conceptsStatistic-based (e.g., by Internet)Example-based
[C. G. M. Snoek, IEEE Trans. on Multimedia, 2007]
Vector-based (for image and video query examples)[John R. Smith, ICME’03]
None of existing methods jointly considers semantics and observablity
Problem of concept selectionSemantics
Observability
Most are simply using linear fusionSemanticsReliabilityObservability?Diversity?
How to answer the query with selected concepts?
person, face, police, newspaper
people-related
Framework
Rel
evan
t sho
t lis
t
Outline
IntroductionSemantic Space vs. Observability SpaceConcept Selection and FusionExperimental ResultsConclusions
Construction of Semantic Space
Semantic Space
Ontology
Ontology-enriched Semantic Space (OSS)- Global Consistency [X.-Y Wei, MM07]
Conventional Ontology Reasoning
weapon
gun tank armored car
Query: tank
Sim (tank, gun) = Sim (tank, armored car)
gun ? armored car ?
OSS - Global Consistency
Conventional Ontology ReasoningLocal measure
weapon
gun
vehicle
tank armored car
Construction of Semantic Space
gun tank …
gun
tank
armored car
weaponvehicle
Ontologyenriched
Semantic Space
weapon vehicle
weapon
vehicle
Minimize redundancy
Space transformation
WordNet
weapon
gun
vehicle
tank armored car
B2
gun
armoredcar
B1
tank
vehicle
weapon
Construction of Observability Space
Observability Space
LSCOM annotation
Construction of Observability Space
road boat …
road
boat
watercar
vehicle
Pearson product-moment (PM)
Observability Space
road water
road
water
Minimize redundancy
Space transformation
B2
B1
boat
car
vehicle
road
sky
LSCOM and Concept Annotation
Observability
Solving problem of missing annotation
road boat car …
road
PM(car, vehicle)
boat
watercar
vehicle Vehicle is easy to be ignored by annotators when they are annotating a keyframes with car presented.[J.R. Kender, ICME07]
road water
road
water
Minimize redundancy
LSCOM and Concept Annotation
Observability
carvehicle
… …… …
When car and vehicle are represented by road and water, their observabilityrelation is also transferred through the two concepts. This relation does not rely on PM(car,vehicle) .
Semantic Space vs. Observability Space
Semantic SpaceSemantic Space Observability SpaceObservability Space
Dendrograms created by SS and OS
Outline
IntroductionSemantic Space vs. Observability SpaceConcept Selection and FusionExperimental ResultsConclusions
Concept Selection
Anchor concepts: represent the semantic aspects of a queryBridge concepts: represent the context of a queryPositive concepts: concepts frequently co-occur with the target conceptNegative concepts: concepts never co-occur with the target concept
Concept Selection– Query-to-Concept Semantic MappingSelecting Anchor concepts in SS
One concept to each query termRepresenting the semantic aspect of the query
v1
v3
v2
Concept vector
Concept vector Concept
vector
Vector of a query item
SS
Query: Find vehicles on the way
Vehicle roadSS
Concept Selection– Detector Mining in OS
Selecting Bridge Concepts in OSForming subspaces to represent the context of the queryObservability Gap between Anchor Concepts
More specific concepts in the context (car)Latent concept not defined in SS (car_on_road)
Find vehicles on the way
SS
Vehicle road
OS
Vehicle
road
CarCar_on_road
water
boat
Concept Selection– Mining positive and negative concepts in OS
Vehicle
Car
Road
Truck
Outer Space
Tennis
Positive
Negative
OSRoad
Carvehicle
Concept Fusion– Reliability-based fusion
Vehicle Truck
Car
Road+
Outer Space
Tennis
Positive
Negative
OS
Enrich target concepts with its positive conceptsRefine target concept’s detector scores with its negative concepts (filters)
Concept Fusion– Reliability-based fusion
Enrich target concepts with its positive conceptsRefine target concept’s detector scores with its negative concepts (filters)
vehicle
cartruckroad
+
Outer
space
tennis
=
+ =
Enrich anchor concepts with bridge concepts
Multi-level Detector Fusion– Observability-based Fusion (in OS)
Query: Find vehicle on the way
Vehicle Road
+
Anchor concepts selection in SS
Car, Car_on_road
Bridge concepts selection in OS
Multi-level Detector Fusion– Observability-based Fusion (in OS)
vehiclevehicle
vehiclevehicle
vehicle
car
Car on road
+
car +
car ==
car
+ =
car + =car + =
vehiclevehicle
vehiclevehicle
vehicle
Car on road
car
Car on road
car
Car on road
car
Car on road
car
Answer the query with the reliability improved and observablity enriched anchor concepts
Multi-level Detector Fusion– Semantic-based Fusion (in SS)
Find vehicle on the way
Vehicle
Road
+
Semantic(vehilce, Vehicle)
Semantic(way, Road)
Consider diversity of anchor concepts in concept fusion
person, face, police, newspaper
Multi-level Detector Fusion– Diversity-based Fusion
people-related
clustering
person
facepolice
newspaper
Outline
IntroductionSemantic Space vs. Observability SpaceConcept Selection and FusionExperimental ResultsConclusions
Datasets from TRECVID 2005 to 2007 with more than 285 hours videos and 72 queriesVIREO-374 detectors trained using TRECVID
2005 development setTop 1000 shots in returned list are evaluated by
using Average precision (AP)
Experimental Results– Dataset and Evaluation
Concept selections by using SS and OSSS: 572 concepts, WordNet, WUP -> 366 dimensionsOS: 374 concepts, LSCOM, PM -> 253 dimensions
Experimental Results– Space Construction
Find shots of a person walking or riding a bicycle
Anchor concepts
Experimental Results– Video Search Performance
Semantic-based fusion (S)Reliability-based fusion (R)Observability-based fusion (O)Diversity-based fusion (D)
0
0.05
0.1
0.15
0.2
0.25
0.3
AP-30 AP-50 AP-100 AP-1000
S-only
S+O
S+OR
S+ORD
Top-k performance on TV07 dataset
Experimental Results– Video Search Performance
Performance based on Query TypesEvent – 31 queriesPerson or Thing (PT) – 19 queriesPlace – 14 queriesName Entity (NE) – 12 queries
0
5
10
15
20
25
30
35
Event PT Place NE
# of queries
Experimental Results– Video Search Performance
Performance based on Query TypesEvent – 31 queriesPerson or Thing (PT) – 19 queriesPlace – 14 queriesName Entity (NE) – 12 queries
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Event PT Place NE
MAP
S-only
S+O
S+OR
S+ORD
Observability-based
Experimental Results– Video Search Performance
Performance based on Query TypesEventPerson or Thing (PT)PlaceName Entity (NE)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Event PT Place NE
MAP
S-only
S+O
S+OR
S+ORD
Diversity-based
Experimental Results– Video Search Performance
Performance based on Query TypesEvent – 31 queriesPerson or Thing (PT) – 19 queriesPlace – 14 queriesName Entity (NE) – 12 queries
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Event PT Place NE
MAP
S-only
S+O
S+OR
S+ORD
Reliability-based
Experimental Results– Comparison to Ontology
Reasoning
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
S+ORD OSS RES JCN WUP Lesk
TV07
TV06
TV05
Experimental Results– Comparison to Ontology
Reasoning
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
S+ORD OSS RES JCN WUP Lesk
Event
PT
Place
NE
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
TV05 runs
Experimental Results– Compare to TRECVID Submissions
Our runs are Visual-OnlyTV05
S-onlyS-O
S-ORS-ORD
Experimental Results– Compare to TRECVID Submissions
Our runs are Visual-OnlyTV06TV07
0
0.02
0.04
0.06
0.08
0.1
TV06 runs
0
0.02
TV07 runs
0.04
0.06
0.08
0.1
S-only S-O S-OR S-ORD
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
TV08 runs (Type A)
Experimental Results– Compare to TRECVID
SubmissionsOur runs are Visual-Only
TV08
S-onlyS-ORD
Outline
IntroductionSemantic Space vs. Observability SpaceConcept Selection and FusionExperimental ResultsConclusions
ConclusionTwo spaces complement to each other in concept
selectionSS provides model for semantic reasoningOS provides model for observability reasoning
observablity gap, bridge conceptsMulti-level concept fusion addresses different
aspects of detectorsSemanticsReliability (helpful for all types of queries)Observability (helpful for person+thing and place queries) Diversity (helpful for event related queries)
Future work
Concept FrequencyCausalityMulti-modality fusion
ThanksThanks !
Presented by Xiao-Yong WEI