fusing semantic, observability, reliability and diversity ...xiaoyong/papers/mm08.ppt.pdffusing...

Fusing Semantic, Observability, Reliability and Diversity of Concept Detectors

for Video Search

Xiao-Yong WEI, Chong-Wah NgoDept. of Computer Science

City University of Hong Kong

ACM Multimedia 2008, Vancouver, Canada

Find shots of military personnel or soldiers together with military vehicles or weapons

Which concepts are related to this

query?

Find shots of military personnel or soldierstogether with military vehicles or weapons

explosion, flag, (entertainment)

thinkingobserving

armored car, armed person, tank

(e.g., IS-A relation)

(occur together)


explosionMilitary

vehicle

soldiers


Military personnel, soldier, military vehicle,

weaponWhat else?

How to model different types of relations among

concepts?

ObservabilitySpace

thinkingobserving

Semantic Space

Outline

IntroductionSemantic Space vs. Observability SpaceConcept Selection and FusionExperimental ResultsConclusions

Video Search vs. Semantic Gap

User Level

Multimedia Level

Query Query Query Query

Introduction - Background

Video Search vs. Semantic Gap

User Level

Multimedia Level

Query Query Query

Text Image Motion Audio

Low-Level Representations

Low-Level Features

Query

Semantic Gap

Natural language

Machine computable


Video Search vs. Semantic GapConcept-based Video Search

User Level

Multimedia Level

Query Query Query

Semantic G

ap



Concept Concept Concept …….

Low-Level Features

Query



User Level

Multimedia Level

Query Query Query

Semantic G

ap




High-L

evel Sem

antic

Low-Level Features

High-Level Concepts

Query


User Level

Multimedia Level

Query Query Query

Semantic G

ap



Data Flow


High-L

evel Sem

anticG

eneral V

ocabularies

Low-Level Features

High-Level Concepts

Vocabularies Set (General Knowledge)

Query



User Level

Multimedia Level

Query Query Query

Semantic G

ap



Data Flow


High-L

evel Sem

anticG

eneral V

ocabularies

Low-Level Features

High-Level Concepts


Query



Crowd … Banner

protest

User Level

Multimedia Level

Query Query Query

Semantic G

ap



Data Flow


High-L

evel Sem

anticG

eneral V

ocabularies

Low-Level Features

High-Level Concepts


Query

How many and which detectors should be developed?

Critical questions to answer

User Level

Multimedia Level

Query Query Query

Semantic G

ap



Data Flow


High-L

evel Sem

anticG

eneral V

ocabularies

Low-Level Features

High-Level Concepts


Query


Which concepts should be selected to describe the query?


User Level

Multimedia Level

Query Query Query

Semantic G

ap



Data Flow


High-L

evel Sem

anticG

eneral V

ocabularies

Low-Level Features

High-Level Concepts


Query ⎫⎪⎪⎪⎬⎪⎪⎪⎭⎫⎪⎪⎪⎬⎪⎪⎪⎭


Which concepts should be selected to describe the query?

How to answer the query with selected concepts ?


Large scale concept ontology for multimedia (LSCOM)MediaMill – 101TRECVID

How many and which concepts should be developed?

Query-to-concept mapping

Which concepts should be selected?

Query-to-concept mappingOntology reasoning: Resnik, JCN, WUP


Object

militarypersonnel

soldier

militaryvehicle

tank armoredcar

ontologyQueries: … military personnel

or military vehicles

…

concepts

animal soldier

bus tank

armored car car

explosion

Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of concepts


descriptionsSoldier: is a …military personnel

Tank: is a …military vehicle

Armored car: is a …military vehicle

concepts

animal soldier

bus tank

armored car carBus: is a …

Queries: … military personnel


…explosion

Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of conceptsStatistic-based (e.g., by Internet)


Explosion and military vehicle frequently occur together…

concepts

animal soldier

bus tankarmored car car

explosion

Queries: … military personnel


…

Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of conceptsStatistic-based (e.g., by Internet)Example-based

[C. G. M. Snoek, IEEE Trans. on Multimedia, 2007]

Vector-based (for image and video query examples)[John R. Smith, ICME’03]


Query-to-concept mappingOntology reasoning: Resnik, JCN, WUPComparing to text descriptions (definitions) of conceptsStatistic-based (e.g., by Internet)Example-based

[C. G. M. Snoek, IEEE Trans. on Multimedia, 2007]

Vector-based (for image and video query examples)[John R. Smith, ICME’03]

None of existing methods jointly considers semantics and observablity

Problem of concept selectionSemantics

Observability

Most are simply using linear fusionSemanticsReliabilityObservability?Diversity?

How to answer the query with selected concepts?

person, face, police, newspaper

people-related

Framework

Rel

evan

t sho

t lis

t

Outline


Construction of Semantic Space

Semantic Space

Ontology

Ontology-enriched Semantic Space (OSS)- Global Consistency [X.-Y Wei, MM07]

Conventional Ontology Reasoning

weapon

gun tank armored car

Query: tank

Sim (tank, gun) = Sim (tank, armored car)

gun ? armored car ?

OSS - Global Consistency

Conventional Ontology ReasoningLocal measure

weapon

gun

vehicle

tank armored car

Construction of Semantic Space

gun tank …

gun

tank

armored car

weaponvehicle

Ontologyenriched

Semantic Space

weapon vehicle

weapon

vehicle

Minimize redundancy

Space transformation

WordNet

weapon

gun

vehicle

tank armored car

B2

gun

armoredcar

B1

tank

vehicle

weapon

Construction of Observability Space

Observability Space

LSCOM annotation

Construction of Observability Space

road boat …

road

boat

watercar

vehicle

Pearson product-moment (PM)

Observability Space

road water

road

water

Minimize redundancy

Space transformation

B2

B1

boat

car

vehicle

road

sky

LSCOM and Concept Annotation

Observability

Solving problem of missing annotation

road boat car …

road

PM(car, vehicle)

boat

watercar

vehicle Vehicle is easy to be ignored by annotators when they are annotating a keyframes with car presented.[J.R. Kender, ICME07]

road water

road

water

Minimize redundancy

LSCOM and Concept Annotation

Observability

carvehicle

… …… …

When car and vehicle are represented by road and water, their observabilityrelation is also transferred through the two concepts. This relation does not rely on PM(car,vehicle) .

Semantic Space vs. Observability Space

Semantic SpaceSemantic Space Observability SpaceObservability Space

Dendrograms created by SS and OS

Outline


Concept Selection

Anchor concepts: represent the semantic aspects of a queryBridge concepts: represent the context of a queryPositive concepts: concepts frequently co-occur with the target conceptNegative concepts: concepts never co-occur with the target concept

Concept Selection– Query-to-Concept Semantic MappingSelecting Anchor concepts in SS

One concept to each query termRepresenting the semantic aspect of the query

v1

v3

v2

Concept vector

Concept vector Concept

vector

Vector of a query item

SS

Query: Find vehicles on the way

Vehicle roadSS

Concept Selection– Detector Mining in OS

Selecting Bridge Concepts in OSForming subspaces to represent the context of the queryObservability Gap between Anchor Concepts

More specific concepts in the context (car)Latent concept not defined in SS (car_on_road)

Find vehicles on the way

SS

Vehicle road

OS

Vehicle

road

CarCar_on_road

water

boat

Concept Selection– Mining positive and negative concepts in OS

Vehicle

Car

Road

Truck

Outer Space

Tennis

Positive

Negative

OSRoad

Carvehicle

Concept Fusion– Reliability-based fusion

Vehicle Truck

Car

Road+

Outer Space

Tennis

Positive

Negative

OS

Enrich target concepts with its positive conceptsRefine target concept’s detector scores with its negative concepts (filters)

Concept Fusion– Reliability-based fusion

Enrich target concepts with its positive conceptsRefine target concept’s detector scores with its negative concepts (filters)

vehicle

cartruckroad

+

Outer

space

tennis

=

+ =

Enrich anchor concepts with bridge concepts

Multi-level Detector Fusion– Observability-based Fusion (in OS)

Query: Find vehicle on the way

Vehicle Road

+

Anchor concepts selection in SS

Car, Car_on_road

Bridge concepts selection in OS

Multi-level Detector Fusion– Observability-based Fusion (in OS)

vehiclevehicle

vehiclevehicle

vehicle

car

Car on road

+

car +

car ==

car

+ =

car + =car + =

vehiclevehicle

vehiclevehicle

vehicle

Car on road

car

Car on road

car

Car on road

car

Car on road

car

Answer the query with the reliability improved and observablity enriched anchor concepts

Multi-level Detector Fusion– Semantic-based Fusion (in SS)

Find vehicle on the way

Vehicle

Road

+

Semantic(vehilce, Vehicle)

Semantic(way, Road)

Consider diversity of anchor concepts in concept fusion

person, face, police, newspaper

Multi-level Detector Fusion– Diversity-based Fusion

people-related

clustering

person

facepolice

newspaper

Outline


Datasets from TRECVID 2005 to 2007 with more than 285 hours videos and 72 queriesVIREO-374 detectors trained using TRECVID

2005 development setTop 1000 shots in returned list are evaluated by

using Average precision (AP)

Experimental Results– Dataset and Evaluation

Concept selections by using SS and OSSS: 572 concepts, WordNet, WUP -> 366 dimensionsOS: 374 concepts, LSCOM, PM -> 253 dimensions

Experimental Results– Space Construction

Find shots of a person walking or riding a bicycle

Anchor concepts

Experimental Results– Video Search Performance

Semantic-based fusion (S)Reliability-based fusion (R)Observability-based fusion (O)Diversity-based fusion (D)

0

0.05

0.1

0.15

0.2

0.25

0.3

AP-30 AP-50 AP-100 AP-1000

S-only

S+O

S+OR

S+ORD

Top-k performance on TV07 dataset


Performance based on Query TypesEvent – 31 queriesPerson or Thing (PT) – 19 queriesPlace – 14 queriesName Entity (NE) – 12 queries

0

5

10

15

20

25

30

35

Event PT Place NE

# of queries



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Event PT Place NE

MAP

S-only

S+O

S+OR

S+ORD

Observability-based


Performance based on Query TypesEventPerson or Thing (PT)PlaceName Entity (NE)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Event PT Place NE

MAP

S-only

S+O

S+OR

S+ORD

Diversity-based



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Event PT Place NE

MAP

S-only

S+O

S+OR

S+ORD

Reliability-based

Experimental Results– Comparison to Ontology

Reasoning

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

S+ORD OSS RES JCN WUP Lesk

TV07

TV06

TV05

Experimental Results– Comparison to Ontology

Reasoning

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

S+ORD OSS RES JCN WUP Lesk

Event

PT

Place

NE

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

TV05 runs

Experimental Results– Compare to TRECVID Submissions

Our runs are Visual-OnlyTV05

S-onlyS-O

S-ORS-ORD

Experimental Results– Compare to TRECVID Submissions

Our runs are Visual-OnlyTV06TV07

0

0.02

0.04

0.06

0.08

0.1

TV06 runs

0

0.02

TV07 runs

0.04

0.06

0.08

0.1

S-only S-O S-OR S-ORD

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

TV08 runs (Type A)

Experimental Results– Compare to TRECVID

SubmissionsOur runs are Visual-Only

TV08

S-onlyS-ORD

Outline


ConclusionTwo spaces complement to each other in concept

selectionSS provides model for semantic reasoningOS provides model for observability reasoning

observablity gap, bridge conceptsMulti-level concept fusion addresses different

aspects of detectorsSemanticsReliability (helpful for all types of queries)Observability (helpful for person+thing and place queries) Diversity (helpful for event related queries)

Future work

Concept FrequencyCausalityMulti-modality fusion

ThanksThanks !

Presented by Xiao-Yong WEI

fusing semantic, observability, reliability and diversity ...xiaoyong/papers/mm08.ppt.pdffusing...

Documents