icassp, may 21 2004 arjen p. de vries thijs westerveld tzvetanka i. ianeva combining multiple...
Post on 19-Dec-2015
215 views
TRANSCRIPT
ICASSP, May 21 2004
Arjen P. de Vries
Thijs Westerveld
Tzvetanka I. Ianeva
Combining Multiple Representations on the TRECVID Search Task
ICASSP, May 21 2004
Introduction• Video Retrieval should take advantage
of information from all available sources and modalities– …but so far ASR best for almost any query
• LL11@TRECVID2003: Combining information sources– Different models/modalities– Multiple example images
ICASSP, May 21 2004
Calculate conditional
probabilities of observing query samples given each model in the collection
RetrievalModels
P(Q|M1)
P(Q|M4)
P(Q|M3)
P(Q|M2)
Query
ICASSP, May 21 2004
Static Model• Indexing
–Estimate a Gaussian Mixture Model from each keyframe (using EM)
–Fixed number of components (C=8)
–Feature vectors contain colour, texture, and position information from pixel blocks: <x,y,DCT>
ICASSP, May 21 2004
Dynamic Model
• Indexing:•GMM of
multiple frames (N=29) around keyframe
•Feature vectors extended with time-stamp in [0,1]: <x,y,t,DCT>
0
.5
1
ICASSP, May 21 2004
Dynamic Model Advantages
• More training data for models
• Reduced dependency upon selecting appropriate keyframe
• Some spatio-temporal aspects of shot are captured– (Dis-)appearance of objects
ICASSP, May 21 2004
Experimental Set-up
• Build models for each shot– Static, Dynamic, Language
• Build Queries from topics– Construct simple keyword text query– Select visual example– Rescale and compress example images to
match video size and quality
ICASSP, May 21 2004
Combining Modalities• Independence assumption textual/visual
– P(Qt,Qv|Shot) = P(Qt|LM) * P(Qv|GMM)
• Combination works if both runs useful [CWI:TREC:2002]
• Dynamic run moreuseful than static run
Run MAP
ASR only .130
Static only .022
Static+ASR .105
Dynamic only .022
Dynamic+ASR .132
ICASSP, May 21 2004
Merging Run Results
• Combining (conflicting) examples difficult [CWI:TREC:2002]
• Single example Miss relevant shots
• Round-Robin Merging
123456789
10
123456789
10
Combined
11223344..
ICASSP, May 21 2004
Merging Run Results
• Combining (conflicting) examples difficult [CWI:TREC:2002]
• Single example Miss relevant shots
• Round-Robin Merging
Combined
11223344..
123456789
10
123456789
10
+ASR
Single .022 .132
All .031 .149
Selected .039 .151
Best .050 .155
ICASSP, May 21 2004
Conclusions
• For most topics, neither the static nor the dynamic visual model captures the user information need sufficiently…
• …averaged over 25 topics however, it is better to use both modalities than ASR only
Working hypothesis: Matching against
both modalities gives robustness
ICASSP, May 21 2004
Conclusions
• Dynamic captures visual similarity better– Thanks to spatio-temporal aspects?
• Experiments with full covariance matrix for <x,y,t>-dims
• Static model of KF is too fragile – Dependency on single KF?
• To be tested by ranking max(all I-frames in shot)
– Not enough training data?