fxpal interactive search for trecvid 2004 · search engine • user specifies combination of: –...
TRANSCRIPT
FXPAL Interactive Search for TRECVID 2004
John Adcock, Matthew Cooper, Andreas Girgensohn, Lynn Wilcox
FX Palo alto Laboratory Inc @ trecvid 2004 2
Overview
• First time doing search– 2nd year of participation overall
• Emphasis on interface elements– Rich visualization of search results– Quick and easy exploration of results
• Straightforward search engine– Text search over ASR transcripts
• Literal search with Lucene• Fuzzy search with LSS
– Keyframe search by image similarity• Color correlograms
FX Palo alto Laboratory Inc @ trecvid 2004 3
Preprocessing
Unit of search retrieval is a “story”, but we couldn’t don’t have reference story segmentation for the test set
• Group reference shots into “stories”– Bootstrap an LSS with common shot boundaries and
ASR– use similarity-matrix method to find “story”
boundaries• Given new story boundaries
– Generate text indices for story and shots– Generate story-based LSS for search
FX Palo alto Laboratory Inc @ trecvid 2004 4
LS Index(shots)
Preprocessing
ASRCommonShot Ref
BootstrapLSS
(shots)
SimilaritySegmentation
StorySegments
LS Index(stories)
Lucene Index(stories)
Lucene Index(shots)
FX Palo alto Laboratory Inc @ trecvid 2004 5
Search Engine
• User specifies combination of:– Text query
• Literal query using Lucene or fuzzy query using LSS
– Image examples• Any keyframe in the interface can be dragged onto the image
example area
– Text/image weighting is static and equal– Max image similarity of shot propagated to story– Text similarity of story propagated to shot
• Averaged with shot-based text similarity
FX Palo alto Laboratory Inc @ trecvid 2004 6
Search Engine
Lucene Search
LSS Search
Query text
Image ColorCorrelogram
SearchQuery Images
CombineRanked Stories
Searcher option
FX Palo alto Laboratory Inc @ trecvid 2004 7
Interface Elements
• Stories summarized in keyframe “quads”• Navigate through stories to video timeline/shots• Transparent icon overlays
– Visited: grayed– Relevant: green– Irrelevant:red
• Query-relevance shown with size and color• Hotkeys for most actions• Multi-select and drag and drop
FX Palo alto Laboratory Inc @ trecvid 2004 8
Text query boxImage query box
Trecvid topic text
Text search type
Trecvid topic images
Query results area
Gray visited overlay
Relevant shots areaMedia player
and zoom area
Video timeline
Expanded shots area
Excluded overlay
Included overlay
Selected story
FX Palo alto Laboratory Inc @ trecvid 2004 9
Story Summary Quads
• Query-dependent story summary– Use 4 highest scoring shots in the story– Allocate space proportional to score
Story thumbnail Shot thumbnails
FX Palo alto Laboratory Inc @ trecvid 2004 10
Building on searches
• Find similar– Use shot/story text
for search
• Add related– Auto re-query with
existing results
FX Palo alto Laboratory Inc @ trecvid 2004 11
Expanded Story / Timeline Browsing
• Selecting a story expands the video at that point– Clickable video timeline with
relevancy shading– Clickable story quad timeline– Shot thumbs marked with relevancy– Overlay on shots marked
(non)relevant– Mouse-overs zoom in the media
player and tool-tip shows relevancy context
– Double clicks play video in the media player
FX Palo alto Laboratory Inc @ trecvid 2004 12
Experiments
• 6 searchers answering 12 topics each in latinsquare– Pairs of orthogonal users grouped together
• Each topic answered 3 times
– Searchers include 2 primary developers• 1 ended up in best and 1 in worst performing group
• Each of the 3 complete searcher runs goes through 3 “systems” or methods for filling out the shot list yielding 9 total submissions
FX Palo alto Laboratory Inc @ trecvid 2004 13
System Types
• Type 1: re-issue user queries and weight results of each query by precision
against the user-labeled shots• Type 2:
take text from all relevant shots and issue a single new LSS-based text query
• Type 3: take text from each relevant shot in turn for LSS-based query and
apply query ranking as in system type 1
Shots marked as not-relevant excluded from system results
Every system type preceded by bracketing the user-retrieved shots
FX Palo alto Laboratory Inc @ trecvid 2004 14
Submissions
User IDedShots
BracketedShots
System1(Weighted)
System2(LSA1)
System3(LSA2)
+
+
FX Palo alto Laboratory Inc @ trecvid 2004 15
Results
• Ranks 3-6, 9-13 in overall MAP– Strongly user dependent (user groups clump
together)– Post-processing methods perform nearly same
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
I_A_1
_AL_
2_5
I_A_1
_AL_
1_4
I_A_1
_AL_
3_6
I_A_1
_AL_
1_7
I_A_1
_AL_
2_8
I_A_1
_AL_
3_9
I_A_1
_AL_
1_1
I_A_1
_AL_
2_2
I_A_1
_AL_
3_3
Submissions
MA
P
FXPal submissionsOther contributors
2 3 1
FX Palo alto Laboratory Inc @ trecvid 2004 16
User vs. System
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
MAP
Group 1 Group 2 Group 3
User Group
System Summary
WEIGHTED
LSA1
LSA2
Bracketed
None
FX Palo alto Laboratory Inc @ trecvid 2004 17
User vs. System in Overall
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
I_A
_1_A
L_2
_5I_
A_1
_AL
_1_4
I_A
_1_A
L_3
_6
fxpa
l_2
_bra
cket
edI_
A_1
_AL
_1_7
I_A
_1_A
L_2
_8I_
A_1
_AL
_3_9
I_A
_1_A
L_1
_1I_
A_1
_AL
_2_2
I_A
_1_A
L_3
_3
fxpa
l_2
_use
rsfx
pal_
1_b
rack
eted
fxpa
l_3
_bra
cket
ed
fxpa
l_3
_use
rsfx
pal_
1_u
sers
Submission
MA
P
With bracketingUser selected only
Complete submission
Other contributors
FX Palo alto Laboratory Inc @ trecvid 2004 18
Performance by Question
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
peop
le o
n st
eps
or s
tairs
pede
stria
ns a
nd v
ehic
les
bicy
cles
rol
ling
peop
le m
ovin
g a
stre
tche
r
umbr
ella
s
finge
rs s
trik
ing
keyb
oard
build
ings
on
fire
hand
held
wea
pon
firin
g
golf
ball
into
the
hole
Bill
Clin
ton
tenn
is p
laye
r co
ntac
ting
ball
hors
es in
mot
ion
peop
le a
nd d
ogs
whe
elch
airs
sign
s at
a p
rote
st
zoom
ing
in U
S C
apito
l dom
e
Ben
jam
in N
etan
yahu
build
ings
with
floo
d w
ater
s
Hen
ry H
yde
Sad
dam
Hus
sein
.
Sam
Don
alds
on
hock
ey r
ink
Bor
is Y
elts
in
MA
P
Overall median
FXPal average
Overall max
FX Palo alto Laboratory Inc @ trecvid 2004 19
Directions
• More sophisticated:– Story segmentation– Image similarity / video features
• Simplify user interface for non power-users and more typical search and re-use tasks
• Handle multiple simultaneous media streams– Presentation slides– Multi-camera capture