multimedia document processing using the weblab platform: axes project use case. ow2con'15,...
TRANSCRIPT
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
AXES project use case
17th November 2015
Multimedia document processingusing the WebLab platform
Yann Mombrun – Bruno GrilhèresAdvanced Information Processing
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
Advanced Information Processing
• Airbus Defence and Space• Airbus Group subsidiary• With Airbus and Airbus Helicopters
• Advanced Information Processing team (TCOIC4)• 10 engineers and Ph.D students• R&T/R&D for unstructured document analysis (and data fusion)
Multimedia document processing using the WebLab platform
17th November 2015 2
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
Unstructured document analysis
From unstructured data...
… to structured and actionable knowledge
• Complex processing chain: data gathering, data normalisation, translation, information extraction, information fusion, indexing and search…
• Data variability: media, format, language…
è Need an integration platform
Multimedia document processing using the WebLab platform
17th November 2015 3
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
What is WebLab?
An integration platform• allowing the integration of a selection of
software components (search engine, information extraction, translation, knowledge management, graphical representation using maps/networks, etc.)
• allowing the interoperation of the selected components.
• based on recognised standards (SOA, Web Services, Semantic Web)
• enabling the composition of services inside complex processing chains
A set of media mining services for document processing to be reused in all the WebLab projects.
Multimedia document processing using the WebLab platform
17th November 2015 4
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
What is WebLab?
Internal studies
Multimedia document processing using the WebLab platform
17th November 2015 5
Customer oriented projects
A range of projects in the fields of media-mining, NLP & KM
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
Content providers and end-users
Technical partners
Administrative coordination
Focus on the AXES Project
17th November 2015
Multimedia document processing using the WebLab platform
6
FP7 ICT 2011-2014 http://www.axes-project.eu
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn. Not just search:
ExploreBrowse Experience
P r o j e c t ( G o a l
The$goal$of$AXES$is$to$develop$tools$that$provide$various$types$of$users$with$new$ways$to$interact$with$audiovisual$libraries,$helping$them$dis=cover,$browse,$navigate,$search$and$enrich$archives.������������#� ���������� ���������������������� ������������� =������ ����������� ���������� ������� ��������� ���������%� ����� ��� ��=�������������������������'���� �($�����#�������#�����������=ogy.
A ( u s e r / c e n t e r e d ( a p p r o a c h
Our$aim$is$to$open$up$audiovisual$digital$libraries,$increasing$their$cul=������ ��������� ����� ������ ��� ���������%� ������&���� �!������������� �!� ������� ���� ���������� ��� ����� ����� ��� ������� ���%�����������������������������������������#������������&���������an$early$stage.$Targeted$users$are$media$professionals,$educators,$stu=dents,$amateur$researchers$and$home$users.
Developing tools providing new engaging ways to interactwith audiovisual libraries…
17th November 2015
Multimedia document processing using the WebLab platform
7
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
... for a multitude of end users...
Media professionalsResearchersHome users
!
!
!
Intelligence Officers
17th November 2015 8
Multimedia document processing using the WebLab platform
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
… building on state-of-the-art content analysis techniques…
Computer vision
Speech recognition /audio analysis
Search & navigation
Weakly-supervised methods
pairs from temporally overlapping tracks are used to definepenalties for classifying those as the same person. Simi-larly in [11], same-person and different-person constraintsare included into a Gaussian Process (GP) classifier. Theseconstraints guide the inference procedure for prediction andactive learning tasks. Unlike our work, these approachesrequire a minimum of hand labeled examples. In addition,the domain-specific metrics we learn can be used to definea better kernel for these approaches.
3. Unsupervised face metric learning
In this section we describe our processing pipeline to ex-tract face-tracks, and facial-features in Section 3.1, see Fig-ure 1 for an overview. We continue in Section 3.2 to presenthow we learn metrics for face identification from the ex-tracted face tracks, and how we used them for track identi-fication in Section 3.3.
3.1. Face detection, tracking, and features
In order to build face tracks in videos, we first use aface detector on individual video frames and then link theobtained detections. Such a detection-based approach forobject tracking has been shown effective in uncontrolledvideos [5, 12, 16].
We use the Viola-Jones [18] face detector to get an ini-tial set of detections. In order to link the detections intoface tracks, we employ the approach of [12], which is avariant of the tracking method proposed in [5]. A Kanade-Lucas-Tomasi (KLT) tracker [15] is applied forwards andbackwards in time, which provides point tracks across de-tection bounding boxes. Each detection pair is assigned aconnectivity score according to the number of shared pointtracks. The tracks are formed using agglomerative cluster-ing on the detections using the connectivity scores, whichresults in tracks.
Many of the false positives of the face detector do nothave temporal support. Therefore, such false detections areeasily eliminated by forming face tracks only from detec-tions with a sufficiently large number of shared KLT point-tracks, and then discarding very short tracks. Similarly,there are sometimes temporal gaps in the true face tracks.Such missed detections are recovered by filling in thesegaps using a least-squares estimation technique [12]. Usingthe bounding-box coordinates of the detections in a track,the coordinates of the missing detections are estimated byminimizing the distances to the coordinates of neighbor-ing detections. The same estimation method is also usedfor temporal smoothing of the already existing detectionbounding boxes.
We use facial features to encode the appearance of theface detections in each track. First, using the publicly avail-able code of [5], we localize nine features on the face: thecorners of the eyes and mouth, and three points on the
Figure 1. An overview of our processing pipeline. (a) A face de-
tector is applied to each video frame. (b) Face tracks are created
by associating face detections. (c) Facial points are localized. (d)
Locally SIFT appearance descriptors are extracted on the facial
features, and concatenated to form the final face descriptor.
nose, see Figure 1. We then extract SIFT descriptors atthese nine locations at three different scales, which we con-catenate to form a feature vector f ∈ IRD of dimensionD = 3× 9× 128 = 3456. As the descriptors are computedat facial feature points, it is robust to pose and expressionchanges. Using the SIFT descriptor makes it also robust tosmall errors in localization.
3.2. Metric learning from face tracks
Given a set of face tracks we can extract face pairs fromthem to learn a metric over the face descriptors in an unsu-pervised manner. Let Ti = {fi1, . . . ,fini
} denote the i-thtrack of length ni. We generate a set of positive trainingpairs Pu by collecting all within-frame face pairs:
Pu = {(fik,fil)}. (1)
Similarly, using all pairs of tracks that appear together in avideo frame, we generate a set of negative training pairs Nu
by collecting all between-track face pairs:
Nu = {(fik,fjl) : oij = 1}, (2)
where oij = 1 if two tracks appear in the same video frame,and oij = 0 otherwise.
If for some of the face tracks Ti the character label liis available, then we use these to generate supervised train-ing pairs in a similar manner as above. Positive pairs arecollected from tracks of the same character:
Ps = {(fik,fjl) : li = lj}, (3)
and tracks of different people provide negative pairs:
Ns = {(fik,fjl) : li ̸= lj}. (4)
In practice a large number of training pairs can be gen-erated without using any supervision: the 327 tracks in ourtest set generate roughly 1.4 million positive pairs, and the79 pairs of distinct tracks that occur at the same time yieldapproximately 600.000 negative training pairs. This large
pairs from temporally overlapping tracks are used to definepenalties for classifying those as the same person. Simi-larly in [11], same-person and different-person constraintsare included into a Gaussian Process (GP) classifier. Theseconstraints guide the inference procedure for prediction andactive learning tasks. Unlike our work, these approachesrequire a minimum of hand labeled examples. In addition,the domain-specific metrics we learn can be used to definea better kernel for these approaches.
3. Unsupervised face metric learning
In this section we describe our processing pipeline to ex-tract face-tracks, and facial-features in Section 3.1, see Fig-ure 1 for an overview. We continue in Section 3.2 to presenthow we learn metrics for face identification from the ex-tracted face tracks, and how we used them for track identi-fication in Section 3.3.
3.1. Face detection, tracking, and features
In order to build face tracks in videos, we first use aface detector on individual video frames and then link theobtained detections. Such a detection-based approach forobject tracking has been shown effective in uncontrolledvideos [5, 12, 16].
We use the Viola-Jones [18] face detector to get an ini-tial set of detections. In order to link the detections intoface tracks, we employ the approach of [12], which is avariant of the tracking method proposed in [5]. A Kanade-Lucas-Tomasi (KLT) tracker [15] is applied forwards andbackwards in time, which provides point tracks across de-tection bounding boxes. Each detection pair is assigned aconnectivity score according to the number of shared pointtracks. The tracks are formed using agglomerative cluster-ing on the detections using the connectivity scores, whichresults in tracks.
Many of the false positives of the face detector do nothave temporal support. Therefore, such false detections areeasily eliminated by forming face tracks only from detec-tions with a sufficiently large number of shared KLT point-tracks, and then discarding very short tracks. Similarly,there are sometimes temporal gaps in the true face tracks.Such missed detections are recovered by filling in thesegaps using a least-squares estimation technique [12]. Usingthe bounding-box coordinates of the detections in a track,the coordinates of the missing detections are estimated byminimizing the distances to the coordinates of neighbor-ing detections. The same estimation method is also usedfor temporal smoothing of the already existing detectionbounding boxes.
We use facial features to encode the appearance of theface detections in each track. First, using the publicly avail-able code of [5], we localize nine features on the face: thecorners of the eyes and mouth, and three points on the
Figure 1. An overview of our processing pipeline. (a) A face de-
tector is applied to each video frame. (b) Face tracks are created
by associating face detections. (c) Facial points are localized. (d)
Locally SIFT appearance descriptors are extracted on the facial
features, and concatenated to form the final face descriptor.
nose, see Figure 1. We then extract SIFT descriptors atthese nine locations at three different scales, which we con-catenate to form a feature vector f ∈ IRD of dimensionD = 3× 9× 128 = 3456. As the descriptors are computedat facial feature points, it is robust to pose and expressionchanges. Using the SIFT descriptor makes it also robust tosmall errors in localization.
3.2. Metric learning from face tracks
Given a set of face tracks we can extract face pairs fromthem to learn a metric over the face descriptors in an unsu-pervised manner. Let Ti = {fi1, . . . ,fini
} denote the i-thtrack of length ni. We generate a set of positive trainingpairs Pu by collecting all within-frame face pairs:
Pu = {(fik,fil)}. (1)
Similarly, using all pairs of tracks that appear together in avideo frame, we generate a set of negative training pairs Nu
by collecting all between-track face pairs:
Nu = {(fik,fjl) : oij = 1}, (2)
where oij = 1 if two tracks appear in the same video frame,and oij = 0 otherwise.
If for some of the face tracks Ti the character label liis available, then we use these to generate supervised train-ing pairs in a similar manner as above. Positive pairs arecollected from tracks of the same character:
Ps = {(fik,fjl) : li = lj}, (3)
and tracks of different people provide negative pairs:
Ns = {(fik,fjl) : li ̸= lj}. (4)
In practice a large number of training pairs can be gen-erated without using any supervision: the 327 tracks in ourtest set generate roughly 1.4 million positive pairs, and the79 pairs of distinct tracks that occur at the same time yieldapproximately 600.000 negative training pairs. This large
17th November 2015 9
Multimedia document processing using the WebLab platform
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
… building on state-of-the-art content analysis techniques…
PeoplePlaces
Object categories
Events
17th November 2015 10
Multimedia document processing using the WebLab platform
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
AXES final architectureMultimedia document processing using the WebLab platform
Normalisation
ShotDetection
TextProcessing
Speech Processing
VideoContent
Linking
Messaging and DistributionOrchestration
Image Indexing
TextIndexing
Text, Links and Image
Indices
AXES Pro UI
AXES Home UI
AXES Research UI
17th November 2015 11
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
OpenAXES OW2 project
• Free (or) Open source version of AXES è OW2 WebLab subproject
• Proposed to EC and asked by end users• Demonstration purposes (by Airbus and AXES partners)• Baseline for a following project
• Using partners code or replace by similar components• Not fully Open source yet (TEC and KUL)
• Complementary with WebLab demonstrationè Focus on multimedia processing
Multimedia document processing using the WebLab platform
17th November 2015 12
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
OpenAXES
Free (or) Open Source Version of AXES Based on OW2 WebLab
Already available in the current version :• Video Normalisation (Airbus DS / FFMpeg)• Shot/Scene detection (TEC)• Image concept extraction (KUL)• Spoken word & Metadata search (UT)• Speech to Text (En&Fr) (Airbus DS / Sphinx / LIUM)• Similar search (Airbus DS / Pastec)• On-the-fly search (UO)• Favorites / Like/ Most viewed / Video cutter (DCU)• Recommendations (UT)
• Easy installation on Ubuntu 14.04, 15.10 and Mint 17
17th November 2015
Multimedia document processing using the WebLab platform
13
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
DemonstrationMultimedia document processing using the WebLab platform
17th November 2015 14
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
Questions?Multimedia document processing using the WebLab platform
17th November 2015 15
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
The WebLab Platform
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
OpenAXES Links
• Hosted on OW2 Forge http://forge.ow2.org/projects/openaxes/• Deliverables Versions
http://forge.ow2.org/project/showfiles.php?group_id=436– OpenAXES-1.1.0 2015-07-07 – OpenAXES-1.0.1 2015-05-19 – OpenAXES-1.0.0 2015-04-01 – OpenAXES-0.6.0 2015-02-16 – OpenAXES-0.5.0 2015-01-30 – OpenAXES-0.2.0 2014-12-16
• SVN svn://svn.forge.objectweb.org/svnroot/openaxes • Continuous Build http://bamboo.ow2.org/browse/OPENAXES• Bug Tracking http://jira.ow2.org/browse/OPENAXES
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
Visor: On-the-fly object detection and retrieval
© 2
014
Airb
us D
efen
ce a
nd S
pace
–A
ll rig
hts
rese
rved
. Th
e re
prod
uctio
n,
dist
ribut
ion
and
utili
zatio
n of
thi
s do
cum
ent
as w
ell a
s th
e co
mm
unic
atio
n of
its
cont
ents
to
othe
rs
with
out
expr
ess
auth
oriz
atio
n is
pro
hibi
ted.
O
ffend
ers
will
be
held
lia
ble
for t
he p
aym
ent
of d
amag
es.
All
right
s re
serv
ed
in th
e ev
ent
of th
e gr
ant
of a
pat
ent,
utili
ty m
odel
or
desi
gn.
AXES Datasets
• Year 1• Mix of NISV, DW and BBC content: 40h
• Year 2• NISV PRO: 400h• BBC Eastenders: 500h• BBC PRO: 480h
• Year 3• BBC Home: 1000h• DW RES: 100h• NISV RES: 1000h
• Year 4• NISV Home: 1000h• DW Home: 200h• BBC Benchmarks: 400h• CNN Internet Archive: 100h