automatic video indexing via object motion analysis 1997 pattern recognition
Post on 07-Jul-2018
221 Views
Preview:
TRANSCRIPT
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
1/19
~ e r g a m o n
Pattern Recognition, Vol. 30, No. 4, pp.
607--Q25
1997
©
1997 Pattern Recognition Society. Pnblished by Elsevier Science Ltd
Ptinted
in Great Britain. All rights reserved
0031-3203/97 $17.00+.00
PH:
S0031-3203(96)00107-0
AUTOMATIC VIDEO INDEXING VIA OBJECT
MOTION ANALYSIS
JONATHAN D. COURTNEY*
Texas Instruments Incorporated 8330
LBJ Freeway
M S 8374
Dallas
Texas
75243
U.S.A
Received
12
June
1996;
received
or
publication
30
July
1996)
Abstract-To
assist human
analysis of
video
data a technique has
been developed
to
perform
automatic
content-based
video
indexing
from
object motion. Moving objects
are
detected in tbe
video sequence using
motiOn s e ~ m e n t a t i o n I?etbo?s. By t r c ~ n g individual objects through tbe segmented data a symbolic
representation of
tbe video IS
generated m
tbe
form of a directed
graph
describing
tbe
objects and tbeir
movement. This
graph
is then annotated using a rule-based classification scheme to identify
events
of interest
~ . g .
a ~ p e a r a n c e / d i . s a p p e a r a n ~ e
deposit/removal entrance/exit, and motion/rest of objects. One may then use ~
m?ex mto. tbe monon
graph
mstead of
the
raw
data
to analyse
the semantic
content of
tbe video.
Application of
tb1s techmque
to
surveillance
video
analysis is discussed.
©
1997 Pattern Recognition Society. Published
by
Elsevier
Science
Ltd.
Video indexing
Object tracking Motion analysis
Content-based retrieval
1.
INTRODUCTION
Advances in multimedia technology, including commer
cial prospects for video-on-demand and digital library
systems, have generated recent interest in content-based
video analysis. Video data offers users of multimedia
systems a wealth of information; however, it is not as
readily manipulated as other data such as text. Raw video
data has no immediate handles by which the multi
media system user may analyse its contents. By annotat
ing video data with symbolic information describing the
semantic content, one may facilitate analysis beyond
simple serial playback.
To assist human analysis
of
video data, a technique has
been developed to perform automatic, content-based
video indexing from object motion. Moving objects
are detected in the video sequence using motion seg
mentation methods. By tracking individual objects
through the segmented data, a symbolic representation
of the video is generated in the form of a directed graph
describing the objects and their movement. This graph is
then annotated using a rule-based classification scheme
to identify events of interest, e.g., appearance/disappear
ance, deposit/removal, entrance/exit, and motion/rest of
objects. One may then use an index into the motion graph
instead
of
the raw data to analyse the semantic content
of
the video.
We have developed a system that demonstrates this
indexing technique in assisted analysis of surveillance
video data. The Automatic Video Indexing (AVI) system
allows the user to select a video sequence of interest, play
it forward or backward and stop at individual frames.
Furthermore, the user may specify queries on video
sequences and
jump
to events
of
interest to avoid
tedious serial playback. For example, the user may select
E-mail:
courtney@csc.ti.com.
607
a person in a video sequence and specify the query show
me all objects that this person removed from the scene .
In response, the AVI system assembles a set of video
clips highlighting the query results. The user may
select a clip of interest and proceed with further video
analysis using queries or playback as before.
The remainder of this paper is organized as follows:
Section 2 discusses content-based video analysis. Sec
tion 3 presents a video indexing technique based on
object motion analysis. Section 4 describes a system
which implements this video indexing technique for
scene monitoring applications. Section 5 presents experi
mental results using the system. Section 6 concludes the
paper.
2. CONTENT-BASED VIDEO ANALYSIS
Video data poses unique problems for multimedia
information systems that text does not. Textual data is
a symbolic abstraction
of
the spoken word that is usually
generated and structured by humans. Video, on the other
hand, is a direct recording
of
visual information. In its
raw and most common form, video data is subject to little
human-imposed structure, and thus has no immediate
handles by which the multimedia system user may
analyse its contents.
For example, consider an online movie screenplay
(textual data) and a digitized movie (video and audio
data). If one were analysing the screenplay and interested
in searching for instances
of
the word horse in the text,
various text searching algorithms could be employed to
locate every instance of this symbol as desired. Such
analysis is common in online text databases.
If,
however,
one were interested in searching for every scene in the
digitized movie where a horse appeared, the task is much
more difficult. Unless a human performs some sort of
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
2/19
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
3/19
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
4/19
610
J. D.
COURTNEY
the region detected by the segmentation of image In
is
due
to the motion
of
an object present in the reference image
(i.e. due to exposed background ), a high probability
exists that the boundary
of
the segmented region will
coincide with intensity edges detected in
0
. If
the region
is due to the presence of a foreground object in the
current image, a high probability exists that the region
boundary will coincide with intensity edges in In. The test
is
implemented by applying an edge detection operator to
the current and reference images and checking for co
incident boundary pixels in the segmented region
of
Cn.<
9
l Figure 3 shows this process.
I f
the test supports
the hypothesis that the region in question
is
due to
exposed background, the reference image
is
modified
by replacing the object with its exposed background
region (see Fig. 4).
No motion segmentation technique
is
perfect. The
following are errors typical of many motion segmenta
tion techniques:
1.
True objects will disappear temporarily from the
motion segmentation record. This occurs when there
is insufficient contrast between an object and an
occluded background region, or
if
an object
is
partially occluded by a background structure (for
instance, a tree or pillar present in the scene).
2.
False objects will appear temporarily in the motion
segmentation record. This
is
caused by light fluctua
tions or shadows cast by moving objects.
3.
Separate objects will temporarily join together. This
typically occurs when two or more objects are in
close proximity or when one object occludes another
object.
4.
Single objects will split into multiple regions. This
occurs when a portion of
an
object has insufficient
contrast with the background it occludes.
Instead of applying incremental improvements to re
lieve the shortcomings of motion segmentation, the AVI
technique addresses these problems at a higher level
where information about the semantic content
of
the
video data
is
more readily available. The object tracking
and motion analysis stages described in Sections 3.3 and
3.4 employ object trajectory estimates and knowledge
concerning object motion and typical motion segmenta
tion errors to construct a more accurate representation
of
the video content.
3.3.
Object tracking
The motion segmentation output is processed by the
object tracking stage. Given a segmented image Cn with
P
uniquely-labeled regions corresponding to foreground
objects in the video, the system generates a set
of
features
to represent each region. This set of features is named a
V-object (video-object), denoted V ~ ,
p =
1, . . . ,
P
A
V-object contains the label, centroid, bounding box, and
shape mask of its corresponding region,
as
well
as
object
velocity and trajectory information generated by the
tracking process.
V-objects are then tracked through the segmented
video sequence. Given segmented images
Cn
and
Cn+t
with V-objects
Vn
= { V ~ ;
p
= 1, . . . , P} and
Vn+l
=
{V,:
1
; q = 1, . . .
,
Q},
respectively, the motion tracking
process links V-objects and V ~ + l if their position
and estimated velocity indicate that they correspond
to
the same real-world object appearing in frames
Fn
and
Fn+l· This
is
determined using linear prediction of
V
object positions and a mutual nearest neighbor criter
ion via the following procedure:
1. For each V-object E
Vn,
predict its position in the
next frame using
if,.
= J/, .
tn+l
-
tn),
where if,.
is
the predicted centroid of in Cn+t>
J1:
the centroid of measured in Cm the estimated
(forward) velocity of V ~ , and
tn+l
and tn are the
timestamps
of
frames Fn+l and
Fm
respectively.
Initially, the velocity estimate is set to
=
0,
0).
2. For each E
Vn,
determine the V-object in the next
frame with centroid nearest
if,..
This nearest neigh
bor
is
denoted J V ~ . Thus,
J V ~ = V ~ + l 3 II.U : - . u ~ + 1 l l
S
if ; - . u ~ + 1 l l Vq
r.
3. For every pair V ~ , J V ~ = V ~ +
1
)
for which
no
other
V
objects in
Vn
have
1
as
a nearest neighbor, estimate
the (forward) velocity
of V ~ + l ' as
r . U ~ + 1 -
vn+l = ;
tn+1 - tn
(1)
otherwise, set v ~ + l = 0,
0).
These steps are performed for each
Cm
n
=
0, 1,
. . .
,
2.
Steps 1 and 2 find nearest neighbors
in the subsequent frame for each V-object. Step 3 gen
erates velocity estimates for V-objects that can be un
ambiguously tracked; this information is used in step 1 to
predict V-object positions for the next frame.
Next, steps
l 3
are repeated for the reverse sequence,
i.e.
Cm n = 1 N -
2,
... , 1. This results in anew set
of predicted centroids, velocity estimates, and nearest
neighbors for each V-object in the reverse direction.
Thus, the V-objects are tracked both forward and back
ward through the sequence. The remaining steps are then
performed:
4.
V-objects and V ~ + l are
mutual nearest neighbors
if J l l ~ = V ~ +
1
and J V ~ +
1
= V ~ . (Here,
J V ~
is the
nearest neighbor
of
in the forward direction, and
J V ~
1
is
the nearest neighbor of
V ~ +
1
in the reverse
direction.) For each pair
of
mutual nearest neighbors
( V ~ , v ~ + 1 ) , create a
prim ry link
from to v ~ + 1
5.
For each E
Vn
without a mutual nearest neighbor,
create a
secondary link
from to if the predicted
centroid
if,.
is
within
E of
V ~ , where
E
is
some small
distance.
6. For each in Vn+
1
without a mutual nearest
neighbor, create a
secondary
link from J V ~ +
1
to
V,:
1
if the predicted centroid p ~ +
1
is
within E of
J V ~ + l
The object tracking procedure uses the mutual nearest
neighbor criterion (step 4) to estimate frame-to-frame V-
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
5/19
\ -
r
o.
I I
....
.__
Automatic video indexing via object motion analysis
a)
b)
(c)
(f)
(g)
h)
Fig. 3. Exposed background detection. a) Reference image /
0
. b) Image In c) Region to be tested. d)
Edge image o a), found using Sobel
0
1 operator. e) Edge image o b). t) Edge image o c), showing
boundary pixels. g) Pixels coincident in d) and
t).
h) Pixels coincident in e) and
t).
The greater number
o coincident pixels in g) versus h) support the hypothesis that the region in question is due to exposed
background.
611
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
6/19
612
J. D.
COURTNEY
Fig. 4.
Reference image modified to account
for the
exposed background region detected
in Fig.
3.
object trajectories with a high degree of confidence. Pairs
of mutual nearest neighbors are connected using a pri
mary link to indicate that they are highly likely to
represent the same real-world object in successive video
frames.
Steps 5-6 associate V-objects that are tracked
with less confidence but display evidence that they might
result from the same real-world object. Thus, these
objects are joined by secondary links. These steps
are necessary to account for the
split
and join
type motion segmentation errors as described in
Section
3.2.
The object tracking process results in a list
of
V-
objects and connecting links that form a directed graph
(digraph) representing the position and trajectory of
foreground objects in the video sequence. Thus, the V-
objects are the nodes
of
the graph and the connecting
links are the arcs. This motion graph is the output
of
the
object tracking stage.
O
Fl
F2
F3
F4
Figure 5 shows a motion graph for a hypothetical
sequence of one-dimensional frames. Here, the system
detects the appearance of an object at A and tracks it to
the V-object at
B.
Due to an error in motion segmenta
tion, the object splits at D and E, and joins at F.
At
G, the
object joins with the object tracked from C due to
occlusion. These objects split at H and
I
Note that
primary links connect the V-objects that were most
reliably tracked.
3 4 otion analysis
The motion analysis stage analyses the results of
object tracking and annotates the motion graph with tags
describing several events of interest. This process pro
ceeds in two parts: V-object grouping and V-object
indexing. Figure
6
shows an example motion graph for
a hypothetical sequence of 1-D frames discussed in the
following sections.
F5
F6 F7
F8
Fig. 5. The output of
the
object tracking stage for a hypothetical sequence of
1 D
frames. The vertical lines
labeled Fn represent
frame
number n Primary links
are shown as
solid arcs; secondary links are
shown as
dashed arcs.
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
7/19
Automatic video indexing via object motion analysis
613
FO Fl
F2
F3
F4
F5
F6 F7
F8 F
FlO Fll
Fl2 Fl3
Fl4
Fig.
6.
An example motion graph for a sequence of 1-D frames.
3.4.1. V-object grouping.
First, the motion analysis
stage hierarchically groups V-objects into structures
representing the paths of objects through the video data.
Using graph theory terminology,0
5
l
five groupings are
defined for this purpose:
A stem M
={Vi:
i
= 1,2,
. . . NM}
is a maximal
size, directed path (dipath)
of
two or more V-objects
containing no secondary links, meeting all of the follow
ing conditions:
• outdegree(Vi) = 1 for 1 ; i
<
NM
• indegree(Vi) = 1 for 1 < i
;
NM and
• either
(2)
or
(3)
where
Jli
is the centroid
of
V-object Vi
EM.
Thus, a stem represents a simple trajectory of an object
through two
or
more frames. Figure 7 labels V-objects
from Fig. 6 belonging to separate stems with the letters
A through
K .
Stems are used to determine the motion sta te
of
real-world objects, i.e. whether they are moving or
FO Fl F2
F3 F4 F5 F6
F7
stationary.
f
equation (2) is true, then the stem is classi
fied as stationary; if equation (3) is true, then the stem is
classified as
moving.
Figure
7
highlights stationary stems
B, C,
F
and H; the remainder are moving.
A branch B ={Vi: i = 1 2,
. . .
NB} is a maximal
size dipath of two or more V-objects containing no
secondary links, for which outdegree(Vi)=1 for
1 ;
i
< NB and indegree(V;)=l for 1 <
i ;
NB Figure
8 labels V-objects belonging to branches with the letters
L through T .
A
branch represents a highly reliable
trajectory estimate of an object through a series of
frames.
f
a branch consists entirely of a single stationary stem,
then it is classified as stationary; otherwise, it is classi
fied as
moving.
Branches N and Q in Fig. (high
lighted) are stationary; the remainder are moving.
A trail is a maximal-size dipath of two or more V-
objects that contains no secondary links. This grouping
represents the object tracking stage's best estimate of an
object trajectory using the mutual nearest neighbor cri
terion. Figure 9 labels V-objects belonging to trails with
the letters U through Z .
A trail and the V-objects it contains are classified as
stationary if
all the branches it contains are stationary,
K
F8
F9
FlO
Fll
Fl2 Fl3 Fl4
Fig. 7. Stems. Stationary stems are highlighted.
FO Fl
F2 F3
F4
F5
F6 F7
FS
F9
FlO
Fll
Fl2
Fl3
Fl4
Fig. 8. Branches. Stationary branches are highlighted.
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
8/19
614
J.
D.
COURTNEY
FO
Fl
F2 F3
F4 F5
F6 F7 FS
F9 FlO
Fll
Fl2 Fl3 Fl4
Fig. 9. Trails.
and moving if all the branches it contains are moving.
Otherwise, the trail is classified as unknown. Trail W in
Fig.
9
is stationary; the remainder are moving.
A
track K={L,,G,,
...
LNK_
1
GNK_
1
LNK} is a
dipath of maximal size containing trails {L; : 1 ::;
i::; NK] and connecting dipaths
{G;
: 1
::;
i
<
NK}.
For each
G;
E
K
there must exist a dipath
H = {Vf,G;,
V ~ d
(where Vf is the last V-object in L;, and
1
is the first V-
object inL;
1
),
such that every \ } E H meets the require-
ment
(4)
where is the centroid of
Vf
the forward velocity of
vf t j f;
the time difference between the frames con
taining \ } and Vf, and P j is the centroid of \ ). Thus,
equation (4) specifies that the object must maintain a
constant velocity through path H.
A track represents the trajectory estimate
of
an object
that may cause or undergo occlusion one or more times in
a sequence. The motion analysis stage uses equation (4)
to attempt to follow an object through frames where an
FO
Fl
F2
F3
F4 F5 F6 F7
occlusion occurs. Figure 10 labels V-objects belonging to
tracks with the letters a ,
(3 ,
x , 6 and c .
Note that track 6 joins trails
X
and Y
A track and the V-objects it contains are classified as
stationary if all the trails it contains are stationary, and
moving if all the trails it contains are moving. Otherwise,
the track is classified as
unknown.
Track
x
in Fig. 10 is
stationary; the remaining tracks are moving.
A trace is a maximal-size, connected digraph of V-
objects. A trace represents the complete trajectory of an
object and all the objects with which it intersects. Thus,
the motion graph in Fig. 6 contains two traces: one trace
extends from F
2
to F
; the remaining V-objects form a
second trace. Figure 11 labels V-objects on these traces
with the numbers
1
and
2 .
Note that the preceding groupings are hierarchical, i.e.
for every trace E, there exists at least one track K trail L,
branch
B,
and stem
M
such that
E
2
K
2
L
2
B
2
M.
Furthermore, every V-object is a member
of
exactly one
trace.
The motion analysis stage scans the motion graph
generated by the object tracking stage and groups
V-
objects into stems, branches, trails, tracks, and traces.
FS F9
FlO
Fll
Fl2 Fl3
Fl4
Fig.
10.
Tracks. The dipath connecting trails X and Y from Fig. 9 is highlighted.
FO
Fl
F2
F3
F4
F5 F6
F7
FS
F9
FlO Fll
Fl2
Fl3 Fl4
Fig. 11. Traces.
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
9/19
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
10/19
616
J D. COURTNEY
Table 1
Cond
itions for annotatin
g V-objects with eac
h of the object-mot
ion events
V-obje
ct motion state
Moving
Stationary Unknown
Appearance
1 Head
of
track
1
Head of
trac
k
2 in degree
V) > 0
2 indegr
ee V) = 0
Disappeara
nce 1 Tail
of track
1
Tail
of track
2
outdegree V)
>
0
2
outdegree V)
=
0
Entrance
1 Head
of
track
1 H ead
of
t
rack
2
in degree V
) = 0
2 indeg
ree V) 0
Exi t
1 Tail
of track
1
T
ail of track
2 o
utdegree V)
=
0
2
outdegree V)
= 0
Deposit
1 Head
of
track
2
in degree V
) = 1
Re moval
1
Tail
of track
2
outdegree V) 1
Depositor)
Adjacent to V-ob
ject with dep osit tag
Re mover)
Adj acent fro
m V-object with re
mov l tag
Motio
n
1 Tail
of stationary stem
2
Head of moving ste
m
Rest
1
Tail o
f
moving s
tem
2
Head
of
st
ationary stem
Entrance Entrance Motion Rest Exit ppear Disappearance
FO Fi
F2.:
F3
F ¢ FS
F6
··
F7
FS
F9
FlO F l l
Fl2.:
Fl
:Fl
4
De
positor I Deposit
Exit E
ntrance
Remov
al I Remover
Exit
Fi
g.
12
Annotation r
ules applied to Fig. 6
Video Indexing
Fig. 13 A
high-level diagram
of the
AVI
system
.
it forw
ard or backward
and stop on indiv
idual frames.
T
he system also p
rovides a content-
ba sed retrieval m
e
chanism
by which the A
VI system user m
ay specify
qu eries on a vid
eo sequence usin
g spatial, tempo
ral,
event-, and object-based parameters. Th us, the user
can j
um p to importan
t points in the vi
deo sequence
ba sed on the que
ry specification.
Figure
14
show
s a picture of the
playback porti
on
of
the AVI
GUI. It provides
familiar VCR-like
controls
(i
.e. forward, revers
e, stop, step-forw
ard, step-back), a
s
well as
a system clipb
oar d for record
in g inter
mediate video analysis results (i.e. video cli ps ) .
F
or example, the
clipboard shown
in Fig.
14
contai
ns
three
clips, the result of
a previous quer
y by the user.
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
11/19
Au
tomatic v
id eo index
ing via o
bj ect moti
on analysi
s
617
Fig.
14
The
AVI
system playback interface.
Fig. 15
The AVI
system
query inte
rface.
The user
may sele
ct o ne o
f he se clip
s, p lay it f
or ward a
nd
bac
k, and po
se a new
query us
in g it. Th
e clip(s)
resulting
from th
e new qu
ery are t
he n push
ed onto t
he top
of the
clipboard
stack. T
he user m
ay also
peruse th
e cl ipboa
rd
stack
using th
e button
-c omman
ds
up , dow n
, and
pop .
Fi gure 15
shows
the query
in terface
to the
AVI system
.
Usi
ng the T
ype fie
ld, the us
er may s
pe cify an
y com -
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
12/19
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
13/19
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
14/19
6
20
1 D
. OU
RTN
EY
[ ]
[10
]
[2
0]
[
30]
[
4 ]
[
50]
[60]
[7
0]
[80
]
[90]
[1
00]
[1
1 ]
[1
20]
1
30]
[
140
]
1
50]
1
60]
[17
0]
[180
]
1
90]
[2
]
[210
]
[22
]
[23
]
Fi
g. 21
. Fr a
mes
from
an e
xam p
le vid
eo s
equen
ce. F
rame
nu m
bers
are s
hown
belo
w ea
ch im
age.
ro
om
at
th a
t po i
nt, th
e pe
rso n
is d
efine
d
as
a di
ffere
nt
ob ject.)
T
he
user
retu
rns t
o the
or i
gina l
clip
of F
ig.
23(a
) by
pop p
ing
th e c
lipb
oa rd
stac
k twi
ce. T
hen
the
us er
app l
ies
the
que r
y f
in d
remo
val
even
ts of
this
ob
je ct
to
the
br iefcase. The sys tem responds with a sing le clip
of
the
pe
rson
rem
ov i
ng
the
brief
case
,
as
sh
own
in
F
ig. 2
3(c).
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
15/19
Automatic video indexing via object motion analysis
[36]
[78]
[110]
Fig. 22. Clips from the video sequence of Fig. 21 satisfying the query fmd all deposit events . Boxes
highlight the objects contributing to the event.
a) b)
c) d)
Fig. 23. Advanced video analysis example. Clips show: (a) the briefcase being deposited, (b) the entrance of
the person who deposits the briefcase, (c) the briefcase being removed, (d) the exit
of
the person who
removes the briefcase.
5. EXPERIMENTAL RESULTS
621
Finally, the user specifies the query find exit events of
this object tothepersonremovingthe briefcase. The system
then responds with a single clip of the person as he leaves
the room (with the briefcase), as shown in Fig. 23(d).
The video indexing technique described in this paper
was tested using the AVI system on three video sequences
containing a total of 900 frames,
18
objects, and 44
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
16/19
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
17/19
Automatic video indexing via object motion analysis
623
[O
[12]
[24]
[36]
[ 8] [60]
72]
[8 ]
[96]
[108] [120]
[132]
144]
[156] [168]
[180]
[192]
204]
[216]
[228]
[240J
[252] [264]
[276]
Fig. 24. Frames from Test Sequence 2.
which multimedia system users may navigate through
video sequences. The video indexing technique described
in this paper abstracts raw video information using
motion segmentation object tracking and a hierarchical
path construction method which enables annotation using
several motion-based event tags. Efficient retrieval o
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
18/19
624
J. D. COURTNEY
[O
13]
26] 39]
52] 65]
[78]
[91]
104]
117]
130]
143]
[156]
[169] [182] [195]
208]
221]
[234]
247]
260]
[273] 286]
299]
Fig. 25. Frames from Test Sequence
3.
video clips
is
facilitated by an event index into the
abstracted video. Furthermore, a system employing this
indexing technique for assisted analysis o surveillance
video allows users to jump to points
o
interest in a
video sequence via intuitive spatial, temporal, event-, and
object-based queries.
-
8/18/2019 Automatic Video Indexing via Object Motion Analysis 1997 Pattern Recognition
19/19
Automatic video indexing via object motion analysis 625
a)
b)
Fig. 26. Appearance and exit
of
an individual pedestrian from Test Sequence
3.
Frame F
217
shows the
pedestrian emerging from a car; frame
248
shows the pedestrian walk out of the field of
view.
Acknowledgements- Thanks go to Dinesh Nair and Stephen
Perkins for assisting
in
the design and implementation of the
AVI
system.
REFERENCES
1. HongJiang Zhang, Atreyi Kankanhalli and
W.
Stephen,
Automatic partitioning of full-motion video,
Multimedia
Systems 1(1), 10--28 (1993).
2.
Akihito Akutsu, Yoshinobu Tonomura, Hideo Hashimoto
and Yuji Ohba, Video indexing using motion vectors, in
Visual Communications and Image Processing Proc. SPIE
1818
Petros Maragos, ed., pp. 1522-1530, Boston,
Massachusetts (November 1992).
3. Mikihiro Ioka and Masato Kurokawa, A method for
retrieving sequences of images on the basis of motion
analysis, in Image Storage and Retrieval Systems Proc.
SPIE 1662 pp. 35-46 (1992).
4.
Suh-Yin Lee and Huan-Ming Kao, Video indexing an
approach based on moving object and track, in Storage and
Retrieval for Image and
Video
Databases
Proc.
SPIE
1908 Wayne Niblack, ed.,
pp.
25-36, San Jose, California
(February 1993).
5. Glorianna Davenport, Thomas Aguierre Smith and Nata1io
Pincever, Cinematic primitives for multimedia,
IEEE
Comput. Graphics Appl.
67-74 (July 1991).
6.
Masahiro Shibata, A temporal segmentation method for
video sequences, in Visual Communications and Image
Processing
Proc.
SPIE I818
Petros Maragos, ed., pp.
1194-1205, Boston, Massachusetts (November 1992).
7.
Deborah Swanberg, Chiao-Fe Shu and Ramesh Jain,
Knowledge guided parsing in video databases, in
Storage
and Retrieval for Image and Video Databases
Proc.
SPIE
1908
Wayne Niblack, ed., pp. 13-24, San Jose, California
(February 1993).
8. F Arman, R Depommier, A. Hsu and M-Y. Chiu, Content
based browsing of video sequences, in Proc.
ACM
Int.
Conf on Multimedia San Francisco, California (October
1994).
9.
Ramesh Jain, W.
N.
Martin and J. K. Aggarwal,
Segmentation through the detection of changes due to
motion,
Comput. Graphics Image Process.
11 13-34
(1979).
10.
S.
Yalamanchili,
W.
N.
Martin and
J.
K.
Aggarwal,
Extraction of moving object descriptions via differ
encing, Comput. Graphics Image Process. 18 188-201
(1982).
11. Dana H. Ballard and Christopher
M.
Brown,
Computer
Vision. Prentice-Hall, Englewood Cliffs, New Jersey
(1982).
12. Robert
M.
Haralick and Linda G. Shapiro, Computer and
Robot Vision Vol. 2. Addison-
Wesley,
Reading, Massa
chusetts (1993).
13. Akio Shio and Jack Sklansky, Segmentation of people in
motion, in
IEEE Workshop on Visual Motion
pp. 325-332,
Princeton, New Jersey (October 1991).
14. M.
Irani and P. Anandan, A unified approach to moving
object detection in 2D and 3D scenes, in Proc. Image
Understanding Workshop
pp. 707-718, Palm Springs,
California (February 1996).
15. Gary Chartrand and Ortrud R Oellermann, Applied and
Algorithmic Graph Theory. McGraw-Hill, New York
(1993).
16. Stephen
S.
Intille and Aaron F Bobick, Closed-world
tracking, in Proc. Fifth Int. Conf. on Computer Vision pp.
672-678, Cambridge, Massachusetts (June 1995).
bout the Author JONATHAN
D. COURTNEY received the M.S. degree in Computer Science and the B.S.
degree in Computer Engineering and Computer Science from Michigan State University. r Courtney
is
a
Member of the Technical Staff in the Multimedia Systems Branch of Corporate Research and Development at
Texas Instruments. His Master s thesis research, under the direction of Professor Ani
K.
Jain, concerned mobile
robot localization using multisensor maps. His current research interests include multimedia information
systems and virtual environments for cooperative work. r Courtney is a member of the IEEE.
top related