extracting features from spatio-temporal volumes (stvs) for activity recognition dheeraj singaraju...
TRANSCRIPT
![Page 1: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/1.jpg)
Extracting features from spatio-temporal volumes (STVs) for activity recognition
Dheeraj Singaraju
Reading group: 06/29/06
![Page 2: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/2.jpg)
Motivation for dealing with STVs• Optical flow based methods would be able to capture only
first order motion. • Methods that use HMMs deal with single point trajectories
that carry only motion information and no spatial information
We aim at a direct scheme for event detection and classification that does not require feature tracking, segmentation or computation of optical flow
We want to detect points in the space-time volume which have significant local variation in both space and time.
![Page 3: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/3.jpg)
Approaches that we shall discuss• On Space-Time Interest Points; Ivan Laptev
– Local image features provide compact and abstract representations of images, eg: corners
– Extend the concept of a spatial corner detector to a spatio-temporal corner detector
• Actions as Objects: A Novel Action Represenation; Alper Yilmaz and Mubarak Shah– Concepts of differential geometry: Extract features from the STV
based on local variations in curvatures of points on the volume– The curvatures show invariance to rotation and translation
![Page 4: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/4.jpg)
Detecting interest points in space• An image can be modeled by its linear scale
representation as follows
• To look for interest points one analyzes the matrix of 2nd moments :
A more familiar form of the matrix
![Page 5: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/5.jpg)
Detecting interest points in space (contd.)
• We want to choose corners in the image since they have significant spatial variation.
• We therefore detect positive maxima of the following function
How do we detect interest points in space-time ?
![Page 6: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/6.jpg)
Results of detecting interest points in space
• Detecting interest points in space gives interest points in the stationary background also
• We want to find interest points that have information in the space as well as the temporal domain.
![Page 7: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/7.jpg)
Detecting interest points in space-time• A spatio-temporal image sequence can be modeled by its
linear scale representation as follows
• Note that there are different scales for the spatial and the temporal scale, i.e. and respectively
![Page 8: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/8.jpg)
Detecting interest points in space-time (contd.)
• To look for interest points one analyzes the matrix of 2nd moments :
• We therefore look for the maxima of the following spatio-temporal corner function
![Page 9: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/9.jpg)
Results of detecting interest points in the STV
• Consider a synthetic sequence of a ball moving towards a wall and colliding with it
• An interest point is detected at the collision point
![Page 10: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/10.jpg)
Results of detecting interest points in the STV • Consider a synthetic sequence of 2 balls moving towards
each other
• Different interest points are calculated at different spatial and temporal scales
coarser scale
![Page 11: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/11.jpg)
Effects of scales on interest point detectionLong temporal events are detected for large values of while short events are detected for small values of
Long spatial events are detected for large values of while short events are detected for small values of
![Page 12: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/12.jpg)
Scale selection in space-time• We consider a prototype event modeled by a spatio-
temporal Gaussian blob
• The scale space representation of f is hence given by
![Page 13: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/13.jpg)
Scale selection in space-time (contd.)• We want to find a differential operator that assumes
simultaneous extrema over spatial and temporal scales that are characteristic of this Gaussian prototype event
• To recover the spatio-temporal extent of f, we consider second order derivatives of L normalized by the scales as:
• By solving for the fact that the above normalized 2nd order derivatives assume maxima at scales and we get a =1, b= ¼, c= ½ and d= ¾.
![Page 14: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/14.jpg)
Scale selection in space-time (contd.)
• We therefore define a normalized spatio-temporal Laplace operator as follows:
• The following plots show that the zero crossings correspond to the maxima that are detected at and
![Page 15: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/15.jpg)
Scale adapted space time interest points
• So far we have found events that are local extrema in the space time volume at a particular choice of space and time scales
• We would like to detect interest points that are extrema over the space time volume as well as over the scale of the scale-normalized Laplace operator
• The reason for doing so is that different events would in general have different spatial and temporal extents
![Page 16: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/16.jpg)
Algorithm for detecting interest points
![Page 17: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/17.jpg)
Results on a previously used synthetic example
Note that all the extrema are detected irrespective of their spatial and temporal extents
DOUBT
Why are these points not detected as interest points ?
![Page 18: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/18.jpg)
Results of the algorithm on real seq.
Note that events of all spatial and temporal extents are captured.
The size of the circle shows the spatial extent of the event
![Page 19: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/19.jpg)
Results of interest pt. detection
Note that the regularity and extent of the spatio-temporal interest points is actually representative of the true events in time
![Page 20: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/20.jpg)
Classification of events• Every interest point is described by its local spatio-temporal
neighbor and we compare neighborhoods of events to classify events
• The neighborhood of an interest point is defined by evaluating the following event descriptors
This normalization guarantees the invariance of the derivative response to image scaling
![Page 21: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/21.jpg)
Classification of events (contd.)• To compare two events, we compute the Mahalanobis distance
between their descriptors as
• To detect similar events in the given data, we apply k-means clustering to the event descriptors and thus detect groups of interest points with similar spatio-temporal neighbourhoods
• Once the cluster centers are evaluated from the training data, given a new event, we evaluate its distance from the cluster centers. If the distance from all the centers is above a threshold we declare it as a background event.
![Page 22: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/22.jpg)
Results of classification
![Page 23: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/23.jpg)
Recognizing gaits• We extract the following features from the spatio-temporal
volume – Positions of the interest points:– The corresponding scales: – The class of interest points:
• We introduce a state for the model determined by the vector , where the variables are
– Position of person in the image:– His/her size:– Frequency of the gait:– Phase of the gait at current moment:– Temporal variations of
![Page 24: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/24.jpg)
Recognizing gaits (contd.)
• We then have the following model for walking
• Such a model helps handle translations as well as uniform rescaling in the image and the temporal domain
![Page 25: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/25.jpg)
Recognizing gaits (contd.)• Given a model state X, a current time , a length of time
window , and a set of data features detected from the recent time window , the match between the model and the data is defined by a weighted sum of distances h between the model features and the data features .
• is a data feature minimizing the distance h for a given and is the variance for the exponential function.
![Page 26: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/26.jpg)
Recognizing gaits (contd.)• To find the best match between the model and the data, we
search for the model state that minimizes
![Page 27: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/27.jpg)
Summary of the approach• An interest point detector is developed that finds local
image features that show high variation of the image values in space and in time
• The spatio-temporal extents of detected events can be estimated by using a normalized Laplacian operator
• The neighborhoods of the events are described using scale invariant spatio-temporal descriptors
• Different actions are then compared by checking for the matches between the event descriptors
![Page 28: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/28.jpg)
Actions as objects: Action sketches• This methods analyzes the spatio-temporal volume by
using the differential geometric surface properties such as peaks, pits, valleys and ridges
• The authors claim that these are important action descriptors as they capture both spatial and temporal properties
• These descriptors are related to the convex and concave parts of the object contours and/or to the maxima in the spatio-temporal curvature of a trajectory, and are hence view invariant.
![Page 29: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/29.jpg)
STV: a collection of contours• In this approach the spatio-temporal volume is really a
hollow solid object whose boundaries are defined by the contours of the boundaries of a person in every image frame.
• It is assumed that the STV can be considered as a manifold, which helps us to consider small neighborhoods around a point to be nearly flat.
• Since the STV is really the time evolution of a contour, we can define a 2D parametric representation by considering arc length s of the contour and time t.
![Page 30: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/30.jpg)
STV: a collection of contours (contd.)
t varying, s fixed s varying, t fixed
The STV is a continuous representation in the normalized time scale and it does not require ay time warping for matching two sequences of different lengths.
![Page 31: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/31.jpg)
Action descriptors• We want to compute action descriptors that correspond to
changes in direction, speed and shape of parts of contour
• Changes in these quantities are reflected on the surface of the STV and can be computed using differential geometry by identifying different landmarks.
• These landmarks can be classified by basis of the local curvatures at points on the STV
![Page 32: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/32.jpg)
Action descriptors (contd.)• Differential geometry gives us the concept of Gaussian
Curvature K and Mean Curvature H that can be evaluated at points on the manifold of the STV. These curvatures exhibit invariance to algebraic transformations such as translation and rotation.
• Local extrema of these curvatures can therefore be used to identify interest points for describing actions
![Page 33: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/33.jpg)
Action descriptors (contd.)• The following table shows the different surface types and
their associated curvatures
![Page 34: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/34.jpg)
Analysis of action descriptors• We consider three types of contours: concave contours,
convex contours and straight contours
• The following contours generate typical landmarks in the spatial-temporal volume– Straight contour: ridge, valley or flat surface– Convex contour: peak, ridge or saddle ridge– Concave contour: pit, valley or saddle valley
Shapes generated from straight contours
![Page 35: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/35.jpg)
STVs corresponding to hand motion
The STV generated by a hand staying stable. Such a motion (or lack of it) creates a ridge
![Page 36: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/36.jpg)
STVs corresponding to hand motion
The STV created by a hand that first moves downwards and then upwards. Note that a saddle ridge is created at the point of change of motion
![Page 37: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/37.jpg)
Properties of the event descriptors• The landmarks discussed so far are essentially produced
due to stable motion or change in stable motion. The stability of motion enforces that the STV is smooth enough so that one can
consider valid local planar neighborhoods at points
• Some of the landmarks are related to the curvature of the point trajectories and body contours as follows
![Page 38: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/38.jpg)
View invariance of event descriptors
• Since the landmarks are associated with extrema of local curvatures, even when the view changes the transformed landmarks are extrema in the new STV
DOUBT: Not very confident about the derivation of the above
• Due to this view invariance, comparing two STV volumes is equivalent to checking if there is a valid Fundamental Matrix relating the set of event descriptors in 2 given action volumes.
Derived formula relating curvatures of corresponding points in 2 different views
![Page 39: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/39.jpg)
Comparing two actions• We check if a linear system of the following kind is satisfied
by the event descriptors in both the actions
• This boils down to checking if the last singular value of A is 0. From a set of possible matches between the input action sketch and the known action sketches, we select the action with the minimum matching score
![Page 40: Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06](https://reader030.vdocument.in/reader030/viewer/2022032722/56649f455503460f94c66de5/html5/thumbnails/40.jpg)
Summary of the approach• Using concepts of differential geometry, extract interest
points; action sketches that have local spatiotemporal information by virtue of being local extrema of curvatures in space-time
• These event descriptors are associated with uniform motion or stable changes in uniform motion
• Since the action sketches are view invariant, comparing 2 actions is equivalent to checking if there is a valid Fundamental Matrix relating the positions of the action sketches for the individual actions.