monitoring creatures great and small: computer vision systems...
TRANSCRIPT
Monitoring Creatures Great and Small: Computer Vision Systems for Looking at
Grizzly Bears, Fish, and Grasshoppers
Greg Mori, Maryam Moslemi, Andy Rova, Payam Sabzmeydani, Jens Wawerla
Simon Fraser University
VAIB workshop - December 7, 2008
Captivating Cinema
video: Prof. Larry Dill, SFU Biological Sciences
Computer Vision for Data Collection
• “Looking at Animals” problems• Sifting through video to find animals
• Determining what the animals are up to
• Classifying species of animals
• Symbiotic relationship• Natural scientists receive data
• Computer scientists receive
• real-world datasets
• ground truth for quantifiable success/failure
Outline
• Detection of animals in video
• Grizzly bears
• Analyzing animal behaviours
• Grasshoppers
• Recognizing animal species
• Fish
Outline
• Detection of animals in video
• Grizzly bears
• Analyzing animal behaviours
• Grasshoppers
• Recognizing animal species
• Fish
Grizzly Bear Monitoring
• New eco-tourism site on salmon spawning river• Grizzly bears feed on
salmon
• Will human presence negatively impact bears?
• “Bearcam” deployed to watch bears on-site in northern Yukon
Grizzly Bear Monitoring
• New eco-tourism site on salmon spawning river• Grizzly bears feed on
salmon
• Will human presence negatively impact bears?
• “Bearcam” deployed to watch bears on-site in northern Yukon
Ni’iinlii Njik Park
Bearcam
• Bearcam system recorded approx. 4h video per day for 15 days
Bear Detection
• Bears have distinct shape and pattern of motion• extract image gradients and background difference
• build classifier to detect bears
background difference
spatial gradients
Classifier
• Build bear detector using variant of AdaBoost (Viola-Jones)• A set of weak learners is built from thresholded
background subtraction and gradient features
pos.gradient
ht(x) = pt ft(x) < ptθt
neg.gradient
neg.back. sub.
pos.back. sub.
Results
• Crop windows from video frames
• Training set• 451 windows containing
bears
• 45100 without bears
• Test set• 400 bear windows
• 40000 without
!"!#
!"!$
!"!%
!"!!
!""
!"!$
!"!%
!"!!
!""
&'()*+,-)./.0*)+,*1+2.34-2
5.))+1'/*
60*1'7*+89:+-0*1+'((+1;3)
+
+
<-5=.3*4+5-/.-3+'34+)>',*
5-/.-3
)>',*
Results on Frames
• Run classifier on entire frame, take highest response
• Same training set• bootstrap negative set
• Test set• 405 frames with at least
1 bear
• 16000 with none
• detect 76% at 0.001 FPPI
• detect 88% at 0.01 FPPI
!"!#
!"!$
!"!%
!"!!
!""
!"!$
!"!%
!"!!
!""
&'()*+,-)./.0*)+,*1+.2'3*
2.))+1'/*
40*1'3*+567+-0*1+'((+189)
+
+
:--/)/1',+"
:--/)/1',+!
:--/)/1',+%
Outline
• Detection of animals in video
• Grizzly bears
• Analyzing animal behaviours
• Grasshoppers
• Recognizing animal species
• Fish
Understanding Insect Actions• How are grasshoppers’ actions
affected by spiders?• Predator-prey relationship
• Environment variables• Temperature
• Light
• Presence of food
• Collect data on grasshopper movement rates and actions• Lab environment, glass case
• Calibrated stereo cameras
Tracking
• Background subtraction tracker in each camera
Top Camera Bottom Camera
Clustering with Action Features
• Smooth the 3D track
• For each non-overlapping window of size w of track compute the difference between x(t) and x(t+∆t)
• Use spectral clustering on these features
Clustering Results
• Cluster purity measured
• 3530 hand-labelled frames
2 4 6 8 100
0.2
0.4
0.6
0.8
1
Number of added jump samples
Corr
ectn
ess
Figure 5. Effect of Number of added jump samples on perfor-
mance of detecting jump actions.
3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Number of clusters
Corr
ectn
ess
Without jump samples
With jump samples
Figure 6. Impact of sampling from jumps on performance.
Curves show correctness of frames labelled as jumping with
and without samples from this class.
action. This number will be the fraction of actions that
are correctly classified.
The average of the correctness is shown for each of
them in Figure 7 with respect to number of clusters. The
plot shows this fraction for each action and for all of
them together. As it is shown in this figure our overall
performance is above 80 percent and the graph is almost
smooth for K > 5.
Figure 6 shows the importance of having samples
from rare activities. In our experiments jump frames
are rare and their features are very different from walk-
ing and standing still. We checked in our experiments
whether the jumps are sampled or not and plotted the
correctness of recognized jump frames in both cases. As
it is shown in Figure 6 there is a big change if we do not
sample jump frames. In this case the computed eigenvec-
tors which are the embedding coordinate will not lead to
a good clustering because we estimate the eigenvectors
of the whole affinity using them and if there were no
samples of the unusual actions the Nystrom extension
will not accurately reconstruct the eigenvectors.
We also did experiments using different values of r to
analyze the effect of this parameter on the performance
of the algorithm. As it can be seen in Figure 5, hav-
ing more samples could result in a slightly better per-
formance but the method is relatively stable for different
values. More importantly, if we do not have any sam-
ples from the rare actions we cannot recognize them cor-
rectly.
3 4 5 6 7 80
0.2
0.4
0.6
0.8
1
Number of clustersC
orr
ectn
ess
Standing still
Walking
Jumping
Overall
Figure 7. Impact of number of clusters on performance
5 Conclusion
In this paper we have developed features using 3D
track of the object and applied the spectral algorithm to
recognize actions of the object. It uses samples points
from the data to cluster all of it and this will improve the
performance. We show how to use the Nystrom exten-
sion in a problem that involves small clusters, and that
the naive random sampling will have a substantial effect
on performance on these clusters.
References
[1] T. Balch, Z. Khan, and M. Veloso. Automatically track-
ing and analyzing the behavior of live insect colonies.
In Fifth International Conference on Autonomous Agents,
2001.
[2] S. Belongie, K. Branson, P. Dollar, and V. Rabaud. Mon-
itoring animal behavior in the smart vivarium. In MB,
2005.
[3] Camera Calibration Toolbox for Matlab.
http://www.vision.caltech.edu/bouguetj/calib doc/.
[4] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Be-
havior recognition via sparse spatio-temporal features. In
VS-PETS, 2005.
[5] C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spec-
tral grouping using the nystrom method. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
26(2), February 2004.
[6] J. Shi and J. Malik. Normalized cuts and image segmen-
tation. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 22(8):888–905, 2000.
[7] A. Subramanya, A. Raj, J. Bilmes, and D. Fox. Recogniz-
ing activities and spatial context using wearable sensors.
In Conference on Uncertainty in AI (UAI), 2006.
[8] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers.
Wallflower: Principles and practice of background main-
tenance. In The Proceedings of the Seventh IEEE Interna-
tional Conference on Computer Vision, pages 255–261,
1999.
[9] L. Zelnik-Manor and P. Perona. Self-tuning spectral clus-
tering. In Proceedings of NIPS 2004, 2004.
[10] H. Zhong, J. Shi, and M. Visontai. Detecting unusual
activity in video. In Proceedings of the 2004 IEEE Com-
puter Society Conference on Computer Vision and Pat-
tern Recognition, CVPR 2004, pages 819– 826, June
2004.
Clustering Visualization
• Take all frames in “jump” cluster
• Show all such clips in one shorter video
• Minimize spatial/temporal overlap of clips
• Rav-Acha, Pritch, Peleg CVPR06
Clustering Visualization
• Take all frames in “jump” cluster
• Show all such clips in one shorter video
• Minimize spatial/temporal overlap of clips
• Rav-Acha, Pritch, Peleg CVPR06
Outline
• Detection of animals in video
• Grizzly bears
• Analyzing animal behaviours
• Grasshoppers
• Recognizing animal species
• Fish
• Biologists have many hours of underwater video footage
• Require count of fish by species
• Use as proxy for tiger shark count
• Currently, people must watch and manually identify/count
• Automatic system could save many hours of labour
Counting Fish
Challenges
• Video has limited resolution and is interlaced
• Underwater lighting has shifts in intensity and color
• Plants and sediment can cause false positives when detecting movement
• Fish appear with arbitrary locations and poses
Method overview
1. Preprocess video frames to crop candidate subimages
2. Find correspondences between unknown images and known fish template images
3. Warp unknown images into alignment with the templates
4. Use support vector machines (SVMs) to classify the unknown images by fish species
SVM
SVM
Classificationdecision
query image
template 1
template 2
find correspondences and warp
find correspondences
and warp
query warped to template 1
query warped to template 2
filter responses
filter responses
Warping examples
SVM kernel unwarped warpedlinear 84% 90%polynomial 81% 86%
Table 1: Results of SVM classification
Also reported are the rates of correct classification for SVMs trained on texture featuresgenerated from the original, untransformed images.
For both the linear and polynomial SVM kernels, warping the images into align-ment with a template prior to classification improved the results. Curiously, the rbf SVMkernel, which works well for many classification problems, provided negligible results inour experiments. The best performer was the linear kernel operating on warped imagedata, followed by the polynomial kernel, also trained on warped images.
(a) test image (b) template (c) warped test image
(d) test image (e) template (f) warped test image
(g) test image (h) template (i) warped test image
Figure 5: Warping examples: in each row, the rightmost column shows the result ofwarping the leftmost column into approximate alignment with the center column images,using an affine transformation estimated from the calculated point correspondences. Thefirst two rows show reasonable transformations (5(f) and 5(c)), while the correspondenceswere not recovered well in the third row example and consequently 5(i) is distorted.
5 ConclusionIn this paper we have demonstrated a novel application of two existing techniques – shapecontexts and efficient dynamic programming-based correspondence using the distancetransform, to solve a difficult deformable template matching problem. In feature-poor
Experimental results
SVM kernel no warping warped
linear 84% 90%
polynomial 81% 86%
Automatic classification of 320 hand-cropped video frames of two fish species
some misclassifications
Acknowledgements
• graduate students
• natural scientist collaborators• Prof. Dill, Prof. Rothley, S. Marshall,
G. Dutton
• funding from Canada Foundation for Innovation / BC Knowledge Development Fund• Scientific Data Acquisition,
Transmission, and Storage (SDATS) project
Thank you