monitoring creatures great and small: computer vision systems...

Monitoring Creatures Great and Small: Computer Vision Systems for Looking at

Grizzly Bears, Fish, and Grasshoppers

Greg Mori, Maryam Moslemi, Andy Rova, Payam Sabzmeydani, Jens Wawerla

Simon Fraser University

VAIB workshop - December 7, 2008

Captivating Cinema

video: Prof. Larry Dill, SFU Biological Sciences

Computer Vision for Data Collection

• “Looking at Animals” problems• Sifting through video to find animals

• Determining what the animals are up to

• Classifying species of animals

• Symbiotic relationship• Natural scientists receive data

• Computer scientists receive

• real-world datasets

• ground truth for quantifiable success/failure

Outline

• Detection of animals in video

• Grizzly bears

• Analyzing animal behaviours

• Grasshoppers

• Recognizing animal species

• Fish

Grizzly Bear Monitoring

• New eco-tourism site on salmon spawning river• Grizzly bears feed on

salmon

• Will human presence negatively impact bears?

• “Bearcam” deployed to watch bears on-site in northern Yukon

Grizzly Bear Monitoring

• New eco-tourism site on salmon spawning river• Grizzly bears feed on

salmon

• Will human presence negatively impact bears?

• “Bearcam” deployed to watch bears on-site in northern Yukon

Ni’iinlii Njik Park

Bearcam

• Bearcam system recorded approx. 4h video per day for 15 days

Bear Detection

• Bears have distinct shape and pattern of motion• extract image gradients and background difference

• build classifier to detect bears

background difference

spatial gradients

Classifier

• Build bear detector using variant of AdaBoost (Viola-Jones)• A set of weak learners is built from thresholded

background subtraction and gradient features

pos.gradient

ht(x) = pt ft(x) < ptθt

neg.gradient

neg.back. sub.

pos.back. sub.

Results

• Crop windows from video frames

• Training set• 451 windows containing

bears

• 45100 without bears

• Test set• 400 bear windows

• 40000 without

!"!#

!"!$

!"!%

!"!!

!""

!"!$

!"!%

!"!!

!""

&'()*+,-)./.0*)+,*1+2.34-2

5.))+1'/*

60*1'7*+89:+-0*1+'((+1;3)

+

+

<-5=.3*4+5-/.-3+'34+)>',*

5-/.-3

)>',*

Results on Frames

• Run classifier on entire frame, take highest response

• Same training set• bootstrap negative set

• Test set• 405 frames with at least

1 bear

• 16000 with none

• detect 76% at 0.001 FPPI

• detect 88% at 0.01 FPPI

!"!#

!"!$

!"!%

!"!!

!""

!"!$

!"!%

!"!!

!""

&'()*+,-)./.0*)+,*1+.2'3*

2.))+1'/*

40*1'3*+567+-0*1+'((+189)

+

+

:--/)/1',+"

:--/)/1',+!

:--/)/1',+%

Outline


• Grizzly bears


• Grasshoppers


• Fish

Understanding Insect Actions• How are grasshoppers’ actions

affected by spiders?• Predator-prey relationship

• Environment variables• Temperature

• Light

• Presence of food

• Collect data on grasshopper movement rates and actions• Lab environment, glass case

• Calibrated stereo cameras

Tracking

• Background subtraction tracker in each camera

Top Camera Bottom Camera

Clustering with Action Features

• Smooth the 3D track

• For each non-overlapping window of size w of track compute the difference between x(t) and x(t+∆t)

• Use spectral clustering on these features

Clustering Results

• Cluster purity measured

• 3530 hand-labelled frames

2 4 6 8 100

0.2

0.4

0.6

0.8

1

Number of added jump samples

Corr

ectn

ess

Figure 5. Effect of Number of added jump samples on perfor-

mance of detecting jump actions.

3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Number of clusters

Corr

ectn

ess

Without jump samples

With jump samples

Figure 6. Impact of sampling from jumps on performance.

Curves show correctness of frames labelled as jumping with

and without samples from this class.

action. This number will be the fraction of actions that

are correctly classified.

The average of the correctness is shown for each of

them in Figure 7 with respect to number of clusters. The

plot shows this fraction for each action and for all of

them together. As it is shown in this figure our overall

performance is above 80 percent and the graph is almost

smooth for K > 5.

Figure 6 shows the importance of having samples

from rare activities. In our experiments jump frames

are rare and their features are very different from walk-

ing and standing still. We checked in our experiments

whether the jumps are sampled or not and plotted the

correctness of recognized jump frames in both cases. As

it is shown in Figure 6 there is a big change if we do not

sample jump frames. In this case the computed eigenvec-

tors which are the embedding coordinate will not lead to

a good clustering because we estimate the eigenvectors

of the whole affinity using them and if there were no

samples of the unusual actions the Nystrom extension

will not accurately reconstruct the eigenvectors.

We also did experiments using different values of r to

analyze the effect of this parameter on the performance

of the algorithm. As it can be seen in Figure 5, hav-

ing more samples could result in a slightly better per-

formance but the method is relatively stable for different

values. More importantly, if we do not have any sam-

ples from the rare actions we cannot recognize them cor-

rectly.

3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

Number of clustersC

orr

ectn

ess

Standing still

Walking

Jumping

Overall

Figure 7. Impact of number of clusters on performance

5 Conclusion

In this paper we have developed features using 3D

track of the object and applied the spectral algorithm to

recognize actions of the object. It uses samples points

from the data to cluster all of it and this will improve the

performance. We show how to use the Nystrom exten-

sion in a problem that involves small clusters, and that

the naive random sampling will have a substantial effect

on performance on these clusters.

References

[1] T. Balch, Z. Khan, and M. Veloso. Automatically track-

ing and analyzing the behavior of live insect colonies.

In Fifth International Conference on Autonomous Agents,

2001.

[2] S. Belongie, K. Branson, P. Dollar, and V. Rabaud. Mon-

itoring animal behavior in the smart vivarium. In MB,

2005.

[3] Camera Calibration Toolbox for Matlab.

http://www.vision.caltech.edu/bouguetj/calib doc/.

[4] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Be-

havior recognition via sparse spatio-temporal features. In

VS-PETS, 2005.

[5] C. Fowlkes, S. Belongie, F. Chung, and J. Malik. Spec-

tral grouping using the nystrom method. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

26(2), February 2004.

[6] J. Shi and J. Malik. Normalized cuts and image segmen-

tation. IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence, 22(8):888–905, 2000.

[7] A. Subramanya, A. Raj, J. Bilmes, and D. Fox. Recogniz-

ing activities and spatial context using wearable sensors.

In Conference on Uncertainty in AI (UAI), 2006.

[8] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers.

Wallflower: Principles and practice of background main-

tenance. In The Proceedings of the Seventh IEEE Interna-

tional Conference on Computer Vision, pages 255–261,

1999.

[9] L. Zelnik-Manor and P. Perona. Self-tuning spectral clus-

tering. In Proceedings of NIPS 2004, 2004.

[10] H. Zhong, J. Shi, and M. Visontai. Detecting unusual

activity in video. In Proceedings of the 2004 IEEE Com-

puter Society Conference on Computer Vision and Pat-

tern Recognition, CVPR 2004, pages 819– 826, June

2004.

Clustering Visualization

• Take all frames in “jump” cluster

• Show all such clips in one shorter video

• Minimize spatial/temporal overlap of clips

• Rav-Acha, Pritch, Peleg CVPR06

Outline


• Grizzly bears


• Grasshoppers


• Fish

• Biologists have many hours of underwater video footage

• Require count of fish by species

• Use as proxy for tiger shark count

• Currently, people must watch and manually identify/count

• Automatic system could save many hours of labour

Counting Fish

Challenges

• Video has limited resolution and is interlaced

• Underwater lighting has shifts in intensity and color

• Plants and sediment can cause false positives when detecting movement

• Fish appear with arbitrary locations and poses

Method overview

1. Preprocess video frames to crop candidate subimages

2. Find correspondences between unknown images and known fish template images

3. Warp unknown images into alignment with the templates

4. Use support vector machines (SVMs) to classify the unknown images by fish species

SVM

SVM

Classificationdecision

query image

template 1

template 2

find correspondences and warp

find correspondences

and warp

query warped to template 1

query warped to template 2

filter responses

filter responses

Warping examples

SVM kernel unwarped warpedlinear 84% 90%polynomial 81% 86%

Table 1: Results of SVM classification

Also reported are the rates of correct classification for SVMs trained on texture featuresgenerated from the original, untransformed images.

For both the linear and polynomial SVM kernels, warping the images into align-ment with a template prior to classification improved the results. Curiously, the rbf SVMkernel, which works well for many classification problems, provided negligible results inour experiments. The best performer was the linear kernel operating on warped imagedata, followed by the polynomial kernel, also trained on warped images.

(a) test image (b) template (c) warped test image

(d) test image (e) template (f) warped test image

(g) test image (h) template (i) warped test image

Figure 5: Warping examples: in each row, the rightmost column shows the result ofwarping the leftmost column into approximate alignment with the center column images,using an affine transformation estimated from the calculated point correspondences. Thefirst two rows show reasonable transformations (5(f) and 5(c)), while the correspondenceswere not recovered well in the third row example and consequently 5(i) is distorted.

5 ConclusionIn this paper we have demonstrated a novel application of two existing techniques – shapecontexts and efficient dynamic programming-based correspondence using the distancetransform, to solve a difficult deformable template matching problem. In feature-poor

Experimental results

SVM kernel no warping warped

linear 84% 90%

polynomial 81% 86%

Automatic classification of 320 hand-cropped video frames of two fish species

some misclassifications

Acknowledgements

• graduate students

• natural scientist collaborators• Prof. Dill, Prof. Rothley, S. Marshall,

G. Dutton

• funding from Canada Foundation for Innovation / BC Knowledge Development Fund• Scientific Data Acquisition,

Transmission, and Storage (SDATS) project

Thank you

monitoring creatures great and small: computer vision systems...

Documents