[ieee 2012 4th international conference on intelligent human-machine systems and cybernetics (ihmsc)...

Fast Template Matching using Pruning Strategy with Haar-like Features

Vinh-Tiep Nguyen John von Neumann Institute, VNU-HCM

University of Science, VNU-HCM Ho Chi Minh city, Vietnam

[email protected]

Khanh-Duy Le John von Neumann Institute, VNU-HCM

University of Science, VNU-HCM Ho Chi Minh city, Vietnam

[email protected]

Minh-Triet Tran University of Science, VNU-HCM

Ho Chi Minh city, Vietnam [email protected]

Anh-Duc Duong University of Information Technology, VNU-HCM

Ho Chi Minh city, Vietnam [email protected]

Abstract —Template matching is one of the most popular problems in computer vision applications. Many methods are proposed to enhance the accuracy and performance in terms of processing time. With the rapid development of digital cameras/recorders and HD video, images captured by modern devices have much higher resolution than before. Therefore it is necessary to have novel real-time processing template matching algorithms. This motivates our proposal to improve the speed of matching an arbitrary given template, especially in a large size image. Our key idea is based on a pruning strategy to remove certain unmatchable positions in a large size image with only a few simple computational operations. Our proposed method has a flexibility for users to select various dissimilarity measures to increase the accuracy of the system in practical applications. Experiments show that our algorithm is faster than the standard algorithm implemented in OpenCV using Fast Fourier Transform about 8 to 10 times.

Keywords - template matching, real-time, Haar-like feature, pruning strategy, high resolution.

I. INTRODUCTION

Template matching or pattern matching is a technique for finding sub-image extracted from a larger image which is similar to a given template. It is an important topic in computer vision and has a lot of applications such as image registration [18], object tracking [17], object recognition [16]. The most common approach for template matching is to slide a window with a similar size to that of a given template to all possible positions of a new image and to find patches that look like the template based on a dissimilarity measure. This algorithm, named as Full Search, is simple and easy to implement but has high computational cost. Figure 1 shows an example of Full Search.

There are different criteria to evaluate a template matching algorithm such as rotation and scale invariant, robustness with change of viewpoint, contrast, occlusion, and noise. Moreover, computational time is also important when using in real-time applications. In this paper, we

mainly focus on improving the speed of algorithm because in many practical cases, the setup of a system is fixed.

We propose a fast algorithm with two main phases. The first one is the pruning phase that skips as many unmatched candidates as possible with only a few simple operations taking advantages of the high performance of computing with integral images [5]. The main objective of this phase is to reduce unnecessary computation at positions that cannot match a template with high certainty. The second phase is the matching phase that calculates the dissimilarity between each prominent candidate in the first phase and the given template. We design our proposed method as a framework with high flexibility so that the weak features and the dissimilarity measure can be updated by new ones to improve the accuracy and effectiveness and to adapt to particular problems.

Figure 1. An Example of Full Search Template Matching

The experiment shows that our proposed algorithm has an approximate accuracy with existing implemented method in the latest release of OpenCV but the speed is faster about 8 to 10 times with a 640x480 query image. With higher resolution images, our algorithm also outperforms in processing time.

2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics

978-0-7695-4721-3/12 $26.00 © 2012 IEEE

DOI 10.1109/IHMSC.2012.155

246

The remainder of this paper is organized as follows. In section II we briefly summarize related works. In Section III, we describe our proposed algorithm. Experimental results are presented and discussed in section IV and the conclusion is in Section V.

II. BACKGROUND AND RELATED WORKS

A. Template matching methods Template matching algorithms can be divided into two

main approaches: area-based and feature-based methods. In the area-based approach, there are many distance measures to classify templates such as: Sum of Squared Differences (SSD), Sum of Absolute Differences (SAD), Cross – Correlation [4]. Area-based matching methods are simple and easy to be implemented. Moreover, this approach can work efficiently with both simple texture and complex texture patterns. However, this approach is usually not robust with change of scale, rotation and view point. As mentioned in Section I, the Full Search algorithm takes much time for computation. To reduce the computational cost, several approaches have been proposed such as Coarse-fine template matching [1], Two-stage template matching [2], Fast Normalized Cross-Correlation using FFT transform [4], Fast Robust Correlation [3]. In recent studies, some fast algorithms based on projection kernel are proposed [13][14]. These algorithms adopt the SSD measure used in full search method. Although SSD is simple, it is not good to measure the distinctiveness to classify patterns [15].

Feature based approaches use local features such as edges [7][11], corners [8], blobs [9][10] and a similarity measure to find best matches between local features in a template image and a query image. Most of these methods are robust with scale, rotation, occlusion, and perspective. However, feature based approach is only used with objects with high contrast to detect edges or enough textures to extract key points. An algorithm to extract and describe key points is complicated and requires much time for computing so it does not suitable for real-time applications.

Because of the simplicity in computation and the ability to process both high and low textured templates, we propose a new fast template matching algorithm following the area based approach. Efficient pruning strategy is used to significantly improve the performance of our proposed method while maintaining the high accuracy.

B. Integral image Integral image, first introduced by F.C. Crow in 1984[6],

is a quick and efficient algorithm for calculating sum of values in a rectangular sub-matrix in a grid. However it was not widely used in computer vision community until its

prominent use in Viola-Jones’ object detection process [5]. Once the summed area table has been computed, we can compute any rectangle in constant time with just 4 references and 3 operators. This leads to the complexity of computing sum of values in a rectangular region reduced from 2( )O n to

(1)O . For example, in the Figure 2, we have:

( ) ( )( ) ( )

( , ) ( ) ( ) ( ) ( )A x x C xA y y C y

i x y I A I C I B I D′< ≤′< ≤

′ ′ = + − −∑

where I(P) is the sum of a rectangular area with P as the bottom right corner.

Figure 2. Computing gray values in a rectangle area using Integral Image

C. Haar-like features Haar-like features are proposed to recognize objects by

Viola and Jones [5]. Each Haar-like feature is divided into dark and bright rectangles. The feature value is the difference between sum of all pixels in dark rectangles and sum of pixels in bright rectangles. Figure 3 shows examples of Haar-like features. Combine with integral images, Haar-like features can be computed with only a few operations. Haar-like features are weak features since there may be many different images having the same Haar-like feature values. However if we combine many Haar-like features together, we can produce a strong feature. In this paper, the pruning phase takes advantages of this idea to reduce the cost of computing.

Figure 3. Examples of Haar-like features [5]

III. PROPOSED ALGORITHM Our proposed algorithm uses the idea of Haar-like

features of a pattern image as weak features to prune certain unmatched regions and retain “potential” locations to the next phase. After the pruning phase, we match the template image with the sub-images at each potential position using normalized cross correlation measure [4]. Because a Haar-like feature is a weak feature, we combine 6 Haar-like features (shown in Figure 3) to quickly filter out unnecessary regions in an input image.

247

Figure 4. Relation between number of candidates (red points) passed over step 1 and the tightness of threshold a) Template image

b) Candidates with loose threshold c) Candidates with normal threshold d) Candidates tight threshold (the green point is the correct position of matching)

Let ( , )if u v be the value of the ith Haar-like feature of the region of interest (ROI) at pixel (u, v) of an input image I and

(0)if be the value of the ith Haar-like feature of the template

image T. We have:

(0) (0) (0)

( , ) ( , ) ( , )i i i

i i i

f u v g u v w u vf g w

= −⎧⎪⎨

= −⎪⎩

where

• ( , )ig u v be the sum of pixels in dark rectangular regions in the ith Haar-like feature of the ROI at position ( , )u v of I;

• (0)ig be the sum of pixels in dark rectangular regions in the

ith Haar-like feature of the template image T;

• ( , )iw u v be the sum of pixels in white rectangular regions in the ith Haar-like feature of the ROI at position ( , )u v of I;

• (0)iw be the sum of pixels in white rectangular regions in

the ith Haar-like feature of the template image T.

To check whether the template image T appears or not at a pixel (u, v) in an input image I, we need to process 3 steps including 2 steps of pruning phase and 1 step of computing normalized cross correlation of matching phase.

• Step 1: For the ith Haar-like feature (1 ≤ i ≤ 6), check if the ratio between the sum of values in black feature regions of the input image I at position (u, v) and that of template image T satisfies a threshold ( )1

iθ . The similar task is performed for the white feature regions of the ith Haar-like feature.

(1) (1)1 (0) (0)

( , ) ( , ): 1 and 1 , for 1 6i ii i

i i

g u v w u vH ig w

θ θ− < − < ≤ ≤

• Step 2: Check if the ratio between the value of the ith Haar-like feature of the input image I at the current position (u, v) and the values of the template image satisfying a specific threshold ( )2

iθ .

(2)2 (0)

( , ): 1ii

i

f u vHf

θ− <

• Step 3: Compute the normalized cross correlation between the sub-region of input image I at a potential candidate point (u, v) and the template image T. We use normalized cross correlation to guarantee that the matching results are not affected by the brightness condition of the environment.

(3)3 : ( , )H u vγ θ<

where:

,,

2 2

,, ,

( , ) ( . )( , )

( , ) ( , )

u vx y

u vx y x y

I x y I T x u y v Tu v

I x y I T x u y v Tγ

⎡ ⎤ ⎡ ⎤− − − −⎣ ⎦ ⎣ ⎦=⎡ ⎤ ⎡ ⎤− − − −⎣ ⎦ ⎣ ⎦

∑∑ ∑

( , )x y is a pixel in the ROI of the input image I at ( , )u v with the same size with the template image.

Figure 4 demonstrates the efficiency of our algorithm in eliminating certain unmatched points with the template. Each red point in an image is the left top corner of a candidate window successfully passing the step 1. With the higher value of the threshold (i.e. loose threshold), there are more candidates passing step 1 of hypothesis testing. For the lower value of the threshold (i.e. tight threshold), less candidates pass the hypothesis testing in step 1. It should be noticed that even with very tight threshold, our proposed method can detect and preserve correctly the correct matching position (illustrated by a green point in Figure 4d). Figure 5 show a full process of our algorithm with a fixed threshold.

248

Figure 5. Full process of our algorithm. a) Template image b) Candidates after step 1 (red points)

c) Candidates after step 2 (green points) and best matched position (blue rectangle)

Figure 6 describes the flow of using Haar-like features and normalized correlation coefficient for matching template. Step 1 and step 2 quickly filter out positions that do not satisfy the criteria of potential points. The final step matches the template with the input image at the remaining potential points.

Figure 6. Flow of the algorithm

IV. EXPERIMENT

A. Dataset Description We compared our proposed method with the algorithm

using Fast Fourier transformation and Integral Image implemented in OpenCV library. There are 6 methods in the function cvMatchTemplate but we only used the method which use normalized cross correlation as a measure of dissimilarity (CV_TM_CCORR_NORMED) because this is the most similar to our method.

In this experiment, we use the dataset from MIT and NASA [19] that includes 30 different input images each size and 10 different templates for each input image. This dataset is also provided a tool to make distortion by noise, JPEG compression and blur filtering. Our system features used in the experiment are Windows 7, Intel Core i5 - 2.27 GHz, 4GB RAM. The result of this dataset is presented in part B and C of this section. As mentioned before, our algorithm is specially outperformed when using with high resolution data. Therefore, we also create our own dataset for this case and this result is presented in part D.

B. Performance

Figure 7. Time of matching with 640x480 input image and different sizes

of templates The system performance is computed by matching a set

of same size input image with many sizes of templates. Figure 7 shows the running time in millisecond when using the dataset of 640�480 pixels images and 16�16, 32�32 and 64�64 pixels template. The time of computation of our algorithm is very lower than the standard algorithm implemented by OpenCV library. Figure 7 also indicates that our algorithm is very stable with change of template size.

249

We also count number of skipped positions after pruning to estimate the effectiveness of our weak features. With the 640�480 input image and 64�64 template, total number of positions that need to be scanned is (640-64)�(480-64) = 239616 pixels. As the result, average number of remainder positions is approximately 52. This means that our pruning phase can eliminate up to 99.98% positions while preserving the correct positions.

C. Accuracy

Figure 8. Accuracy when matching with distorted input images

When doing experiment with undistorted images, both of algorithms have the absolute accuracy. However to evaluate more precisely, we generate distortion in input images with different types of noise such as blur filter, Gaussian noise, and JPEG compression.

Figure 8 illustrates that our algorithm has an approximate accuracy with the algorithm of OpenCV. Especially in the case of JPEG compression, both of algorithms get the 100% of accuracy. Figure 11 shows an example of our good matching result in both original and images with noise.

D. Performance in high resolution images We create 10 datasets corresponding to different

resolutions: 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20 megapixels. Each dataset contains 50 images. To evaluate the relationship between image resolution and time of matching, we use the templates of the same size in each experiment.

For each dataset we generate 100 templates having the same size by randomly cropping subimages from that

dataset. Figure 9 and Figure 10 illustrate the average running time (in milliseconds) of two experiments. The template size is 128�128 in experiment 1 and 256�256 in experiment 2.

Figure 9. Time of matching with 128x128 sized templates

Figure 10. Time of matching with 256x256 sized templates

From the experimental results, we can see that the running time linearly increases when the number of pixels increases. Besides, the running time of our proposed method is 8-10 times faster than that of the standard template matching algorithm implemented in the latest version of OpenCV. Therefore our proposed method outperforms the template matching algorithm in OpenCV in terms of running time while achieving the similar accuracy.

250

Figure 11. Matching results with 3 type of noises. Left top: undistored image. Right top: Gaussian noise.

Left bottom: blurred image. Right bottom: JPEG compression V. CONCLUSION

In this paper, a two phase template matching using the advantage of the high performance of integral image and simple idea of Haar-like feature is proposed. The pattern is divided into two sub-regions and using sum of gray values in these regions, we skip certain unmatched candidate very fast. In final phase, we use a robust measure, normalized cross correlation, to classify the sub-image. Since the flexibility of our method, we can easy to change the weak features and dissimilarity measure for more robust matching. Experimental results demonstrate that our algorithm is suitable to efficiently match templates in real-time system, especially systems that process high resolution images.

ACKNOWLEDGMENT This research is supported by John von Neumann

Institute (in the project B2012-42-01) and University of Science, Vietnam National University – Ho Chi Minh city.

REFERENCES [1] A. Rosenfeld and G. J. Vanderburg, “Coarse-Fine Template

Matching,” in IEEE Transactions on System, Man, and Cybernetics, vol. 7, pp. 104–107, 1997.

[2] G. J. VaderBrug and A. Rosenfeld, “Two-stage Template Matching,” in IEEE Trans. Computers, vol. 26, 1977, in press.

[3] A. Fitch, A. Kadyrov, W. Christmas, and J. Kittler, “Fast Robust Correlation,” in IEEE Trans Image Process, vol. 14, pp. 1063–1073, 2005.

[4] J. P. Lewis, “Fast Template Matching,” Vision Interface 95, Candian Image Processing and Pattern Recognition Society, Quebec City, Canada, pp. 120–123, 1995.

[5] Viola, P. and M. J. Jones, “Robust real-time object detection,” Int. Journal of Computer Vision, vol.57, no.2, pp.37-154, 2001.

[6] F. Crow, “Summed-area tables for texture mapping,” in Proceedings of SIGGRAPH, vol. 18, pp. 207–212, 1984.

[7] A. Hofhauser, C. Steger and N. Navab, “Edge-Based Template Matching and Tracking for Perspectively Distorted Planar Objects,” In Proceedings of the 4th International Symposium on Advances in Visual Computing (ISVC '08), Springer-Verlag, Berlin, Heidelberg, pp. 104–107, 2008.

[8] J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans, Pattern Anal, Mach, Intell, 8, 6 , pp. 679–698, 1986.

[9] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal Computer Vision, vol. 60, pp. 91–110, 2004.

[10] H. Bay, A. Ess, T. Tuytelaars and L. V. Gool, “Speeded-Up Robust Features (SURF),” Computer Vision Image Understanding, vol. 110, pp. 346–359, 2008.

[11] J. Shi and C. Tomasi, “Good Features to Track,” 9th IEEE Conference on Computer Vision and Pattern Recognition, Springer, 1994.

[12] M. Calonder, V. Lepetit, C. Strecha and P. Fua, “BRIEF: Binary Robust Independent Elementary Features,” 11th European Conference on Computer Vision (ECCV), LNCS Springer, 2010.

[13] W. Ouyang and W.K. Cham, “Fast algorithm for Walsh Hadamard transform on sliding windows,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 1, pp. 165–171, Jan. 2010.

[14] W. Ouyang, R. Zhang and W.K. Cham, “Fast pattern matching using orthogonal Haar transform”, In Proc. IEEE CVPR 2010.

[15] B. Girod, “What's Wrong With Mean Squared Error?” in A. B. Watson (ed.), Visual Factors of Electronic Image Communications, MIT Press, pp. 207-220, 1993.

[16] R. M. Dufour, E. L. Miller, and N. P. Galatsanos, “Template matching based object recognition with unknown geometric parameters,” IEEE Trans. Image Process., vol. 11, no. 12, pp. 1385–1396, Dec. 2002

[17] F. Jurie, M. Dhome, “Real Time Robust Template Matching,” in British Machine Vision Conference, pp. 123-132, 2002

[18] L. Ding, A. Goshtasby, and M. Satter, “Volume image registration by template matching,” in Image Visual Computing, vol. 19, no. 12, pp. 821–832, 2001.

[19] http://www.ee.cuhk.edu.hk/~wlouyang/PMEval

251

[ieee 2012 4th international conference on intelligent human-machine systems and cybernetics (ihmsc)...

Documents