video stab using ransac

Video Stabilization based on Point Feature Matching Technique

Labeeb Mohsin Abdullah, Nooritawati Md Tahir* & Mustaffa Samad Faculty of Electrical Engineering

Universiti Teknologi MARA (UiTM) 40450, Shah Alam, SELANGOR

Corresponding author: norita_tahir@ yahoo.com*

Abstract This study proposed an algorithm to stabilize jittery videos directly without the need to estimate camera motion. A stable output video will be attained without the effect of jittery that caused by shaking the handheld camera during video recording. Firstly, salient points from each frame from the input video is identified and processed followed by optimizing and stabilize the video. Optimization includes the quality of the video stabilization and less unallied area after the process of stabilization. The output of using such method showed good result in terms of stabilization and discarded distortion from the output videos recorded in different circumstances. Initial results showed that the proposed technique is suitable to be used and provide great deal of stabilization.

Keywords: image processing, video stabilization, point feature matching, salient points, image quality measurement

I. INTRODUCTION Recently, the market of handheld camera has growth rapidly.

However, video capturing by non-professional user normally will lead to unanticipated effects. Hence, many researchers study such drawbacks to enhance the quality of casual videos. Currently, hardware stabilizers are attached to the cameras as effective solution. On one hand, pre-processing techniques such as nonlinear filters is applied to discard the unwanted noise. On the other hand, using multi-stages for pre and post processing could aggravate the existing problems according to errors accumulative. However, there are shortcomings related to process the videos with complicated motion such as multiple moving foreground objects [1].

Generally the process of stabilization have to go through three phases namely motion estimation, motion smoothing and image composition [2]. For the first phase the purpose is to estimate the motion between frames. After that, the parameters of estimated motion which is obtained from the first phase will be sent to motion compensation, where removing the high-frequency distortion and calculating the global transformation, which is very important to stabilize the current frame. Next, warping will be done by image composition for the frame under

processing [9] .This three-step frameworks are the essential steps in most of the video stabilization algorithms.

II. MATERIALS AND METHOD This section will present the overview of the proposed

methodology and implementation as depicted in Figure 1.

Figure 1: Overview of the proposed method of video stabilization.

Reading frames from video sequence recorded

Identify salient points from each frame Perform Harris Corner Detection

Start

Select Correspondences between points Apply sum of squared differences SSD Estimate transform from noisy

correspondences Find initial points of two frames

Apply Gaussian filter Color composite of affine and S-R-T

transform output Transform approximation and smoothing Corrected frame sequence

END

2012 IEEE Control and System Graduate Research Colloquium (ICSGRC 2012)

978-1-4673-2036-8/12/$31.00 2012 IEEE 303

A. Identification of salient points from each frame & Harris Corner Detection

The main goal of this step is to correct the distortion between the two frames by finding a transformation that will be done by applying an object system which returns affine transform [3]. The input for this stage should supply the object with a set of point correspondences between the two frames [4]. Firstly, the wanted points from the two chosen frames have to be identified followed by selecting the common correspondence between the frames. At this point, the candidate points for each frame are identified but to make sure that these points will have corresponding points in the second frame, it is necessary to find points around salient image features, like corners. Thus, Corner Detector System Object is used to find corner values using Harris Corner Detection which is one of the fastest algorithms to find corner values.

B. Select correspondences between points & SSD After the salient points from each frame are obtained the

correspondence between the points that are identified previously need to be picked [4]. For each point, the matching of lowest cost between the points that existed in frame A and B are also needed to be found for all points. Hence, it is necessary to divide the sequence of frames image into 99 block. The matching cost means the distance between frame A and B measured in pixel. To find this cost, the technique of Sum of Squared Differences (SSD) can be used between the consecutive frame images. Each point in frame A is compared with the points in frame B to find the lowest matching cost or in other words the shortest distance between them measured in pixels.

III. RESULTS AND DISCUSSION In this section, the results attained based on the proposed

methodology will be discussed. Table I showed the basic characteristics of each video utilized as database in this study. In addition, the values of the size and the number of bytes for the salient points existed in each video are also tabulated.

A. Strong corners Detection Firstly, an algorithm is developed based on Harris and

Stephens corner detection algorithm [2] to identify all salient points or strong corners from each frame. These points are considered as the anchor points as benchmark for points to be considered and vice versa. Sample of detected points obtained from two frames are as demonstrated in Figure 2. Furthermore, it is observed the total points covered are the same frame features for instance the salient points along the trees, corners of the sidewalk and the moving object.

Figure 2: The detected strong corners from both frames where they marked with green dots.

B. Corresponding points Next, the initial correspondences between the points that is identified from the previous step will be invoked. Correspondences between the invoked points have to be picked for each point, for that purpose a matrix of 9 x 9 blocks will be extracted around each point from its consecutive image frames. The most important here is matching the cost between points by finding the Sum of Squared Differences (SSD) between the consecutive image regions of frames. Thus we have to find the lowest costs to consider them in the solution [8]. Figure 3 showed the same positions for the green color points of the initial corresponding points existed in both frames.

Figure 3: Corresponding points between frames.

However, not all these correspondence points are correct, which means many of them are redundancy points, but at the same time there is a significant number of outlier points as well. This lack will be considered in the next step. SSD will ensure to find the


304

minimum cost matching point in points B with the aid of features, which resulted a loop over points A that search for best matches in points B with features contribution.

C. Accurate correspondence As mentioned above, there are several incorrect point correspondences but strong estimation of geometric transform between the two image frames can be determined using the random sample consensus algorithm (RANSAC) [5][6]. This algorithm searched through the given set of point correspondences specifically valid linear correspondence as in Figure 4.

Figure 4: Correct correspondences according to RANSAC.

From Figure 4, the inliers correspondences consecrated in the image background, not in the foreground, which itself is not aligned; is observed. The reason stand behind this is the background features are far enough that act as if they were on an infinitely distant plane. We can assume that background plane is static and will not change dramatically between the first and second frame, instead, this transform is capturing the motion of the camera. Thus correcting process will stabilized the video. Furthermore, as long as the motion of the camera between frame A and frame B is minimize or the time of sampling the video is high enough, this condition is maintained. The RANSAC algorithm is repeated multiple times and at each run the cost of the result is calculated by projecting frame B onto frame A via Sum of Absolute Differences between the two image frames and the results attained is as in Table I.

Firstly, the effect of the number of corners is influenced by the output as resulted in Vid1 with highest matching point values but least SSD followed by Vid3 and Vid2 respectively. This indicated that Vid2 comprised of the maximum number of salient points to be handled since the SSD attained is the highest.

D. Frames Correction Further, the raw mean video frames and the mean of

corrected frame are computed as in Figure 5.

Figure 6: Cod Figure 5: Corrected frames

The left image showed the mean of the raw input frames that resembled the distorted original video frame due to extreme jittery. On the right side is the mean of the corrected frames with less distortion. This proven that the stabilization algorithm worked well. Several more samples of corrected video frames are as depicted in Figure 6.

E. Quality The output video quality is also measured based on the

proposed methods. This is evaluated based on SVD based grayscale Image value and graphical measurement.

i. SVD Based Grayscale Image Quality Singular value decomposition (SVD) is developed as a new measurement that can express the quality of distorted images either graphically that is in 2D measurement or numerically as a scalar measurement, both near and above the visual threshold. The experiments here utilized SVD based measurement that outperformed the normally used PSNR [10]. Equation 1 represented the computed value for this purpose:


305

where: Dmid represents the midpoint of the sorted DiS k is the image size n is the block size M SVD is the measurement of Singular value decomposition

An example for the output quality for Vid1 based on Equation 1 with k=8, n=1, Di and Dmid represented by 256256 matrix attained M-SVD of 22.20. Hence, the numerical quality obtained from the three sample videos are tabulated in Table 1. As visualize in Figure 6, it can be seen that Vid3 obtained the best quality based on the calculated value that is 40.50% followed by Vid2 with 39.21% and Vid1 22.20%. This resembled that Vid1 has great distortion whilst Vid3 is least distorted.

i. Graphical measurement The criteria of measuring graphical quality in any image or frame can be done as shown in Figure 7. Graphical measurement will indicate the condition of video due to distortion. As seen in

the graphical results, Vid1 has the most size of blocks among all frames blocks to indicate that it is the worst video experienced distortion followed by Vid2 and Vid3. This result agreed with the SVD value calculated earlier.

IV. CONCLUSION In conclusion, the video stabilization technique based on

proposed method showed remarkable results in term of stabilizing high jittery videos suffered from distortion. Initial results also proven that due fusions of RANSAC algorithm, Gaussian filter, Harris, Stephenss and SAD efficiency stabilization process succeeded based on the output quality attained. Future work includes finding better feature detector and overcome the consequences of extreme shaking of handheld camera in feasible real time implementation for video stabilization.

Acknowledgment Funding for presenting this study was supported by Faculty of Electrical Engineeering, UiTM Shah Alam, Selangor.

Figure 6: Three input videos (Vid1, Vid2, Vid3) with different stabilization need to be done since recorded in different circumstances.

Figure 7: Graphical measurement for each sample videos (Vid1, Vid2 & Vid3) as quality stabilization indication


306

REFERENCES [1] M. Gleicher and F. Liu., Re-cinematography: Improving the camerawork

of casual video, ACM Transactions on Multimedia Computing, Communications, and Applications, 5(1), pp 1- 28, 2008.

[2] C. Harris and M.J. Stephens, A combined corner and edge detector, Proc of Alvey Vision Conference, pp 147152, 1988.

[3] Anu Suneja and Gaurav Kumar . An Experimental Study of Edge Detection Methods in Digital Image, Global Journal of Computer Science and Technology, 10(2), 2010.

[4] http://www.mathworks.com/products/computer-vision/demos.html?file=/products/demos/shipping/vision/videostabilize_pm.html.

[5] Fischler, MA; Bolles, RC. "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography." Comm. of the ACM 24, 1981.

[6] Tordoff, B; Murray, DW. "Guided sampling and consensus for motion estimation." 7th European Conference on Computer Vision, 2002.

[7] J. Jin, Z. Zhu, and G. Xu. Digital video sequence stabilization based on 2.5D motion estimation and inertial motion filtering, Real-Time Imaging, 7(4):357365, 2001.

[8] http://siddhantahuja.wordpress.com/tag/sum-of-squared-differences/ [9] M. Pilu. Video stabilization as a variation problem and numerical

solution with the Viterbi method. In Proceedings of Computer Vision and Pattern Recognition, pp 625630, 2004.

[10] Aleksandra Shnayderman, Alexander Gusev, and Ahmet M. Eskicioglu An SVD-Based Grayscale Image Quality Measure for Local and Global Assessment ,IEEE 15(2), 2006.

Table I: Criteria of sample videos & Results (Type: RGB & Extension: AVI) Sample 1st Frame Sample

Inputs

Size

Bytes No

Frames # &

Length

SAD Value

Computational Time (s)

Quality Value

(M -SVD)

Vid1 2x139 1112 34 & 2 8.85e5 8.81 22.20 %

Vid2 2x66 528 73 & 3 0.0824 9.18 39.21%

Vid3 2x128 1024 132 & 4 3.553e3 9.28 40.50%


307