[ieee 2011 ieee/asme international conference on advanced intelligent mechatronics (aim) - budapest,...

Depth Estimation from Monocular Vision using Image Edge

Complexity

Sallehuddin Mohamed Haris, Muhammad Khalid Zakaria and Mohd Zaki Nuawi∗

Abstract

Autonomous robotic arm motion requires the use

of a control system in order to prevent collisions with

the targeted object. Generally, in translational motion,

as the camera approaches an object, the degree of com-

plexity of the edges of the object image will change. This

principle can be used to estimate the distance to a tar-

geted object. This work introduces a novel statistical

method, named Moment of Zoomed-Algorithm Kurtosis

(MoZAK), which is based on the I-kaz method, as an

indicator for motion system control. The MoZAK pa-

rameter, Zc which represents the degree of complexity

of image edges, is used to indicate if further actuation of

the motor, or otherwise, is required. The method is com-

pared to conventional statistical methods (standard de-

viation and kurtosis). Results indicate that the MoZAK

method presents a viable distance estimator compared

to conventional statistical methods.

1. INTRODUCTION

In this research, a robotic arm with one degree of

freedom translational motion is used to autonomously

reach a targeted object. Motion is constrained in the

vertical direction and controlled using a single camera

mounted on the end effector. The end-effector (or cam-

era since it is mounted on the end-effector) distance to

the target needs to be continually estimated, so as to

enable the target to be reached safely, without suffer-

ing any collisions. The use of monocular vision from

a single robot mounted camera suffers from an inherent

problem that is the distance of objects in the workspace

cannot be determined accurately from the captured im-

age. We present a method by which the distance from

the camera to the object can be estimated such that the

∗S.M. Haris, M.Z. Zakaria and M.Z. Nuawi are all with

the Department of Mechanical & Materials Engineering, Uni-

versiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia

[email protected], muh [email protected],

[email protected]

robotic arm can safely reach the target without colliding

into it, or conversely, misses it altogether.

2. Background

Distance estimation techniques from visual data

have previously been studied by various researchers, us-

ing either stereo or monocular vision. Stereo image sys-

tems giving good results, have been developed [1, 2].

However, these systems are limited by the baseline dis-

tance between the two cameras. Many of these systems

also fail when used in environments where the image

area contains little texture. This is illustrated in Fig. 1,

where Figs. 1(a) and (b) are respectively the images

captured by cameras on the left and right hand sides of

a stereo vision system. Fig. 1(c) shows the estimated

distance, indicated by different colours. Here, shades

of blue indicate objects that lie furthest away from the

cameras, while objects at medium distance are repre-

sented in red, which progressively turns to yellow as

the distance decreases. However, the black areas are re-

gions where distance could not be ascertained and this

can be attributed to the lack of texture in the captured

images [2] .

A monocular vision system has been developed by

[2] using data from a laser camera as training datasets.

Pre-training is a pre-requisite before this system could

be implemented in practical applications. Similarly,

in [3], 1-D distances were estimated using supervised

learning. In [4], surface reconstruction from single im-

ages was performed for known, fixed, objects . In [5],

depth estimation requires knowledge on colours and

textures, while the method in [6] requires scenes with

uniform colours and textures.

Research on the use of images for robotic motion

servo control has become increasingly significant due

to the large amount of data that can be captured using

relatively cheap and simple equipment. This is further

reinforced by rapid advancements in computing tech-

nology, providing fast computational power at relatively

low prices [7].

In this work, the objective was to use single camera

2011 IEEE/ASME International Conference onAdvanced Intelligent Mechatronics (AIM2011)Budapest, Hungary, July 3-7, 2011

978-1-4577-0839-8/11/$26.00 ©2011 IEEE 868

(a)

(b)

(c)

Figure 1. Depth estimation using binocular im-age [2]

monocular vision to control the translational motion of

a 1-DOF robotic arm. We propose a method for depth

estimation that could be used without a priori knowl-

edge of the scene and does not require the system to be

trained. Object edges are first identified using edge de-

tection methods and the complexity of the edges with

respect to the whole image frame is used as a parameter

to estimate depth.

3. The Depth Estimation Algorithm

Our method comes from the hypothesis that the im-

age edges of a targeted object have higher complexity

when the object is at a further distance away compared

to when it is nearer. The parameter that indicates the de-

gree of complexity of the image edges would decrease

as the camera approaches the targeted object. Ideally

the measurement value of this parameter varies linearly

with the distance between the object and the camera.

Referring to Figure 2, humans beings would gen-

erally be able to differentiate the relative distance of

the object in Figure 2(a) compared to Figure 2(b) just

from visual observation. This has become possible be-

cause data sets have been developed in the human brain

through learning and experience. However, a long pe-

riod of learning is required before such perception func-

tion could be fully developed.

(a)

(b)

Figure 2. Image edges at a distance of (a) 1metre and (b) 0.1 metre from target

The flowchart shown in Figure 3 shows the algo-

rithm to obtain the image edge complexity coefficient.

Subsequently, this coefficient is used to control the ver-

tical downward motion of the robotic arm towards the

targeted object. The control system would signal the

motor to keep on rotating until a certain threshold value

of the coefficient is reached which indicates that the ob-

ject is now sufficiently close to the target.

869

Figure 3. Processing system flowchart

4. Object Detection

Objects in the image frame need to be firstly iden-

tified. This is achieved using image processing methods

as follows. The captured image is stored and processed

using software. The image format is first changed to

reduce computational costs. The captured image is

recorded as a RGB (320 x 240 x 3) pixel image. It

is converted to a monochromatic image of (320 x 240)

pixels.

The second stage is to detect object edges from the

monochromatic image. Here the Canny edge detection

method is used. The processing operation is conducted

using the OpenCV open source library [8]. Results fom

the edge detection operation can be seen in Figure 4,

where Figure 4(a) is the monochrome image and Fig-

ure 4(b) is the image after passing through the Canny

filter. These image processing steps are repeated for

each frame sequence captured by the camera.

5. Statistical Analysis

The method of fourth moment statistical analysis

[9] is now used to analyse data extracted from the im-

age, which is then used to control the robotic arm move-

ment. Motion is restricted to within a plane normal to

the image plane.

Two types of experiments were conducted to study

the effectiveness of this method. The first experi-

(a)

(b)

Figure 4. (a) Image from camera, (b) Objectedges obtained from the Canny filter

ment was to compare MoZAK to conventional statis-

tical methods that are based on kurtosis and standard

deviations. The second experiment studied the effect

of varying the order of statistical analysis. Both ap-

proaches were then compared against each other in or-

der to determine the best analytical method for estimat-

ing the robot arm movement.

Data obtained from the object detection process

was first extracted and transformed into one dimen-

sional data to simplify statistical processing, and hence

reduce processing time. The centroid and edges of the

target object were used as the reference points in the

transformation into one dimensional data. The distance

between the edges and the centroid was used as input

data for statistical analysis. The centroid was obtained

using the K-means clustering method [10]. The cluster

centroid was used to define the order of system analy-

sis. Order variation will be discussed in the discussion

on the approach used to optimise the analysis results.

The results are shown in Figure 5.

The data of distance to centroid was analysed using

conventional statistical methods, namely kurtosis and

870

Figure 5. Clustering using K-means for onecentroid

standard deviation. The statistical value was calculated

for each image sequence recorded by the camera. The

MoZAK (Zc) parameter is explained by the equation:

Zc =

√

1

n

c

∑i=1

std4 (xi)k (xi) (1)

where n is the number of pixels representing object

edges, x is distance from centroid to the edge, k is kur-

tosis, std is standard deviation and c is the order of anal-

ysis.

c describes the order that is used to obtain Zc. The

fourth moment statistical analysis is performed on data

xi according to (1). In brief, the fourth order moment in

(1) is obtained from the kurtosis statistical moment (2)

according to the r-th order moment equation (3).

K =1

nσ4

n

∑i=1

(xi −µ)4(2)

Mr =1

n

n

∑i=1

(xi −µ)r(3)

Hence, (4) and (5) are respectively, the equations

used to obtain the first and second order MoZAK pa-

rameters.

Z1 =

√

1

nstd4 (xi)k (xi) (4)

Z2 =

√

1

nstd4 (xi)k (xi)+ std4 (x2)k (x2) (5)

Table 1. First order statistical analysis results

Image Z1 Kurtosis Standard Deviation

Sequence

Number

1 0.0043 2.8710 0.3108

2 0.0125 10.3600 0.3731

3 0.0224 16.9636 0.3790

4 0.0083 2.6655 0.3485

5 0.0507 69.3038 0.3233

6 0.0241 3.1571 0.4354

6. Results

6.1. Comparison between conventional statisti-

cal methods and MoZAK

Experiments were conducted to examine the effec-

tiveness of using the MoZAK coeefficient in controlling

the robotic arm movement. Figure 6 shows a plot of the

normalised statistical values. Zc is the parameter intro-

duced in (1). It can be seen that Z1 increases for the first

three points, but the fourth and sixth points give values

that seem to contradict the early hypothesis. This is due

to the reduction in the kurtosis and standard deviation

values as shown in Table 1. Most significantly, kurtosis

dropped to 2.6655 from its previous value of 16.9636.

However, it can be said that this technique gives a gen-

eral impression that Z1 can be used to describe the de-

gree of complexity of object edges in the image.

Figure 6. Statistical analysis results

871

Table 2. Second order MoZAK analysis results

Image Z1 Z2

Sequence

Number

1 0.0244 0.0228

2 0.0326 0.0281

3 0.0372 0.0390

4 0.0450 0.0479

5 0.0944 0.0970

6 0.1687 0.1153

6.2. Varying the degree of statistical analysis

The K-means clustering system was then modified

with the introduction of two centroids that best fit the

data of object edges. Data was analysed as in Section

6.1, but using two reference points or centroids. Z2

was calculated according to (5) to obtain a single sec-

ond degree MoZAK value, and the results are shown in

Figure 7. The MoZAK analysis results for each image

frame recorded by the camera is shown in Table 2.

Figure 7. Statistical analysis results

A comparison of results obtained from using the

first and second order methods is shown in Figure 8. It

can be seen that the change in Z2 is more linear using

the two centroid system compared to the change in Z1

using just one centroid.

7. Discussion

The underlying assumption used in the presented

method is that as the camera comes nearer to the ob-

ject, the image frame becomes less cluttered. This is

a reasonable assumption since the target object image

will increasingly fill up the image frame, pushing out

other objects that clutter the scene. Consequently, as the

camera approaches the object, the edges from the target

object will dominate, hence the standard deviation and

Figure 8. Results of 1st (*) and 2nd (+) orderMoZAK analysis

kurtosis of object edges with respect to the whole im-

age will decrease. The MoZAK parameter has been de-

fined such that a high MoZAK number indicates a less

cluttered image compared to one with a lower MoZAK

number. However, the assumption, may not be true for

all image sequences as shown by by sequence number

for in the results obtained.

8. Conclusion

The analysis technique discussed in this paper pro-

vides an effective approach for controlling translational

motion of a robotic arm using a single eye-in-hand

monocular camera. Unlike other works on depth es-

timation from monocular vision, this method does not

require pre-training or a priori knowledge of the scene.

In general, the second order integrated statistical

analysis method, Z2 is capable of producing a parame-

ter whose value changes linearly with each consecutive

image sequence, recorded as the camera approaches the

target object. In conclusion, Z2 represents the degree of

complexity of object edges whose value can be used as

a measure of the distance between the camera and the

target.

9. ACKNOWLEDGMENTS

The support from the Ministry of Science, Technol-

ogy and Innovation, Malaysia in providing the research

grant no. 03-01-02-SF0459 is gratefully acknowledged.

References

[1] D. Scharstein, and R. Szeliski, A Taxonomy and Evalua-

tion of Dense Two-frame Stereo Correspondence Algo-

872

rithms, International Journal of Computer Vision, vol.

47, 2002, pp. 7-42.

[2] A. Saxena, J. Schulte, and A.Y. Ng., Depth Estima-

tion using Monocular and Stereo Cues. In Proc. Interna-

tional Joint Conferences on Artificial Intelligence, 2007,

pp. 2197–2203

[3] J. Michels, A. Saxena, and A.Y. Ng. High speed obsta-

cle avoidance using monocular vision and reinforcement

learning. In Proc. International Conference on Machine

Learning, 2005.

[4] T. Nagai, T. Naruse, M. Ikehara, and A. Kurematsu.

HMM-based surface reconstruction from single images.

In Proc IEEE Intl Conf Image Processing, volume 2,

2002, pp. 561–564.

[5] G. Gini and A. Marchi. Indoor robot navigation with sin-

gle camera vision. In Pattern Recognition in Information

Systems, 2002, pp. 67–76.

[6] M. Shao, T. Simchony, and R. Chellappa. New algo-

rithms from reconstruction of a 3-d depth map from one

or more images. In Proc IEEE Computer Vision and Pat-

tern Recognition, 1988, pp. 530–535.

[7] J. Campbell, R. Sukthankar, I. Nourbakhsh, and A.

Pahwa, A robust visual odometry and precipice detec-

tion system using consumer-grade monocular vision, in

Proc. IEEE International Conference on Robotics and

Automation, 2005, pp. 3421–3427.

[8] G.R. Bradski and A. Kaehler. Learning OpenCV Com-

puter Vision with the OpenCV Library. United States of

America, OReilly Media, 2008.

[9] M.Z. Nuawi, M.J.M Nor, N. Jamaludin, S. Abdullah, F.

Lamin, and C.K.E Nizwan, Development of Integrated

Kurtosis-Based Algorithm for Z-filter Technique, Jour-

nal of Applied science, vol. 8 no. 8, 2008, pp. 1541-

1547.

[10] J.B. MacQueen, Some methods for classification and

analysis of multivariate observations, Proc. Fifth Sym-

posium on Math, Statistics, and Probability, 1967, pp.

281-297.

873

[ieee 2011 ieee/asme international conference on advanced intelligent mechatronics (aim) - budapest,...

Documents