seabed image mosaicing for benthic species countingav/papers/bagheri10benthic.pdf · seabed image...

Seabed Image Mosaicing for Benthic Species

Counting

Hamed Bagheri, Andrew Vardy, and Ralf Bachmayer

Faculty of Engineering and Applied Science

Memorial University of Newfoundland

St. John’s, Canada

Abstract—Counting benthic species from sea floor imagery isuseful for evaluating the state of the local ecosystem. These imagesbeing taken by digital still Cameras mounted on ROVs (RemoteOperating Vehicles) are used to capture images of animals on theseabed. The overlap between images must be detected in orderto eliminate the possibility of counting the same individual morethan once. In order to prevent such an error, a feature-basedMosaicing method using SIFT has been employed to find theseoverlapping regions. This paper will present some initial resultson creating a mosaic from images of the seabed for countingpurpose.

Key Words- Image Processing, Image Registration, Im-

age Mosaicing, Homography Transformation.

I. INTRODUCTION

Monitoring the benthic habitat of marine environments has

wide application in the oil and gas industries (e.g., population

monitoring for environmental impact assessment), as well

as oceanographic research (e.g., population studies, habitat

analysis)[19]. In order to use these imagery data effectively,

there is a need to develop means to extract information from

raw imagery. This step has been done manually until recently

with few exceptions, where the researchers count the number

of animals seen in images or video sequences for further

study. Assuming the automatic or the manual counting works

properly and effectively, there are still conditions that will

make the multiple counting of an animal probable, e.g. if an

animal appears in several images or in a video more than

once, it will be counted several times. A common scenario for

this problem can be shown as follow. A submersible which

is used for exploring the sea floor could follow either of the

tracks showing in Fig. 1; in case (a), in some tracks there is a

possibility that a specific area appears in more than one image

due to adjacency of neighbor tracks. In case (b), we can see

an area which is explored multiple times as it is chosen to be

the starting point for several data collecting explorations.

(a) (b)

Figure1.Typical ROV track for Mosaicing

a) Near tracks b)Dense center

In this paper, we aim to solve the problem of multiple

counting by using mosaics generated from overlapping images

of the sea floor. However, in this work we do not aim to

generate the map of the explored area.

Extracting features of every image and trying to match them

with other collected images is a time consuming and inefficient

process. GPS data of locations of images is employed to re-

duce the load of processing in Feature extraction and matching

stage; also it is fairly possible to guess which images cannot

have any overlapping region with a particular image using GPS

data.

II. RELATED WORKS

The field of image Mosaicing is relatively old with an

extensive research literature. Photo Mosaicing methods in

the research literature mainly fall into two categories. Direct

methods [7][10][8] and feature based methods [9][11][13][14].

Direct methods use all the available image data and can

provide accurate results, but heavily depend on ’brightness

constancy’ and initialization[15].Feature based methods use

special characteristics of an image such as corners; recently

developed feature based methods use invariant features which

makes the Mosaicing system more stable.

With the exception of few papers [19][12], we are not aware

of researches on automating population counting of animals

for any purposes.

III. NAVIGATIONAL GPS DATA

Despite inaccuracy of GPS data for underwater environ-

ment, it is still possible to take advantage of it to categorize

images, e.g. finding images taken within a specific radius

in the neighborhood of a particular image. We also know

that the most probable overlap could happen in between two

images being taken consecutively. For this mean, latitude,

longitude and altitude (depth) are converted to Earth-Centered

Earth-Fixed (ECEF) Cartesian coordinate. Then, distance in

between location of each image with regard to the next image

taken in time is calculated. All images within each radius are

categorized in the same group. The next stage employs each of

these group of images and tries to find if there is any overlap

area between images of the same group or not.

Using this method we can reduce feature matching stage to

17 for the first image, 29 for the second image and 5 for the

third image instead of 775 for each of them, as illustrated in

Fig. 2.

2.2095 2.209 2.2085 2.208 2.2075 2.207 2.2065

x 106

9.39

9.385

9.38

9.375

9.37

9.365

9.36

9.355x 105 Estimated ROV Path using GPS data

X [m]

Y [

m]

Figure 2. Categorizing images in neighbourhood. Red depicts

center image, and Blue shows the neighbouring images.

IV. MOSAICING

A mosaic is a collection of images which have been stitched

to form a larger, single composite image. For ocean floor

mosaics, each image is obtained by moving the camera over

the ocean floor. This moving is not a pure translation, also

tilt view of the camera or changing the orientation of camera

or the vehicle will cause problems in Mosaicing. In order to

combine the overlapping images, we should find geometric

transformation in between pairs and transform the second

image into the first image view. After stitching pairs, corners

and edges of each image will have higher intensity in the final

mosaic which should be blended to obtain a better quality of

the mosaic.

A. FEATURE EXTRACTION AND MATCHING

Most recent work on feature extraction has focused on local

invariant features [3], with applications such as image stitching

[1], 3D modeling, gesture recognition, object recognition[1]

and robotic mapping [16] and others. In this context of local

invariant features, for any object in an image,the features

represent interesting point of the object, ranging from complex

features such as object itself to simpler structures such as edges

or points. Also, these features can be designed to be invariant

to scale orientation and be robust to changes in viewpoint,

illumination, noise and blurring.

Consequently, feature-based approach is an interesting idea.

To our best knowledge the most popular available algorithm

that realizes all the mentioned advantages is the Scale-Invariant

Feature Transform (SIFT) [1].

SIFT describes and detects local features and passes the

particular characteristics of invariance and robustness. For

feature detection, the SIFT algorithm first convolve the image

with Gaussian mask at different scales, and then takes the

difference of blurred images. Maxima/minima of the difference

of Gaussians (DoG), that occur at multiple scales, form

scale-invariant feature points. Based on local image gradient

directions, an orientation is assigned to achieve rotational

invariance. Therefore, a highly distinctive descriptor and par-

tially invariant to 3D viewpoint, illumination, etc is computed.

SIFT perform this computation by dividing the image region

defining the feature into sub-regions. Within each sub-region,

SIFT computes gradient magnitude and orientation values and

forms the histograms combinig them into a descriptor of 128

elements.

The first step in our overlap finding algorithm is extracting

and matching SIFT features between all of the images in one

group gathered from the previous step. Using SIFT features

as-is, a preliminary system was designed for Mosaicing of

underwater images. Figure 3. Shows matched SIFT features

for different viewpoints of a region.

Figure 3. SIFT features extracted and matched.

The next stage in Mosaicing is Image Matching. At this

stage the goal is to find all overlapping images. Connected

overlapping images will be Mosaiced later and will be ready

for further object recognition analysis and counting.

Since each image could possibly be matched with other

images, this problem appears to be expensive in terms of the

computational load. But with grouping the sets using GPS

data from the previous part, we can reduce the complexity to

a lower rate.

B. Homography Transform Estimation

From the feature matching step, we find the images with

large number of matched features between them, then in

order to find the geometrically consistent matched features

(inliers), RANSAC (RANdom SAmple Consensus) [17] is

used. This algorithm is known as a robust estimation process

to estimate image transformation parameters using minimal set

of randomly samples correspondences. Projection effect exists

in most of the images taken from sea floor as the vehicle has

to be close to the scene because of light problem and also

the camera mounted on the vehicle is not necessarily looking

downward. In order to compensate for this effect, we estimate

homography matrix using set of inliers as the fundamental

features.

For each pair of images, the forward homographies are

estimated using at least 4 pairs of corresponding points. Let

(X,Y ) define points in the first image and (x, y) be the

corresponding points in the second image. Then for each

corresponding points, we can obtain:

xy1

= h11 h12 h13h21 h22 h23h31 h32 h33

XY1

Because the homography transform is written using homo-

geneous coordinates, the homography H is defined using 8

parameters plus a free 9th homogeneous scaling factor. There-

fore, at least 4 point-correspondences providing 8 equations

are required to compute the homography. Practically, a larger

number of correspondences is employed to obtain an over-

determined linear system. By rewriting H in a vector form

as h = [h11, h12, h13, h21, h22, h23, h31, h32, h33]T , n pairs

of point-correspondences enable the construction of a 2n× 9linear× system, which is expressed by:

(1)

Solving this linear system involves the calculation of Sin-

gular Value Decomposition (SVD). Such an SVD corresponds

to reworking the matrix to the form of the matrix product

A = UDV T , where the solution h corresponds to the last

column of the matrix V. Then H is determined from h.

C. Multi Band Blending

In reality, each pixel along a ray, has different intensity

appearing in different images. This issue appears more in

underwater images as the submersible has to carry its own

light source and this artificial light makes the center of image

brighter than the corners. An object for Mosaicing might

appear once in the edge or corners of an image and the other

time in the center having different intensity value. Another

reason is rapid attenuation of light in aquatic environments

and existence of perspective in images, which causes closer

objects have higher intensity than further objects in a scene.

This change in intensity values makes the task of blending

important for underwater images.

Multiband blending is developed by Burt and Adelson [6] is

performed for this stage. The idea behind multi-band blending

is performing Laplacian Pyramids for images to make multiple

frequency bands and then blending each frequency band. In

our implementation we used a 4 band scheme. In this case a

high pass image is formed with spatial frequency greater than

4 pixels and the low pass image with less than 2 pixels relative

to the rendered image. Figure 4 shows a mosaic without multi-

band blending, and Figure 6 shows the same Mosaic with

blending algorithm applied.

Figure 4. Mosaic without Multi-band Blending.

V. IMPLEMENTATION

This section introduces some initial results of applying

our approach to selective pairs of high quality images with

enough overlapping region. A seabed imagery including 775

images collected off the west coast of Canada by the Remotely

Operated Platform for Ocean Sciences (ROPOS) [18].

The algorithm for this work can be described as follow:

Algorithm:

Input: n images with GPS Data

I. Converting GPS coordinate to ECEC

II. For all images:

(i) Calculate euclidean distance to the next image

location

(ii) Group the images within its neighbor

III. For all images in one group:

(i) Extract SIFT features from all images in the

group

(ii) Find k nearest-neighbors for each feature

IV. For each image in a group:

(i) Using maximum number of matched features

to this image, select candidates for Mosaicing

(ii) Using RANSAC, compute the homography

between pairs of images.

(iii) Transform the matched image and stitch two

images together

(iv) Render the Mosaic using multi-band blending

Output: m images with no overlap

Figure 6. shows a sample output of this algorithm.

Selecting pairs of images was owing to the problems we

face with during our implementation, which are addressed and

discussed in the following two subsections.

A. Noise in GPS Data

Unlike land and space regimes for which GPS data is

available, there is no global positioning measurement system

for underwater environments. Ideas for such system are being

examined; however, the accuracy would be in order of 100

meters [5]. This inaccuracy in GPS data for underwater envi-

ronments causes errors such as existence of two images, far in

location with no overlap ,in identical GPS coordinate; and also

in some cases, because of inaccuracy in GPS data,location on

two images were estimated too far from each other, which in

its turn lead to expanding the radius of the circle for grouping

images. For few cases, this problem imposed processing of

almost all of the data set images for finding overlap for a

particular image.

B. Lack of features in underwater imagery

Common problems of underwater imagery such as insuffi-

cient number of features, poor quality images with blurring or

very low contrast, variation in light or color and marine snow

(particles in water causing reflectance noise) make the task

of feature extraction and matching challenging. The principle

issue in Mosaicing for underwater imagery is lack of features

which is not comparable with outdoor images. Even human

visual system needs to spend more time to analyze and find

anything meaningful. The following figure shows an example

of this situation.

Figure 5. SIFT matched features, illustrating mismatches.

As can be seen, SIFT has extracted many features from the

image pair and matched them. This problem mainly exists in

images of the sea floor in deep sea, as the majority of the

features are from concrete texture and sandy area, and the

minority of the features belong to interesting objects such as

animals or bigger rocks which are more suitable for Mosaicing

purpose by having more interesting features. Therefore, the

nature and pattern of these images make the task of feature

matching challenging. This problem does not exist for our

sample images from shallow water sea floor which have more

textures.

Also, as the portion of distributed mismatched features

in this situation is higher than portion of correctly matched

features, RANSAC is not able to find inliers effectively. This

problem can be addressed as our main problem in this set of

images.

VI. CONCLUSION AND FUTURE WORKS

A technique for solving the problem of multiple counting

has been presented. Our approach used the navigational Data

to group images with more possibility of having overlap. Then

images with enough overlapping region were stitched together.

In future, techniques will be investigated to extract and

match more interesting features from the ocean floor images

with few features. This may be done with some pre-processing

such as filtering of images to particularly and partially filter

the sandy area, or can be done by filtering a range of extracted

features belonging to this particular pattern.

VII. ACKNOWLEDGMENTS

The first author would like to express sincere thanks to his

advisors Andrew Vardy and Ralf Bachmayer. In addition, the

author is very appreciative of Adam Gobi for his invaluable

motivation and support, and Kim Juniper for generously pro-

viding the Barkley Canyon image data. Finally, support from

the NSERC Canadian Healthy Oceans Network is gratefully

acknowledged.

REFERENCES

[1] M. Brown and D. Lowe. Automatic panoramic image stitching using

invariant features. International Journal of Computer Vision, 74(1):59–

73, 2007.

[2] K. Shepherd and S. Juniper. ROPOS: Creating a Scientific tool from an

industrial ROV. Marine Technology Society Journal, 31(3):48–54, 1997.

[3] T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: A

survey. Foundations and Trends R in Computer Graphics and Vision,

3(3):177–280, 2008.

[4] Nowak, B.M.; Whitney, T.; Ackley, S.F.; , "Analysis of ROV video

imagery for krill identification and counting under Antarctic sea ice,"

Autonomous Underwater Vehicles, pages.1-9,2008

[5] R. Marks, S. Rock, and, M. Lee, "Using visual sensing for control of an

underwater robotic vehicle", IARP Second Workshop on Mobile Robots

for Subsea Environments, May 1994.

[6] P. J. Burt and E. H. Adelson. A multiresolution spline with application

to image mosaics. ACM Transactions on Graphics, 2(4):217–236, 1983.

[7] Reddy, B.S., Chatterji, B.N., “A FFT-Based Technique for Translation,

Rotation, and Scale-Invariant Image Registration”, IEEE Trans. on

Image Processing, 5:8, August 1996.

[8] H. Shum and R. Szeliski. Construction of panoramic mosaics with

global and local alignment. International Journal of Computer Vision,

36(2):101–130, February 2000.

[9] M. Brown and D. Lowe. Automatic panoramic image stitching using

invariant features. IJCV, 74(1):59–73, 2007.

[10] R. Szeliski and S. Kang. Direct Methods for Visual Scene Reconstruc-

tion. In IEEE Workshop on Representations of Visual Scenes, pages

26–33, Cambridge, MA, 1995.

[11] A. Elibol, R. Garcia, O. Delaunoy, and N. Gracias. A New Global

Alignment Method for Feature Based Image Mosaicing. In Proceedings

of the 4th International Symposium on Advances in Visual Computing,

Part II, pages 257-266. Springer, 2008.

[12] Nowak, B.M.; Whitney, T.; Ackley, S.F.; , "Analysis of ROV video

imagery for krill identification and counting under Antarctic sea ice,"

Autonomous Underwater Vehicles, IEEE/OES , vol., no., pp.1-9, 13-14

Oct. 2008

[13] Kudzinava, M., Garcia, R., Marti, J.: Feature-Based Matching of Un-

derwater Images. In: International Workshop on Marine Technology, pp.

96-97. (2007)

[14] Naoki CHIBA, Hiroshi KANO, Michihiko MINOH and Masashi YA-

SUDA “Feature –based image mosaicng”, IEICE, Japan D-II Vol, J82

No 10 pp 1589~1999 1999.10

[15] M. Brown and D. Lowe, “Recognising Panoramas,” Proc. Ninth Int’l

Conf. Computer Vision, pp. 1218-1227, 2003.

[16] S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping with

uncertainty using scale-invariant visual landmarks. The International

Journal of Robotics Research, 21(8):735, 2002.

[17] Martin A. Fischler , Robert C. Bolles, Random sample consensus: a

paradigm for model fitting with applications to image analysis and

automated cartography, Communications of the ACM, v.24 n.6, p.381-

395, June 1981.

[18] K. Shepherd and S. Juniper. ROPOS: Creating a Scientific tool from an

industrial ROV. Marine Technology Society Journal, 31(3):48–54, 1997.

[19] A. F. Gobi, "Towards Generalized Benthic Species Recognition and

Quantification using Computer Vision," in Proceedings of the 4th

Pacific-Rim Symposium on Image and Video Technology (PSIVT2010),

Singapore, to appear, 2010.

Figure 6. Mosaic of two images with 4 level Multi-band blending.

seabed image mosaicing for benthic species countingav/papers/bagheri10benthic.pdf · seabed image...

Documents