seabed image mosaicing for benthic species countingav/papers/bagheri10benthic.pdf · seabed image...
TRANSCRIPT
Seabed Image Mosaicing for Benthic Species
Counting
Hamed Bagheri, Andrew Vardy, and Ralf Bachmayer
Faculty of Engineering and Applied Science
Memorial University of Newfoundland
St. John’s, Canada
Abstract—Counting benthic species from sea floor imagery isuseful for evaluating the state of the local ecosystem. These imagesbeing taken by digital still Cameras mounted on ROVs (RemoteOperating Vehicles) are used to capture images of animals on theseabed. The overlap between images must be detected in orderto eliminate the possibility of counting the same individual morethan once. In order to prevent such an error, a feature-basedMosaicing method using SIFT has been employed to find theseoverlapping regions. This paper will present some initial resultson creating a mosaic from images of the seabed for countingpurpose.
Key Words- Image Processing, Image Registration, Im-
age Mosaicing, Homography Transformation.
I. INTRODUCTION
Monitoring the benthic habitat of marine environments has
wide application in the oil and gas industries (e.g., population
monitoring for environmental impact assessment), as well
as oceanographic research (e.g., population studies, habitat
analysis)[19]. In order to use these imagery data effectively,
there is a need to develop means to extract information from
raw imagery. This step has been done manually until recently
with few exceptions, where the researchers count the number
of animals seen in images or video sequences for further
study. Assuming the automatic or the manual counting works
properly and effectively, there are still conditions that will
make the multiple counting of an animal probable, e.g. if an
animal appears in several images or in a video more than
once, it will be counted several times. A common scenario for
this problem can be shown as follow. A submersible which
is used for exploring the sea floor could follow either of the
tracks showing in Fig. 1; in case (a), in some tracks there is a
possibility that a specific area appears in more than one image
due to adjacency of neighbor tracks. In case (b), we can see
an area which is explored multiple times as it is chosen to be
the starting point for several data collecting explorations.
(a) (b)
Figure1.Typical ROV track for Mosaicing
a) Near tracks b)Dense center
In this paper, we aim to solve the problem of multiple
counting by using mosaics generated from overlapping images
of the sea floor. However, in this work we do not aim to
generate the map of the explored area.
Extracting features of every image and trying to match them
with other collected images is a time consuming and inefficient
process. GPS data of locations of images is employed to re-
duce the load of processing in Feature extraction and matching
stage; also it is fairly possible to guess which images cannot
have any overlapping region with a particular image using GPS
data.
II. RELATED WORKS
The field of image Mosaicing is relatively old with an
extensive research literature. Photo Mosaicing methods in
the research literature mainly fall into two categories. Direct
methods [7][10][8] and feature based methods [9][11][13][14].
Direct methods use all the available image data and can
provide accurate results, but heavily depend on ’brightness
constancy’ and initialization[15].Feature based methods use
special characteristics of an image such as corners; recently
developed feature based methods use invariant features which
makes the Mosaicing system more stable.
With the exception of few papers [19][12], we are not aware
of researches on automating population counting of animals
for any purposes.
III. NAVIGATIONAL GPS DATA
Despite inaccuracy of GPS data for underwater environ-
ment, it is still possible to take advantage of it to categorize
images, e.g. finding images taken within a specific radius
in the neighborhood of a particular image. We also know
that the most probable overlap could happen in between two
images being taken consecutively. For this mean, latitude,
longitude and altitude (depth) are converted to Earth-Centered
Earth-Fixed (ECEF) Cartesian coordinate. Then, distance in
between location of each image with regard to the next image
taken in time is calculated. All images within each radius are
categorized in the same group. The next stage employs each of
these group of images and tries to find if there is any overlap
area between images of the same group or not.
Using this method we can reduce feature matching stage to
17 for the first image, 29 for the second image and 5 for the
third image instead of 775 for each of them, as illustrated in
Fig. 2.
2.2095 2.209 2.2085 2.208 2.2075 2.207 2.2065
x 106
9.39
9.385
9.38
9.375
9.37
9.365
9.36
9.355x 105 Estimated ROV Path using GPS data
X [m]
Y [
m]
Figure 2. Categorizing images in neighbourhood. Red depicts
center image, and Blue shows the neighbouring images.
IV. MOSAICING
A mosaic is a collection of images which have been stitched
to form a larger, single composite image. For ocean floor
mosaics, each image is obtained by moving the camera over
the ocean floor. This moving is not a pure translation, also
tilt view of the camera or changing the orientation of camera
or the vehicle will cause problems in Mosaicing. In order to
combine the overlapping images, we should find geometric
transformation in between pairs and transform the second
image into the first image view. After stitching pairs, corners
and edges of each image will have higher intensity in the final
mosaic which should be blended to obtain a better quality of
the mosaic.
A. FEATURE EXTRACTION AND MATCHING
Most recent work on feature extraction has focused on local
invariant features [3], with applications such as image stitching
[1], 3D modeling, gesture recognition, object recognition[1]
and robotic mapping [16] and others. In this context of local
invariant features, for any object in an image,the features
represent interesting point of the object, ranging from complex
features such as object itself to simpler structures such as edges
or points. Also, these features can be designed to be invariant
to scale orientation and be robust to changes in viewpoint,
illumination, noise and blurring.
Consequently, feature-based approach is an interesting idea.
To our best knowledge the most popular available algorithm
that realizes all the mentioned advantages is the Scale-Invariant
Feature Transform (SIFT) [1].
SIFT describes and detects local features and passes the
particular characteristics of invariance and robustness. For
feature detection, the SIFT algorithm first convolve the image
with Gaussian mask at different scales, and then takes the
difference of blurred images. Maxima/minima of the difference
of Gaussians (DoG), that occur at multiple scales, form
scale-invariant feature points. Based on local image gradient
directions, an orientation is assigned to achieve rotational
invariance. Therefore, a highly distinctive descriptor and par-
tially invariant to 3D viewpoint, illumination, etc is computed.
SIFT perform this computation by dividing the image region
defining the feature into sub-regions. Within each sub-region,
SIFT computes gradient magnitude and orientation values and
forms the histograms combinig them into a descriptor of 128
elements.
The first step in our overlap finding algorithm is extracting
and matching SIFT features between all of the images in one
group gathered from the previous step. Using SIFT features
as-is, a preliminary system was designed for Mosaicing of
underwater images. Figure 3. Shows matched SIFT features
for different viewpoints of a region.
Figure 3. SIFT features extracted and matched.
The next stage in Mosaicing is Image Matching. At this
stage the goal is to find all overlapping images. Connected
overlapping images will be Mosaiced later and will be ready
for further object recognition analysis and counting.
Since each image could possibly be matched with other
images, this problem appears to be expensive in terms of the
computational load. But with grouping the sets using GPS
data from the previous part, we can reduce the complexity to
a lower rate.
B. Homography Transform Estimation
From the feature matching step, we find the images with
large number of matched features between them, then in
order to find the geometrically consistent matched features
(inliers), RANSAC (RANdom SAmple Consensus) [17] is
used. This algorithm is known as a robust estimation process
to estimate image transformation parameters using minimal set
of randomly samples correspondences. Projection effect exists
in most of the images taken from sea floor as the vehicle has
to be close to the scene because of light problem and also
the camera mounted on the vehicle is not necessarily looking
downward. In order to compensate for this effect, we estimate
homography matrix using set of inliers as the fundamental
features.
For each pair of images, the forward homographies are
estimated using at least 4 pairs of corresponding points. Let
(X,Y ) define points in the first image and (x, y) be the
corresponding points in the second image. Then for each
corresponding points, we can obtain:
xy1
= h11 h12 h13h21 h22 h23h31 h32 h33
XY1
Because the homography transform is written using homo-
geneous coordinates, the homography H is defined using 8
parameters plus a free 9th homogeneous scaling factor. There-
fore, at least 4 point-correspondences providing 8 equations
are required to compute the homography. Practically, a larger
number of correspondences is employed to obtain an over-
determined linear system. By rewriting H in a vector form
as h = [h11, h12, h13, h21, h22, h23, h31, h32, h33]T , n pairs
of point-correspondences enable the construction of a 2n× 9linear× system, which is expressed by:
(1)
Solving this linear system involves the calculation of Sin-
gular Value Decomposition (SVD). Such an SVD corresponds
to reworking the matrix to the form of the matrix product
A = UDV T , where the solution h corresponds to the last
column of the matrix V. Then H is determined from h.
C. Multi Band Blending
In reality, each pixel along a ray, has different intensity
appearing in different images. This issue appears more in
underwater images as the submersible has to carry its own
light source and this artificial light makes the center of image
brighter than the corners. An object for Mosaicing might
appear once in the edge or corners of an image and the other
time in the center having different intensity value. Another
reason is rapid attenuation of light in aquatic environments
and existence of perspective in images, which causes closer
objects have higher intensity than further objects in a scene.
This change in intensity values makes the task of blending
important for underwater images.
Multiband blending is developed by Burt and Adelson [6] is
performed for this stage. The idea behind multi-band blending
is performing Laplacian Pyramids for images to make multiple
frequency bands and then blending each frequency band. In
our implementation we used a 4 band scheme. In this case a
high pass image is formed with spatial frequency greater than
4 pixels and the low pass image with less than 2 pixels relative
to the rendered image. Figure 4 shows a mosaic without multi-
band blending, and Figure 6 shows the same Mosaic with
blending algorithm applied.
Figure 4. Mosaic without Multi-band Blending.
V. IMPLEMENTATION
This section introduces some initial results of applying
our approach to selective pairs of high quality images with
enough overlapping region. A seabed imagery including 775
images collected off the west coast of Canada by the Remotely
Operated Platform for Ocean Sciences (ROPOS) [18].
The algorithm for this work can be described as follow:
Algorithm:
Input: n images with GPS Data
I. Converting GPS coordinate to ECEC
II. For all images:
(i) Calculate euclidean distance to the next image
location
(ii) Group the images within its neighbor
III. For all images in one group:
(i) Extract SIFT features from all images in the
group
(ii) Find k nearest-neighbors for each feature
IV. For each image in a group:
(i) Using maximum number of matched features
to this image, select candidates for Mosaicing
(ii) Using RANSAC, compute the homography
between pairs of images.
(iii) Transform the matched image and stitch two
images together
(iv) Render the Mosaic using multi-band blending
Output: m images with no overlap
Figure 6. shows a sample output of this algorithm.
Selecting pairs of images was owing to the problems we
face with during our implementation, which are addressed and
discussed in the following two subsections.
A. Noise in GPS Data
Unlike land and space regimes for which GPS data is
available, there is no global positioning measurement system
for underwater environments. Ideas for such system are being
examined; however, the accuracy would be in order of 100
meters [5]. This inaccuracy in GPS data for underwater envi-
ronments causes errors such as existence of two images, far in
location with no overlap ,in identical GPS coordinate; and also
in some cases, because of inaccuracy in GPS data,location on
two images were estimated too far from each other, which in
its turn lead to expanding the radius of the circle for grouping
images. For few cases, this problem imposed processing of
almost all of the data set images for finding overlap for a
particular image.
B. Lack of features in underwater imagery
Common problems of underwater imagery such as insuffi-
cient number of features, poor quality images with blurring or
very low contrast, variation in light or color and marine snow
(particles in water causing reflectance noise) make the task
of feature extraction and matching challenging. The principle
issue in Mosaicing for underwater imagery is lack of features
which is not comparable with outdoor images. Even human
visual system needs to spend more time to analyze and find
anything meaningful. The following figure shows an example
of this situation.
Figure 5. SIFT matched features, illustrating mismatches.
As can be seen, SIFT has extracted many features from the
image pair and matched them. This problem mainly exists in
images of the sea floor in deep sea, as the majority of the
features are from concrete texture and sandy area, and the
minority of the features belong to interesting objects such as
animals or bigger rocks which are more suitable for Mosaicing
purpose by having more interesting features. Therefore, the
nature and pattern of these images make the task of feature
matching challenging. This problem does not exist for our
sample images from shallow water sea floor which have more
textures.
Also, as the portion of distributed mismatched features
in this situation is higher than portion of correctly matched
features, RANSAC is not able to find inliers effectively. This
problem can be addressed as our main problem in this set of
images.
VI. CONCLUSION AND FUTURE WORKS
A technique for solving the problem of multiple counting
has been presented. Our approach used the navigational Data
to group images with more possibility of having overlap. Then
images with enough overlapping region were stitched together.
In future, techniques will be investigated to extract and
match more interesting features from the ocean floor images
with few features. This may be done with some pre-processing
such as filtering of images to particularly and partially filter
the sandy area, or can be done by filtering a range of extracted
features belonging to this particular pattern.
VII. ACKNOWLEDGMENTS
The first author would like to express sincere thanks to his
advisors Andrew Vardy and Ralf Bachmayer. In addition, the
author is very appreciative of Adam Gobi for his invaluable
motivation and support, and Kim Juniper for generously pro-
viding the Barkley Canyon image data. Finally, support from
the NSERC Canadian Healthy Oceans Network is gratefully
acknowledged.
REFERENCES
[1] M. Brown and D. Lowe. Automatic panoramic image stitching using
invariant features. International Journal of Computer Vision, 74(1):59–
73, 2007.
[2] K. Shepherd and S. Juniper. ROPOS: Creating a Scientific tool from an
industrial ROV. Marine Technology Society Journal, 31(3):48–54, 1997.
[3] T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: A
survey. Foundations and Trends R in Computer Graphics and Vision,
3(3):177–280, 2008.
[4] Nowak, B.M.; Whitney, T.; Ackley, S.F.; , "Analysis of ROV video
imagery for krill identification and counting under Antarctic sea ice,"
Autonomous Underwater Vehicles, pages.1-9,2008
[5] R. Marks, S. Rock, and, M. Lee, "Using visual sensing for control of an
underwater robotic vehicle", IARP Second Workshop on Mobile Robots
for Subsea Environments, May 1994.
[6] P. J. Burt and E. H. Adelson. A multiresolution spline with application
to image mosaics. ACM Transactions on Graphics, 2(4):217–236, 1983.
[7] Reddy, B.S., Chatterji, B.N., “A FFT-Based Technique for Translation,
Rotation, and Scale-Invariant Image Registration”, IEEE Trans. on
Image Processing, 5:8, August 1996.
[8] H. Shum and R. Szeliski. Construction of panoramic mosaics with
global and local alignment. International Journal of Computer Vision,
36(2):101–130, February 2000.
[9] M. Brown and D. Lowe. Automatic panoramic image stitching using
invariant features. IJCV, 74(1):59–73, 2007.
[10] R. Szeliski and S. Kang. Direct Methods for Visual Scene Reconstruc-
tion. In IEEE Workshop on Representations of Visual Scenes, pages
26–33, Cambridge, MA, 1995.
[11] A. Elibol, R. Garcia, O. Delaunoy, and N. Gracias. A New Global
Alignment Method for Feature Based Image Mosaicing. In Proceedings
of the 4th International Symposium on Advances in Visual Computing,
Part II, pages 257-266. Springer, 2008.
[12] Nowak, B.M.; Whitney, T.; Ackley, S.F.; , "Analysis of ROV video
imagery for krill identification and counting under Antarctic sea ice,"
Autonomous Underwater Vehicles, IEEE/OES , vol., no., pp.1-9, 13-14
Oct. 2008
[13] Kudzinava, M., Garcia, R., Marti, J.: Feature-Based Matching of Un-
derwater Images. In: International Workshop on Marine Technology, pp.
96-97. (2007)
[14] Naoki CHIBA, Hiroshi KANO, Michihiko MINOH and Masashi YA-
SUDA “Feature –based image mosaicng”, IEICE, Japan D-II Vol, J82
No 10 pp 1589~1999 1999.10
[15] M. Brown and D. Lowe, “Recognising Panoramas,” Proc. Ninth Int’l
Conf. Computer Vision, pp. 1218-1227, 2003.
[16] S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping with
uncertainty using scale-invariant visual landmarks. The International
Journal of Robotics Research, 21(8):735, 2002.
[17] Martin A. Fischler , Robert C. Bolles, Random sample consensus: a
paradigm for model fitting with applications to image analysis and
automated cartography, Communications of the ACM, v.24 n.6, p.381-
395, June 1981.
[18] K. Shepherd and S. Juniper. ROPOS: Creating a Scientific tool from an
industrial ROV. Marine Technology Society Journal, 31(3):48–54, 1997.
[19] A. F. Gobi, "Towards Generalized Benthic Species Recognition and
Quantification using Computer Vision," in Proceedings of the 4th
Pacific-Rim Symposium on Image and Video Technology (PSIVT2010),
Singapore, to appear, 2010.
Figure 6. Mosaic of two images with 4 level Multi-band blending.