hand gesture recognition for kinect v2 sensor in …...gesture recognition without high cost has...
Post on 08-Jul-2020
6 Views
Preview:
TRANSCRIPT
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016), pp. 407-418
http://dx.doi.org/10.14257/ijseia.2016.10.12.34
ISSN: 1738-9984 IJSEIA
Copyright ⓒ 2016 SERSC
Hand Gesture Recognition for Kinect v2 Sensor in the Near
Distance Where Depth Data Are Not Provided
Min-Soo Kim1 and Choong Ho Lee2
1Dept. of Info. and Comm. Eng.,
Hanbat National Univ., Daejeon-City, Rep. of Korea 2Graduate School of Info. and Comm. Eng.,
Hanbat National Univ., Daejeon-City, Rep. of Korea 1asdq200@naver.com, 2 chlee@hanbat.ac.kr
Abstract
Kinect v2 sensor does not provide depth information and skeletal traction function in
near distance from the sensor. That is why many researches, to recognize hand gestures,
are focused on the skeletal tracking only inside the range of detection. This paper
proposes a method which can recognize hand gestures in the distance less than 0.5 meter
without conventional skeletal tracking when Kinect v2 sensor is used. The proposed
method does not use the information of depth sensor and infrared sensor, but detect hand
area and count the number of isolated areas which are generated by drawing a circle in
the center of the hand area. This method introduces new detectable gestures without high
cost, so that it can be a substitute for the existing mouse-movement controlling and
dynamic gesture recognition method such as clicking a mouse, clicking and dragging,
rotating an image with two hands, and scaling an image with two hands in near distance.
The gestures are appropriate for the user interface of smart devices which employ the
interactions based on hand gestures in near distance.
Keywords: User interface, Kinect v2 sensor, Hand gesture recognition
1. Introduction
In recent years, hand gesture recognition has been actively studied as one of the human
computer interactions. Since the gesture recognition can be used for various kinds of
digital devices such as smartphones, tablet computers as well as conventional desktop
computers and laptop PCs, much of the attention is concentrated on the related works. [1]
To detect hand area, the existing method uses various color models such as YCbCr,
HSV and RGB. They determine thresholds considering illumination and background
objects included in the environments, but they are not consistent and very sensitive to
various environmental factors. Since the colors of face area and other skin area are very
similar to the hand color, it is very difficult to determine the hand area by color
information. For actual application to detect hand area, the existing method shows low
performance to discriminate hand area from face area when hand area is overlapped with
face area. Further, in order to improve detection performance, it is necessary to wear
specific colored gloves [2-4], or uses only depth information without infrared information
[5-7]. Without using the color information and special gloves, finger tracking methods are
generally used such as in [5, 8, 9]. But these kinds of methods use relatively complicated
algorithms [9, 10] such as SVM or Convex Hull or AdaBoost. Moreover, designing hand
gesture recognition without high cost has become another important issue and described
in [11]. A face recognition technique using Kinect is reported in [12].
On the other hand, for the applications, various sensors are introduced in the market.
There are two kinds of sensors that are commonly used. One is used for short distance
such as from 0.2 to 1.2 meters and the others are used for relatively long distance for
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
408 Copyright ⓒ 2016 SERSC
example from 0.8 to 4.0 meters, or from 0.5 to 8.0 meters. These pertain to Kinect v1 and
v2 sensor of Microsoft [13] and that one pertains to Intel Realsense™ SR300 [14].
Further, two kinds of sensors provide the skeletal recognition for hand gesture recognition,
and uses depth information in the available distance. However, Kinect v2 sensor neither
provides depth information nor detects infrared information in the distance less than
0.5meters. Further, the recognition method in Intel Realsense™ needs complicated
conventional skeletal recognition. Figure 1 shows the depth sensor and IR (infrared
sensor) of Kinect v2 sensor. The range of depth sensor and IR sensor of Kinect v2 sensor
is 0.5 meter ~ 8.0 meters while it is 0.8 meter ~ 4.0 meters for Kinect v1. As Kinect
sensors are developed, many research studies are conducted to recognize human
movement activities using the sensors. [9, 15-21] A technique to improve Kinect-skeleton
estimation is reported in [15]. To correct radial distortion of RGB camera and find the
transformation matrix for the correspondence between the RGB and depth image of
Kinect v2 is described in [16]. Intuitive hand gestures for controlling the rotation of 3D
digital object is described in [19]. The conventional method which extracts palm region
using RGB and depth sensor has been reported by [19]. In addition to that, tracking and
evaluation of human motion for medical application is in [20], a depth completion method
using Kinect v2 is in [17]. When emphasis is on the real time using Kinect v2 sensor, it is
reported in [9, 21].
Figure 1. Kinect for Windows v2 Sensor which has a Color Camera, a Depth Sensor and an IR Sensor
However, no research has been reported to utilize Kinect v2 in the near distance
outside the detection range but almost all of the research studies were conducted inside
the detection range. This paper presents a new simple method to recognize hand gestures
in the distance less than 0.5 meter for Kinect v2 sensor without using the data of depth
sensor and IR sensor. In other words, this method enables Kinect v2 sensor to detect hand
area and recognize hand gestures in the near distance less than 0.5 meter where it does not
provide related depth information or infrared information or skeletal recognition method.
Furthermore, the newly developed simple gestures will also be presented. This paper
extends the research which is described in [22].
Section 2 explains the method of hand area detection which we come up with to detect
the hand area in the near distance where the depth sensor cannot give depth information.
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
Copyright ⓒ 2016 SERSC 409
Section 3 describes existing hand gesture recognition method used in our method. The
proposed method will be presented in Section 4. Further, experimentation and newly
developed gestures will be followed in Section 5. It includes hand gestures which use z-
direction as well as those of x-y directions with one hand and two hands. The analyses
and discussions are made in the same section. Finally, the conclusion and remarks are
presented in Section 6.
2. Hand Area Detection in Near Distance
Kinect v2 Sensor provides a depth sensor and an infrared sensor to provide depth
information and infrared information. But in near distance less than 0.5 meter, it does not
give the information. We propose the new method to detect hand area which can be used
in the near distance.
We used the characteristics of Kinect v2 sensor and infrared sensor which do not give
the appropriate information including depth information and infrared radiation
information in less than 0.5 meter. Specifically, we used the white areas of two resultant
images caused by the side effect of sensor limitations. Figure 2 shows the images.
Conducting AND operation of two images, for example (b) and (c), then we can extract
the hand area. Another pair of (e) and (f) in Figure 2 can be used to extract hand area in
the dark environment.
(a) (b) (c)
(d) (e) (f)
Figure 2. Hand area detection by AND operation of depth image and infrared image. (a) Original Images. (b) Depth Images. (c) Infrared Images. (d)
Original Image in darkness. (f) Depth Image in darkness. (g) Infrared Image in darkness
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
410 Copyright ⓒ 2016 SERSC
Figure 3. The Process of Detecting Hand Region
Figure 3 illustrates the process to detect hand region. After extracting the hand area, we
conduct morphology opening operation to remove noises in the image.
3. Hand Gesture Recognition
After we detected the hand area, we recognize specific gestures. We split the area by
drawing a black circle in the center of a hand, then count the number of areas isolated. [6,
23]
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
Copyright ⓒ 2016 SERSC 411
3.1. Computation of Weight Centers for Hand Areas
To detect the hand area, we should first obtain the center of a hand area, i.e., moment
[24], and its coordinates described in [6, 23]. Then the center of a hand ,p qm is defined as
(1).
, ( , )p q
p qm x y f x y dxdy (1)
And the detected hand area, 0th-order moment, can be expressed as (2).
0,0 ( , )m f x y dxdy (2)
Furthermore, the centers of weight, first-order moment, 1,0m and
0,1m , can be obtained
using (3) and (4), respectively as follows:
1,0 ( , )m dxdyxf x y (3)
0,1 ( , )m dxdyyf x y . (4)
And, the coordinatecx ,
cy can be denoted by (5) and (6), respectively.
1,0
0,0
c
mx
m (5)
0,1
0,0
c
my
m (6)
3.2. Division of Hand Areas
Figure 4 shows the procedures described in Section 2. Figure 4 (a) shows the area
detected, (b) shows the computed locations of hand areas, (c) shows the circles in the
centers of hand areas and (d) shows the resulted areas divided by the black circles. Here,
we count the number of areas divided and can recognize various gestures.
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
412 Copyright ⓒ 2016 SERSC
(a) (b)
(c) (d)
Figure 4. Procedure to Recognize Hand Area Region. (a) Hand Area Obtained. (b) Computation of Centers of Hand Areas. (c) Drawing Virtual
Circles. (d) Filling the Circles by Black Color
4. The Proposed Method
We propose newly invented hand gestures by counting the number of contours
(separate areas) which was isolated by a black circle in the weight center. After detecting
the hand areas, we draw a rectangle which includes hand area and draw a black circle
whose diameter is 1/3 of the height of the rectangle. Using the number of separate areas
and the moving of the areas, we can invent various gestures.
4.1. Assumptions and Advantages
The proposed method does not use depth information and infrared information directly,
but uses the side effect (error images) outside the detection range of Microsoft Kinect v2
sensor. We assume that the hands are the nearest objects and are moving less than 0.5
meter from the sensor. We propose that this method can be used outside the detection
range in the near distance; and simpler than existing method because it does not use
complicated algorithms and show more stable detection performance and independent of
color of hands, illumination status, and background objects unlike existing methods [1-3,
6, 10].
4.2. Recognition of Hand Gestures
We can discriminate various gestures by counting number of isolated areas. Figure 5
shows how to recognize the specific hand gestures. In Figure 4, (a) to (c) are for single-
hand gesture, and (d) to (f) are for two-hand gestures. Here, (a) and (d) mean ‘mouse
release’; (b) and (e) mean ‘mouse click’; (c) and (f) designate ‘mode change’. Further, (g)
expresses ‘zoom in an object’ but we can zoom out by moving the hands to the opposite
direction. To change to the rotation mode, we close two fists like (f), and after that open
fists like (e) and move two hands like (h). By counting the number of contours, we
discriminate the gestures. For example, (a) has five contours, (b) has two contours, and (c)
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
Copyright ⓒ 2016 SERSC 413
has one contour respectively when one hand is used. And (d) has ten contours, (e) has
four contours, and (f) has two contours respectively when two hands are used.
(a) (b) (c)
(d) (e) (f)
(g) (h)
Figure 5. The Proposed Hand Gestures. (a) Mouse Release. (b) Mouse Click. (c). Mode Change. (d) Mouse Release. (e) Mouse Click. (f) Mode Change. (g)
Zoom-in. (h) Rotation
4.3. New Gestures Which Use Z-Axis Information
Since Kinect v2 sensor cannot provide depth information and infrared information in
near distance, the depth is determined by the size of radius of a circle in the center of hand
area. Here, the depth means the distance from the x-y plane of hand area toward the
Kinect v2 sensor. Furthermore, we can call the direction of the distance ‘z-axis’ because it
is perpendicular to the x-y plane composed of x, y in (5) and (6). Figure 6 shows the four
layers and detected hand areas. Figure 6 illustrates four layers which describe divided
distances from hand area toward Kinect v2 sensor. Here, layer 1 to layer 4 are determined
by the lengths of radii of the circles in the hand area. It should be noted that the left
figures are drawn from the point of human, while right figures are drawn from the point of
sensors. For example, the top left figure denotes the layer 1 which is the nearest from the
hand area is chosen but the hand area of top right figure is the smallest in the point of
sensor’s view. Specifically, the thresholds of radii of circles are 38 pixels, 48 pixels, 58
pixels and 68 pixels for layer 1, layer 2, layer 3 and layer 4, respectively.
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
414 Copyright ⓒ 2016 SERSC
Figure 6. Z-axis values according to the depth which is the distance from x-y plane of the hand area toward the Kinect sensor: left figures are for
human’s view; right figures are for sensor’s view. (a) Layer 1: The radius is larger than or equal to 38 pixels and less than 48 pixels. (b) Layer 2: The
radius is larger than or equal to 48 pixels and less than 48 pixels. (c) Layer 3: The radius is larger than or equal to 58 pixels and less than 48 pixels. (d)
Layer 4: The radius is larger than or equal to 68 pixels
4. Experimentation
We used Kinect v2 sensor and conducted the experimentation in the distance less than
0.5 meter which the sensor does not provide depth information and infrared data. In
addition to that, we used openFrameworks Kinect v2 in order to make user interface
which is based on C++ and openGL. Further, we used various libraries including the add-
ons of the openFrameworks, and cross-platforms. The addons include ofxOpenCv and
ofxCv that enable openCV in openFrameworks; and ofxKinect2 that is to use Kinect v2
sensor. For the operating system, Microsoft Windows 10 is used and Visual Studio 2015
community is installed. In Figure 7, (a) expresses a right posture to detect hand area for
our method while (b) shows a bad posture which gives extra skin area from elbow to wrist
incorrectly. In case of (b) the center of the hand area moves to the wrist part, so that the
hand area is not extracted. Additionally, we have a palm of one hand tilted in various
ways like (c), but have obtained valid results also. In (d), left figure means ‘release-
mouse’, middle figure means ‘hold the object’ by mouse-click, and right figure means
‘move an object’. When using one hand, we marked a dot on the selected object. Figures
(e) and (f) are for the two-hand gestures. In the (e), the left figure denotes ‘mouse release’,
the center figure denotes ‘hold the object’ by mouse-click, and the right figure expresses
zoom-out. Similarly, in figure (f), the left figure denotes ‘mouse release’, the middle
figure denotes ‘hold the object’ by mouse-click, and the right figure expresses rotation.
The two dots around an object denote that the focus is on the object.
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
Copyright ⓒ 2016 SERSC 415
(a)
(b)
(c)
(d)
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
416 Copyright ⓒ 2016 SERSC
(e)
(f)
Figure 7. The Proposed Gestures. (a) A Correct Posture and the Detected Area. (b) An Incorrect Posture and Detected Area. (c) Tilted Palms and the Detected Areas. (e) Clicking an Object with Two Hands and Expanding an
Image. (f) Clicking an Object with Two Hands and Rotation
We have confirmed that our method is valid in various situations. When the subject
person is changed, the hand area is changed. So that, the thresholds to determine the
layers are changed according to the lengths of circles which are located at the center of
hand areas. We experimented for three persons and confirmed that our method is stable
for the ranges in Figure 6.
5. Conclusions
This paper has proposed a new simple method to recognize gestures in near distance
less than 0.5 meter where Kinect v2 sensor cannot provide depth information and infrared
sensor data. The method tracks hand area and counts number of contours, and uses
direction of contours. The proposed method is simpler than the existing method which
detects finger tracking method because it only checks the number of areas divided by a
black circle in the center of hand area and the moving direction. Further, it can be used to
develop three-dimensional user interface, since it uses z-axis information using the length
of radius of the circle located at the center of a hand area. The proposed hand gestures can
be used instead of mouse clicking, dragging and moving, releasing a mouse, rotating an
image with two hands, and scaling an image with two hands. The method expands the
available ranges of Kinect v2 sensor and can be used also for Kinect v1 sensor.
Acknowledgements
We thank Hanbat National University. This research was supported by the research
fund of Hanbat National University in 2016. This paper is a revised and expanded version
of a paper entitled “A Simple 3D Hand Gesture Interface Based on Hand Area Detection
and Tracking" presented at MITA 2016 (The 12th International Conference on Multimedia
Information and Technology and Applications), Luang Prabang, Lao PDR, July 4-6, 2016.
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
Copyright ⓒ 2016 SERSC 417
References
[1] P. Premaratne, “Human Computer Interaction Using Hand Gestures: Cognitive Science and
Technology”, Springer-Verlag New York Inc., (2014).
[2] C.-H. Wu and W.-L. Chen and C. H. Lin, “Depth-Based Hand Gesture Recognition”, vol. 75, no. 12,
(2016), pp. 7065-7086.
[3] G. R. S. Murthy and R. S. Jadon, “A Review of Vision Based Hand Gesture Recognition”, International
Journal of Information Technology and Knowledge Management, vol. 2, no. 2, (2009), pp. 405-410.
[4] A. Abgottspon, “A Hand Gesture Interface for Investigating Real-Time Human-Computer Interaction”,
ECU098 Informatics, 300CDE Individual Project, Coventry Univ., UK, (2010).
[5] A.–M. Balazs, “Hand and Finger Detection Using JavaCV”,
https://www.javacodegeeks.com/2012/12/hand-and-finger-detection-using-javacv.html, (2012).
[6] H. Park, “A Method for Controlling Mouse Movement Using a Real-Time Camera”, Master’s Thesis of
Brown Univ., Providence, RI, USA, (2010).
[7] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman and A. Blake, “Real-
Time Human Pose Recognition in Parts from Single Depth Images”, Communications of ACM, (2013),
pp. 1-8.
[8] https://channel9.msdn.com/coding4fun/kinect/Kinect-v2-Finger-Tracking, (2016).
[9] R. M. Gurav and P. K. Kadbe, “Real Time Tracking and Contour Detection for Gesture Recognition
Using OpenCV”, 2015 International Conference on Industrial Instrumentation and Control (ICIC),
(2015), pp. 974-977.
[10] C. Zou, Y. Liu, J. Wang and H. Si, “Deformable Part Model Based Hand Detection against Complex
Backgrounds”, Advances in Images and Graphics Technologies, Springer Link, vol. 634 of the series
Comm. in Computer and Info. Science, (2016), pp. 149-159.
[11] J. Molina and J. M. Martínez, “A Synthetic Training Framework for Providing Gesture Scalability to
2.5D Pose-Based Hand Gesture Recognition Systems”, Machine Vision and Applications, vol. 25, issue
5, (2014), pp. 1309-1315.
[12] G. Goswami, M. Vatsa and R. Singh, “Face Recognition with RGB-D Images Using Kinect”, Face
Recognition Across the Imaging Spectrum, Springer Link, (2016), pp. 281-303.
[13] https://developer.microsoft.com/en-us/windows/kinect/hardware, (2016).
[14] https://software.intel.com/en-us/realsense/home, (2016).
[15] J. Valcik, J. Sedmidubsk and P. Zezula, “Improving Kinect-Skeleton Estimation”, Advanced Concepts
for Intelligent Vision Systems, Springer Link, vol. 9386 of Lecture Notes in Computer Science, (2015),
pp. 575-587.
[16] C. Kim, S. Yun, S.-W. Jung and C. S. Won, “Color and Depth Image Correspondence for Kinect v2”,
Advanced Multimedia and Ubiquitous Engineering, Springer Link, vol. 354 of the series Lecture Notes
in Electrical Engineering, (2015), pp. 333-340.
[17] W. Song, A.V. Le, S. Yun, S.-W. Jung and C. S. Won, “Depth Completion for Kinect v2 Sensor”,
Multimedia Tools and Applications, Springer Link, (2016), pp. 1-24.
[18] L.-C. Chen, Y.-M. Cheng, P.-Y. Chu and F. E. Sandnes, “The Common Characteristics of User-Defined
and Mid-Air Gestures for Rotating 3D Digital Contents”, Universal Access in Human-Computer
Interaction Techniques and Environments, Springer Link, vol. 9738 of the series Lecture Notes in
Computer Science, (2016), pp. 15-22.
[19] S. Samoil, and S. N. Yanushkevich, “Depth Assisted Palm Region Extraction using the Kinect v2
Sensor”, 2015 Sixth International Conference on Emerging Security Technologies, (2015), pp. 74-79.
[20] H. Alabbasi, A. Gradinaru, F. Moldoveanu and A. Moldoveanu, “Human Motion Tracking & Evaluation
using Kinect v2 Sensor”, The 5th IEEE International Conference on E-Health and Bioengineering,
(2015).
[21] Y. Lan, J. Li and Z. Ju, “Data Fusion-based Real-Time Hand Gesture Recognition with Kinect v2”, 2016
9th International Conference on Human System Interactions (HSI), (2016).
[22] M.-S. Kim and C. H. Lee, “A Simple 3D Hand Gesture Interface Based on Hand Area Detection and
Tracking”, Proceedings of MITA 2016 (The 12th International Conference on Multimedia Information
and Technology and Applications), Luang Prabang, Lao PDR, (2016), pp. 131-133.
[23] B. Ionescu, D. Coquin, P. Lambert and V. Buzuloiu, “Dynamic Hand Gesture Recognition Using the
Skeleton of the Hand”, EURASIP Journal on Applied Signal Processing, vol. 13, (2005), pp. 2101-2109.
[24] J. Kilian, “Simple Image Analysis by Moments”,
http://breckon.eu/toby/teaching/dip/opencv/SimpleImageAnalyisbyMoments.pdf, (2001), pp. 1-8.
International Journal of Software Engineering and Its Applications
Vol. 10, No. 12 (2016)
418 Copyright ⓒ 2016 SERSC
Authors
Min-Soo Kim, received his B.E. Degree in Computer and
Information Engineering from Hanbat National University,
Daejeon, Korea, in 2017. His current research interests include
pattern recognition, digital image processing and human computer
interfaces.
Choong Ho Lee, received his B.E. and M.E. Degrees in
Electronic Engineering from Yonsei University, Seoul, Korea, in
1985 and 1987, respectively. He also received his Ph.D. in
Information Sciences from Tohoku University, Sendai, Japan in
March of 1998. From 1985 to 2000, he was with KT as a researcher.
Since 2000, he has been a professor in Graduate School of
Information Communication Engineering of Hanbat National
University. His current research interests include pattern
recognition, digital image processing and mobile robot control.
top related