hand gesture recognition for kinect v2 sensor in …...gesture recognition without high cost has...

International Journal of Software Engineering and Its Applications

Vol. 10, No. 12 (2016), pp. 407-418

http://dx.doi.org/10.14257/ijseia.2016.10.12.34

ISSN: 1738-9984 IJSEIA

Hand Gesture Recognition for Kinect v2 Sensor in the Near

Distance Where Depth Data Are Not Provided

Min-Soo Kim1 and Choong Ho Lee2

1Dept. of Info. and Comm. Eng.,

Hanbat National Univ., Daejeon-City, Rep. of Korea 2Graduate School of Info. and Comm. Eng.,

Hanbat National Univ., Daejeon-City, Rep. of Korea 1asdq200@naver.com, 2 chlee@hanbat.ac.kr

Abstract

Kinect v2 sensor does not provide depth information and skeletal traction function in

near distance from the sensor. That is why many researches, to recognize hand gestures,

are focused on the skeletal tracking only inside the range of detection. This paper

proposes a method which can recognize hand gestures in the distance less than 0.5 meter

without conventional skeletal tracking when Kinect v2 sensor is used. The proposed

method does not use the information of depth sensor and infrared sensor, but detect hand

area and count the number of isolated areas which are generated by drawing a circle in

the center of the hand area. This method introduces new detectable gestures without high

cost, so that it can be a substitute for the existing mouse-movement controlling and

dynamic gesture recognition method such as clicking a mouse, clicking and dragging,

rotating an image with two hands, and scaling an image with two hands in near distance.

The gestures are appropriate for the user interface of smart devices which employ the

interactions based on hand gestures in near distance.

Keywords: User interface, Kinect v2 sensor, Hand gesture recognition

1. Introduction

In recent years, hand gesture recognition has been actively studied as one of the human

computer interactions. Since the gesture recognition can be used for various kinds of

digital devices such as smartphones, tablet computers as well as conventional desktop

computers and laptop PCs, much of the attention is concentrated on the related works. [1]

To detect hand area, the existing method uses various color models such as YCbCr,

HSV and RGB. They determine thresholds considering illumination and background

objects included in the environments, but they are not consistent and very sensitive to

various environmental factors. Since the colors of face area and other skin area are very

similar to the hand color, it is very difficult to determine the hand area by color

information. For actual application to detect hand area, the existing method shows low

performance to discriminate hand area from face area when hand area is overlapped with

face area. Further, in order to improve detection performance, it is necessary to wear

specific colored gloves [2-4], or uses only depth information without infrared information

[5-7]. Without using the color information and special gloves, finger tracking methods are

generally used such as in [5, 8, 9]. But these kinds of methods use relatively complicated

algorithms [9, 10] such as SVM or Convex Hull or AdaBoost. Moreover, designing hand

gesture recognition without high cost has become another important issue and described

in [11]. A face recognition technique using Kinect is reported in [12].

On the other hand, for the applications, various sensors are introduced in the market.

There are two kinds of sensors that are commonly used. One is used for short distance

such as from 0.2 to 1.2 meters and the others are used for relatively long distance for

Vol. 10, No. 12 (2016)

example from 0.8 to 4.0 meters, or from 0.5 to 8.0 meters. These pertain to Kinect v1 and

v2 sensor of Microsoft [13] and that one pertains to Intel Realsense™ SR300 [14].

Further, two kinds of sensors provide the skeletal recognition for hand gesture recognition,

and uses depth information in the available distance. However, Kinect v2 sensor neither

provides depth information nor detects infrared information in the distance less than

0.5meters. Further, the recognition method in Intel Realsense™ needs complicated

conventional skeletal recognition. Figure 1 shows the depth sensor and IR (infrared

sensor) of Kinect v2 sensor. The range of depth sensor and IR sensor of Kinect v2 sensor

is 0.5 meter ~ 8.0 meters while it is 0.8 meter ~ 4.0 meters for Kinect v1. As Kinect

sensors are developed, many research studies are conducted to recognize human

movement activities using the sensors. [9, 15-21] A technique to improve Kinect-skeleton

estimation is reported in [15]. To correct radial distortion of RGB camera and find the

transformation matrix for the correspondence between the RGB and depth image of

Kinect v2 is described in [16]. Intuitive hand gestures for controlling the rotation of 3D

digital object is described in [19]. The conventional method which extracts palm region

using RGB and depth sensor has been reported by [19]. In addition to that, tracking and

evaluation of human motion for medical application is in [20], a depth completion method

using Kinect v2 is in [17]. When emphasis is on the real time using Kinect v2 sensor, it is

reported in [9, 21].

Figure 1. Kinect for Windows v2 Sensor which has a Color Camera, a Depth Sensor and an IR Sensor

However, no research has been reported to utilize Kinect v2 in the near distance

outside the detection range but almost all of the research studies were conducted inside

the detection range. This paper presents a new simple method to recognize hand gestures

in the distance less than 0.5 meter for Kinect v2 sensor without using the data of depth

sensor and IR sensor. In other words, this method enables Kinect v2 sensor to detect hand

area and recognize hand gestures in the near distance less than 0.5 meter where it does not

provide related depth information or infrared information or skeletal recognition method.

Furthermore, the newly developed simple gestures will also be presented. This paper

extends the research which is described in [22].

Section 2 explains the method of hand area detection which we come up with to detect

the hand area in the near distance where the depth sensor cannot give depth information.

Vol. 10, No. 12 (2016)

Section 3 describes existing hand gesture recognition method used in our method. The

proposed method will be presented in Section 4. Further, experimentation and newly

developed gestures will be followed in Section 5. It includes hand gestures which use z-

direction as well as those of x-y directions with one hand and two hands. The analyses

and discussions are made in the same section. Finally, the conclusion and remarks are

presented in Section 6.

2. Hand Area Detection in Near Distance

Kinect v2 Sensor provides a depth sensor and an infrared sensor to provide depth

information and infrared information. But in near distance less than 0.5 meter, it does not

give the information. We propose the new method to detect hand area which can be used

in the near distance.

We used the characteristics of Kinect v2 sensor and infrared sensor which do not give

the appropriate information including depth information and infrared radiation

information in less than 0.5 meter. Specifically, we used the white areas of two resultant

images caused by the side effect of sensor limitations. Figure 2 shows the images.

Conducting AND operation of two images, for example (b) and (c), then we can extract

the hand area. Another pair of (e) and (f) in Figure 2 can be used to extract hand area in

the dark environment.

(a) (b) (c)

(d) (e) (f)

Figure 2. Hand area detection by AND operation of depth image and infrared image. (a) Original Images. (b) Depth Images. (c) Infrared Images. (d)

Original Image in darkness. (f) Depth Image in darkness. (g) Infrared Image in darkness

Vol. 10, No. 12 (2016)

Figure 3. The Process of Detecting Hand Region

Figure 3 illustrates the process to detect hand region. After extracting the hand area, we

conduct morphology opening operation to remove noises in the image.

3. Hand Gesture Recognition

After we detected the hand area, we recognize specific gestures. We split the area by

drawing a black circle in the center of a hand, then count the number of areas isolated. [6,

Vol. 10, No. 12 (2016)

3.1. Computation of Weight Centers for Hand Areas

To detect the hand area, we should first obtain the center of a hand area, i.e., moment

[24], and its coordinates described in [6, 23]. Then the center of a hand ,p qm is defined as

, ( , )p q

p qm x y f x y dxdy (1)

And the detected hand area, 0th-order moment, can be expressed as (2).

0,0 ( , )m f x y dxdy (2)

Furthermore, the centers of weight, first-order moment, 1,0m and

0,1m , can be obtained

using (3) and (4), respectively as follows:

1,0 ( , )m dxdyxf x y (3)

0,1 ( , )m dxdyyf x y . (4)

And, the coordinatecx ,

cy can be denoted by (5) and (6), respectively.

3.2. Division of Hand Areas

Figure 4 shows the procedures described in Section 2. Figure 4 (a) shows the area

detected, (b) shows the computed locations of hand areas, (c) shows the circles in the

centers of hand areas and (d) shows the resulted areas divided by the black circles. Here,

we count the number of areas divided and can recognize various gestures.

Vol. 10, No. 12 (2016)

(a) (b)

(c) (d)

Figure 4. Procedure to Recognize Hand Area Region. (a) Hand Area Obtained. (b) Computation of Centers of Hand Areas. (c) Drawing Virtual

Circles. (d) Filling the Circles by Black Color

4. The Proposed Method

We propose newly invented hand gestures by counting the number of contours

(separate areas) which was isolated by a black circle in the weight center. After detecting

the hand areas, we draw a rectangle which includes hand area and draw a black circle

whose diameter is 1/3 of the height of the rectangle. Using the number of separate areas

and the moving of the areas, we can invent various gestures.

4.1. Assumptions and Advantages

The proposed method does not use depth information and infrared information directly,

but uses the side effect (error images) outside the detection range of Microsoft Kinect v2

sensor. We assume that the hands are the nearest objects and are moving less than 0.5

meter from the sensor. We propose that this method can be used outside the detection

range in the near distance; and simpler than existing method because it does not use

complicated algorithms and show more stable detection performance and independent of

color of hands, illumination status, and background objects unlike existing methods [1-3,

6, 10].

4.2. Recognition of Hand Gestures

We can discriminate various gestures by counting number of isolated areas. Figure 5

shows how to recognize the specific hand gestures. In Figure 4, (a) to (c) are for single-

hand gesture, and (d) to (f) are for two-hand gestures. Here, (a) and (d) mean ‘mouse

release’; (b) and (e) mean ‘mouse click’; (c) and (f) designate ‘mode change’. Further, (g)

expresses ‘zoom in an object’ but we can zoom out by moving the hands to the opposite

direction. To change to the rotation mode, we close two fists like (f), and after that open

fists like (e) and move two hands like (h). By counting the number of contours, we

discriminate the gestures. For example, (a) has five contours, (b) has two contours, and (c)

Vol. 10, No. 12 (2016)

has one contour respectively when one hand is used. And (d) has ten contours, (e) has

four contours, and (f) has two contours respectively when two hands are used.

(a) (b) (c)

(d) (e) (f)

(g) (h)

Figure 5. The Proposed Hand Gestures. (a) Mouse Release. (b) Mouse Click. (c). Mode Change. (d) Mouse Release. (e) Mouse Click. (f) Mode Change. (g)

Zoom-in. (h) Rotation

4.3. New Gestures Which Use Z-Axis Information

Since Kinect v2 sensor cannot provide depth information and infrared information in

near distance, the depth is determined by the size of radius of a circle in the center of hand

area. Here, the depth means the distance from the x-y plane of hand area toward the

Kinect v2 sensor. Furthermore, we can call the direction of the distance ‘z-axis’ because it

is perpendicular to the x-y plane composed of x, y in (5) and (6). Figure 6 shows the four

layers and detected hand areas. Figure 6 illustrates four layers which describe divided

distances from hand area toward Kinect v2 sensor. Here, layer 1 to layer 4 are determined

by the lengths of radii of the circles in the hand area. It should be noted that the left

figures are drawn from the point of human, while right figures are drawn from the point of

sensors. For example, the top left figure denotes the layer 1 which is the nearest from the

hand area is chosen but the hand area of top right figure is the smallest in the point of

sensor’s view. Specifically, the thresholds of radii of circles are 38 pixels, 48 pixels, 58

pixels and 68 pixels for layer 1, layer 2, layer 3 and layer 4, respectively.

Vol. 10, No. 12 (2016)

Figure 6. Z-axis values according to the depth which is the distance from x-y plane of the hand area toward the Kinect sensor: left figures are for

human’s view; right figures are for sensor’s view. (a) Layer 1: The radius is larger than or equal to 38 pixels and less than 48 pixels. (b) Layer 2: The

radius is larger than or equal to 48 pixels and less than 48 pixels. (c) Layer 3: The radius is larger than or equal to 58 pixels and less than 48 pixels. (d)

Layer 4: The radius is larger than or equal to 68 pixels

4. Experimentation

We used Kinect v2 sensor and conducted the experimentation in the distance less than

0.5 meter which the sensor does not provide depth information and infrared data. In

addition to that, we used openFrameworks Kinect v2 in order to make user interface

which is based on C++ and openGL. Further, we used various libraries including the add-

ons of the openFrameworks, and cross-platforms. The addons include ofxOpenCv and

ofxCv that enable openCV in openFrameworks; and ofxKinect2 that is to use Kinect v2

sensor. For the operating system, Microsoft Windows 10 is used and Visual Studio 2015

community is installed. In Figure 7, (a) expresses a right posture to detect hand area for

our method while (b) shows a bad posture which gives extra skin area from elbow to wrist

incorrectly. In case of (b) the center of the hand area moves to the wrist part, so that the

hand area is not extracted. Additionally, we have a palm of one hand tilted in various

ways like (c), but have obtained valid results also. In (d), left figure means ‘release-

mouse’, middle figure means ‘hold the object’ by mouse-click, and right figure means

‘move an object’. When using one hand, we marked a dot on the selected object. Figures

(e) and (f) are for the two-hand gestures. In the (e), the left figure denotes ‘mouse release’,

the center figure denotes ‘hold the object’ by mouse-click, and the right figure expresses

zoom-out. Similarly, in figure (f), the left figure denotes ‘mouse release’, the middle

figure denotes ‘hold the object’ by mouse-click, and the right figure expresses rotation.

The two dots around an object denote that the focus is on the object.

Vol. 10, No. 12 (2016)

Figure 7. The Proposed Gestures. (a) A Correct Posture and the Detected Area. (b) An Incorrect Posture and Detected Area. (c) Tilted Palms and the Detected Areas. (e) Clicking an Object with Two Hands and Expanding an

Image. (f) Clicking an Object with Two Hands and Rotation

We have confirmed that our method is valid in various situations. When the subject

person is changed, the hand area is changed. So that, the thresholds to determine the

layers are changed according to the lengths of circles which are located at the center of

hand areas. We experimented for three persons and confirmed that our method is stable

for the ranges in Figure 6.

5. Conclusions

This paper has proposed a new simple method to recognize gestures in near distance

less than 0.5 meter where Kinect v2 sensor cannot provide depth information and infrared

sensor data. The method tracks hand area and counts number of contours, and uses

direction of contours. The proposed method is simpler than the existing method which

detects finger tracking method because it only checks the number of areas divided by a

black circle in the center of hand area and the moving direction. Further, it can be used to

develop three-dimensional user interface, since it uses z-axis information using the length

of radius of the circle located at the center of a hand area. The proposed hand gestures can

be used instead of mouse clicking, dragging and moving, releasing a mouse, rotating an

image with two hands, and scaling an image with two hands. The method expands the

available ranges of Kinect v2 sensor and can be used also for Kinect v1 sensor.

Acknowledgements

We thank Hanbat National University. This research was supported by the research

fund of Hanbat National University in 2016. This paper is a revised and expanded version

of a paper entitled “A Simple 3D Hand Gesture Interface Based on Hand Area Detection

and Tracking" presented at MITA 2016 (The 12th International Conference on Multimedia

Information and Technology and Applications), Luang Prabang, Lao PDR, July 4-6, 2016.

Vol. 10, No. 12 (2016)

References

[1] P. Premaratne, “Human Computer Interaction Using Hand Gestures: Cognitive Science and

Technology”, Springer-Verlag New York Inc., (2014).

[2] C.-H. Wu and W.-L. Chen and C. H. Lin, “Depth-Based Hand Gesture Recognition”, vol. 75, no. 12,

(2016), pp. 7065-7086.

[3] G. R. S. Murthy and R. S. Jadon, “A Review of Vision Based Hand Gesture Recognition”, International

Journal of Information Technology and Knowledge Management, vol. 2, no. 2, (2009), pp. 405-410.

[4] A. Abgottspon, “A Hand Gesture Interface for Investigating Real-Time Human-Computer Interaction”,

ECU098 Informatics, 300CDE Individual Project, Coventry Univ., UK, (2010).

[5] A.–M. Balazs, “Hand and Finger Detection Using JavaCV”,

https://www.javacodegeeks.com/2012/12/hand-and-finger-detection-using-javacv.html, (2012).

[6] H. Park, “A Method for Controlling Mouse Movement Using a Real-Time Camera”, Master’s Thesis of

Brown Univ., Providence, RI, USA, (2010).

[7] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman and A. Blake, “Real-

Time Human Pose Recognition in Parts from Single Depth Images”, Communications of ACM, (2013),

pp. 1-8.

[8] https://channel9.msdn.com/coding4fun/kinect/Kinect-v2-Finger-Tracking, (2016).

[9] R. M. Gurav and P. K. Kadbe, “Real Time Tracking and Contour Detection for Gesture Recognition

Using OpenCV”, 2015 International Conference on Industrial Instrumentation and Control (ICIC),

(2015), pp. 974-977.

[10] C. Zou, Y. Liu, J. Wang and H. Si, “Deformable Part Model Based Hand Detection against Complex

Backgrounds”, Advances in Images and Graphics Technologies, Springer Link, vol. 634 of the series

Comm. in Computer and Info. Science, (2016), pp. 149-159.

[11] J. Molina and J. M. Martínez, “A Synthetic Training Framework for Providing Gesture Scalability to

2.5D Pose-Based Hand Gesture Recognition Systems”, Machine Vision and Applications, vol. 25, issue

5, (2014), pp. 1309-1315.

[12] G. Goswami, M. Vatsa and R. Singh, “Face Recognition with RGB-D Images Using Kinect”, Face

Recognition Across the Imaging Spectrum, Springer Link, (2016), pp. 281-303.

[13] https://developer.microsoft.com/en-us/windows/kinect/hardware, (2016).

[14] https://software.intel.com/en-us/realsense/home, (2016).

[15] J. Valcik, J. Sedmidubsk and P. Zezula, “Improving Kinect-Skeleton Estimation”, Advanced Concepts

for Intelligent Vision Systems, Springer Link, vol. 9386 of Lecture Notes in Computer Science, (2015),

pp. 575-587.

[16] C. Kim, S. Yun, S.-W. Jung and C. S. Won, “Color and Depth Image Correspondence for Kinect v2”,

Advanced Multimedia and Ubiquitous Engineering, Springer Link, vol. 354 of the series Lecture Notes

in Electrical Engineering, (2015), pp. 333-340.

[17] W. Song, A.V. Le, S. Yun, S.-W. Jung and C. S. Won, “Depth Completion for Kinect v2 Sensor”,

Multimedia Tools and Applications, Springer Link, (2016), pp. 1-24.

[18] L.-C. Chen, Y.-M. Cheng, P.-Y. Chu and F. E. Sandnes, “The Common Characteristics of User-Defined

and Mid-Air Gestures for Rotating 3D Digital Contents”, Universal Access in Human-Computer

Interaction Techniques and Environments, Springer Link, vol. 9738 of the series Lecture Notes in

Computer Science, (2016), pp. 15-22.

[19] S. Samoil, and S. N. Yanushkevich, “Depth Assisted Palm Region Extraction using the Kinect v2

Sensor”, 2015 Sixth International Conference on Emerging Security Technologies, (2015), pp. 74-79.

[20] H. Alabbasi, A. Gradinaru, F. Moldoveanu and A. Moldoveanu, “Human Motion Tracking & Evaluation

using Kinect v2 Sensor”, The 5th IEEE International Conference on E-Health and Bioengineering,

(2015).

[21] Y. Lan, J. Li and Z. Ju, “Data Fusion-based Real-Time Hand Gesture Recognition with Kinect v2”, 2016

9th International Conference on Human System Interactions (HSI), (2016).

[22] M.-S. Kim and C. H. Lee, “A Simple 3D Hand Gesture Interface Based on Hand Area Detection and

Tracking”, Proceedings of MITA 2016 (The 12th International Conference on Multimedia Information

and Technology and Applications), Luang Prabang, Lao PDR, (2016), pp. 131-133.

[23] B. Ionescu, D. Coquin, P. Lambert and V. Buzuloiu, “Dynamic Hand Gesture Recognition Using the

Skeleton of the Hand”, EURASIP Journal on Applied Signal Processing, vol. 13, (2005), pp. 2101-2109.

[24] J. Kilian, “Simple Image Analysis by Moments”,

http://breckon.eu/toby/teaching/dip/opencv/SimpleImageAnalyisbyMoments.pdf, (2001), pp. 1-8.

Vol. 10, No. 12 (2016)

Authors

Min-Soo Kim, received his B.E. Degree in Computer and

Information Engineering from Hanbat National University,

Daejeon, Korea, in 2017. His current research interests include

pattern recognition, digital image processing and human computer

interfaces.

Choong Ho Lee, received his B.E. and M.E. Degrees in

Electronic Engineering from Yonsei University, Seoul, Korea, in

1985 and 1987, respectively. He also received his Ph.D. in

Information Sciences from Tohoku University, Sendai, Japan in

March of 1998. From 1985 to 2000, he was with KT as a researcher.

Since 2000, he has been a professor in Graduate School of

Information Communication Engineering of Hanbat National

University. His current research interests include pattern

recognition, digital image processing and mobile robot control.

hand gesture recognition for kinect v2 sensor in …...gesture recognition without high cost has...

Documents

real-time gesture recognition from depth data through key...

kinect h4x gesture recognition and playback tools...

real-world machine learning: how kinect gesture recognition...

kinect sensor-based long-distance hand gesture...

an application kinect camera controls vehicles by gesture

human gesture recognition on product manifoldskeywords:...

3-d hand gesture recognition with different … hand gesture...

multi-layered gesture recognition with kinect · a gesture...

gesture recognition!

the technology behind gesture...

5 track kinect@bicocca - gesture

design a natural user interface for gesture … a natural...

hand gesture recognition using kinect -...

kinect-based gesture recognition for touchless ... ·...

interaction game with gesture recognition using kinect

a wearable gesture recognition device for detecting ......

kinect, wp8 & human recognition

kinect sensor based gesture recognition for …

gesture-based human-computer-interaction using kinect...

gesture recognition everywhere - gpu technology...