academic excellence for business and the professions€¦ · 03/05/2013 27 [] m.asad, w.ikram...

Academic excellence for business and the professions

Hand gesture recognition using Kinect

Muhammad Asad

Table of Contents

1. Introduction to Kinect

I. What is Kinect?

II. How it works?

III. Why Kinect?

IV. Does Kinect has any limitations?

2. Hand Gesture recognition

I. Types of Hand gestures

II. Distance-Invariant Segmentation of Hand

III. Feature Extraction

IV. Neural Network and HMM training

3. Guidance system for visually impaired

I. Feature extraction

II. Feature Selection

III. Guidance decision

IV. Test sequences

03/05/2013 2

What is Kinect?

• A sensor which is capable of providing

– Depth Image

– RGB(intensity) Image

– Audio from Multi-array microphone

• Real-time provision

• Low cost

• Operating range: 0.5m to 8m

• Originally developed for Microsoft Xbox 360 gaming console

• Now used in different computer vision research areas [1]

3 03/05/2013

[1] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE Multimedia, vol. 19, no. 2, pp. 4–10, 2012. Image taken from: http://en.wikipedia.org/wiki/Kinect

Figure 1. Kinect Sensor

http://en.wikipedia.org/wiki/Kinect

How it works? • Projection of pattern of infra red points

• Image from infra red camera

• Correlated to pattern for known distance

• Real-time depth image (30 fps)

4 03/05/2013

Image taken from: [1] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE Multimedia, vol. 19, no. 2, pp. 4–10, 2012.

Figure 2. Inside Kinect Sensor

How it works?

5 03/05/2013

Figure 3. How Kinect Sensor Works?

How it works?

• Depth Image

– Grayscale image

– Normalized Range: 0 – 255

– Invalid Depth: 0

– Darker pixels = Less distance

– Bright pixels = More distance

– Black pixels = no depth

6 03/05/2013 Figure 4. Kinect Depth Map Visualization

Why Kinect?

• Depth Image

– 3D Shape Information

– Invariant to illumination/lighting changes

– Segmentation

• RGB (Intensity) Image

– Can be aligned to depth

– Details about texture/colour

• Audio from Multi-array microphone

– Source localization

– Ambient noise suppression

• Real-time

• Low cost

7 03/05/2013

Why Kinect?

8 03/05/2013

Taken from: [1] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE Multimedia, vol. 19, no. 2, pp. 4–10, 2012. [2] Weise, Thibaut, Sofien Bouaziz, Hao Li, and Mark Pauly. "Realtime performance-based facial animation." ACM Trans. Graph 30,no. 4 (2011): 77.

Does Kinect has any limitations?

• Occlusion of projected pattern

• Small object

• Light absorbing surfaces

• Scattering of infra-red pattern

• Noise with increased distance [3]

9 03/05/2013

[3] K. Khoshelham and S.O. Elberink, “Accuracy and resolution of kinect depth data for indoor mapping applications,” Sensors, vol. 12, no. 2, pp. 1437–1454, 2012.

Does Kinect has any limitations?

10 03/05/2013

Figure 5. Limitations of Kinect Sensor

Types of Hand gestures

• Static hand gestures

– Defined by hand shape, position and orientation only

– Example: symbolic gestures in sign language

• Dynamic hand gestures

– Temporal integration of static hand gestures

– Defined by hand shape, positiong, orientation, motion,

acceleration and displacement.

– More natural gestures can be modelled

– Examples: telepresence robotics, mapped gestures on mobile

devices

11 03/05/2013

OpenNI hand tracker

12 03/05/2013

Distance-invariant segmentation

• OpenNI hand tracker used

13 03/05/2013

Figure 6. Motivation behind Distance Invariant Segmentation

Distance-invariant segmentation

• Inverse relation between:

– distance Pz of hand

– side length S of segmented hand region

• Dataset collection:

– Varying distance Pz of hand

– Ground truth segmentation size S

14 03/05/2013

Results: Distance-invariant segmentation

15 03/05/2013

Figure 7. Distance-Invariant Segmentation vs Fixed size Segmentation

Feature Extraction

• Projection extraction:

where Cz= Pz – S/2

16 03/05/2013

Feature Extraction

• Projection extraction

• Based on projection based action recognition [4]

17 03/05/2013

[4] W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3d points,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2010. IEEE, 2010, pp. 9–14.

Figure 8. Projection Mask extraction; (a) XY Projection (b) ZX Projection (c) ZY Projection

Quantization Error in Projections

18 03/05/2013

Figure 9. Quantization and Random Error noise in Projection Masks at varying distance from Kinect sensor; (a) 700mm (b) 950mm (c) 1200mm (d) 1450mm (e) 1700mm

Quantization Error Reduction [5] • Morphological Operation

• Averaging based interpolation

19 03/05/2013

[5] M.Asad, C.Abhayaratne “Kinect Depth Stream Pre Processing for Hand Gesture Recognition",IEEE International Conference on Image Processing (ICIP'13), September 15-18, 2013 (Accepted)

Feature Extraction

20 03/05/2013

Figure 10. Contour feature extraction

Gesture Stages

• Swipe right and swipe left gestures

• Divided into four gesture stages

• Swipe left: 1->2->3->4

• Swipe right: 4->3->2->1

21 03/05/2013

Figure 11. Gesture stages for swipe gestures; (a) Stage 1 (b) Stage 2 (c) Stage 3 (d) Stage 4

Neural Network Training

22 03/05/2013

• Number of Neurons • 64x64x3 • 300 • 15 • 1

Figure 12. Neural Network Structure

Neural Network Training

03/05/2013 23

Neural Network Output with Varying Distance

24 03/05/2013 Figure 13. Neural Network Response with varying distance from the sensor

Neural Network and HMM

25 03/05/2013

Figure 14. Neural Network and HMM Response against time

Demo

26 03/05/2013

Guidance system for Visually Impaired

Person[6]

27 03/05/2013

[6] M.Asad, W.Ikram “Smartphone based Guidance System for Visually Impaired Person”, IEEE International Conference on Image Processing Theory, Tools and Applications (IPTA'12), October 15-18, 2012, Turkey

Figure 15. Flowchart of the guidance system

Feature Extraction: Edge Detection

28 03/05/2013

Figure 16. Edge Detection using (b) Canny (c) Method in [7]

[7] Y. Zhao, W. Gui, and Z. Chen, “Edge detection based on multi-structure elements morphology,” in Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on, vol. 2. IEEE, 2006, pp. 9795–9798.

(a) (b) (c)

Feature Selection: Hough Transform

29 03/05/2013

Figure 17. Hough Transform of Fig. 16 with Hough Peaks

Feature Selection: Vanishing Point

30 03/05/2013 Figure 18. Vanishing Point Extraction

Guidance Decision

31 03/05/2013

Figure 19. Guidance Decision

Test Sequences

32 03/05/2013

Thank you!!

33 03/05/2013

academic excellence for business and the professions€¦ · 03/05/2013 27 [] m.asad, w.ikram...

Documents