lncs 8007 - real time mono-vision based customizable ... · real time mono-vision based...

M. Kurosu (Ed.): Human-Computer Interaction, Part IV, HCII 2013, LNCS 8007, pp. 497–505, 2013. © Springer-Verlag Berlin Heidelberg 2013

Real Time Mono-vision Based Customizable Virtual Keyboard Using Finger Tip Speed Analysis

Sumit Srivastava and Ramesh Chandra Tripathi

SILP Lab, Indian Institute of Information Technology, Allahabad [email protected], [email protected]

Abstract. User interfaces are a growing field of research around the world specifically for PDA’s, mobile phones, tablets and other such gadgets. One of the many challenges involved are their adaptability, size, cost and ease of use. This paper presents a novel mono-vision based touch and type method on cus-tomizable keyboard drawn, printed or projected on a surface. The idea is to let the user decide the size, orientation, language as well as the position of the keys, a fully user customized keyboard. Proposed system also takes care of keyboard on uneven surfaces. Accurate results are found by the implementation of the proposed real time mono-vision based customizable virtual keyboard sys-tem. This paper uses a phenomenal idea that the finger tip intended to type must be moving fastest relative to other fingers until it does a hit on a surface.

Keywords: Virtual Keyboard, Image Processing, Single camera, mono vision, Edge Detection, Quadrilateral extraction, Character Recognition, Hand Segmentation, Fingertip extraction, Customizable keyboard.

1 Introduction

This paper introduces a method to untangle several issues related to input through keyboard such as portability, cost, shape and size of the keyboard, orientation, posi-tion of keys, multilingual support. Multilingual support is one of the most important aspects due to large number of languages around.

Significant amount of research is being done in the domain of user interfaces using computer vision but only a few exists which talk about single camera based tech-niques. This certainly brings down the resources used and thus more cost effective and portable device.

Habib et al. [1] proposed a mono-vision fuzzy rules based technique which has some pre-defined rules to identify a particular gesture of fingers as action to press a particular key. The method involves a SDIO camera being placed at table top and records the movement the finger tips and knuckles. It does so for a particular number of frames and then decides which keystroke that particular hand gesture may be for. It pre-defines some 32 fuzzy rules for those different finger tip and finger knuckles position. This method is quite good but is not flexible to different keyboard patterns. The rules are fixed and cannot be used for all language keyboards due to varying number of alphabets and keyboard layouts.

498 S. Srivastava and R.C. Tripathi

Murase et al. [2] has proposed a method involving the camera being placed on the table top such that its view tends to be in parallel to the table surface. It looks out for fingertips and identifies a finger tip movement as a keystroke if it descends down below a particular level in the frame. It calculates the depth of finger using real-AdaBoost machine learning algorithm that adopts HOG features of user’s hand image. This work has a limitation that the keyboard has to be pre-defined with exact depth knowledge of each key. Again is faces the problem that it is not so robust with any keyboard and also not so user friendly as keyboard depth will have to be defined by the user.

Adajani et al. [3] proposed a technique that uses the idea that a finger tip and its shadow shall coincide when the key touch event happens. This technique falls short on light source point of view. Likewise, if the light source is not exactly in front of the hand, this shall fail. Also a problem arises when two fingers are very close to each other and are very near to the surface, resulting in a situation such that shadow of one is absorbed by the other.

Conclusively most of the methods face the problem of pre-defined keyboards, i.e., information about keys position, orientation, language, size, distance from camera, shape of the keyboard should be known. Also some are completely dependent on light source position.

We thus try to solve all the above problems with urge to make it more robust and user friendly.

2 Proposed Methodology

The implementation flow graph is shown in Fig.1. The first frame captured is used to localize the keys. In the next phase these identified possible key locations are passed

Fig. 1. Flow chart of the proposed methodology

Real Time Mono-vision Based Customizable Virtual Keyboard 499

through an optical character recognition procedure to identify the keys. Next, the user hand is segmented using colour based hand segmentation and subsequently the finger-tips are found. Then, we detect the key touch events by tracking the fastest moving finger tip in subsequent frames. In the end, the touch points are mapped to the key locations identified in first phase. Following is the detailed discussion of all the phases.

2.1 Key Localization

This phase deals with localization of keys, i.e., identifying position of the keys. For this, the first frame from the camera is obtained having the keyboard drawn/ printed/ projected on a surface. Image contrast enhancement is done using Gray-Level-Grouping [4]. All the contours are detected from the image after running canny edge detector on it and also by using several pre-defined threshold levels and removing end points that have no connectivity [5]. All the contours having number of sides as four, an area in a specific range and being convex are filtered out from the large pool of contours obtained. Angle is measured between the sides of these filtered polygons. Those with cosines greater than 0.6, are rejected. The allowed range of cosines is taken large because certain genuine quadrilaterals also have irregular shape. Fig. 2 shows some results of the first phase obtained in several conditions.

[a]

[b]

Fig. 2. [a] Computer generated keyboard and its result, [b] Keys drawn on a newspaper and its result, [c] Keyboard printed on a sheet of paper and its result


[c]

Fig. 2. (Continued)

2.2 Key Identification

Key identification process involves implementation of Optical Character Recognition. There are several good OCR algorithms around. Here we have used the method pro-posed by Gupta et. al [6] for hand written character recognition using neural network. The method classifies characters by two approaches, holistic and segmentation based. We have used holistic method as most keys obtained are individual characters and thus further segmentation is not required. Also those keys having more than one cha-racter are limited in number and only slight variation is present. Further, Fourier descriptors are used as features and the classification is done using several classifiers, MLP, RBF and SVM. As of the proposed approach SVM provides best results, so we have used it directly.

2.3 Hand Extraction

There are several moving object extraction methods such as background subtraction [7 - 8], temporal difference, optical flow analysis [9], Gaussian model, HOG, force field method [10], etc. Of these only temporal filtering is one which can be readily used in real time systems. Modifications of these approaches are also available which can provide real time results which are quite good to work upon [11 - 13]. But here we have used the simple HSV colour space based segmentation for skin [14] which certainly puts on some constraint on the environment it can be used in, such as sha-dow could be a problem in some cases where light source is not directly in front of the hand. Such issues can be handled by using better segmentation techniques involving removal of shadow and ghost effect.

This segmentation is then followed by exclusion of unwanted areas segmented due to similar colour. This is done by excluding segments smaller than a particular size. Also some post-processing is applied as closing operation to enhance the segmented hand as seen in figure 3[a] and 3[b].


[a] [b]

Fig. 3. [a] Source hand image. [b] Segmented hand from the source image.

2.4 Finger Tip Extraction

Hand segmentation provides a rough estimation of finger tips. To extract them, a process inspired from the star skeletonization method [16] is used. Here, a horizontal scanning of the segmented image is done to obtain the highest “lighted” pixel on the y-axis. This provides an outline of the segmented hand.

Further, a DFT is applied on this obtained curve followed by a low pass filter and an IDFT to obtain a smooth curve highlighting the finger tips in form of local maxima for each tip. These maxima are then synchronized with the proper tip it corresponds to. This phase gives a decent result with all tips being found.

[a] [b]

Fig. 4. [a] Rough finger outline, [b] Smooth curve obtained after applying DFT followed by Low pass filter and IDFT with local maxima representing tips.


2.5 Keystroke Detection

This phase follows up to the previous one with tips being found in every frame and maintaining location of each tip in each frame with several other data such as direc-tion of the current motion, last point of change of direction, base frame from which movement in a particular direction began.

All of this data is used to keep track of the speed of each tip in each frame. This is calculated by the following formula,

(1)

Si represents speed of the tip in current frame, CPi represents position of the tip in current frame, PBFi is for position of the tip in base frame, N represents number of frames between current frame and base frame for the particular finger tip.

[a] [b]

[c] [d]

Fig. 5. [a]-[e]. Subsequent frames depicting touch occurrence and detection scenario. Starting from [b] showing the ring finger as moving fastest, then [c] shows the actual touch event and [e] showing the point of touch. Blue dot represents all tips extracted. Green dot represents fast-est tip in that particular frame. Red dot represents actual point of touch. So, here a frame delay of two frames takes place to find the actual point of touch.

Si = (CPi – PBFi) N

,


[e]

Fig. 5. [a]-[e]. (Continued)

This speed data is used to find fastest moving tip in forward direction in each frame. This data is maintained along with an additional information depicting for how many frames did the fastest tip moved in forward direction and then in backward direction. When their count is above a certain number of frames, the point of change of direction is considered to be as point of touch.

This point of touch is then finally traced to the key at this location.

3 Result and Discussion

The setup uses a single camera to extract the keys and watch the finger movements and conclusively detect touch events and map them to key locations. The camera used is i-ball super-view at the resolution of 640x480 but it can work for every camera with support for this resolution.

The overall procedure introduced here is new as a concept and provides handful of positive results. The idea of using the natural phenomenon that the finger being used to type moves fastest among all in previous frames works good enough to provide positive results.

The key localization module gives an accuracy of 100% in ideal computer generat-ed keyboard images and up to 98% for those printed on paper or hand drawn on any surface but not so complex background. Also the surface doesn’t needs to be flat for it to work and keys can be in any order. As visible in fig. 2 it works efficiently for sev-eral conditions as mentioned above.

The hand segmentation module is quite effective in limited conditions specifically when surroundings do not have colour similar to that of skin. Tip extraction method works very accurate and is independent of object extracted in hand segmentation module. Thus if in case hand segmentation module is converted into segmentation of moving objects with tips, say pen, rather than being specific to hands, then it shall detect its tips.


The key touch detection method tends to have some improvements over other virtual keyboard methods that have been discussed earlier in the introduction part, namely, the method involving fuzzy rules for the hand gesture, machine learning based approach and the finger tip shadow method.

Specifically for the shadow based method consider a case as shown in following figure (Fig. 6). It shows index finger’s shadow is overtaken by that of middle finger. This shall result into faulty touch detection. However, our method clearly differen-tiates between the two tips and thus being independent of the shadow shall provide better result.

[a] [b]

Fig. 6. Situation in which shadow of one finger gets behind another finger

4 Conclusion

The method discussed here introduces a new paradigm of possibilities with virtual keyboards with fully customizable keyboard support and real time keystroke detec-tion. However, many improvements can be introduced to it like using better algo-rithms for hand segmentation so as to make it more robust against various environ-ments as the current one might not help with light source from an angle, dim light source, objects having similar colour to the skin like wood, etc.

Since the key identification module implemented was just for English, its working across other languages could not be verified but certainly the availability of several robust algorithms for multilingual optical character recognition [15] would help it get across this problem and help it become portable and accessible to everyone.

Another limitation this work has is that as of now it doesn’t supports multi-touch functionality of the keyboards (e.g. ctrl+del, etc).

Acknowledgment. I sincerely would like to thank everyone who helped making this project a reality. Faculty members and my colleagues whose advices helped me im-prove it. Also would like to thank friends and family members who supported me throughout. At last, I would like to thank IIIT-Allahabad authority for providing all necessary facilities and equipments.


References

1. Habib, H.A., Mufti, M.: Real Time Mono Vision Gesture Based Virtual Keyboard System. IEEE 1266 Transactions on Consumer Electronics 52(4) (November 2006)

2. Murase, T., Moteki, A., Suzuki, G., Nakai, T., Hara, N., Matsuda, T.: Gesture Keyboard with a machine learning requiring only one camera. In: AH 2012 Proceedings of the 3rd Aug-mented Human International Conference, Geneva (2012 ), doi:10.1145/2160125.2160154

3. Adajania, Y., Gosalia, J., Kanade, A., Mehta, H., Shekokar, P.N.: Virtual Keyboard Using Shadow Analysis. In: Third International Conference on Emerging Trends in Engineering and Technology, November 19-21, 2011, pp. 163–165 (2011), doi:10.1109/ICETET.2010.115

4. ZhiYu, C., Abidi, B.R., Page, D.L., Abidi, M.A.: Gray-Level grouping (GLG): an automatic method for optimized contrast enhancement-part I: the basic method. IEEE Transactions on Image Processing 15(8), 2290–2302 (2006)

5. Kim, J., Han, Y., Hahn, H.: Character segmentation method for a License plate with topological transform. World Academy of Science, Engineering and Technology (2009)

6. Gupta, A., Srivastava, M., Mahanta, C.: Offline handwritten character recognition using neural network. In: 2011 IEEE International Conference on Computer Applications and Industrial Electronics (ICCAIE), December 4-7, pp. 102–107 (2011)

7. Haritaoglu, I., Davis, L.S., Harwood, D.: W4: Who? When? Where? What? A real time system for detecting and tracking people. In: Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 222–227 (1998)

8. Wren, C., Azarbayehani, A., Darrell, T., Pentland, A.: PFinder: Real time tracking of human body. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 780–785 (1997)

9. Barron, J., Fleet, D., Beauchemin, S.: Performance of optical flow techniques. International Journal of Computer Vision 12(1), 42–77 (1994)

10. Gupta, R.K.: Comparitive analysis of segmentation algorithms for Hand gesture recognition. In: Third International Conference on Computational Intelligence, Communi-cation Systems and Networks, July 26-28, pp. 231–235 (2011)

11. Spruyt, V., Ledda, A., Geerts, S.: Real-time multi-colorspace hand segmentation. In: 17th IEEE International Conference on Image Processing, pp. 3117–3120 (September 2010)

12. Bilal, S., Akmeliawati, R., Salami, M.J.E., Shafie, A.A., Bouhabba, E.M.: A hybrid method using haar-like and skin-color algorithm for hand posture detection, recognition and tracking. In: International Conference on Mechatronics and Automation, pp. 934–939 (August 2010)

13. Gang-Zeng, M., Yi-Leh, W., Maw-Kae, H.: Real-time hand detection and tracking against complex background. In: Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 905–908 (September 2009)

14. Oliveira, V., Conci, A.: Skin detection using HSV Color space. In: Pedrini, H., de Carvalho, J.M. (eds.) Workshops of Sibgrapi 2009 –Posters, pp. 1–2. SBC, Rio De Janerio (2009)

15. Fujiyoshi, H., Lipton, A.J.: Real-time human motion analysis by image skeletonization. In: Fourth IEEE Workshop on Applications of Computer Vision, pp. 15–21 (October 1998)

16. Kae, A., Smith, D.A., Miller, E.: Learning on the fly: A font free approach towards multilingual OCR. International Journal on Document Analysis and Recognition 14(3), 289–301 (2011)

lncs 8007 - real time mono-vision based customizable ... · real time mono-vision based...

Documents