automatic recognition of object size and shape via user-dependent measurements of the grasping hand

Available online at www.sciencedirect.com

1071-5819/$ - se

http://dx.doi.or

nCorrespond

E-mail addr

(R.-D. Vatavu)

URL: http:/

Int. J. Human-Computer Studies 71 (2013) 590–607

www.elsevier.com/locate/ijhcs

Automatic recognition of object size and shape via user-dependentmeasurements of the grasping hand

Radu-Daniel Vatavun, Ionut Alexandru Zaiti

University Stefan cel Mare of Suceava, str. Universitatii nr. 13, 720229 Suceava, Romania

Received 4 July 2011; received in revised form 13 August 2012; accepted 3 January 2013

Communicated by E. Motta

Available online 24 January 2013

Abstract

An investigation is conducted on the feasibility of using the posture of the hand during prehension in order to identify geometric

properties of grasped objects such as size and shape. A recent study of Paulson et al. (2011) already demonstrated the successful use of

hand posture for discriminating between several actions in an office setting. Inspired by their approach and following closely the results

in motor planning and control from psychology (Makenzie and Iberall, 1994), we adopt a more cautious and punctilious approach in

order to understand the opportunities that hand posture brings for recognizing properties of target objects. We present results from an

experiment designed in order to investigate recognition of object properties during grasping in two different conditions: object

translation (involving firm grasps) and object exploration (which includes a large variety of different hand and finger configurations).

We show that object size and shape can be recognized with up to 98% accuracy during translation and up to 95% and 91% accuracies

during exploration by employing user-dependent training. In contrast, experiments show less accuracy (up to 60%) for user-independent

training for all tested classification techniques. We also point out the variability of individual grasping postures resulted during object

exploration and the need for using classifiers trained with a large set of examples. The results of this work can benefit psychologists and

researchers interested in human studies and motor control by providing more insights on grasping measurements, pattern recognition

practitioners by reporting recognition results of new algorithms, as well as designers of interactive systems that work on gesture-based

interfaces by providing them with design guidelines issued from our experiment.

& 2013 Elsevier Ltd. All rights reserved.

Keywords: Hand posture; Grasping; Prehension; Object recognition; Experiment; Shape recognition; Size recognition; Measurements; Data glove;

Gestures

1. Introduction

The human hand represents a remarkable instrument forgrasping and manipulating objects but also for extractinguseful information such as object weight, size, orientation,surface texture, and thermal properties. Hands serve there-fore both executive and perceptive functions synchro-nously while they are employed to explore as well as totransform and change the environment. In MacKenzie andIberall’s own words: we use our hands as general purpose

e front matter & 2013 Elsevier Ltd. All rights reserved.

g/10.1016/j.ijhcs.2013.01.002

ing author. Tel./fax: þ40 230 524801.

esses: [email protected], [email protected]

, [email protected] (I.A. Zaiti).

/www.eed.usv.ro/~vatavu (R.-D. Vatavu).

devices, to pick up objects, to point, to climb, to play musical

instruments, to draw and sculpt, to communicate, to touch

and feel, and to explore the world (Makenzie and Iberall,1994, p. 4). The understanding of the hand functions andthe inner workings of the fine mechanisms for planningand controlling movements at both muscular and neural(central nervous system) levels still represent a very activefield of study for psychologists interested in motor control(Jones and Lederman, 2006; Makenzie and Iberall, 1994;Wing et al., 1996). However, MacKenzie and Iberall’sdescription of hands as instruments can prove to beextremely interesting to human–computer interaction: thehand becomes a specialized device that extracts informationfrom the objects users are interacting with. This can bedescribed as extrinsic-oriented exploration of the environment


dx.doi.org/10.1016/j.ijhcs.2013.01.002


dx.doi.org/10.1016/j.ijhcs.2013.01.002

dx.doi.org/10.1016/j.ijhcs.2013.01.002

dx.doi.org/10.1016/j.ijhcs.2013.01.002

mailto:[email protected]






http://www.eed.usv.ro/~vatavu

R.-D. Vatavu, I.A. Zaiti / Int. J. Human-Computer Studies 71 (2013) 590–607 591

in opposition to intrinsic-oriented exploration where objectsshare identification information explicitly through varioustechnologies such as RFID (Kim et al., 2007; Tanenbaumet al., 2011; Ziegler and Urbas, 2011) and Bluetooth (Farellaet al., 2008; Siegemund and Florkemeier, 2003). As theexploration is also voluntary and accomplished usingthe human hand, we can further refer to it as extrinsic-

proprioceptive exploration as the hand posture is the onlyavailable source of data from which information about targetobjects is inferred.

There are two main motivations in HCI for collecting andusing hand posture information during object grasping andexploration. One of them is concerned with recognizing handpostures and designing proper interaction metaphors inorder to build natural gesture-based interfaces. The goal inthis case is to provide natural and intuitive ways for users tointeract with computing systems by leveraging the informa-tion richness delivered by the hand pose (Baudel andBeaudouin-Lafon, 1993; Erol et al., 2007; Wachs et al.,2011). The other motivation is represented by recognizingactivity patterns for context-aware applications, leading HCIdevelopments towards ubiquitous computing. Within thisdirection, Paulson et al. (2011) showed that various activitiesin an office such as dialing a number, holding a mug, typingat the keyboard, or handling the mouse can be recognizedusing hand posture information solely.

However, the application domain can be much extendedand many advantages and interactive opportunities can beenvisaged regarding the information obtained while thehand is grasping ambient objects. First, the hand as ameasuring instrument relieves the costly need for embeddingidentification technology into practically every ambientobject as would be the case for RFID tags (Ilie-Zudoret al., 2011). This is especially important when such taginstallations become impractical for some scenarios interms of cost, performance, and reliability (e.g., identifica-tion problems do occur cause of tag orientation, materialtype, and reader collision), and social acceptance of RFIDtechnology hindered by ethical concerns (Want, 2006).Second, knowledge of how objects are manipulated can beexploited for enhancing existing interactions with everydayobjects: a firm grip on the phone ends the current call; afirm grip on the door knob locks the office door on theway out while a gentle grasp simply closes it withoutlocking; grasping the remote turns on the TV, etc. Videogames can also benefit of enriched interaction experiencesin the way that players can grab any object from the realworld and use it inside the game. For example, grasping asimple stick informs a sports game that the user is holdingthe baseball bat; a toy pistol can be used in an action gamethat senses the hand in the ‘‘trigger’’ posture; grasping aball can make the game character grab another snow balland throw it at the opponent in a winter game. Traditionallearning games for children such as wooden bricks thatencourage hand motor development can become highlyinteractive whilst the hand posture is retrieved and analyzed:virtual guiding tutors that know when the child grasped the

object with the right size and shape or automated monitor-ing of the child’s progress. All such interaction opportunitieswith everyday objects become viable once information canbe inferred about the grasped object using the hand alone.The idea of using hand measurements in order to identify

properties of target objects has been investigated before invarious forms and for various purposes outside the HCIcommunity. For example, an early study of Buchholz et al.(1992) was concerned with proposing a model for the humanhand in order to noninvasively predict postures as the handgrasps different sizes of objects. The model served to predictand evaluate the prehensile capacities of the hand whengrasping ellipsoidal objects (Buchholz and Armstrong, 1992)in order to provide assistance for designers of tool handles. Inthe psychology line of motor control work, Santello andSoechting (1998) showed that it is possible to discriminatebetween concave and convex objects using the relative flexurebetween the index and little fingers of the grasping hand.Such previous works suggest the potential of using the

hand in order to extract information about the environment,and, more precisely, about the objects users are interactingwith. The findings, although dispersed in specific contextsand research communities, suggest interesting opportunitiesfor human–computer interaction. However, in order for thecommunity to benefit of such a technique, solid evidence andanalysis must be provided for researchers and practitionersto rely on when designing and evaluating their applications.Inspired by the well-grounded results in psychology con-cerning the grasping hand (Jones and Lederman, 2006;Makenzie and Iberall, 1994; Wing et al., 1996) and followingthe results of Paulson et al. (2011), we explore the feasibilityof employing hand posture to automatically extract objectproperties. However, in opposition to Paulson et al. (2011)that only show that a number of distinct office activities canbe recognized, we adopt a more thorough procedure. Byfollowing closely the results from motor control theoryconcerning the act of prehension and grasping target objects,we designed an experiment for determining whether size,shape, and size and shape together can be reliably identifiedfor objects with basic geometries. The analysis was carriedout in two different scenarios: object translation, for whichstable and firm hand grasps are used, and object exploration,for which a large variety of different hand postures andfinger configurations are employed. The exploration scenariowas specifically introduced in order to evaluate the techniqueindependently of the intended use of the object and thereforeto better understand its feasibility.The contributions of our work include:

i.
We show that object size and shape can be robustlyinferred from measurements on the grasping hand withup to 98% accuracy during object translation and 95%and 91% accuracies during object exploration in theuser-dependent training scenario.
ii.
We report design guidelines for the implementation ofthe classification algorithm, the size of the training set,and the training procedure.

R.-D. Vatavu, I.A. Zaiti / Int. J. Human-Computer Studies 71 (2013) 590–607592

iii.
We introduce new tools for analyzing the variability inhand posture and for measuring the amount of sharedpostures when grasping different objects.
2. Related work

We are interested in previous works that come from thefield of psychology and motor control in order to under-stand the main results with respect to hand prehensile andgrasping movements. We then use these results in order toinform the design of an experiment for acquiring handmeasurements as described in the next section. We are alsointerested in works from the pattern recognition and HCIcommunities that attempt to recognize hand postures and,consequently, we report results on acquisition and recogni-tion technology. By relating to all these communities, webelieve that this work could potentially benefit all of them byproviding more insights on hand measurements, recognitionresults, as well as design guidelines for hand-based interfaces.

A vast amount of literature exists on hand motor control(Jones and Lederman, 2006; Makenzie and Iberall, 1994;Wing et al., 1996) that provides a solid starting point forlooking at the human hand as an instrument for extractingobject properties. Also, this knowledge can be used inorder to inform the design of recognition algorithms andinteraction techniques for gesture-based interfaces asmotor theorists may have already (or partially) foundanswers to important questions:

�
How is the hand adjusting to objects during prehensionand grasping? � What postures do people use when exploring an object
in order to learn about its properties?

2.1. The influence of object characteristics on hand posture

Objects with different geometries are being grasped usingdifferent hand postures that depend on object size (Jakobsonand Goodale, 1991), shape (Santello and Soechting, 1998), andintended use (Napier, 1956). Even more, the hand shapechanges during the reach-to-grasp movement in accordance tothe shape, dimensions, and other properties of the object(Jones and Lederman, 2006). For example, the amplitude ofthe maximum grip aperture of the hand was found to correlatewith the size of the target object during the transport phase(Jakobson and Goodale, 1991; Gentilucci et al., 1991) afterwhich the hand shape becomes distinctly resemblant to that ofthe object to be grasped. The shape of the grasped objectimposes constraints on the posture being used so that theapplied forces would coordinate and therefore prevent theobject to slip by maintaining a stable grasp (Jenmalm et al.,1998). This stable grasp is attained by positioning fingers tocontact points on the object surfaces. Therefore, object sizeand shape will also influence both the number of fingers aswell as their contact locations during grasping (Makenzieand Iberall, 1994). These findings on hand adaptation to the

size and shape of target objects represent a solid base forsuggesting the use of the hand as an automatic measurementdevice acting on the objects users are interacting with.The intended use of the target object was also found to

influence hand posture. For example, Makenzie and Iberall(1994) note that: ‘‘when a person reaches out to grasp an object,

the hand opens into some suitable shape for grasping and

manipulation—suitable, in the sense that the person’s under-

standing of the task influences the shape of the hand. As well, the

hand posture adapted for drinking is dramatically different from

that used to throw the mug at someone else.’’ (p. 4). Therefore, avariety of postures are expected to be associated to a givenobject during manipulation. This has a huge impact on thealgorithm used for recognition, suggesting that a nearest-neighbor approach as in Paulson et al. (2011) must be backedup with a properly sized set of training examples (as shownlater in the Results section).Researchers have found that people choose between

different hand movement patterns when inspecting objectsin order to learn about their various properties (such assurface texture, weight, temperature, etc.). These stereotypi-cal movements have been coined as exploratory proceduresin the study of Lederman and Klatzky (1987) and theyinclude lateral motions for identifying texture, pressure forhardness, unsupported holding for weight, contour followingand enclosure for shape. The observation regarding theexistence of such standard exploratory patterns is veryinteresting from an algorithmic perspective as it informson the postures people are likely to use when showinginterest in some specific property of the target object.Other researchers have tried to propose classificatory

systems for the postures adopted by the hand while grasping(see Makenzie and Iberall, 1994, pp. 19 and 20 for a review).Two postures have been occurring frequently for mostclassifications: the power grip and the precision grip. The firstone is used when the objective of the task is force (e.g., using aheavy tool such as a hammer). The posture in this case ischaracterized by a large contact area and little movement fromthe fingers as the grasp needs to be stable. The precision grip isusually accomplished using the thumb, index, and sometimesmiddle fingers so that the primary objective is precise controlrather than force. The tip, palmar, and lateral pinches areexamples of the precision grip (Cutkosky and Howe, 1990).These findings represent a great starting point for under-

standing the hand as an instrument. We used knowledge fromthese works in order to inform the design of our experiment inaccordance to the already established practices from psychol-ogy. Also, observations on hand grasp patterns were exploitedin the Recognition and Results section of the work in order toexplore the design space of the Euclidean distance forcomputing the dissimilarity between hand postures.

2.2. Acquisition and recognition of hand postures

Many technologies exist today for capturing hand pos-tures including gloves with flexure sensors, special markersworking with IR tracking systems, or video cameras. They


differ in acquisition resolution in terms of finger positionaccuracy and data sampling rate. For example, sensor glovesallow fine detection of finger flexure and precise handorientation and are able to capture all the degrees of freedomexhibited by the human hand at accurate resolutions. Forthis reason, they are the preferred equipment for motorcontrol experiments (Santello and Soechting, 1998; Santelloet al., 2002). Also, they have been employed extensively forinteractive applications (Baudel and Beaudouin-Lafon, 1993;Paulson et al., 2011; Sturman and Zeltzer, 1994) as theyallow rapid prototyping enabling thus HCI researchers toconcentrate on interaction techniques rather than on theacquisition technology itself. Color gloves and computervision have also been shown recently to work remarkablywell (Wang and Popovic, 2009). Rather than describing herethe multiple options available today for acquiring measure-ments of the human hand for interactive purposes, we referthe reader to extensive surveys such as Erol et al. (2007) andWachs et al. (2011).

When it comes to recognition algorithms, researchershave proposed different techniques depending on whetherhand postures or gestures need to be recognized. Althoughadvanced algorithms are being used for recognizingdynamic hand gestures such as Hidden Markov Models(Chen et al., 2003) and neural networks (Tsai and Lee,2011), classification of hand postures is frequently per-formed using more simple approaches such as the nearest-neighbor algorithm (Paulson et al., 2011; Wang andPopovic, 2009). The approach consists in comparing thehand posture to be classified to previously recordedsamples available as a training set. The comparison isguided by the use of a distance such as the Euclideanmetric. The candidate hand posture is recognized asbelonging to the class of its closest sample or neighbor inthe training set. Although simple, this approach has beendemonstrated to work very well when the training set isproperly sized in order to model adequately the distribu-tion of each posture class (a data set of around 100,000entries was reported in Wang and Popovic, 2009). Also,the nearest neighbor classifier presents several advantagessuch as flexibility and adaptability which encouraged itsadoption for gesture recognition in the human–computerinteraction community (Appert and Zhai, 2009; Li, 2010).The most notable advantage is represented by the fact thatnew or user-specific gestures can be added by simplyproviding training examples without the need of rethinkingthe structure or retraining the classifier.

Although hand posture recognition has been shown towork with high accuracy rates, distinction must be expli-citly made between using hand posture in order to sendcommands (for which the number of commands is usuallysmall and limited by the human ability to learn and recall)and the attempt to classify all distinct hand postures usedfor grasping. For the later, the psychology community hasexpressed concerns with regards to developing classifica-tory systems for grasping hand postures. According toJones and Lederman (2006), researchers have found

challenging to develop hand function taxonomies for thepurpose of predicting the hand posture during grasping byonly considering function and object geometry. Suchconcerns must be carefully taken into consideration. Firstof all, from the recognition point of view, the training setneeded for recognizing hand postures associated to thesame object should probably contain a large number ofexamples, as we have already stated before. Second, thetechnique of inferring object properties from hand pos-tures is very likely to have limitations which must beunderstood and investigated (and we show less accuraterecognition results for user-independent training). There-fore, our research approach is to start investigating thetechnique with a thoroughly controlled experiment so thatdesigners and practitioners of such interactive techniquewould have a solid base to rely on. The challenge is tounderstand to what extent such a technique can be usedreliably in practice. The design of our experiment describedin the next section of the paper considers this aspect.

3. Experiment

3.1. Premises for experiment design

Findings from motor control theory show that the handposture changes accordingly with the size (Jakobson andGoodale, 1991) and the shape (Santello and Soechting, 1998)of the grasped object. However, variations in posture are alsodetermined by the intended use of the object (Napier, 1956)which makes it difficult to create complete taxonomies of thegrasping hand (Jones and Lederman, 2006) based on objectgeometry alone. However, even within this limitation, wehypothesize that object size and shape could still be robustlyrecognized if a large sample of data were available. Even ifcurrent hand taxonomies can not be complete from this pointof view (Jones and Lederman, 2006), we rely nonetheless onsuch classificatory systems as they represent the most reliablesources in order to inform the design of our experiment.From all the existing taxonomies (see Makenzie and

Iberall, 1994, pp. 19 and 20 for a review) it is worth notinga very early and simple yet effective classification ofprehensile postures into just six categories from Schlesingerin a work from 1919 cited and described by Makenzie andIberall (1994, pp. 17 and 18). This minimal set wasconstructed by considering the specific functionality neededfor grasping or holding various objects and is composed ofthe cylindrical, spherical, tip, hook, palmar, and lateral

postures. As they are important to our experiment design,we briefly describe each of them below (see also Fig. 1):

1.
Cylindrical: the open fist grip used for grasping tools ora closed fist for thin objects.
2.
Spherical: the fingers are spread and the palm archedwhile adapting to grasp spherical objects such as a ball.
3.
Tip: fingers grasp sharp and small objects, such as aneedle or pen.
4.
Hook: used for heavy loads such as suitcases.

Fig. 1. A very early (1919) and simple yet effective classification of prehensile postures into six categories from Schlesinger (Makenzie and Iberall, 1994,

pp. 17 and 18): (1) cylindrical, (2) spherical, (3) tip, (4) hook, (5) palmar, and (6) lateral hand postures.

Fig. 2. Objects used for the experiment consisting in six basic shapes (cube, parallelepiped, cylinder, sphere, pyramid, and surface) and three different sizes

(small, medium, and large). All objects were made out of light wood.


5.
Palmar: used for flat and thick objects. 6. Lateral: the thumb is primarily used in order to grasp
thin and flat objects such as a piece of paper.

Schlesinger’s classification incorporates three criticalnotions: object shape (cylindrical, spherical), hand surface(tip, palmar, lateral), and hand shape (hook, close fist,open fist). Therefore, by considering basic object geome-tries in accordance with these criteria, an informed inves-tigation on the feasibility of using hand posture for objector action recognition can be reliably conducted. As a sidediscussion, a problem with this simple classification is thatit does not take into account the actual task for whichobjects are being used as suggested by Napier (1956). Forexample, the postures employed to pick a pen and to writewith it are different. Napier’s classification argued there-fore that the most important influence on the chosenposture is the goal of the task although the influence onthe posture comes from various sources (object shape, size,surface, etc.). However, for the purpose of this study weare interested in extracting object properties such as size

and shape rather than inferring on the intended use. Webelieve it is more rigorous to start with simpler researchquestions which once answered construct the base for morefuture work on the uses and applications of this technique.We therefore selected six basic shapes for our experi-

ment (cube, parallelepiped, pyramid, sphere, cylinder, andsurface) as informed by the classification of prehensilepostures of Schlesinger (Makenzie and Iberall, 1994, pp. 17and 18). For each shape, three different sizes wereconsidered and labeled for our reference as small, medium,and large for which the actual size references were 2, 4, and8 cm which applied to each object in accordance with itsspecific geometry. The geometries of the 18 objects usedfor the experiment are illustrated in Fig. 2.

3.2. Participants

Twelve right-handed participants volunteered to takepart in the experiment (ages between 18 and 24). None ofthem have worked before with a data glove. All subjectswere naive as to the goals of the experiment.


3.3. Apparatus

Hand postures were captured using the 5DT Data Glove14 Ultra1 equipped with 14 optical sensors. The glovemeasures finger flexure (two sensors per finger placed atthe first and second joints) as well as the abduction betweenthe fingers (four sensors), see Fig. 8b and the 5DT gloveuser’s manual.2 The glove was connected to a desktop PC onthe USB port. Data from the glove were calibrated for eachuser with each sensor measurement being stored as a realvalue in ½0: :1� using 12 bits of precision. The material of theglove is stretch lycra so that it fits the user’s hand and hencereduces the discomfort while wearing the equipment. Datawere captured at a frequency of 60 Hz.

3.4. Tasks and design

The experiment consisted in two tasks for which handpostures were captured. For the first task we were inter-ested in hand postures employed during object translationwhile for the second we captured postures used duringobject exploration. The tasks are different with respect tothe variation in hand posture that can be collected. For thetranslation task, the objects are likely to be held firmly in astable grip while they are moved from one location toanother while for the second task a large variety ofpostures will presumably be employed while exploringthe object. We therefore expect small variation for thefirst and large hand posture variations for the second task.The motivation of using the two tasks was to acquire asufficient amount of samples that would be representativefor grasping and manipulating a given object as well as tocompare the feasibility of the technique when either firmgrasps or exploratory postures are being used.

3.4.1. Task 1: object translation

Participants stood in front of a table on which all theobjects were placed at an easy arm reach distance. Theinitial location of the objects was on the participants’ rightside of the table. Participants were asked to grab theobjects placed on their right and move them to the left side.Only the dominant hand (right hand for all subjects) hadto be used. The order in which objects had to be movedwas randomized by a software application which displayeda text description concerning the current trial, e.g. ‘‘Move

the small cylinder to your left’’. The experimenter waspresent during all the trials in order to record the graspingmotion: the recording started with a key press of theexperimenter at the moment when the participant firsttouched the object and ended (again with a key press)when the participant released the object. The objective ofthe task was to capture hand postures used when grabbingand maintaining a stable grip posture for reliable

1http://www.5DT.com2http://www.5dt.com/downloads/dataglove/ultra/5DTDataGloveUltra

Manualv1.3.pdf

translation of objects. The task took approximately5 min to perform.

3.4.2. Task 2: object exploration

For the second task, participants were asked to pick upeach object and perform a series of explorations on it. Again,only the dominant hand had to be used. The explorationconsisted in performing a predetermined sequence of rota-tions on each object which would require fingers to be usedfrequently. The objective of the task was to capture as manydifferent postures as possible that can be associated to agiven object of a specified shape and size. The order in whichobjects were manipulated was randomized by the softwareapplication as it was the sequence of rotations that had to beperformed. Prior to the experiment, six different labels wereglued on each object representing the numbers from 1 to 6.For each trial, the application generated the rotations byrandomly choosing a sequence of numbers that was dis-played to the participant, e.g. ‘‘Grab the medium pyramid and

search for the sequence 2–5–4–2–1–3–5–1’’. Participants hadto search for each number in the sequence in order tocomplete the trial. Evidently, the sequence of numbers wasnot important: the main advantage of using it was that userswere given the freedom to explore the object themselves (bylooking for the next number in the sequence). The experi-menter ensured that participants performed the entiresequence before moving to the next trial.This second task was used in order to capture multiple

measurements of postures that each participant chose toperform with a specific object. Participants were given thefreedom of exploring the object with no instruction otherthan using one hand only. Postures were being capturedstarting with the moment when the participant firsttouched the object until the object was released (as in thefirst task, the two events were explicitly specified by theexperimenter with a press of a key). The task took around15 min to complete.

3.5. Recognition and analysis apparatus

We describe below the definitions employed during ouranalysis. We introduce the definition for hand posture,distance between two postures, and the main algorithmemployed for classifying a new posture.A hand posture p is represented as a 14-feature vector

where each feature pi encodes either the flexure or theabduction distance measured by the glove sensors in theinterval ½0: :1� with 1 denoting maximum flexure/abduction(see the previous section for the technical description of theglove)

p¼ ðp1; p2; . . . ; p14Þ 2 R14 ð1Þ

This representation can be used directly in order tocompute the dissimilarity between two hand postures in


the form of the Euclidean distance

dðp; qÞ ¼ Jp�qJ¼ ðX14i ¼ 1

ðpi�qiÞ2Þ

1=2

ð2Þ

A small distance between two posture measurementsindicates a strong similarity. The maximum value for theEuclidean distance between two postures having 14 nor-

malized features would be ðP14

i ¼ 1ð1�0Þ2Þ1=2¼ 141=2 ¼ 3:75.

Classification of a new hand posture p is computed usingthe nearest-neighbor rule (Paulson et al., 2011; Wang andPopovic, 2009). If T ¼ fpj9j ¼ 1; 9T9g represents a set oftraining samples, p will be classified to the class of pk forwhich

k ¼ arg minj¼1;9T9

fdðp; pjÞg ð3Þ

Fig. 3. Within-object variation in hand postures vs. object size. Error bars

represent 95% CI.

4. Results

We try to understand whether hand posture can be reliablyused in order to discriminate between grasped objects thatexhibit fine shape and size differences. Also, we analyzewhether size and shape can still be inferred during anexploration task for which multiple finger configurationsare likely to be used. The problem is subtle and recognizedbefore as being difficult cause of the large variability of thegrasping hand (Jones and Lederman, 2006): ‘‘because the

hand usually adopts different grasps during the performance of

a task, it has been difficult to develop taxonomies of hand

function that can predict hand grasps from a specification of

task requirements and object geometry’’ (p. 140). However,previous work (Wang and Popovic, 2009; Wang et al., 2011)showed that nearest-neighbor classifiers work particularlywell for recognizing hand postures when employing a largeset of training samples which gives hope for proposing arobust solution to the problem.

4.1. Data set

A data set of � 44; 000 hand postures was collected fromthe 12 participants during the first task (object translation)with an average count of 200 postures per object (SD=77)which is equivalent to firmly holding each object forapproximatively 3–4 s. For the second task, � 200; 000postures were collected with an average of 925 posturesper object (SD=295) which is the equivalent to � 15 s(SD=4.9) exploration time per object and participant.

4.2. Hand posture variation

We start by measuring the amount of variation in handposture in relation with the variability in grasping objectsobserved in the motor control theory (Jones andLederman, 2006). For the first task, participants used astable hand posture in order to hold firmly the graspedobject while moving it to a new location. Therefore, data

that were acquired for each participant and each objectwould represent roughly the same posture and we expect asmall amount of variation. However, for the second taskwhich involved object exploration, we expect that variationin hand posture will increase significantly cause of thedifferent finger configurations employed.Let A¼ fa1; a2; . . . ; a9A9g be the set of all hand postures

acquired for object A for a given participant during eithertask 1 or 2. We define the within-object posture variationas follows:

wðAÞ ¼1

9A9

X9A9

i ¼ 1

Jai�aJ2 ð4Þ

where a represents the average posture for set A and J � J

the Euclidean distance between two postures defined in theprevious section. The average posture a of the set isobtained by averaging the values of each sensor for allpostures in the set

a ¼1

9A9

X9A9

i ¼ 1

ai;1;1

9A9

X9A9

i ¼ 1

ai;2; . . . ;1

9A9

X9A9

i ¼ 1

ai;14

0@

1A ð5Þ

where each posture ai is described by 14 measurementsai ¼ ðai;1; ai;2; . . . ; ai;14Þ as provided by the 5DT data glove.The within-object variation w(A) measures how much eachindividual hand posture deviates from the center of the setwith respect to the Euclidean distance. If hand postures areroughly the same then we expect to find a small amount ofvariation.We computed the values of the variation for both

translation and exploration tasks. As expected, variationwas significantly larger for the exploration task (mean 0.20,SD¼0.11) than for the translation task (mean 0.10,SD¼0.08) as shown by a Wilcoxon signed-rank test(z¼�10:084; po0:001; r¼ 0:49). Interestingly, there wasa significant effect of object size on variation as indicatedby a Friedman test for both translation (w2ð2Þ ¼ 6:778;po0:05) and exploration (w2ð2Þ ¼ 57:000; p o0:001).Fig. 3 illustrates the variation values for the two tasks.The findings show that the larger objects are, more optionsfor grabbing and holding them in a stable posture are

Fig. 4. Within-object variation in hand postures vs. object shape. Error

bars represent 95% CI.


possible during translation and especially during explora-tion. A significant effect of object shape was also found forboth translation (w2ð5Þ ¼ 22:984; po0:001) and exploration(w2ð5Þ ¼ 43:095; po0:001). Fig. 4 illustrates the variationmeasured for each shape. Post-hoc tests revealed furtherinsight on hand posture variation. Wilcoxon signed-ranktests (Bonferroni corrected) showed no significant differencesbetween small and medium or medium and large sizes for thetranslation task. Also, n.s. differences were found betweenparallelepiped and cylinder or between cube and sphere

(expected due to similar geometries). The differences were,however, significant for the exploration task.

3Recognition rates were computed for each user and averaged values

are reported.

4.3. Shape and size recognition accuracy

The measurements on variation confirmed the observa-tions from the motor control theory that grasping differentsizes and shapes involves different hand postures. Also,different amounts of variation were found for the transla-tion and exploration tasks. In these conditions, we inves-tigate in the following whether hand posture information isrobust enough in order to discriminate reliably between thesize and shape of the objects from our set.

In order to estimate recognition accuracy for object shapeand size, the following approach was adopted. The handposture data set representing a continuous recording for eachparticipant, object, and task type (translation or exploration)was divided into training and testing. Specifically, a fixed timewindow of w postures was randomly extracted from therecorded timeline while all the remaining postures were usedfor training (see Fig. 5). Recognizers employed in differentways the information contained in the testing set of w postures(see below) in order to deliver a classification decision on theobject shape and size. In order to estimate recognitionaccuracy, we repeated the window extraction procedure for100 times for each object and each participant and countedhow many times each recognizer was correct

Recognition accuracy¼number of correct classifications

100 random trials½100%�

ð6Þ

This time window testing procedure corresponds exactlyto how a recognition system will work in practice: posturedata come as a continuous stream with the recognizerbuffering a number of w consecutive hand postures forwhich a classification decision needs to be delivered. Whenconsidering the high frequency response of today’s datagloves (e.g., our 5DT glove delivers 60 measurementsper second), the size of the window can be relatively large(e.g., up to 30–60 postures) for the classifier to take aninformed decision by using more data. We performed trialtests with w¼15, 30, and 60 postures for the window sizewhich correspond to 0.25, 0.5, and 1 s, respectively, ofcontinuous recording and found similar recognitionresults. Therefore, we only report in the following resultsfor w¼30 postures which is equivalent to outputting aclassification response each half second in a real-timeworking system.We start our recognition analysis by using the results of

Paulson et al. (2011) which tested different classifiers andfeature spaces and found that nearest-neighbor classifiersworking with either raw or principal component analysisfeatures deliver the best recognition performance for theoffice tasks scenario. We therefore selected the nearest-neighbor approach for our own analysis as well. Thetechnique of Paulson et al. (2011) was to average all thepostures recorded while performing a task in the office(e.g., drinking from a cup) and use the averaged result asthe training prototype for each class. For our problem, thisled to all the training hand postures recorded duringtranslation and exploration to be averaged into a singlerepresentative posture. When we tried this approach weobtained recognition rates for shape and size3 of 64.3%(SD¼12.5%) for the first task and 42.9% (SD¼12.0%)for the second task which are considerably lower than theresults reported by Paulson et al. (2011) for recognizing the12 office activities. However, this result is somewhatexpected as we work with objects between which muchfiner differences exist such as a cube and a sphere. Also, ourdata for the second task comes from object explorationand therefore exhibits greater variation. Averaging out allthese data practically ignore object specificity and, asshown by the results, does not capture accurately thestructure of each class in the sample space. This also showsthat when going beyond rather distinct activities that onlyrequire firm postures, the particular problems identified bymotor theorists with respect to constructing classificationtaxonomies for hand postures (Jones and Lederman, 2006)start to reveal for pattern recognists as well.As the averaging technique did not output satisfactory

results, we investigated additional enhancements for thenearest-neighbor classifier that would take into account allthe measured variance in our data. Therefore, we triedmultiple classifiers which we describe below. We named theclassifiers using the convention TRAINING-TESTING-TECHNIQUE

Fig. 5. The hand posture data set for a given participant, object, and task type is divided into training and testing. A randomly selected time window of

size w makes up the testing set while all the remaining postures go to training.


where TRAINING and TESTING reflect how the training andtesting sets are used [e.g., whether we average all the posturesas in Paulson et al. (2011)] – AVG – or rather use everyposture individually—RAW) while TECHNIQUE representseither the nearest-neighbor (NN) or the K-nearest-neighbor(KNN) classification approach. For example, the RAW–AVG–NN classifier employs the nearest-neighbor technique toclassify a candidate hand posture obtained by averaging allthe postures from the testing set (AVG) and compare theaverage posture against all the individual (RAW) data storedin the training set. The different versions of the classifiers aredescribed below:

1.
AVG-RAW-NN classifier: Training postures are averagedinto a single posture for each object and task type whichserves as a prototype or representative pattern. A singleposture is randomly selected from the testing set (thetime window of size w) and classified by comparing itvia the Euclidean distance against the prototypes of allobjects in accordance with the nearest-neighbor techni-que. The random selection of the training/testing setsand the random selection of the tested posture arerepeated for 100 times in order to estimate the recogni-tion accuracy of the technique as per Eq. (6).
2.
AVG-RAW-KNN classifier: The training set is averagedwhile raw testing postures are used, as in the previousAVG–RAW–NN approach. However, this classifier usesthe entire testing set (rather than one single posture) inorder to deliver the classification result. The reportedresult corresponds to the most frequently detected classby performing all the classifications of each individualposture in the entire testing window of size w. Thisapproach uses the principle of K-nearest-neighbor clas-sification in order to return a more informed decision.To make a simple comparison with the previoustechnique, the AVG–RAW–KNN classifier holds out itsdecision for w¼30 postures rather than classifying andreporting the result for each new posture the data gloveis streaming. Again, the random selection of the train-ing/testing sets is repeated for 100 times.
3.
RAW-AVG-NN classifier: All the postures are used fortraining (raw data) but we average postures from thetesting set into a single candidate that is submitted toclassification. The averaged posture is compared againstall the samples from the training set using the nearest-neighbor approach. The idea behind this approach isthat the nearest-neighbor classifier will operate better asthe sample space is more populated (i.e., the classdistributions of each object type are more dense). Thisacts as a brute force implementation.
4.
RAW-RAW-NN classifier: All the postures are used forboth training and testing without computing any averages.One single posture from the testing set is randomlyselected and submitted to classification. The process isrepeated for 100 times as in the previous approaches.
5.
RAW-RAW-KNN classifier: All the postures are used forboth training and testing but we output the mostfrequently detected class for the entire time window w.This combines the brute force of RAW–RAW–NN withthe group weighted decision of KNN. Training andtesting sets are randomly selected for 100 times in orderto estimate recognition accuracy.
6.
AVG-AVG-NN classifier: This is the classification proce-dure of Paulson et al. (2011) for which we alreadyreported classification results. We include it here forcompleteness purposes.
Besides these classifiers which are directly derived fromthe nearest-neighbor approach, we also tested two state-of-the-art machine learning techniques that have beenemployed before for recognizing static hand postures(Kelly et al., 2010; Pizzolato et al., 2010; Rashid et al.,2010).

7.
Multilayer perceptron (MLP): Two MLP architectureswere used for the size and shape classification problems.Pretests were conducted in order to determine theoptimal number of neurons in the hidden layer thatwould deliver the best classification result. The final


MLP architectures for which we report results are14� 70� 3 neurons for the size classification problemand 14� 70� 6 neurons for the shape problem. Thesearchitectures correspond to an input layer of 14 sensors,a hidden layer of 70 neurons, and a number of outputneurons depending on the number of distinct classes (3for the size and 6 for the shape classification problem).

8.
Multiclass support vector machine (SVM): We used asingle SVM architecture with a linear kernel type withimperfect separation of classes and C=1 multiplier foroutliers. The model was trained using a k-fold crossvalidation technique and has one output coding theclass to which the object belongs (3 sizes � 6 shapes). Ak factor of 20 was found during pretests to deliver the bestresults.
Results are displayed in Figs. 6 and 7 for both transla-tion and exploration tasks. The highest recognition rateswere achieved when combining the classification results ofindividual raw postures across the entire time window

Fig. 6. Recognition rates for object shape for different techniques base

Fig. 7. Recognition rates for object size for different techniques based

(RAW–RAW–KNN). For the translation task, recognitionrates were above 98% for both shape and size. For theexploration task, shape was recognized with 91% accu-racy, size with 95.1%, while both shape and size wererecognized with 90.1%. The Friedman test showed asignificant difference between the recognition ratesreported by the different techniques for both translation(w2ð7Þ ¼ 5635:862; po0:001) and exploration tasks(w2ð7Þ ¼ 5956:323; po0:001).

4.4. Recognizer design and findings from motor control

theory

The large corpus of hand posture data that was acquiredallowed us to verify existing observations from psychologyon the independence of finger movements and preferredpatterns in grasping objects as well as to test whether theseobservations could help to increase the performance of ourposture recognizers. We briefly summarize such findingsunder this section, report results on our own data, and

d on the nearest-neighbor approach. Error bars represent 95% CI.

on the nearest-neighbor approach. Error bars represent 95% CI.


devise new variants for the RAW–RAW–KNN recognizer asinformed by these observations.

Although each finger taken alone can perform a wide rangeof motions, hand muscles act over many joints which makesome fingers hard or impossible to control independently forsome specific motions. This is referred to as co-activation(Schieber and Santello, 2004). The degree of independence offingers in motor tasks has been quantified by Hger-Ross andSchieber (2000) which reported the thumb and index fingers asthe most independent while the ring and middle fingers theleast independent. Correlations occur both for finger flexureand abduction between fingers, as noted in the practical taskof typing characters (Fish and Soechting, 1992).

We start by reporting results on correlation analysis forfinger movements in order to detect shared variance betweensensor output values. Fig. 8(a and b) illustrates the color-coded Pearson correlation coefficients computed between all14 sensors on the entire data (N¼43,859 values for thetranslation task and N¼199,569 for the exploration task).About 96% of all coefficients (87 out of (13� 14)/2¼91)were significant at p¼0.01 (2-tailed). The sensor type andlocation are showed in Fig. 8(d) for convenience. Averagecorrelations were 0.20 (SD¼0.15) for the translation taskand 0.14 (SD¼0.16) for the exploration task. Fig. 8(c)illustrates the distribution of these coefficients, with majoritybeing less than 0.20 but also showing that 12% of the sensorpairings are highly correlated (between 0.40 and 0.80). Thelargest correlations were found between sensors 7 and 10(r¼0.73 for translation, r¼0.66 for exploration), 10 and 13(r¼0.61 and r¼0.75), 11 and 14 (r¼0.61 and r¼0.55), 8 and11 (r¼0.43 and r¼0.46). These results confirm observationsfrom motor control such as the ring and middle fingers beingthe least independent (Hger-Ross and Schieber, 2000) (sen-sors 10 and 11 measure the ring finger and sensors 7 and 8the middle finger).

These findings can be used in order to inform a betterdesign of the distance reporting dissimilarity between handpostures that would take into account shared variancebetween different sensors. The simplest way to achieve thisis to adopt a weighting scheme which makes the Euclideandistance become

dðp; qÞ ¼ Jp�qJ¼ ðX14i ¼ 1

wi � ðpi�qiÞ2Þ1=2

ð7Þ

where wi 2 ½0: :1� represents the normalized weight control-ling the contribution of sensor i into the final result.

For each sensor we computed the average correlation withall the other 13 sensors (see Fig. 8(a and b)) and defined theweight for sensor i as the complement of the averagedcorrelation value with respect to 1.0. The resulted weightsled to two new variants of the RAW-RAW-KNN classifier, asfollows:

9.
Translation-weighted RAW-RAW-KNN: The RAW–RAW–KNN technique is used together with the Euclidean
distance employing weights derived by averaging trans-lation correlation results (see Fig. 8a)

w¼ f0:84; 0:85; 0:81; 0:80; 0:83; 0:78; 0:72;0:78; 0:85; 0:73; 0:78; 0:79; 0:83; 0:85g

10.
Exploration-weighted RAW-RAW-KNN: The RAW–RAW–KNN technique is used together with the Eucli-dean distance employing weights derived by averagingexploration correlation results (see Fig. 8b)
w¼ f0:93; 0:93; 0:93; 0:82; 0:90; 0:88; 0:82; 0:88;0:84; 0:78; 0:84; 0:81; 0:80; 0:84g

We continue our investigation and look further at studiesthat report on frequently used grasping patterns. For
example, it was observed that people tend to use a three-finger grasp when reaching for most objects (the ‘‘tripod’’grasp) while sometimes a pinch grip is employed for smallobjects (Jones and Lederman, 2006). Also, when the size ormass increase, four fingers are used to pick up the object(Cesari and Newell, 2000). Other researchers noted that thethumb position does not depend on the size of the objectbeing grasped which seems to influence mostly the positionof the other two digits (index and middle fingers) (Gentilucciet al., 2003). The relative contribution of each individualfinger force during grasping was studied by Kinoshita et al.(1995) which reported 42%, 27%, 18%, and 13% percen-tages for the index, middle, ring, and little fingers for five-finger grips, respectively.Informed by these observations on grasping patterns
(Cesari and Newell, 2000; Gentilucci et al., 2003; Jones andLederman, 2006; Kinoshita et al., 1995), we consideredthree new variants for the RAW–RAW–KNN classifier, asfollows:

11.
3-finger-weighted RAW-RAW-KNN: The Euclideandistance is weighted with 0/1 values in order to focussolely on three-finger grasps
w ¼ f1:0; 1:0; 1:0; 1:0; 1:0; 1:0; 1:0; 1:0; 0:0; 0:0;

0:0; 0:0; 0:0; 0:0g

12.
4-finger-weighted RAW-RAW-KNN: The Euclideandistance is weighted with 0/1 values in order to focussolely on four-finger grasps
w ¼ f1:0; 1:0; 1:0; 1:0; 1:0; 1:0; 1:0; 1:0; 1:0; 1:0;

1:0; 0:0; 0:0; 0:0g

13.
Force-weighted RAW-RAW-KNN: The Euclidean dis-tance is weighted with force percentages as informedby the study of (Kinoshita et al., 1995)
w¼ f0:0; 0:0; 0:0; 0:42; 0:42; 0:0; 0:27; 0:27;

0:0; 0:18; 0:18; 0:0; 0:13; 0:13g

Fig. 8. Correlation analysis between the values reported by sensors installed at various locations: (a) Pearson correlation coefficients computed on the

translation task data (N¼43,859, 96% coefficients were significant at p¼0.01). Darker colors show larger absolute values; (b) Pearson correlation

coefficients computed on the exploration task data (N¼199,569, 97% coefficients were significant at p¼0.01); (c) frequency histogram of individual

correlation values showing that 12% of sensor pairs are highly correlated between 0.40 and 0.80 and (d) actual location of the 14 sensors on the glove: 10

sensors measure finger flexion (shown in white) and four sensors measure abduction between fingers (shown in black). (For interpretation of the

references to color in this figure legend, the reader is referred to the web version of this article.)


The recognition rates for object shape and size for thenew classifiers are displayed in Figs. 9 and 10 respectively.
The performance of the unweighted RAW–RAW–KNNtechnique is shown for comparison convenience. TheFriedman test showed a significant difference between therecognition rates reported by the different techniques forboth translation (w2ð5Þ ¼ 351:851; po0:001) and explora-tion tasks (w2ð5Þ ¼ 1098:048; po0:001).
When analyzing size recognition accuracies, post-hocWilcoxon signed-rank tests showed nonsignificant differencesin the translation task between the unweighted and four-finger grasp (Z ¼ �0:991 n:s:) and between the unweightedand correlation-weighted technique (Z ¼ �0:964 n:s:), and

a significant but small Cohen effect difference betweenunweighted and three-finger grasp (Z ¼ �9:563, po0:001,r=0.20). For the exploration task, Wilcoxon tests revealed anonsignificant difference between unweighted and correlation-weighted (Z ¼ �0:739 n:s:) and a significant yet small effectdifference between unweighted and four-finger grasp (Z ¼�8:299, po0:001, r=0.17). When analyzing shape recogni-tion performance, tests showed nonsignificant differencesbetween unweighted and four-finger grasp (Z ¼ �0:790,po0:001) for the translation task and nonsignificant differ-ences between unweighted and correlation-weighted for bothtranslation and exploration tasks. These results show littleinfluence of the ring and little fingers for the classification

Fig. 9. Recognition rates for object shape using the RAW–RAW–KNN technique and weighted Euclidean distances. Error bars represent 95% CI.

Fig. 10. Recognition rates for object size using the RAW–RAW–KNN technique and weighted Euclidean distances. Error bars represent 95% CI.


result for the size problem while these fingers bring usefulinformation for recognizing shape. In the end, the unweightedRAW–RAW–KNN technique using all data seems to deliver thebest performance with low implementation overhead.

4.5. Improving real-time classification performance

The previous sections identified RAW–RAW–KNN as thetechnique delivering the best classification performance byemploying raw training and testing data. However, the sizeof these sets can be quite large in practice which may impactthe real-time performance of the system. Fig. 11 illustratesthe response time needed for each technique to produce aclassification result. Times were measured on a 2.40 GHzIntel CoreDuo Quad CPU for an average size of the trainingset of � 3000 postures for the translation task and 16,000postures for the exploration task. Both RAW–RAW classifica-tion techniques show considerably larger execution times dueto their use of each individual sample from the training andtesting sets. Even though classifications are being reported in0.63 s which can fulfill at the limit the requirements of a real-time working system, we are interested in the following inusing a smaller training set for the same classification task

(with the purpose of reducing memory storage of samplesand execution time for classification). Therefore, we explorefiltering the time line of postures associated to a given objectby eliminating postures that are too similar with respect tosome threshold E. Procedure FILTER-TRAINING-SET(: :) liststhe pseudo code describing this process.

Algorithm 1. FILTER-TRAINING-SET (trainingSet).

1: r
eference’trainingSet0 2: f ilteredSet’freferenceg
3: f
or i’1 to sizeof(trainingSet) do 4: distance’Euclidean-Distanceðreference; trainingSetiÞ
5:
if distance4E then 6: reference’trainingSeti
7:
filteredSet’filteredSet [ freferenceg
8:
end if
9: e
nd for
10: r
eturn filteredSet
The variation analysis for the first task from a previoussubsection shows a mean value of 0.10 which representsthe average Euclidean distance squared. This suggests a

Fig. 11. Classification times for different techniques measured on a 2.40 GHz Intel CoreDuo Quad CPU.

Fig. 12. The effect of filtering on recognition rate. The 95% CI error bars

too small to display.

Fig. 13. The effect of filtering on the size of the training set. The 95% CI

error bars too small to display.


filtering value of E¼ffiffiffiffiffiffiffiffiffi0:10p

¼ 0:32 to use between consecu-tive postures. However, in order to test the effect offiltering, we experimented with various thresholds runningfrom 0.1 to 0.5 in increments of 0.1. We only provideresults for the RAW–RAW–KNN approach that showed thebest performance in the previous test. Figs. 12 and 13illustrate the effect of filtering on both recognition rate andtraining set size. As expected, recognition rates becomelower as the size of the training set decreases. The Fried-man ANOVA test showed significant differences for sizeand shape recognition rates for both tasks (at po0:001).However, recognition rates are still large enough forpractical purposes for low E filters. For the translationtask, size recognition rate is above 97% and shape rateabove 95% for E ¼ 0:1 and 0.2 (which reduces considerablythe number of training postures from 3000 to 268 and 131,respectively). Also, size rate is above 93% and shape rateabove 90% for the exploration task using E ¼ 0:1 (corre-sponding to a reduction in training set size from 16,000 to4000 hand postures). Filtering beyond E40:2 accentuatesthe degradation of the recognition rate. When compared tothe unfiltered training set, the E ¼ 0:1 option seems a goodcompromise.

The classification time for the translation task droppedfrom 118 ms for the unfiltered set to 11 ms for the E ¼ 0:1filtered set (see Fig. 14). For the exploration task, classi-fication time reduced from 629 ms to 160 ms. Results, evenfor the unfiltered training set, show applicability of theclassification technique for real-time scenarios.

4.6. User-independent recognition results

The above recognition results were reported for user-dependent training for which both testing and trainingdata come exclusively from the same user. However,interactive systems would greatly benefit of user-independent scenarios which eliminate the need of trainingbefore using the system. We therefore performed anotherrecognition test in order to see how the user-independent

training scenario would work for this technique. We usedthe RAW–RAW–KNN enhancement of the nearest-neighboralgorithm running on training sets filtered with E¼ 0:1. Inthis scenario, hand postures from one participant wereconsidered for testing while the rest of the participantswere used for training. We repeated the recognition test foreach participant data acting as testing. However, size

Fig. 14. The effect of filtering on classification times measured on a

2.40 GHz Intel CoreDuo Quad CPU.


recognition rate for the translation task was only 57.0%and shape rate was 26.0%. Surprisingly, rates were similarfor the exploration tasks (we expected them to be lower asin the user-dependent scenario from the previous section).A total number of 3400 and 46,000 postures, respectively,were used for the training sets of the two tasks. At thispoint of the analysis, we can conclude that user-dependenttraining is needed for robust results although a task-oriented classification may provide better results asinformed by the observations of Napier (1956) thatconsidered task type having more influence on posturerather than object characteristics alone.

4.7. Shared postures

It is interesting to understand how many hand posturesare being shared by two different objects during eithertranslation or exploration. By analyzing the results of ourrecognition tests involving different nearest-neighbor tech-niques, it is clear that a certain amount of shared posturesdoes exist. Also, looking at the low rates of the first threeNN approaches (see Figs. 6 and 7), we can even hypothe-size that there must be a relative large percent of suchshared postures.

In order to quantify the amount of shared postures, wemust introduce an appropriate measure. Let A and B be twosets of postures, A ¼ fa1; a2; . . . ; a9A9g and B ¼ fb1; b2; . . . ;b9B9g, associated to objects A and B. The amount of sharedpostures would be the cardinal of the intersection A \ B.However, when two postures are being compared, theEuclidean distance needs to be used which does not providea yes/no equality answer but a positive value correspondingto the dissimilarity of the two postures. Also, we areinterested in this problem from the perspective of therecognition rate. Therefore, in order to count how manypostures from B could also be part of A (or are being sharedby A), we compute the ordered list of distances from allpostures ai 2 A to some reference posture (e.g. the average a

of A). We then compute the distance from a for everyposture bj from B and count those for which the distance isless than a threshold. Equivalently put, we count how many

postures from B are as close from the center of A as are thepostures from A. The threshold could be chosen asmaxi¼1;9A9fJai�aJg. However, in order not be biased byoutliers, we remove 5% of the largest distances (and denotethe threshold D0:95). The definition of postures from B thatare shared by A which we denote gðA9BÞ is therefore

gðA9BÞ ¼ 9fbj 2 B9Jbj�aJoD0:95; j ¼ 1; 9B9g9

We can similarly define gðB9AÞ and use the two measures forthe final definition of shared postures between A and B

(normalized in ½0: :1�) as follows:

gðA;BÞ ¼gðA9BÞþgðB9AÞ

9A9þ9B9ð8Þ

Using this measure, we computed the shared posturesbetween every pair of objects from our set. We found anaverage of 22.3% shared postures (SD=24.4%) for thetranslation task while for the exploration task the percen-tage was 65.1% (SD=22.4%). The Wilcoxon signed-ranktest showed a significant effect of task over the percentageof shared postures (z ¼ �48:555; po0:001) with a largeeffect (r=0.57). Fig. 15 illustrates visually the amountshared postures for every pair of objects in our set.

5. Discussion and design guidelines

Our findings show that hand posture can be used in orderto recognize the size and shape of grasped objects providedthat user training procedures run first. In order to assistdesigners and practitioners in implementing this technique intheir applications as well as to understand the overhead ofthe user training procedure, we summarize below thetraining stage and the parameters of the RAW–RAW–KNNclassification technique in the form of design guidelines.Results showed that user training is mandatory for

obtaining high recognition rates. If the requirement ofthe application is to simply recognize object size and shapefrom a stable grasp, then approximately 5 s of continuousrecording at 60 Hz proved sufficient for our translationtask. If object properties are to be inferred during objectexploration then a manipulation time of about 20 s perobject consisting in repetitive rotations with the fingers issufficient. We arrived at the 20 s threshold by looking atthe average manipulation time for the exploration taskwhich was 19.2 s (SD¼6.1). Fig. 16 illustrates the variousmanipulation times. We also found that large objects tookmore time to manipulate (21.6 s, SD¼7.8 s) than small(18.1, SD¼4.4) or medium sized ones (18.0, SD¼5.1) asshowed by an ANOVA test (F ð2; 213Þ ¼ 8:305; po0:001).However, the actual differences are small and post-hoctests confirm this conclusion. For example, Bonferronipost-hoc tests showed no significant difference between thetime needed to manipulate small and medium objects.Also, no significant difference was observed betweenmanipulation times of different object types, F ð5; 210Þ ¼1:758 n:s: In the end, as a rough guideline, � 20 s ofcontinuous object exploration at a measurement rate of

shar

ed p

ostu

res

translatione xploration

100%

22.3%

65.1%75%

50%

25%

0%

translation

exploration

100%

0%

100%

0%

Fig. 15. Shared postures between the objects used in the experiment for translation (top) and exploration (bottom) tasks. The percentage of shared

postures between two objects is computed as per Eq. (8). A darker color indicates a higher percentage. (For interpretation of the references to color in this

figure legend, the reader is referred to the web version of this article.)

Fig. 16. Object manipulation time needed for acquiring the training set of

hand postures: 5 s for translation vs. 20 s for exploration tasks considering

posture data delivered at 60 Hz.


60 Hz will allow building a training set large enough forrobust recognition results.

In what concerns the recognition technique, we foundthat the RAW–RAW–KNN approach worked best. Thismeans using raw individual postures for both training andtesting but combining the classification results over awindow of size w¼30 (half a second of recording). Wealso found that a considerable reduction in classificationtime (as well as in occupied memory for storing thetraining samples) can be safely achieved by filtering thedata stream of postures with the Euclidean distance and athreshold of E¼ 0:1.

5.1. Public hand posture data set and source code

A large amount of data were acquired for this work. Inorder for other researchers to be able to replicate and advance


our results, we decide to make the data set public together withthe source code used for the recognition analysis. We makeavailable both hand posture data files (sensor measurementsand recording time lines) and source code on the correspond-ing author’s web page.

6. Conclusion and future work

The paper explored the use of hand posture for recog-nizing the size and shape of grasped objects during bothtranslation and exploration tasks. The results of anexperiment designed in relation with observations fromthe motor control theory show that size and shape can berecognized with rates of 98% for firm grasping (transla-tion) and 95% and 91% during object exploration foruser-dependent training. Recognition tests showed thatlarge training sets are needed for accurate results. How-ever, a practical procedure was described for delivering lowclassification times with less resources (filtering) as well asfor the simple training of the system (20 s of continuousfinger aided rotations). Additional definitions and mea-surements for hand posture variation and shared postureswere introduced and discussed.

We believe that the results presented in this paper bringmore light to the opportunity of using the hand as a devicefor automatic detection of object properties in terms of sizeand shape. Future work will consider designing newrecognition experiments in accordance with other taxo-nomies from motor control theory that also take intoaccount the intended use of an object (Napier, 1956). Webelieve that restricting classification in the context of agiven task might improve recognition rate for the user-independent case as the range of possible finger movementswill be constrained by the actual task. Future work onuser-independent recognition could also consider employ-ing additional sensors such as accelerometers. Also, itwould be interesting to investigate whether object proper-ties can be inferred even much sooner than the momentwhen the hand actually touches the object or whether otherobject properties can be also detected (e.g., texture byrecognizing the exploratory procedures described byLederman and Klatzky, 1987). The experiment conductedin this work made use of a data glove as the most reliabletechnology today for retrieving fast and accurate handpose and finger flexure data. As sensing technologyadvances, noninvasive acquisition devices such as Kinect4

could be explored.In the end, we believe that the results presented in this

paper can benefit practitioners from pattern recognitionand human–computer interaction by providing moreinsights on recognition algorithms and design guidelinesfor using the hand as a measurement device but they mayalso prove useful to researchers from psychology andmotor control interested in new tools for hand postureanalysis. We also hope that the large corpus of hand

4http://www.microsoft.com/en-us/kinectforwindows/

posture data we provided will help advance the state-of-the-art techniques for inferring object properties fromhand measurements, leading the HCI developments infree-hand interfaces towards the future world of ubiqui-tous computing.

References

Appert, C., Zhai, S., 2009. Using strokes as command shortcuts: cognitive

benefits and toolkit support. In: CHI ’09. pp. 2289–2298.

Baudel, T., Beaudouin-Lafon, M., 1993. Charade: remote control of

objects using free-hand gestures. Communications of the ACM 36

(July), 28–35.

Buchholz, B., Armstrong, T., 1992. A kinematic model of the human

hand to evaluate its prehensile capabilities. Journal of Biomechanics

25 (2), 149–162.

Buchholz, B., Armstrong, T., Goldstein, S., 1992. Anthropometric data

for describing the kinematics of the human hand. Ergonomics 35 (3),

261–273.

Cesari, P., Newell, K., 2000. Body-scaled transitions in human grip

configurations. Journal of Experimental Psychology: Human Percep-

tion and Performance 26 (5), 1657–1668.

Chen, F.-S., Fu, C.-M., Huang, C.-L., 2003. Hand gesture recognition

using a real-time tracking method and hidden Markov models. Image

and Vision Computing 21 (8), 745–758.

Cutkosky, M.R., Howe, R.D., 1990. Human Grasp Choice and Robotic

Grasp Analysis. Springer-Verlag, New York, Inc., New York, NY,

USA, pp. 5–31.

Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X., 2007.

Vision-based hand pose estimation: a review. Computer Vision and

Image Understanding 108 (October), 52–73.

Farella, E., Cafini, O., Benini, L., Ricco, B., 2008. A smart wireless glove

for gesture interaction. In: ACM SIGGRAPH 2008 Posters. SIG-

GRAPH ’08. pp. 44:1–44:1.

Fish, J., Soechting, J.F., 1992. Synergistic finger movements in a skilled motor

task. Experimental Brain Research 91, 327–334. http://dx.doi.org/10.1007/

BF00231666.

Gentilucci, M., Caselli, L., Secchi, C., 2003. Finger control in the tripod grasp.

Experimental Brain Research 149, 351–360. http://dx.doi.org/10.1007/

s00221-002-1359-3.

Gentilucci, M., Castiello, U., Corradini, M., Scarpa, M., Umilta, C.,

Rizzolatti, G., 1991. Influence of different types of grasping on the

transport component of prehension movements. Neuropsychologia 29

(5), 361–378.

Hger-Ross, C., Schieber, M.H., 2000. Quantifying the independence of

human finger movements: comparisons of digits, hands, and move-

ment frequencies. Journal of Neuroscience 20, 8542–8550.

Ilie-Zudor, E., Kemeny, Z., van Blommestein, F., Monostori, L.,

van der Meulen, A., 2011. Survey paper: a survey of applications

and requirements of unique identification systems and rfid techniques.

Computers in Industry 62 (April (3)), 227–252.

Jakobson, L.S., Goodale, M.A., 1991. Factors affecting higher-order

movement planning: a kinematic analysis of human prehension.

Experimental Brain Research 86, 199–208. http://dx.doi.org/10.1007/

BF00231054.

Jenmalm, P., Goodwin, A.W., Johansson, R.S., 1998. Control of grasp

stability when humans lift objects with different surface curvatures.

The Journal of Neurophysiology 79 (April (4)), 1643–1652.

Jones, L.A., Lederman, S.J., 2006. Human Hand Function. Oxford

University Press.

Kelly, D., McDonald, J., Markham, C., 2010. A person independent

system for recognition of hand postures used in sign language. Pattern

Recognition Letters 31 (August (11)), 1359–1368.

Kim, J.G., Kim, B.G., Lee, S., 2007. Ubiquitous hands: context-aware

wearable gloves with a rf interaction model. In: Proceedings of the

2007 Conference on Human Interface: Part II. pp. 546–554.

dx.doi.org/10.1007/BF00231666

dx.doi.org/10.1007/BF00231666

dx.doi.org/10.1007/BF00231666

dx.doi.org/10.1007/BF00231666

dx.doi.org/10.1007/s00221-002-1359-3

dx.doi.org/10.1007/s00221-002-1359-3

dx.doi.org/10.1007/s00221-002-1359-3

dx.doi.org/10.1007/s00221-002-1359-3

dx.doi.org/10.1007/BF00231054

dx.doi.org/10.1007/BF00231054

dx.doi.org/10.1007/BF00231054

dx.doi.org/10.1007/BF00231054


Kinoshita, H., Kawai, S., Ikuta, K., 1995. Contributions and co-

ordination of individual fingers in multiple finger prehension. Ergo-

nomics 38 (6), 1212–1230.

Lederman, S.J., Klatzky, R.L., 1987. Hand movements: a window into

haptic object recognition. Cognitive Psychology 19 (3), 342–368.

Li, Y., 2010. Protractor: a fast and accurate gesture recognizer. In: CHI

’10. pp. 2169–2172.

Makenzie, C.L., Iberall, T., 1994. The Grasping Hand. North-Holland,

Elsevier Science B.V., Amsterdam, The Netherlands.

Napier, J., 1956. The Prehensile Movements of the Human Hand,

vol. 38B.

Paulson, B., Cummings, D., Hammond, T., 2011. Object interaction detection

using hand posture cues in an office setting. International Journal of

Human–Computer Studies 69 (January–February (1–2)), 19–29.

Pizzolato, E.B., dos Santos Anjo, M., Pedroso, G.C., 2010. Automatic

recognition of finger spelling for libras based on a two-layer archi-

tecture. In: Proceedings of the 2010 ACM Symposium on Applied

Computing. SAC ’10. pp. 969–973.

Rashid, O., Al-Hamadi, A., Michaelis, B., 2010. Utilizing invariant

descriptors for finger spelling american sign language using svm. In:

Proceedings of the Sixth International Conference on Advances in

Visual Computing, vol. Part I. ISVC’10. pp. 253–263.

Santello, M., Flanders, M., Soechting, J.F., 2002. Patterns of hand

motion during grasping and the influence of sensory guidance. The

Journal of Neuroscience 22 (February (4)), 1426–1435.

Santello, M., Soechting, J.F., 1998. Gradual molding of the hand to

object contours. Journal of Neurophysiology 79 (3), 1307–1320.

Schieber, M.H., Santello, M., 2004. Hand function: peripheral and

central constraints on performance. Journal of Applied Physiology

96, 2293–2300.

Siegemund, F., Florkemeier, C., 2003. Interaction in pervasive computing

settings using bluetooth-enabled active tags and passive rfid technol-

ogy together with mobile phones. In: Proceedings of the First IEEE

International Conference on Pervasive Computing and Communica-

tions. PERCOM ’03. pp. 378–387.

Sturman, D.J., Zeltzer, D., 1994. A survey of glove-based input. IEEE

Computer Graphics and Applications 14 (January), 30–39.

Tanenbaum, K., Tanenbaum, J., Antle, A.N., Bizzocchi, J., Seif el Nasr, M.,

Hatala, M., 2011. Experiencing the reading glove. In: Proceedings of the

Fifth International Conference on Tangible, Embedded, and Embodied

Interaction. TEI ’11. ACM, New York, NY, USA, pp. 137–144.

Tsai, C.-Y., Lee, Y.-H., 2011. The parameters effect on performance in

ann for hand gesture recognition system. Expert Systems with

Applications 38 (7), 7980–7983.

Wachs, J.P., Kolsch, M., Stern, H., Edan, Y., 2011. Vision-based hand-

gesture applications. Communications of ACM 54 (February), 60–71.

Wang, R., Paris, S., Popovic, J., 2011. 6d hands: markerless hand-tracking

for computer aided design. In: Proceedings of the 24th Annual ACM

Symposium on User Interface Software and Technology. UIST ’11.

ACM, New York, NY, USA, pp. 549–558.

Wang, R.Y., Popovic, J., 2009. Real-time hand-tracking with a color

glove. ACM Transactions on Graphics 28 (3).

Want, R., 2006. An introduction to rfid technology. IEEE Pervasive

Computing 5 (January (1)), 25–33.

Wing, A.M., Haggard, P., Flanagan, J.R., 1996. Hand and Brain: the

Neurophysiology and Psychology of Hand Movements. Academic Press.

Ziegler, J., Urbas, L., 2011. Advanced interaction metaphors for rfid-

tagged physical artefacts. In: IEEE International Conference on

RFID-Technologies and Applications (RFID-TA). IEEE, pp. 73–80.

automatic recognition of object size and shape via user-dependent measurements of the grasping hand

Documents