a comparative analysis of 3d user interaction: how to move ......2.3 comparative studies on 3d user...

10
A Comparative Analysis of 3D User Interaction: How to Move Virtual Objects in Mixed Reality Hyo Jeong Kang * University of Florida Jung-hye Shin University of Wisconsin-Madison Kevin Ponto University of Wisconsin-Madison Figure 1: Example scenes of three interaction techniques ABSTRACT Using one’s hands can be a natural and intuitive method for inter- acting with 3D objects in a mixed reality environment. This study explores three hand-interaction techniques, including the gaze and pinch, touch and grab, and worlds-in-miniature interaction for se- lecting and moving virtual furniture in the 3D scene. Overall, a comparative analysis reveals that the worlds-in-miniature provided the best usability and task performance than other studied techniques. We also conducted in-depth interviews and analyzed participants’ hand gestures in order to identify desired attributes for 3D hand interaction design. Findings from interviews suggest that, when it comes to enjoyment and discoverability, users prefer directly manip- ulating the virtual furniture to interacting with objects remotely or using in-direct interactions such as gaze. Another insight this study provides is the critical roles of the virtual object’s visual appearance in designing natural hand interaction. Gesture analysis reveals that shapes of furniture, as well as its perceived features such as weight, largely determined the participant’s instinctive form of hand inter- action (i.e., lift, grab, push). Based on these findings, we present design suggestions that can aid 3D interaction designers to develop a natural and intuitive hand interaction for mixed reality. Index Terms: Human-centered computing—Human computer interaction (HCI)—Interaction paradigms—Virtual reality; Human- centered computing—Human computer interaction (HCI)—HCI design and evaluation methods—User studies 1 I NTRODUCTION The use of hand input has received increasing attention in the virtual reality (VR) community. There is a growing body of research that has begun to cast light on hand-interaction; yet, many prior VR research predominantly focused on 3D interaction using controllers. Using bare-hands offers a distinctive user experience as opposed to using controllers. Compared to the controller where a single button is often mapped to a specific action, human hands are capable of various gestures by nature. For example, to move a virtual object, the * e-mail:[email protected]fl.edu e-mail: [email protected] e-mail: [email protected] standard design practice for VR controllers is to hold a grip button. The dynamic nature of human hands, however, makes it challenging to find and pin down the most natural hand input among possible gestures—lift, grab, push, point, pinch, and more. The fundamental difference lies in our familiarity. Unlike a controller that typically requires extra efforts of learning, we are more likely to adopt our habitual actions of using hands when interacting with the virtual object. Consequently, identifying the most natural hand interaction is a complicated process since the idea of “natural” hand interaction can vary depends on the context, and it requires an understanding of how we interact with objects in everyday life. Prior 3DUI research has developed the novel metaphors for hand gestures such as finger-pointing that works as a mouse cursor [10,15, 27, 39], a multimodal interaction that integrates finger-pointing and gaze [21,32,42], and multiple gestures such as grabbing and pinching for complex tasks [16, 20]. While these streams of research provide valuable insights in designing innovative hand interactions, relatively little is known on tradeoff among existing technologies. Furthermore, the majority of prior works are focused on the usability aspects of hand input. For instance, 3DUI research abounds with discussions on efficient and accurate hand inputs that address technical concerns such as object occlusions and small field of view [13, 17, 21, 32]. The question of which hand gesture would render a natural interaction, on the other hand, remains less explored. To this end, this study aims to provide a comprehensive review of the tradeoff among the well-known 3D hand interaction designs. We compare three 3D interactions; 1) multimodal interaction using gaze and pinch gesture, 2) direct touch and grab interaction, and 3) worlds- in-miniature. In comparing these three interactions, we first observed participants’ behavioral responses without giving instructions. This approach helps us to examine the discoverability of each interaction design, thereby providing insights into natural hand interaction. We particularly focused on an interaction that involves selecting and arranging furniture items. Architecture and interior design are one of the most promising application areas in MR; thus findings from this study can provide immediate real-world implications. In comparing these three interactions, we address the following research questions. Discoverability: which interaction design requires a greater time investment to understand the mechanism without instruc- tions? In addition to three studied interactions, are there other gestures participants try to use? Accuracy and Efficiency: which interaction mechanism offers 275 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) 2642-5254/20/$31.00 ©2020 IEEE DOI 10.1109/VR46266.2020.00-57

Upload: others

Post on 05-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

A Comparative Analysis of 3D User Interaction: How to Move VirtualObjects in Mixed Reality

Hyo Jeong Kang*

University of FloridaJung-hye Shin†

University of Wisconsin-MadisonKevin Ponto‡

University of Wisconsin-Madison

Figure 1: Example scenes of three interaction techniques

ABSTRACT

Using one’s hands can be a natural and intuitive method for inter-acting with 3D objects in a mixed reality environment. This studyexplores three hand-interaction techniques, including the gaze andpinch, touch and grab, and worlds-in-miniature interaction for se-lecting and moving virtual furniture in the 3D scene. Overall, acomparative analysis reveals that the worlds-in-miniature providedthe best usability and task performance than other studied techniques.We also conducted in-depth interviews and analyzed participants’hand gestures in order to identify desired attributes for 3D handinteraction design. Findings from interviews suggest that, when itcomes to enjoyment and discoverability, users prefer directly manip-ulating the virtual furniture to interacting with objects remotely orusing in-direct interactions such as gaze. Another insight this studyprovides is the critical roles of the virtual object’s visual appearancein designing natural hand interaction. Gesture analysis reveals thatshapes of furniture, as well as its perceived features such as weight,largely determined the participant’s instinctive form of hand inter-action (i.e., lift, grab, push). Based on these findings, we presentdesign suggestions that can aid 3D interaction designers to developa natural and intuitive hand interaction for mixed reality.

Index Terms: Human-centered computing—Human computerinteraction (HCI)—Interaction paradigms—Virtual reality; Human-centered computing—Human computer interaction (HCI)—HCIdesign and evaluation methods—User studies

1 INTRODUCTION

The use of hand input has received increasing attention in the virtualreality (VR) community. There is a growing body of research thathas begun to cast light on hand-interaction; yet, many prior VRresearch predominantly focused on 3D interaction using controllers.Using bare-hands offers a distinctive user experience as opposed tousing controllers. Compared to the controller where a single buttonis often mapped to a specific action, human hands are capable ofvarious gestures by nature. For example, to move a virtual object, the

*e-mail:[email protected]†e-mail: [email protected]‡e-mail: [email protected]

standard design practice for VR controllers is to hold a grip button.The dynamic nature of human hands, however, makes it challengingto find and pin down the most natural hand input among possiblegestures—lift, grab, push, point, pinch, and more. The fundamentaldifference lies in our familiarity. Unlike a controller that typicallyrequires extra efforts of learning, we are more likely to adopt ourhabitual actions of using hands when interacting with the virtualobject. Consequently, identifying the most natural hand interactionis a complicated process since the idea of “natural” hand interactioncan vary depends on the context, and it requires an understanding ofhow we interact with objects in everyday life.

Prior 3DUI research has developed the novel metaphors for handgestures such as finger-pointing that works as a mouse cursor [10,15,27, 39], a multimodal interaction that integrates finger-pointing andgaze [21,32,42], and multiple gestures such as grabbing and pinchingfor complex tasks [16, 20]. While these streams of research providevaluable insights in designing innovative hand interactions, relativelylittle is known on tradeoff among existing technologies. Furthermore,the majority of prior works are focused on the usability aspects ofhand input. For instance, 3DUI research abounds with discussionson efficient and accurate hand inputs that address technical concernssuch as object occlusions and small field of view [13,17,21,32]. Thequestion of which hand gesture would render a natural interaction,on the other hand, remains less explored.

To this end, this study aims to provide a comprehensive review ofthe tradeoff among the well-known 3D hand interaction designs. Wecompare three 3D interactions; 1) multimodal interaction using gazeand pinch gesture, 2) direct touch and grab interaction, and 3) worlds-in-miniature. In comparing these three interactions, we first observedparticipants’ behavioral responses without giving instructions. Thisapproach helps us to examine the discoverability of each interactiondesign, thereby providing insights into natural hand interaction. Weparticularly focused on an interaction that involves selecting andarranging furniture items. Architecture and interior design are one ofthe most promising application areas in MR; thus findings from thisstudy can provide immediate real-world implications. In comparingthese three interactions, we address the following research questions.

• Discoverability: which interaction design requires a greatertime investment to understand the mechanism without instruc-tions? In addition to three studied interactions, are there othergestures participants try to use?

• Accuracy and Efficiency: which interaction mechanism offers

275

2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)

2642-5254/20/$31.00 ©2020 IEEEDOI 10.1109/VR46266.2020.00-57

Page 2: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

the user an increased efficiency and accuracy to complete thetask?

• Naturalism and usability: which interaction mechanism ismore intuitive and easier to use?

The main contributions of this paper are twofold. First, our userstudy can help clarify the advantages and disadvantages of thesethree interactions, providing practical implications for designers tocombine the best features of these existing methods. Second, itis particularly instructive to examine the discoverability of widely-known interaction designs. Discoverability is a critical concern foruser experience, especially for users who are not proficient in theuse of modern technology and have difficulty in learning the newinterface. In this study, we observed how novice MR users respondedto each interaction when there was no instruction. The results of thisgesture analysis can help designers to identify a gesture that wouldrender more natural interactions than existing techniques.

2 RELATED WORKS

2.1 Naturalism and Discoverability in 3D InteractionThe definition of the term naturalism varies widely depending on thefield of study. In 3DUI research, the term describes user interactionthat mimics the counterparts of the real-world interaction [5]. Dis-coverability, coined by Don Norman [25], is a measure of how easilyusers can find and understand the new interface with the minimumlevel of instruction. The previous human-computer interaction andVR research show that the major benefits of discoverability includeminimizing a learning curve [5] increasing user satisfaction [33] andenhancing users’ intention to use the system [8].

Discoverability and naturalism are closely related concepts; how-ever, the mere fact that interaction is easily discoverable would notalways mean that the interaction is natural. For example, using aswipe gesture is not how we navigate pages in the real world, butmost smartphone users can easily discover the swiping feature with-out instructions. Discoverability reflects a user’s prior knowledgeand mental model of how the system should work [25]. As such,examining the discoverability of existing interfaces can be a keyfirst step toward understanding users’ expectation and belief systemabout how hand-based mixed reality interaction should work. In thispaper, we investigate naturalism and discoverability of interactiondesign by observing users’ behavioral responses. An observationalstudy with existing systems can help designers fill the gap betweenusers’ mental models and existing interaction methods.

2.2 Three Existing TechniquesThis study compares three 3D interaction techniques; 1) multimodalinteraction using gaze and pinch gesture, 2) direct touch and grabinteraction, and 3) worlds-in-miniature. These three interactionswere chosen as the subject of the study based on their popularity aswell as the comparative advantage each interaction provides.

First, a gaze and pinch interaction is the most widely used in-teraction for mixed reality headset such as a Microsoft Hololens.This multimodal interaction integrates eye gaze or head pose witha gesture, enabling users to interact with the remote object. Usinga gaze input allows users to avoid false occlusion cues when handtracking is not available, and it also deals with the constraints of alimited field of view. With a gaze and gesture interaction, users canselect the target object with their head movement or eye gaze andmove the object by holding a pinch gesture.

Second, a direct touch and grab interaction is a default interactionmechanism for many VR interfaces. The classical design approachfor VR controllers (e.g., Oculus Touch, Vive) is using a grip buttonfor grabbing the object. When using hands, users can utilize theirnatural hands’ movements by directly touching an object with theirbare hands and physically moving the object with arms’ movement.

The relative merits and demerits of these two interaction tech-niques make it worthwhile to compare a gaze and pinch and a directtouch and grab techniques. While being considered as the mostnatural, the primary issue with a direct touch-grab interaction is thatusers can only interact with objects within their arms’ reach. A gazeand pinch interaction can solve such issues as it enables users tointeract with the remote object; however, a gaze-based interactionmay seem less natural or intuitive.

Worlds-in-miniature (WIM) interaction entails the advantages ofboth a gaze-pinch and a direct grab-touch interaction [37]. Whilestill allowing users to have a natural and direct interaction, the WIMprovides users miniature copies of the actual objects so that users caninteract with the remote object indirectly. There are other 3DUI tech-niques, such as go-go interaction [30] or ray-casting technique [3],that also support direct manipulation from a distance; yet, we ex-amined the worlds-in-miniature given its growing popularity in VRhome design application (e.g., TrueScale, Home VR).

2.3 Comparative Studies on 3D User Interaction

A 3D interaction technique refers to an interaction mechanism onhow users can execute different types of tasks in the 3D environ-ment [4]. In this study, we mainly address selection and manip-ulation tasks. Selection refers to the task of picking one or moreobjects from the environment and manipulation refers to the task ofchanging a virtual object’s orientation and position [5].

In evaluating 3D user interaction, accuracy and efficiency havebeen a fertile research topic in the previous HCI studies. As earlyas in 1999, Bowman and Hodges [2] compared six different 3Dmanipulation techniques to find a more accurate and efficient way tointeract with a remote object in a VR environment. In their compar-ative study, a conventional raycasting offered more precise controlthan using a virtual arm. More recently, Kytó et al. [21] examinedselection techniques for the head-worn AR device using a MicrosoftHololens. Their study findings indicated that using gaze was gener-ally faster than a hand-gesture or a hand-held device. In a similarvein, research reported gaze input being faster than hand-pointing ona large stereoscopic projected display [35]. Perceived usability is an-other critical dimension in evaluating 3D user interaction techniques.Bernardos, Gómez, and Casar [1] compared the head-directed selec-tion to the hand-based pointing on a projection screen. In their study,users found hand-based pointing far more intuitive and usable thanhead-directed selection. McMahan et al. [23] compared three inter-action techniques on the projected wall and found that hand-basedobject manipulation more natural than using controllers.

In summary, much of prior research showed that more naturalisticinteraction—using hands rather than gaze or ray-casting—providesbetter usability. With respect to accuracy and efficiency, however,several works indicated that perceived naturalism is not a necessarycomponent for efficient and precise task performance. While theseprior study findings provide insights into evaluating existing 3Duser interactions, the differences across experiment settings precludeany generalizable conclusion. Prior works on 3D user interactionhave predominantly focused on interaction with a projected walland head-mounted VR displays, and relatively little is known forwearable Mixed Reality displays context. Furthermore, most of theprevious comparative studies tend to focus on the specific interactionmechanism for a single task, for example, comparing head trackingto eye tracking for a selection task [21], different ray-casting mech-anisms for a pointing task [7, 9], different finger-based projectiontechniques for a manipulation task [22].

In this study, we aim to provide a more comprehensive understand-ing of 3D user interaction by examining three interaction techniquesin accomplishing series of tasks—selecting a menu, selecting anobject, and positioning the object. We further examined the dis-coverability of three interaction design by observing participants’hand gestures. Contribution of this study lies in providing a broader

276

Page 3: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

comparison of widely-known 3DUI techniques, thereby offeringinsights for designers to improve the current 3D user interactiondesigns.

3 DESIGN IMPLEMENTATION

3.1 Device Setup

Mixed Reality Setup: In order to create a mixed reality environment,We used an Oculus Rift and a ZED Mini camera. A ZED Mini isan attachable depth-camera which features dual 2K image sensors,providing mixed-reality experience for head-worn VR devices withreal-virtual occlusion. The purpose of using a ZED mini is to achievea wider field of view. ZED mini offers 110-degree fields of view,which makes it easier to interact with large furniture objects. A ZEDmini supports spatial mapping with depth occlusion which is usefulfor collision detection, as it allows users to place the virtual objectsin the real environments. A ZED mini camera was mounted on anOculus Rift headset. The tracking area was set up with Oculus twosensors, and the size of the tracking area was approximately 3 metersby 2.5 meters, the optimal play area for an Oculus.

Hand Tracking: We used a Leap Motion for hand tracking. LeapMotion uses image processing with infrared stereoscopic video [36].Unlike other gesture recognition devices such as Kinnect, a LeapMotion does not return a complete depth map but only a set ofrelevant hand points and some hand pose features; thus, we usedpreset hand pose features of Leap motion—grabbing, pinching, andpointing. Virtual hand was always rendered in the scene letting usersknow their hands are tracked well. Leap motion was attached to thefront of the Oculus Rift.

3.2 Gaze and Pinch Interaction

The gaze and pinch interaction uses head-movement as a proxyto eye gaze. We used the term "gaze" instead of head position,which is more accurate, because gaze is more generally acceptedword in describing a multimodal interaction [21]. Three tasks areimplemented in the following ways: 1) Menu: a menu was alwaysdisplayed in front of the users at the distance of 2 meters. Thegaze cursor moves along with the user’s head movement. When thecursor was pointing to the object, the object got highlighted withblue color. 2) Selection: selection are triggered explicitly usingpinch or "air-tab" gesture. As user select the virtual furniture, thefurniture item appeared in front of the user within the reachable area,a range between 0.6 to 0.8 meters. 3) Manipulation: users couldmove the furniture by holding a pinch gesture. The virtual objectmoved along with the hand’s movement as seen in Figure 2 (top).

3.3 Direct Touch and Grab Interaction

The grab and touch interaction has been a standard way to interactwith a virtual object for the head-mounted VR devices such as anOculus Rift and HTC Vive [18]. Three tasks are implemented in thefollowing ways: 1) Menu: the attached menu metaphor was used.A menu was attached to the user’s left hand and users could evokethe menu as they flip their left hand (seen in Figure 2, middle). Weused this attached menu metaphor for two reasons: first, this menuselection mechanism corresponds well to direct touch and grab inter-action because the menu should be directly touched to be invoked.Second, this attached menu minimizes unnecessary physical move-ment involving menu selection, such as walking toward the menu,thus enables better comparisons with other conditions. 2) Selection:user could select the menu by directly touch the button with theirindex finger. As users select the virtual furniture, the furniture itemappeared in front of the users, within the range between 0.6 and 0.8meter. This range reflects the average length of human arms. Whenit comes to object selection, users could select the object by directlytouching it with their bare hands then could grab it by making afist. 3) Manipulation: users could move the object with their arms’

Figure 2: Example scenes: Gaze and pinch (top), Direct touch andgrab (middle), and worlds-in-miniature (bottom).

movement. Users had to physically walked around the room in orderto place the virtual furniture.

3.4 Worlds-in-Miniature Interaction

The last interaction is a world-in-miniature (WIM) metaphor [37].Unlike the previous two interactions where users can directly selectand move the actual-scale object, WIM interaction provides usersthe hand-held miniature representation of the objects. In orderto create the miniature size room, users were first asked to scanthe surrounding environment. Using ZED mini’s depth-camera,the scanned room environment was brought in to the miniatureenvironment. After the generation of the miniature floor, threetasks are implemented in the following ways: 1) Menu: menu wasdisplayed in front of the users at the distance of 0.6 - 0.8 meter,which reflects the reachable length for average human. 2) Selection:As users select the menu by pressing the button with their indexfinger, the miniature virtual objects appeared. 3) Manipulation: Bymoving the miniature object, users can move the correspondingactual-size object as seen in Figure 2 (bottom). Users were ableto select and move the miniature object by directly manipulating itwith touch and grab interaction.

4 USER STUDY

4.1 Participants

21 participants were recruited through mailing lists and flyers in thecampus. Participants’ age ranged from 18 to 44 with a mean of 24.2(SD = 7.8) and the majority of them were university students. Wespecifically sought participants who had no prior experience withhead-worn mixed reality devices and hand tracking devices. None ofthe participants had previous experience with mixed reality headsetsor hand tracking device, but nine participants had prior experiencewith head-worn VR devices such as an Oculus Rift or HTC Vive.On average, participants spend a significant amount of time workingon a computer on a daily basis (5-8 hours a day: 47%, more than 8hours: 28%).

277

Page 4: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

Figure 3: Overview of experimental procedure

4.2 TasksWith each interaction condition, participants were asked to completethree tasks: 1) select the menu to choose a virtual object; 2) move theobject to the designated area, and 3) arrange multiple furniture items.Figure 3 shows an overview of the three tasks and the experimentprocedure. A virtual object always appeared within a reachable areafrom the user, a range between 0.6 to 0.8 meter, which is averagelength of human arm.

The primary purpose of task 1 and task 2 is to examine discov-erability each interaction technique. Thus, a minimum level of in-struction was provided to participants. For example, for gaze-pinchinteraction, participants were told to “use a hand-gesture to move theobject,” and no further information about what gesture users shoulduse was provided. For touch-grab interaction, participants weretold to "use hands to move objects. No further information abouthow participants can move the virtual furniture was provided. For aworlds-in-miniature interaction, participants had a same instructionwith touch-grab interaction–"use hands to move objects."

After completing task 1 and 2, detailed instructions of how tointeract with the virtual objects were provided before the task 3.The goal of task 3 is to measure the accuracy and efficiency ofeach interaction. After a brief demonstration and trials, participantswere asked to select two red stools, two black stools, and one table,then arrange the four chairs around the table so that each chair isperpendicular to each other. The total time it took to complete thetask was measured as an indicator of efficiency. The angle betweeneach chair was measured as an indicator of accuracy. After thecompletion of all three tasks, participants evaluated the interactiondesign. These three tasks are repeated for 3 interactions.

4.3 Experiment ProcedureIn order to examine participants’ preference and compare threetechniques, this study employed a within-subject study design. Eachinteraction condition was fully counter-balanced to minimize thelearning effect. Participants experienced all three interactions in arandomized order and were asked to pick one interaction designthey preferred the most. Participants evaluated the interaction byanswering questionnaires. Finally, they were asked to complete thefinal survey that includes questions on demographic informationand their average computer usages. After the completion of thesurvey, participants were invited for a follow-up interview. Theentire process approximately took 45 minutes.

4.4 Measures4.4.1 Evaluation of three interaction designsDiscoverability: Discoverability was examined by measuring thetime it took to complete the task 1 and task 2 [29]. In our pilotstudy with 7 graduate students, we asked them to notify us whenthey could not figure out the interaction and wanted to give up thetasks. The pilot study indicated that people feel a great level offrustration when they failed to interact with the virtual objects after1.5 minute. Based on this result, when participants could not figure

out the interaction for longer than 2 minutes, we marked the task as“failed” then asked participants to proceed to the next task.

Efficiency and Accuracy: The efficiency of interaction designwas examined by measuring the time it took to complete the task3. The accuracy of interaction design was examined using positiondata of five furniture items: four stools, and one table. Participantswere asked to arrange four stools so that they are perpendicular toeach other. The average deviation from 90-degree was calculated toexamine accuracy.

Usability: The perceived usability of each interaction design wasevaluated using the System Usability Scale (SUS) [6]. The systemusability scale includes 10 items with five responses that rangefrom strongly agree to strongly disagree. The example questionnaireincludes: “I found the system was easy to use,” and “I would imaginethat most people would learn to use this system very quickly.” Inorder to examine perceived task loads, the NASA Task Load Index(NASA-TLX) questionnaire was also included in the survey [14].

Naturalism and Enjoyment: To measure the perceived level ofnaturalism, we used five-point Likert scale questionnaires. Thequestionnaire includes three items: “interaction with a virtual objectwas similar to how I would interact with an object in a real-world,”“interaction with a virtual object felt familiar,” “interaction with avirtual object seems intuitive and natural.” The questionnaire forthe enjoyment was adopted from previous literature on extendedTechnology Acceptance Model [11]. The enjoyment questionnaireincludes two items: “It was enjoyable to use this system,” “It wasfun to use this system.” Cronbach’s alpha for both variables wasabove 0.7 (αnaturalism = 0.78; αenjoyment = 0.81).

Control variables: Responsiveness of the system was includedas a control variable. The questionnaire asked participants whetherthey experienced system lag when they experience each interactiondesign. Participants’ demographic information, as well as their pre-vious experience with VR, video game, and average daily computeruse, were also measured.

4.4.2 Gesture analysis and follow-up interview

Gestures: Studying gesture can help 3D UI designers to find suitableinterface features that correspond well to users’ expectation [31].Following Kendon [19]’s gesture analysis method, the session wasvideo-recorded and changes of gestures were encoded with a de-scription of the movement change. The purpose of this analysis isto identify gestures participants use when they first encounter thevirtual object, thus the video recordings of task 1 and task 2 sessionswere mainly used for the analysis. The hand movement is describedwith three main elements: a number of hands used, specific ges-ture used, and a target object. The examples of description include:pressing the button with an index finger, lifting the table using bothhands, pushing the chair with one hand, and gripping the top of thechair.

User experience: In order to better understand participants’ ex-perience with three interaction designs, a follow-up interview wasconducted. The interview questions mainly include five items: 1)

278

Page 5: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

Figure 4: Discoverability (top), Accuracy and Efficiency (bottom)

which interaction did you prefer the most? 2) why do you preferinteraction X over others? 3) what do you like or dislike aboutother interactions? 4) What’s the hardest part about the interactionyou experienced? 5) How would you like to provide a solution tomake an improvement? In addition to these five questions, we alsoasked questions based on our observation, for instance, what specificgesture participants used and why.

5 RESULTS

5.1 Evaluation of Three Interaction Designs5.1.1 DiscoverabilityIn order to examine how easy each interaction is to discover, the timeit took for participants to complete the first task (menu selection) andthe second task (object selection and manipulation) was compared.ANOVA analysis shows a significant main effect for both tasks(menu selection: F (2, 51) = 11.13, p<.001, ηp

2= .303; object

selection and manipulation: F (2, 50) = 9.67, p <.001, ηp2= .283).

Tukey’s HSD post hoc test was conducted for pair-wise comparisons.The result of the analysis is shown in Figure 4.

For the menu selection task, gaze-pinch interaction took thelongest amount of time to complete the menu selection (M gaze-pinch

= 54.88, SD gaze-pinch = 20.47; M touch-grab = 42.83, SD touch-grab =25.57; M WIM = 23.69, SD WIM = 12.27). With worlds-in-miniatureinteraction, it took significantly less time to select the menu thangaze-pinch interaction (p <.001) and touch-grab interaction (p <.05).For the object selection and manipulation task, gaze-pinch interac-tion took a significantly longer amount of time to complete the task(M gaze-pinch = 49.20, SD gaze-pinch = 38.33; M touch-grab = 21.57,SDtouch-grab = 14.50; M WIM = 15.92, SD WIM = 11.04). Six partici-pants failed to figure out gaze and pinch interaction within 2 minuteswith the gaze-pinch interaction. There was no significant differencebetween grab-touch interaction and worlds-in-miniature interaction(p = .76).

Figure 5: Users’ rating of NASA TLX (top), System usability scale,naturalism, and enjoyment of each interaction design (bottom)

5.1.2 Accuracy and EfficiencyThe object manipulation task involved arranging four stools aroundthe table so that each stool is perpendicular to each other. Accuracyof manipulation was calculated by creating vectors of stools acrossfrom each other and comparing the angle between these two vectorsto that of a right angle. Mean of absolute angle was used to representthe accuracy of each interaction design. ANOVA analysis shows asignificant main effect on accuracy for different interaction designs(F(2, 54) = 8.58, p < .001, ηp

2 = .241). The result of the analysis isshown in Figure 5.

The pair-wise comparison results show that mean manipulationerror was significantly lower for worlds-in-miniature interactiondesign than other two interaction techniques (M gaze-pinch = 20.36,SD gaze-pinch = 7.31; M touch-grab = 18.42, SD touch-grab = 6.85; MWIM = 12.01, SD WIM = 5.16). The efficiency of each interaction wasmeasured with the total time it took to finish the object selection andmanipulation task (task3). ANOVA analysis shows a significant maineffect on efficiency for different interaction designs (F(2, 54) = 29.23,p < .001, ηp

2 = .51). The Tukey’s HSD pair-wise comparison testshows that worlds-in-miniature was significantly faster to completethe task than the other two interaction techniques (M gaze-pinch =5.90, SD gaze-pinch = 2.08; M touch-grab = 3.98, SD touch-grab = 1.56;M WIM = 2.44, SD WIM = 0.51).

5.1.3 Usability, Naturalism, and EnjoymentUsability of the interface was measured with two indexes: the NASA-Task Load Index (NASA-TLX) and System Usability Scale (SUS).The ANOVA results for NASA-TLX is shown in Figure 6. Therewas a significant main effect for mental load (F(2, 53) = 10.04, p< .001, ηp

2 = .27), physical load (F(2, 54) = 4.84, p < .05, ηp2 =

.15 ), performance (F (2, 54) = 6.97, p < .01, ηp2 = .20), effort

279

Page 6: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

Figure 6: Gestures used by participants for different tasks: Interact-ing with menu (top), a large object (middle), and a miniature object(bottom)

(F(2, 54) = 38.29, p < .001, ηp2 = .58 ), and frustration (F (2,

54) = 23.09, p < .001; ηp2 = .46). The results from pair-wise

comparisons between three interaction designs indicated that thegaze-pinch interaction are significantly more demanding. The gazeand pinch interaction required a significantly higher level of effortthan touch-grab interaction (p <.05) and the worlds-in-miniatureinteraction (p <.001).

The significant main effects were reported for system usability(F(2, 54) = 20.13, p < .001; ηp

2 = .43) and perceived naturalism

(F(2, 51) = 13.35, p < .001; ηp2 = .34), yet no significant effect

was found for enjoyment (F(2, 54) = 2.41, p = .09; ηp2 = .08).

The pairwise Tukey HSD test shows that the worlds-in-miniatureinteraction provides significantly better usuability than other twointeractions (M gaze-pinch = 45.97, SD gaze-pinch = 18.39; M touch-grab

= 57.61, SD touch-grab = 19.22; M WIM = 81.31, SD WIM = 13.92). Interms of perceived level of naturalism, both touch-grab and worlds-in-miniature interaction were rated to be significantly more naturalthan gaze-pinch interaction (M gaze-pinch = 2.25, SD gaze-pinch = 0.92;M touch-grab = 3.77, SD touch-grab = 0.97; M WIM = 3.41, SD WIM =0.87) as shown in Figure 6.

5.2 Gesture Analysis and Follow-up Interview

5.2.1 Gesture Analysis

We analyzed video recordings of participants’ hand movements bysystematically categorizing and labeling gesture they used. Dueto the video recording error, videos of 17 participants were usedfor the analysis. Table 1 shows the summary of findings fromgesture analysis. When participants first encountered the gaze-pinchinterface, in which the menu was displayed far away from the user,the first action all participants tried was to press the menu buttonwith an index finger. When pressing gesture failed to evoke the menuselection, 14 participants walked or leaned toward the menu andtried to press the button again using an entire hand. The next actionparticipants tried was a grabbing gesture. In the follow-up interview,many participants noted that they would not be able to figure out thepinch gesture unless they were provided with the instruction.

When participants first encountered the large-scale furniture items,either with gaze-pinch interaction or touch-grab interaction condi-tion, grabbing was not the first gesture participants tried to use. Themajority of participants tried to move the virtual table by “lifting”it. In terms of the virtual stool, the first action many participantstried was “grabbing” the side of the stool using both hands. 88% ofparticipants used both hands when they were interacting with large

Table 1: Findings from the gesture analysis.* Indicates the percentageof participants who used the gesture

Frequency of the gesture participants used (%)*

Menu

selection

Press the menu button with an index finger (100%)

Press the menu button with the entire hand (82%)

Grabbing gesture with one hand (52%)

Pinch or air-tab gesture with one hand (5%)

Object selection

and manipulation

(Large size)

Lift the table with both hands (88%)

Grab the side of the chair with both hands (84%)

Lift the chair with both hands (52%)

Push the table using one hand (47%)

Grabbing gesture with one hand (47%)

Pinch or air-tab gesture with one hand (5%)

Object selection

and manipulation

(Miniature)

Grabbing gesture with one hand (82%)

Push the object using one hand (29%)

Grab the side of the object using both hands (29%)

scale objects such as a table. On the other hand, for the miniaturesize object, participants frequently used the grabbing gesture to pickup the virtual object. The majority of participants used only onehand when they were interacting with a miniature size object.

5.2.2 Follow-Up Interview: User ExperienceThe follow-up interviews reveal the strength and weakness of eachinteraction techniques in greater depth. The majority of participantspreferred worlds-in-miniature interaction over the other two for easeof use. Participants frequently described the worlds-in-miniatureinteraction as the “easiest” interaction. Although there was no sta-tistically significant differences reported on the level of enjoyment,about one third of participants indicated that they would enjoy di-rectly interacting with real-size object rather than manipulating aminiature object. Furthermore, more than half of interviewees saidthey preferred walking around the room to standing still when inter-acting with virtual objects.

When it comes to discoverability and ease of use, Gaze and pinchinteraction was participants’ least favorite design amongst three.Many noted that gaze and pinch interaction was difficult to figure outwithout instruction. Three interviewees mentioned that they did notnotice the gaze cursor nor understood the mechanism of gaze input.While majority of participants prefer other two interaction designs,two participants said that they would like to use gaze and pinchinteraction in the future because it allows them easier interactiononce they learn how to use it. These two participants indicatedthat they especially liked gaze-pinch interaction as it requires lessphysical movement and users can still directly interact with a life-scale object instead of moving miniature items.

“Even if it’s not real, you think it’s furniture. It feels likeit should be heavy, and I should be using both hands”

More than 80 percent of participants used both hands trying to lifta table or chairs. Interview with participants reveal that perceivedfeature of furniture such as its weight influence their instinctive formof hand interactions. Even after we provided the instruction andparticipants learn that they can move the object using only one hand,they preferred using both hands when moving a large object.

Participants also shared their opinions on desired attributes forinteraction design and made suggestions on how to make an improve-ment. Frequently mentioned topics include 1) dynamic interactions,2) natural interaction, and 3) direct interaction. Based on these find-ings, we discuss design recommendation in the remaining of thispaper. Figure 7 provides a summary of the follow-up interviewswith sample verbatim.

280

Page 7: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

User experience with each interaction design

Worlds in Miniature

Advantage Ease of use Efficient More precise Natural

Disadvantage Not as fun as the other two Limited view

Sample Verbatims “If you are a perfectionist and you want everything in the right position, this is a good way to move furniture” “You don’t really see the large one when you are arranging the small one” “It was easier, but it’s more fun to interact with large one”

Touch and Grab

Advantage Direct interaction Fun to use Natural Physical movement

Disadvantage Difficult to figure out the grip gesture Physical movement

Sample Verbatims “It feels more real when you move and drag the chair around the floor” “I think moving and walking is more intuitive. I don’t mind having to walk. It’s better that way” “Interaction was actually not intuitive, I didn’t know what to do to make it move. I kept pushing it at first"

Gaze and Pinch

Advantage Efficient once you figure out Requires less physical movement

Disadvantage Less intuitive Have to gaze first Difficult to select when multiple objects are aligned in the same direction

Sample Verbatims “If someone is not familiar with VR, using eye-gaze won’t be intuitive. I noticed it (cursor) after a while, but I prefer the other one that you can just use your hands. You don’t have to focus on things” “Once you figure out, it is not difficult. I liked it because you can simply look at the object” “It was not easy to use at first. I think you definitely need some instruction for this one”

Desired attributes for 3D UI interaction design

Dynamic interactions “I hope there are more you can do. Like you can pull, push, or lift, basically do everything that you would do with the real chair and table.” “You can combine all these three somehow or give options to choose. You can pinch, grab and have the mini-size one too.”

Natural interaction “You don’t grab the table. That’s not how you move the table in a real-world. When you see a table like this you try to lift it. You put your hands under the top or you just push it. I wish I can do all that in AR too.” “Pinching worked fine when moving the chair, but not for the menu. Can you click it with the finger? That’s what we do on our phone”

Direct and physical “I liked when you can directly touch and click the menu. When you see the sci-fi movie, you have all these windows and a virtual control center in front of you. You can still touch it”

Figure 7: A summary of findings from interviews

6 DISCUSSION

This study examines hand interaction techniques for a head-wornMR device. The primary goal of this study is to understand how todesign 3D user interaction that is intuitive and easy to learn. Througha series of user studies, we compared three widely known interactiontechniques: gaze and pinch interaction, touch and grab interaction,and worlds-in-miniature. The finding from a user study suggeststwo key implications.

First, worlds-in-miniature interaction stands out amongst threefor providing better usability. Compared to other two studied in-teractions, worlds-in-miniature interaction allowed faster and moreaccurate interaction in manipulating multiple objects. Unlike thegaze-pinch and touch-grab interaction, that only provide perspec-tive views of the objects, having an additional elevated view fromabove makes it easier to see the entire layout and understand spatialconfiguration [34].

Second, the findings from the gesture analysis suggest the criticalrole of the visual appearance of a virtual object or, more specifically,the affordance of an object in enhancing discoverability [25]. That is,a visual characteristic of a virtual menu which makes it looks "push-able" and the shape of a virtual table that affords "lifting" largelydetermined participants’ form of interaction. The perceptual prop-erty an object such as weight influences how users would interactwith the object. This is well represented in one of the participant’scomment: "It’s furniture. It feels like it should be heavy."

7 DESIGN IMPLICATIONS

7.1 Natural Interaction is Dynamic and Complex

Designers often strive to create a simple design. Especially foran innovative interface such as mixed reality system, one of thecommon design approach is to find a one single input system that canwork across different platforms [26]. Assigning specific interactionfor each input system—for example, simulating mouse movementwith a gaze or using finger pointing as a mouse cursor—has been aclassical design approach. This one-to-one mapping approach maybe easier to implement; however, such an interaction mechanism isdifficult to figure out without instructions.

Natural interaction is complex and dynamic. In our in-depthinterviews, when we asked participants to choose a single interactionthey like the most, many indicated that they want to combine allthree interactions rather than picking one. The desire for dynamicinteraction is well expressed in one of the participants comments–"Ihope there are more you can do. Like you can pull, push, or lift,basically do everything you would with with the real chair."

7.2 Natural Interaction is Direct and Physical

Natural interaction is direct and physical, which means it involves"touching" objects. The mid-air gestures has gained its popularitywith the release of motion sensing devices [40]. Research definesmid-air gesture as an interaction that involves touchless manipula-tions of digital content [38]. The term itself presumes the in-directinteraction. In our user study, participants frequently noted that

281

Page 8: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

Figure 8: Findings from gesture analysis and design suggestions based on the study findings

an pinch gesture felt unnatural when they are selecting a menuand manipulating an object. In the in-depth interviews, partici-pants indicated that they would prefer to have a pushable menuand grabbable virtual objects with which they can have more directinteraction. Wobbrock et al. and Piumsoboon et al. reported similarfindings [29, 41].

Gestures are often referred to as a critical component of thenatural user interface [24, 28]. Holding a hand in the air may helpusers to manipulate the remote objects easily; however, the lack ofphysical contact can make the entire experience feel virtual. In orderto make an interaction to be more natural and intuitive, the interfaceshould allow direct interaction with the object. Providing digitalbuttons and icons within the reachable distance, so that users canvirtually touch the object rather than using a mid-air gesture, cansimulates more realistic interaction.

7.3 Affordance and visual guidanceCoined by Gibson [12], the concept of affordance has become thecore design principle in interaction design. Our gesture analysisconfirmed that understanding a virtual object’s affordance is vitalfor developing a more natural and intuitive 3D user interaction.Depending on the virtual object’s visual appearance, the user’s formof interaction can largely vary. A button should be pushed, a tableshould be lifted, and a miniature object should afford to be pickedup.

Designers can take into account visual affordance and create morerealistic simulations by developing physics-based interaction. How-ever, creating an accurate mesh colliders for individual objects andsimulating realistic collision detection can often be time-consumingand computationally expensive. Providing interaction guidance suchas visually presenting the possible forms of gestures and interactioncan be a simple solution that can help users to quickly learn theinteraction. The great example of visual guidance comes from thebounding box and app bar in the Mirosoft Hololens 2. The bound-ing box shown in the Hololens 2 informs the user that the objectis currently adjustable while also responding to the user’s finger’sproximity. This interactive visual feedback can communicate thepossible interaction with users.

8 LIMITATION

The focus of our study was discoverability. Discoverability can bean important concern for the users who have difficulties in learning

a new interface; however, such a pure novice user is only a smallpart of the picture. The results of the study would likely have beenvery different when participants had enough time to learn each inter-action. Future work should address the progressive learning effectand further examine the usability of varying interaction techniquesin the longer term.

With the worlds-in-miniature interaction, it is challenging to pickup the virtual object that is smaller than their hand size. Prior3DUI works [16, 17] had tackled these issues and examine selectiontechniques in cluttered or dense VR environments. It would beinteresting to explore how to design a WIM interface with theseselection techniques.

The technical limitation is another important concern that hasto be addressed in interpreting study results. In creating a mixedreality setting, we used a Zed Mini camera’s video see-throughfeatures rather than using an optical head-mounted display such asa Microsoft Hololens. A video see-through images can suffer fromimage lagging and delays in responses, causing a motion sickness tothe users. Leap Motion sensor does not provide the most accuratehand tracking nor supports realistic object occlusions, which canresult in negative user experience.

Furthermore, we did not consider real-virtual occlusion in a largerspace. In our study, we scanned the environment using a ZED minicamera, but it was limited to the small room space. In the largercluttered environment, gaze-punch might be able to outperformtouch-grab and WIM. Compared to other two, it would be easierto navigate the larger space and move the objects in a clutteredenvironment with gaze-pinch interaction. Future studies can explorevarious scenarios to address such concerns.

9 CONCLUSION

This study compares three widely known interaction techniques:gaze and pinch interaction, touch and grab interaction, and worlds-in-miniature. Overall, the majority of participants found that the pinchand grab interaction most challenging to use, while the worlds-in-miniature interaction provided users the most accurate, efficient, andeasiest interaction in arranging multiple furniture items. The resultsof gesture analysis and in-depth interviews reveal the critical roles ofvisual affordance of virtual objects in determining the participants’form of interactions in mixed reality. We further provide designsuggestions that can aid in developing more intuitive interaction fora wearable mixed reality headset.

Figure 8: Findings from gesture analysis and design suggestions based on the study findings

an pinch gesture felt unnatural when they are selecting a menuand manipulating an object. In the in-depth interviews, partici-pants indicated that they would prefer to have a pushable menuand grabbable virtual objects with which they can have more directinteraction. Wobbrock et al. and Piumsoboon et al. reported similarfindings [29, 41].

Gestures are often referred to as a critical component of thenatural user interface [24, 28]. Holding a hand in the air may helpusers to manipulate the remote objects easily; however, the lack ofphysical contact can make the entire experience feel virtual. In orderto make an interaction to be more natural and intuitive, the interfaceshould allow direct interaction with the object. Providing digitalbuttons and icons within the reachable distance, so that users canvirtually touch the object rather than using a mid-air gesture, cansimulates more realistic interaction.

7.3 Affordance and visual guidanceCoined by Gibson [12], the concept of affordance has become thecore design principle in interaction design. Our gesture analysisconfirmed that understanding a virtual object’s affordance is vitalfor developing a more natural and intuitive 3D user interaction.Depending on the virtual object’s visual appearance, the user’s formof interaction can largely vary. A button should be pushed, a tableshould be lifted, and a miniature object should afford to be pickedup.

Designers can take into account visual affordance and create morerealistic simulations by developing physics-based interaction. How-ever, creating an accurate mesh colliders for individual objects andsimulating realistic collision detection can often be time-consumingand computationally expensive. Providing interaction guidance suchas visually presenting the possible forms of gestures and interactioncan be a simple solution that can help users to quickly learn theinteraction. The great example of visual guidance comes from thebounding box and app bar in the Mirosoft Hololens 2. The bound-ing box shown in the Hololens 2 informs the user that the objectis currently adjustable while also responding to the user’s finger’sproximity. This interactive visual feedback can communicate thepossible interaction with users.

8 LIMITATION

The focus of our study was discoverability. Discoverability can bean important concern for the users who have difficulties in learning

a new interface; however, such a pure novice user is only a smallpart of the picture. The results of the study would likely have beenvery different when participants had enough time to learn each inter-action. Future work should address the progressive learning effectand further examine the usability of varying interaction techniquesin the longer term.

With the worlds-in-miniature interaction, it is challenging to pickup the virtual object that is smaller than their hand size. Prior3DUI works [16, 17] had tackled these issues and examine selectiontechniques in cluttered or dense VR environments. It would beinteresting to explore how to design a WIM interface with theseselection techniques.

The technical limitation is another important concern that hasto be addressed in interpreting study results. In creating a mixedreality setting, we used a Zed Mini camera’s video see-throughfeatures rather than using an optical head-mounted display such asa Microsoft Hololens. A video see-through images can suffer fromimage lagging and delays in responses, causing a motion sickness tothe users. Leap Motion sensor does not provide the most accuratehand tracking nor supports realistic object occlusions, which canresult in negative user experience.

Furthermore, we did not consider real-virtual occlusion in a largerspace. In our study, we scanned the environment using a ZED minicamera, but it was limited to the small room space. In the largercluttered environment, gaze-punch might be able to outperformtouch-grab and WIM. Compared to other two, it would be easierto navigate the larger space and move the objects in a clutteredenvironment with gaze-pinch interaction. Future studies can explorevarious scenarios to address such concerns.

9 CONCLUSION

This study compares three widely known interaction techniques:gaze and pinch interaction, touch and grab interaction, and worlds-in-miniature. Overall, the majority of participants found that the pinchand grab interaction most challenging to use, while the worlds-in-miniature interaction provided users the most accurate, efficient, andeasiest interaction in arranging multiple furniture items. The resultsof gesture analysis and in-depth interviews reveal the critical roles ofvisual affordance of virtual objects in determining the participants’form of interactions in mixed reality. We further provide designsuggestions that can aid in developing more intuitive interaction fora wearable mixed reality headset.

282

Page 9: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

REFERENCES

[1] A. M. Bernardos, D. Gómez, and J. R. Casar. A comparison of head

pose and deictic pointing interaction methods for smart environments.

International Journal of Human–Computer Interaction, 32(4):325–351,

2016. doi: 10.1080/10447318.2016.1142054

[2] D. A. Bowman. Interaction Techniques for Common Tasks in ImmersiveVirtual Environments: Design, Evaluation, and Application. PhD thesis,

Atlanta, GA, USA, 1999. AAI9953819.

[3] D. A. Bowman, J. L. Gabbard, and D. Hix. A survey of usability

evaluation in virtual environments: Classification and comparison of

methods. Presence: Teleoper. Virtual Environ., 11(4):404–424, Aug.

2002. doi: 10.1162/105474602760204309

[4] D. A. Bowman and L. F. Hodges. User interface constraints for immer-

sive virtual environment applications. 1995.

[5] D. A. Bowman, R. P. McMahan, and E. D. Ragan. Questioning natu-

ralism in 3d user interfaces. Commun. ACM, 55(9):78–88, Sept. 2012.

doi: 10.1145/2330667.2330687

[6] J. Brooke. System usability scale (sus): A quick-and-dirty method of

system evaluation user information. digital equipment co ltd. Reading,UK, pp. 1–7, 1986.

[7] V. Buchmann, S. Violich, M. Billinghurst, and A. Cockburn. Fingartips:

Gesture based direct manipulation in augmented reality. In Proceedingsof the 2Nd International Conference on Computer Graphics and Inter-active Techniques in Australasia and South East Asia, GRAPHITE ’04,

pp. 212–221. ACM, New York, NY, USA, 2004. doi: 10.1145/988834.

988871

[8] F. Cafaro, L. Lyons, and A. N. Antle. Framed guessability: Improving

the discoverability of gestures and body movements for full-body inter-

action. In Proceedings of the 2018 CHI Conference on Human Factorsin Computing Systems, pp. 1–12, 2018.

[9] A. Cockburn, P. Quinn, C. Gutwin, G. Ramos, and J. Looser. Air

pointing: Design and evaluation of spatial target acquisition with and

without visual feedback. Int. J. Hum.-Comput. Stud., 69(6):401–414,

June 2011. doi: 10.1016/j.ijhcs.2011.02.005

[10] C. Colombo, A. Del Bimbo, and A. Valli. Visual capture and un-

derstanding of hand pointing actions in a 3-d environment. IEEETransactions on Systems, Man, and Cybernetics, Part B (Cybernetics),33(4):677–686, 2003.

[11] F. D. Davis, R. P. Bagozzi, and P. R. Warshaw. User acceptance

of computer technology: a comparison of two theoretical models.

Management science, 35(8):982–1003, 1989.

[12] J. J. Gibson. The theory of affordances. Hilldale, USA, 1(2), 1977.

[13] J. Hales, D. Rozado, and D. Mardanbegi. Interacting with objects in

the environment by gaze and hand gestures. In Proceedings of the 3rdinternational workshop on pervasive eye tracking and mobile eye-basedinteraction, pp. 1–9, 2013.

[14] S. G. Hart and L. E. Staveland. Development of nasa-tlx (task load

index): Results of empirical and theoretical research. In Advances inpsychology, vol. 52, pp. 139–183. Elsevier, 1988.

[15] K. Hu, S. Canavan, and L. Yin. Hand pointing estimation for human

computer interaction based on two orthogonal-views. In 2010 20th In-ternational Conference on Pattern Recognition, pp. 3760–3763. IEEE,

2010.

[16] Y. Jang, I. Jeong, T. Kim, and W. Woo. Metaphoric hand gestures for

orientation-aware vr object manipulation with an egocentric viewpoint.

IEEE Transactions on Human-Machine Systems, 47(1):113–127, Feb

2017. doi: 10.1109/THMS.2016.2611824

[17] Y. Jang, H. Noh, T. Chang, W. Kim, and Woo. 3d finger cape: Clicking

action and position estimation under self-occlusions in egocentric view-

point. IEEE Transactions on Visualization and Computer Graphics,

21(4):501–510, April 2015. doi: 10.1109/TVCG.2015.2391860

[18] J. Jerald. The VR book: Human-centered design for virtual reality.

Morgan & Claypool, 2015.

[19] A. Kendon. Some relationships between body motion and speech.

Studies in dyadic communication, 7(177):90, 1972.

[20] H. Kharoub, M. Lataifeh, and N. Ahmed. 3d user interface design and

usability for immersive vr. Applied Sciences, 9(22):4861, 2019. doi:

10.3390/app9224861

[21] M. Kytö, B. Ens, T. Piumsomboon, G. A. Lee, and M. Billinghurst. Pin-

pointing: Precise head- and eye-based target selection for augmented

reality. In Proceedings of the 2018 CHI Conference on Human Fac-tors in Computing Systems, CHI ’18, pp. 1–14. ACM, 2018. doi: 10.

1145/3173574.3173655

[22] F. Matulic and D. Vogel. Multiray: Multi-finger raycasting for large

displays. In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems, CHI ’18, pp. 245:1–245:13. ACM, New

York, NY, USA, 2018. doi: 10.1145/3173574.3173819

[23] R. P. McMahan, D. Gorton, J. Gresock, W. McConnell, and D. A. Bow-

man. Separating the effects of level of immersion and 3d interaction

techniques. In Proceedings of the ACM Symposium on Virtual RealitySoftware and Technology, VRST ’06, pp. 108–111. ACM, New York,

NY, USA, 2006. doi: 10.1145/1180495.1180518

[24] D. Mortensen. Natural User Interfaces: What are they and how do youdesign user interfaces that feel natural. Interaction Design Foundation,

2019.

[25] D. A. Norman. Affordance, conventions, and design. Interactions,

6(3):38–43, 1999.

[26] T. Page. Skeuomorphism or flat design: future directions in mobile

device user interface (ui) design education. International Journal ofMobile Learning and Organisation, (2):130–142, 2014.

[27] M. Pfeiffer and W. Stuerzlinger. 3d virtual hand pointing with ems and

vibration feedback. In 2015 IEEE Symposium on 3D User Interfaces(3DUI), pp. 117–120. IEEE, 2015.

[28] T. Piumsomboon, D. Altimira, H. Kim, A. Clark, G. Lee, and

M. Billinghurst. Grasp-shell vs gesture-speech: A comparison of

direct and indirect natural interaction techniques in augmented real-

ity. In 2014 IEEE International Symposium on Mixed and AugmentedReality (ISMAR), pp. 73–82, Sep. 2014. doi: 10.1109/ISMAR.2014.

6948411

[29] T. Piumsomboon, A. Clark, M. Billinghurst, and A. Cockburn. User-

defined gestures for augmented reality. In CHI ’13 Extended Abstractson Human Factors in Computing Systems, CHI EA ’13, pp. 955–960.

ACM, New York, NY, USA, 2013. doi: 10.1145/2468356.2468527

[30] I. Poupyrev, M. Billinghurst, S. Weghorst, and T. Ichikawa. The go-go

interaction technique: Non-linear mapping for direct manipulation in vr.

In Proceedings of the 9th Annual ACM Symposium on User InterfaceSoftware and Technology, UIST ’96, pp. 79–80. ACM, New York, NY,

USA, 1996. doi: 10.1145/237091.237102

[31] S. S. Rautaray and A. Agrawal. Vision based hand gesture recognition

for human computer interaction: A survey. Artif. Intell. Rev., 43(1):1–

54, Jan. 2015. doi: 10.1007/s10462-012-9356-9

[32] M. J. Reale, S. Canavan, L. Yin, K. Hu, and T. Hung. A multi-gesture

interaction system using a 3-d iris disk model for gaze estimation and

an active appearance model for 3-d hand pointing. IEEE Transactionson Multimedia, 13(3):474–486, June 2011. doi: 10.1109/TMM.2011.

2120600

[33] M. Seto, S. Scott, and M. Hancock. Investigating menu discoverability

on a digital tabletop in a public setting. In Proceedings of the 2012ACM international conference on Interactive tabletops and surfaces,

pp. 71–80, 2012.

[34] A. Shashua. Projective depth: A geometric invariant for 3d reconstruc-

tion from two perspective/orthographic views and for visual recogni-

tion. In 1993 (4th) International Conference on Computer Vision, pp.

583–590. IEEE, 1993.

[35] L. E. Sibert and R. J. K. Jacob. Evaluation of eye gaze interaction. In

Proceedings of the SIGCHI Conference on Human Factors in Comput-ing Systems, CHI ’00, pp. 281–288. ACM, New York, NY, USA, 2000.

doi: 10.1145/332040.332445

[36] S. Sridhar, F. Mueller, M. Zollhöfer, D. Casas, A. Oulasvirta, and

C. Theobalt. Real-time joint tracking of a hand manipulating an object

from rgb-d input. In European Conference on Computer Vision, pp.

294–310. Springer, 2016.

[37] R. Stoakley, M. J. Conway, and R. Pausch. Virtual reality on a wim:

interactive worlds in miniature. In CHI, vol. 95, pp. 265–272, 1995.

[38] P. Vogiatzidakis and P. Koutsabasis. Gesture elicitation studies for mid-

air interaction: A review. Multimodal Technologies and Interaction,

2(4):65, 2018.

[39] C. von Hardenberg and F. Bérard. Bare-hand human-computer in-

teraction. In Proceedings of the 2001 Workshop on Perceptive User

283

Page 10: A Comparative Analysis of 3D User Interaction: How to Move ......2.3 Comparative Studies on 3D User Interaction A 3D interaction technique refers to an interaction mechanism on how

Interfaces, PUI ’01, p. 1–8. Association for Computing Machinery,

New York, NY, USA, 2001. doi: 10.1145/971478.971513

[40] I. Wang, P. Narayana, D. Patil, G. Mulay, R. Bangar, B. Draper, R. Bev-

eridge, and J. Ruiz. Exploring the use of gesture in collaborative tasks.

In Proceedings of the 2017 CHI Conference Extended Abstracts onHuman Factors in Computing Systems, CHI EA ’17, pp. 2990–2997.

ACM, New York, NY, USA, 2017. doi: 10.1145/3027063.3053239

[41] J. O. Wobbrock, M. R. Morris, and A. D. Wilson. User-defined gestures

for surface computing. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, CHI ’09, pp. 1083–1092. ACM,

New York, NY, USA, 2009. doi: 10.1145/1518701.1518866

[42] Y. Zhang, S. Stellmach, A. Sellen, and A. Blake. The costs and benefits

of combining gaze and hand gestures for remote interaction. In IFIPConference on Human-Computer Interaction, pp. 570–577. Springer,

2015.

284