skovsgaard small target selection with gaze alone

Copyright © 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00

Small-Target Selection with Gaze Alone

Henrik Skovsgaard∗

IT University of CopenhagenJulio C. Mateo†

John M. Flach‡

Wright State University

John Paulin Hansen§

IT University of Copenhagen

Abstract

Accessing the smallest targets in mainstream interfaces using gazealone is difficult, but interface tools that effectively increase thesize of selectable objects can help. In this paper, we propose aconceptual framework to organize existing tools and guide the de-velopment of new tools. We designed a discrete zoom tool andconducted a proof-of-concept experiment to test the potential of theframework and the tool. Our tool was as fast as and more accu-rate than the currently available two-step magnification tool. Ourframework shows potential to guide the design, development, andtesting of zoom tools to facilitate the accessibility of mainstreaminterfaces for gaze users.

CR Categories: H.5.1 [INFORMATION INTERFACESAND PRESENTATION]: Multimedia Information Systems—Evaluation/methodology

Keywords: gaze interaction, universal access, zoom interfaces

1 Introduction

Mainstream graphical user interfaces (GUIs) are generally designedwith the mouse user in mind. As a consequence, users who rely onalternative input devices may encounter difficulties when accessingthese GUIs. In this paper, we will focus on issues encountered byusers of gaze tracking systems when selecting the smallest targetsin mainstream GUIs. The limited accuracy of gaze pointing (whencompared to mouse pointing) can make small-target selection verydifficult for gaze-input users. Before discussing ways to address thelimited accuracy of gaze input, we will briefly review how the gazesignal is processed and which factors affect gaze-pointing accuracy.

Point-and-select operations, such as pointing at an icon and clickingon it to open an application, are typical of mainstream GUIs. Mouseusers physically move the mouse to point and press the mouse but-ton to issue an activation (i.e., select). Pointing is straightforwardfor gaze-input users as well, but our eyes lack a selection mech-anism. To identify when a user wants to issue an activation, gazetracking systems divide eye movements into saccades and fixations.

Saccades are fast movements that cover relatively large spatial re-gions when users move their gaze from one location of interest tothe next. Fixations are relatively slow movements performed in alimited spatial region when a user is inspecting an object of interest.Even during fixations, the eyes are continuously moving. This in-herent eye jitter, combined with gaze tracker inaccuracies (e.g., off-

∗e-mail: [email protected]†e-mail:[email protected]‡e-mail:[email protected]§e-mail: [email protected]

sets), negatively impacts gaze-pointing accuracy. To reliably iden-tify fixations and saccades, gaze-tracking systems use algorithmsbased on velocity, dispersion, or a combination of both [Duchowski2007]. For example, a velocity threshold can be set such that gazevelocities faster than this threshold are considered part of a saccadewhereas slower velocities are considered part of a fixation.

Gaze-tracking systems use detected fixations and saccades to breakgaze movements into pointing and selection components. If a sac-cade is detected, it is assumed to belong to the pointing component.However, fixations can occur both during pointing and during se-lection. That is, users may look at an object because they want toinspect it further (i.e., inspection fixations) or because they want toselect it (i.e., selection fixations). The most common method to dis-tinguish inspection and selection fixations is to set a time threshold(i.e., dwell time). That is, fixations lasting longer than dwell timeare considered part of the selection component whereas shorter fix-ations are considered part of the pointing component. In general, aselection fixation results in an activation at the cursor location and,if the cursor is on top of a target, a target selection.

Approaches to address the limited accuracy of gaze pointing in or-der to enhance the accessibility to mainstream GUIs can be groupedinto two categories. Some approaches aim at reducing the noise inthe input (gaze) signal, whereas others aim at increasing the toler-ance of interfaces to noisy inputs. These two approaches are notmutually exclusive and, in fact, usually complement each other.

1.1 Reducing Noise in the Input Signal

The most common way to reduce the noise in the gaze signal is tosmooth (i.e., low-pass filter) the signal to increase the steadiness ofthe cursor. Most commercial gaze trackers smooth the input signalbefore displaying the cursor. In fact, it is generally accepted that,given the jitter inherent to eye movements, some degree of smooth-ing is necessary to use gaze as an input signal. However, smoothingalso results in reduced responsiveness to gaze movements (i.e., timedelay) and, therefore, there is a tradeoff between cursor steadinessand responsiveness. Actually, cursor smoothing effectively reducesthe frame rate of the system by averaging across gaze samples.

Signal smoothing and fixation-detection algorithms are not inde-pendent from each other. On the one hand, the amount of smooth-ing applied to the gaze signal can impact the velocity thresholdused in the fixation-detection algorithm. That is, smoother signalsneed lower velocity thresholds than less smooth signals to reliablydistinguish between fixations and saccades. On the other hand,the output of fixation-detection algorithms can be used to informwhen smoothing is applied. For example, cursor smoothing can bestopped as soon as the algorithm detects a saccade and re-activatedduring fixations to increase cursor responsiveness.

1.2 Increasing Interface Tolerance to Noise

An alternative approach to dealing with noisy inputs is to designGUIs that are tolerant to noise. For example, typing interfaces de-veloped for gaze users display very large buttons (e.g., GazeTalk;[Hansen et al. 2003]) or provide other interface features to avoidthe need to select small targets (e.g., Dasher; [Ward et al. 2000]).

145

Start time - Dwell End time

1

2

End timeStart time - Continuous Zoom

3

End timeStart time - N-Step Zoom

Target of interest

Additional Magnification (N > 2)

Figure 1: Illustration ofthe different zoom tools.Row 1 depicts a targetselection with dwell (i.e.,no tool). Row 2 depictshow the continuous zoomtool gradually magnifiesthe target area. Row 3depicts how n-step toolswork. A two-step versionwould end before enter-ing the Additional Mag-nification loop, a three-step version would gothrough the loop once,and so on. The shrinkingred dots in row 1 and 3indicate dwell time.

The use of dedicated software allows developers to have full accessto the information underlying the environment in which the user isacting (e.g., target locations). This information can be used to aidsmall-target selection (e.g., force fields; [Zhang et al. 2008]). How-ever, the development of dedicated GUIs for gaze users does notaddress accessibility to mainstream GUIs.

A way to increase the tolerance of mainstream GUIs to noise is todevelop tools that interface with these GUIs to effectively increasethe size of selectable objects. These tools are generally more lim-ited than dedicated GUIs due to their inability to access all informa-tion (e.g., target locations) underlying mainstream GUIs. The mostcommon of these tools is two-step magnification [Lankford 2000],which is often available in commercial gaze trackers. This two-step tool divides the point-and-select task into two steps requiring apoint-and-select operation each. During the first step, the detectionof a selection component does not result in an activation. Rather, amagnified (usually 2, 3, or 4x) version of the area surrounding thecursor pops up. During the second step, the detection of a selectioncomponent (on the magnified window) results in an activation. As-suming the target is within the magnified area, this tool effectivelyincreases target size and, therefore, increases the GUI tolerance tonoise. Although helpful for small-target selection, the two-step toolslows down interaction and may feel unnatural to the user.

2 Unanticipated Limitations of Zoom Tools

In an attempt to address the limitations of the two-step tool, we de-veloped a zoom tool to access mainstream GUIs. This tool was in-spired by previous work with dedicated interfaces (e.g., StarGazer;[Hansen et al. 2008]), which showed that zooming could help withnoisy input. Bates and Istance [2002] had also proposed the useof zooming interfaces to facilitate access to mainstream GUIs forgaze-input users. However, their tool magnified the whole screenand was controlled manually. In contrast, our gaze-controlled toolpresented a smooth animation surrounding the cursor. When ashort fixation was detected, the content in this window graduallyincreased in size (as if approaching the user) for the duration ofa predetermined zoom time. After this time elapsed, an activationwas issued on the cursor position (i.e., the center of this window).See row 2 of Figure 1 for an illustration.

We expected this zoom tool to have at least four advantages overthe two-step tool. First, we expected its continuous looming ap-pearance to feel more natural to the user. Second, we expected theuser to be able to make online corrections to the cursor position

as the target increased in size. Third, we expected target selectionto be faster because the user would not need to perform two sepa-rate point-and-select operations. Fourth, we expected the maximummagnification level possible to be greater than using a two-step toolwith a window of similar size because the entire region around thecursor did not need to be magnified all at once.

In our previous experiment, we found that this zoom tool facilitatedsmall-target selection when compared to no tool [Skovsgaard et al.2008], but it did not compare favorably to a two-step tool. Rather,the two-step tool was more accurate and rated more favorably thanthe zoom tool. At least three factors might have contributed to thepoor performance and ratings of the zoom tool. First, our zoom-ing tool transformed a discrete point-and-select operation (with astill target) into a continuous tracking task (with a moving target).Second, once zooming started, the user could not control the rate atwhich content zoomed in. Third, the impact of the time delay result-ing from processing and smoothing the gaze signal was amplifieddue to the first two factors. As a result, users corrections often led toinstability (i.e., increasing error, rather than reducing it). It is pos-sible that performing a tracking task using gaze input would not beproblematic without delay. However, some delay is inherent to allcurrent gaze-tracking systems as a result of signal processing andsmoothing. Therefore, tools developed to access mainstream GUIsmust be tolerant to both noise and delay.

3 Re-evaluating the Design of Zoom Tools

In our first implementation, we did not anticipate how our con-tinuous zoom tool would change the task or how delay would af-fect performance. Empirical results challenged our assumption thatcontinuous interaction would always be more natural than discreteinteraction. Instead, continuous interaction seemed unnatural withdelayed feedback. In fact, the manual-control literature suggeststhat, in the presence of delays, users naturally adopt a move-and-wait strategy [Ferrell 1965]. That is, users transform the continuoustask into a series of discrete components. Ironically, our attempt tomake the task more natural backfired because, even though con-tinuous interaction may be more natural in real-world situations,discrete interaction is more natural in the presence of time delays.

3.1 Discrete Zoom Tools

Based on the results of our first study, we designed a discrete zoomtool, which is conceptually equivalent to an n-step tool, combining

146

2 (Discrete)

8

(Continuous)

StepsDiscrete Zoom

2-Step Dwell

Continuous Zoom

Figure 2: The zoom framework.

features of two-step and zoom tools (see row 3 of Figure 1 for an il-lustration). Because zooming occurs in discrete steps, we expectedthis tool to be more tolerant to delay than the continuous zoom tool.When compared to the two-step tool, we expected more steps topermit greater magnification levels because, after the first step, thecontent can be magnified further without increasing window size.Obviously, adding steps can also slow down performance. How-ever, given that early steps require lower accuracy than the two-step tool, we expected discrete zoom to accommodate lower dwelltimes. We also expected the discrete zoom tool to result in moreof a zooming sensation than two-step while providing users morecontrol over zooming rate than continuous zoom.

3.2 The Zoom Framework

Based on our experience developing and testing tools to facilitatethe selection of small targets using gaze alone, we created a concep-tual framework to organize existing tools designed for small-targetselection (Figure 2). All the tools in this framework increase theeffective size of targets (i.e., zoom) to facilitate small-target selec-tion. This framework organizes tools in a discrete-to-continuouscontinuum. The two-step and continuous zoom tools can be placed,respectively, on the discrete and continuous ends of this continuum.The two-step tool suddenly increases target size to its maximummagnification level, whereas continuous zoom increases target sizein what could be considered an infinite number of infinitely smallsteps. Consistent with these two extremes, tools closer to the dis-crete end of the spectrum tend to have less steps of longer duration,whereas tools closer to the continuous end of the spectrum tend tohave more steps of shorter duration. The theoretical shorter dura-tion per step of tools with more steps (i.e., more continuous) is theresult of shorter dwell times when compared to tools with less steps(i.e., more discrete). Tools toward the continuous end of the spec-trum tend to require the user to carry out a more tracking-like task,whereas tools toward the discrete end can be better characterized asa series of point-and-select operations. In addition, tools towardsthe continuous end of the spectrum tend to permit higher magnifi-cation levels because objects can increase in size within a windowof constant size. Therefore, more continuous tools are less limitedby the size of the zooming window.

In general, discrete zoom tools fall in between these two extremes.The specific three-step version we test below falls closer to the dis-crete end (see Figure 2). Even if close to two-step, we argue thatthis three-step tool can facilitate selection of very small targets andnaturalness of interaction when compared to two-step magnifica-tion. We also argue that this framework may facilitate comparisonsamong tools. By studying how tools vary along the continuum, thisframework could provide insights into useful tool features and sug-gest ways in which future designs can combine these features.

4 Discrete Zoom Tools: Proof of Concept

In order to study the potential of discrete zoom tools, we conductedan experiment to compare different zoom tools. Participants in-cluded 2 male expert users (first two authors) and 8 novices (2 males

and 6 females). Novices had no previous experience with gaze in-teraction. We used an IG-30 eye tracker from Alea Technologiesin a desktop setting. Participants were instructed to use a gaze-controlled cursor to point to the target present in the workspace asquickly and accurately as possible. Circular targets appeared one ata time at 1 of 16 possible locations equidistant (300 pixels) from thehoming circle on the center. A trial started when a participant posi-tioned the gaze cursor on the homing circle and ended as soon as theparticipant issued an activation using the corresponding method. Asuccessful target selection was not required. Each participant com-pleted 16 blocks of 16 trials, resulting in a total of 256 activationsper participant. All independent variables were manipulated withinparticipants and fixed within blocks.

We manipulated zoom tool, target size, and smoothing. Zoom toolhad 4 levels: dwell (no zoom), two-step tool, three-step tool, andoptimized three-step tool. The magnification level (4x) and dwelltime (600 ms) of the two-step tool were chosen based on availableversions of this tool. In fact, we purposefully chose a relativelyhigh level of magnification and a relatively short dwell time. Thethree-step tool had the same magnification level and dwell time asthe two-step tool, whereas the optimized three-step tool had twicethe magnification (8x) and half the dwell time (300 ms). Achiev-ing 8x magnification with a two-step tool is virtually impossiblewith a magnified window of the size used in this experiment. The2 levels of target size were 6- and 12-pixel diameters (to representsome of the smallest targets in the environment). The 2 levels ofsmoothing (no smoothing and 10-sample average) were applied tothe raw eye-tracker data and velocity thresholds were adjusted ac-cordingly. We measured hit rate, completion time, and subjectiveratings. Data were analyzed with a repeated measures ANOVA andLSD correction in the post-hoc tests.

We expected the three-step tool to: (a) feel more natural, (b) bemore resistant to noisy input, and (c) enable reliable selection ofsmaller targets than the two-step tool. We did not expect discretezoom to be faster than the two-step tool, but we did expect an op-timized three-step version to achieve similar speeds to the two-steptool without sacrificing accuracy. This optimized version was ex-pected to be able to accommodate lower dwell times and greatermagnification levels than current two-step tools.

Due to space limitations, we emphasize the results that are mostrelevant to the zoom framework. All data analyses were conductedon the data from novices. Experts were used for comparison pur-poses. Target size, smoothing, and subjective-rating results will notbe described in detail. Suffice to say that target size affected hit ratebut not completion time, whereas smoothing affected completiontime but not hit rate. Hit rate was lower for smaller targets than forlarger targets, F(1, 4) = 19.90, p < 0.05. Smoothing over 10 sam-ples resulted in longer completion times than no smoothing, F(1,4) = 11.06, p < 0.05. We found no evidence suggesting that nosmoothing had a greater impact on the two-step than on the three-step tool. Therefore, this experiment did not support the hypothesisthat a three-step tool is more resistant to noise than two-step. Pre-liminary analyses suggest that participants did not rate the threezoom tools different from each other, but some differences wereapparent between dwell and all three tools (i.e., dwell was rated asfaster but less accurate than zoom tools). We found no evidence ofthe three-step tool being perceived as more natural than the two-steptool.

Zoom tool had a significant effect on hit rate, F(3, 21) = 32.43, p< 0.05. Mean hit rate was lowest without zoom (M = 0.04, SD =0.03). The hit rates of the two-step (M = 0.24, SD = 0.11) and three-step tools (M = 0.29, SD = 0.12) were not significantly differentfrom each other, t(7) = 1.22, p > 0.05. The optimized three-steptool (M = 0.48, SD = 0.14) had a higher hit rate than the three-step

147

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Dwell Two-‐Step Three-‐Step Three-‐Step Op:mized

Mean Hit Rate

Zoom Tool

Novice Expert

Figure 3: Mean hit rates for the 8 novices and the 2 experts as afunction of zoom tool.

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Dwell Two-‐Step Three-‐Step Three-‐Step Op5mized

Mean Co

mple+

on Tim

e (m

s)

Zoom Tool

Novice Expert

Figure 4: Mean completion times for the 8 novices and the 2 ex-perts as a function of zoom tool.

tool, t(7) = 4.57, p < 0.05. These results are consistent with ourhypothesis that better accuracy can be achieved with a three-stepthan with a two-step tool. Given the difference between three-stepand optimized three-step, the accuracy advantage is probably due tothe latter’s greater magnification level. Mean hit rates across zoomtools show a similar pattern for novices and experts (Figure 3).

Zoom tool also had a significant effect on completion time, F(3, 21)= 119.04, p < 0.05. Completion times were shortest without zoom(M = 1581 ms, SD = 192 ms). The two-step (M = 3193 ms, SD =441 ms) and optimized three-step tools (M = 3152 ms, SD = 375ms) were not significantly different from each other, t(7) = 0.39,p > 0.05. The three-step tool (M = 3905 ms, SD = 442 ms) tooklonger than the two-step tool, t(7) = 5.35, p < 0.05. These resultsare consistent with our hypothesis that a three-step tool can achievespeeds comparable to a relatively fast version of the two-step tool(given shorter dwell time in the three-step tool). Again, the patternof results was very similar for novices and experts (Figure 4).

Overall, the results of this experiment are promising. We found sup-port for the possibility that discrete zoom tools can achieve similarspeeds and greater accuracy than available two-step tools. Futureresearch should explore whether this finding generalizes to situa-tions in which distractors are present and to tasks in which success-ful target selection is required. Future studies should also explorewhether a two-step tool could accommodate lower dwell times andwhether having different dwell times for different steps could bebeneficial. Our smoothing manipulation and subjective ratings didnot support our hypothesis that three-step tools are more tolerantto noise and natural than two-step tools. Research with a widerrange of smoothing levels and subjective ratings could help de-

termine whether this result is due to a lack of difference betweentools or to a lack of sensitivity of the measures we used. Finally,even if mean values varied substantially, we found a similar pat-tern of results across a wide range of expertise levels. This resultsuggests that findings from novices may generalize to more experi-enced users and novice-user data may be useful to evaluate interfacetools.

5 Summary and Conclusions

Selecting the smallest targets in mainstream GUIs using gaze aloneis not easy. Although some tools exist, there is little theoreticalguidance for the development of tools to facilitate accessibility tomainstream GUIs for gaze users. Based on our previous work, weproposed a conceptual framework to categorize existing tools andguide the development of new tools. As a proof of concept, we de-signed a discrete zoom tool and generated hypotheses about howit would compare to other zoom tools based on this framework.We conducted an experiment in which the optimized three-step dis-crete zoom tool we proposed achieved better performance than atwo-step tool modeled after existing tools. Results suggest that ourframework holds potential to guide the development of zoom toolsto enhance accessibility to mainstream GUIs for gaze users.

References

BATES, R., AND ISTANCE, H. 2002. Zooming interfaces!: en-hancing the performance of eye controlled pointing devices. InProceedings of the fifth international ACM conference on Assis-tive technologies, ACM, Edinburgh, Scotland, 119–126.

DUCHOWSKI, A. T. 2007. Eye tracking methodology. Springer.

FERRELL, W. 1965. Remote manipulation with transmission delay.IEEE Transactions on Human Factors in Electronics 6, 24–32.

HANSEN, J. P., JOHANSEN, A. S., HANSEN, D. W., ITOH, K.,AND MASHINO, S. 2003. Command without a click: Dwelltime typing by mouse and gaze selections. In INTERACT 2003,IOS Press, 121–128.

HANSEN, D. W., SKOVSGAARD, H. H. T., HANSEN, J. P., ANDMLLENBACH, E. 2008. Noise tolerant selection by gaze-controlled pan and zoom in 3D. In Proceedings of the 2008symposium on Eye tracking research & applications, ACM, Sa-vannah, Georgia, 205–212.

LANKFORD, C. 2000. Effective eye-gaze input into windows. InProceedings of the 2000 symposium on Eye tracking research& applications, ACM, Palm Beach Gardens, Florida, UnitedStates, 23–27.

SKOVSGAARD, H., MATEO, J., AND HANSEN, J. P. 2008. Howcan tiny buttons be hit using gaze only? In COGAIN 2008,COGAIN, Prague, Czech Republic, vol. 4, 38–42.

WARD, D. J., BLACKWELL, A. F., AND MACKAY, D. J. C. 2000.Dasher - a data entry interface using continuous gestures andlanguage models. In Proceedings of the 13th annual ACM sym-posium on User interface software and technology, ACM, SanDiego, California, United States, 129–137.

ZHANG, X., REN, X., AND ZHA, H. 2008. Improving eye cursor’sstability for eye pointing tasks. In Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computingsystems, ACM, Florence, Italy, 525–534.

148

skovsgaard small target selection with gaze alone

Documents