ahc2010 proceedings

Proceedings of the 1st Augmented Human International Conference

2010, Megève, France

AH ‘10

General Co-Chairs: Hideo Saito, Keio University, Japan & Jean-Marc Seigneur, University of

Geneva, Switzerland

Program Co-Chairs: Guillaume Moreau, Ecole Centrale de Nantes, France & Pranav Mistry, MIT

Media Lab, USA

Organisation Chair: Jean-Marc Seigneur, University of Geneva, Switzerland

Augmented/Mixed Reality Co-Chairs: Guillaume Moreau, Ecole Centrale de Nantes, France &

Masahiko Inami, Keio University, Japan

Brain Computer Interface Co-Chairs: Karla Felix Navarro, University of Technology Sydney,

Australia and Ed Boyden, MIT Media Lab, USA

Biomechanics and Human Performance Chair: Guillaume Millet, Laboratoire de Physiologie de

l'Exercice de Saint-Etienne, France

Wearable Computing Chair: Bruce Thomas, University of South Australia

Security and Privacy Chair: Jean-Marc Seigneur, University of Geneva, Switzerland

Program Committee:

Peter Froehlich, Forschungszentrum Telekommunikation Wien, Austria

Pranav Mistry, MIT Media Lab, USA

Jean-Marc Seigneur, University of Geneva, Switzerland

Guillaume Moreau, Ecole Centrale de Nantes, France

Guillaume Millet, Laboratoire de Physiologie de l'Exercice de Saint-Etienne, France

Jacques Lefaucheux, JLX3D, France

Christian Jensen, Technical University of Denmark

Jean-Louis Vercher, CNRS et Université de la Méditerranée, France

Steve Marsh, National Research Council Canada

Didier Seyfried, INSEP, France

Hideo Saito, Keio University, Japan

Narayanan Srinivasan, University of Allahabad, India

Qunsheng Peng, Zhejiang University, China

Karla Felix Navarro, University of Technology Sydney, Australia

Brian Caulfield, University College Dublin, Ireland

Masahiko Inami, Keio University, Japan

Ed Boyden, MIT Media Lab, USA

Bruce Thomas, University of South Australia

Franck Multon, Université de Rennes 2, France

Yanjun Zuo, University of North Dakota, USA

Sponsors: Sporaltec, Megève

ACM International Conference Proceedings Series

ACM Press

The Association for Computing Machinery

2 Penn Plaza, Suite 701

New York New York 10121-0701

ACM COPYRIGHT NOTICE. Copyright © 2010 by the Association for Computing Machinery,

Inc. Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed for profit

or commercial advantage and that copies bear this notice and the full citation on the first

page. Copyrights for components of this work owned by others than ACM must be honored.

Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to

redistribute to lists, requires prior specific permission and/or a fee. Request permissions from

Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or [email protected].

For other copying of articles that carry a code at the bottom of the first or last page,

copying is permitted provided that the per-copy fee indicated in the code is paid

through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923,

+1-978-750-8400, +1-978-750-4470 (fax).

Notice to Past Authors of ACM-Published Articles

ACM intends to create a complete electronic archive of all articles and/or

other material previously published by ACM. If you have written a work

that was previously published by ACM in any journal or conference proceedings

prior to 1978, or any SIG Newsletter at any time, and you do NOT want this work

to appear in the ACM Digital Library, please inform [email protected],

stating the title of the work, the author(s), and where and when published.

ACM ISBN: 978-1-60558-825-4

Introduction

The first Augmented Human International Conference (AH’10) has gathered scientific

papers from many different disciplines: information technology, human computer

interface, brain computing interface, sport and human performance, augmented reality…

This first edition is quite multidisciplinary for a research domain that requires even more

interdisciplinarity as it touches the human person. Many papers concentrated on building

the human augmentation technologies, which is necessary for them to emerge in the real

world. However, few papers were investigating the ethical or safety issues of augmented

human technologies. The next edition may bring more papers on this essential aspect that

must be taken into account for a long term success of these technologies.

Acknowledgments

Many thanks to: the eHealth division of the European Commission who has circulated the

call for papers in its official lists of events; the municipality of Megève and Megève

Tourisme who helped organising the conference; the EU-funded FP7-ICT-2007-2-

224024 PERIMETER project who partially funds the organisation chair as well as the

University of Geneva where he is affiliated; the French association for virtual reality

(AFRV) who organised the industrial and scientific session; the ACM who published the

proceedings of the conference in its online library; the French “pôle de compétitivité”

Sporaltec who sponsored the best paper award; and all the program committee members

who reviewed the submitted papers and circulated the CFP to their contacts.

Table of Contents

Article 1: “ExoInterfaces: Novel Exosceleton Haptic Interfaces for Virtual Reality,

Augmented Sport and Rehabilitation”, Dzmitry Tsetserukou, Katsunari Sato and Susumu

Tachi.

Article 2: “PossessedHand: A Hand Gesture Manipulation System using Electrical

Stimuli”, Emi Tamaki, Miyaki Takashi and Jun Rekimoto.

Article 3: “A GMM based 2-stage Architecture for Multi-Subject Emotion Recognition

using Physiological Responses”, Yuan Gu, Su Lim Tan, Kai Juan Wong, Moon-Ho

Ringo Ho and Li Qu.

Article 4: “Gaze-Directed Ubiquitous Interaction Using a Brain-Computer Interface”,

Dieter Schmalstieg, Alexander Bornik, Gernot Mueller-Putz and Gert Pfurtscheller.

Article 5: “Relevance of EEG Input Signals in the Augmented Human Reader”, Inês

Oliveira, Ovidiu Grigore, Nuno Guimarães and Luís Duarte.

Article 6: “Brain Computer Interfaces for Inclusion”, Paul McCullagh, Melanie Ware,

Gaye Lightbody, Maurice Mulvenna, Gerry McAllister and Chris Nugent.

Article 7: “Emotion Detection using Noisy EEG Data”, Mina Mikhail, Khaled El-Ayat,

Rana El Kaliouby, James Coan and John J.B. Allen.

Article 8: “World’s First Wearable Humanoid Robot that Augments Our Emotions”,

Dzmitry Tsetserukou and Alena Neviarouskaya.

Article 9: “KIBITZER: A Wearable System for Eye-Gaze-based Mobile Urban

Exploration”, Matthias Baldauf, Peter Fröhlich and Siegfried Hutter.

Article 10: “Airwriting Recognition using Wearable Motion Sensors”, Christoph Amma,

Dirk Gehrig and Tanja Schultz.

Article 11: “Augmenting the Driver’s View with Real-Time Safety-Related Information“,

Peter Fröhlich, Raimund Schatz, Peter Leitner, Stephan Mantler and Matthias Baldauf.

Article 12: “An Experimental Augmented Reality Platform for Assisted Maritime

Navigation”, Olivier Hugues, Jean-Marc Cieutat and Pascal Guitton.

Article 13: “Skier-ski System Model and Development of a Computer Simulation Aiming

to Improve Skier’s Performance and Ski”, François Roux, Gilles Dietrich and Aude-

Clémence Doix.

Article 14: “T.A.C: Augmented Reality System for Collaborative Tele-Assistance in the

Field of Maintenance through Internet.” Sébastien Bottecchia, Jean Marc Cieutat and

Jean Pierre Jessel.

Article 15: “Learn complex phenomenon and enjoy interactive experiences in a

Museum!”, Benedicte Schmitt, Cédric Bach and Emmanuel Dubois.

Article 16: “Partial Matching of Garment Panel Shapes with Dynamic Sketching

Design”, Shuang Liang, Rong-Hua Li, George Baciu, Eddie C.L. Chan and Dejun Zheng.

Article 17: “Fur Interface with Bristling Effect Induced by Vibration”, Masahiro

Furukawa, Yuji Uema, Maki Sugimoto and Masahiko Inami.

Article 18: “Evaluating Cross-Sensory Perception of Superimposing Virtual Color onto

Real Drink: Toward Realization of Pseudo-Gustatory Displays”, Takuji Narumi,

Munehiko Sato, Tomohiro Tanikawa and Michitaka Hirose.

Article 19: “The Reading Glove: Designing Interactions for Object-Based Tangible

Storytelling”, Joshua Tanenbaum, Karen Tanenbaum and Alissa Antle.

Article 20: “Control of Augmented Reality Information Volume by Glabellar Fader”,

Hiromi Nakamura and Homei Miyashita.

Article 21: “Towards Mobile/Wearable Device Electrosmog Reduction through Careful

Network Selection”, Jean-Marc Seigneur, Xavier Titi and Tewfiq El Maliki.

Article 22: “Bouncing Star Project: Design and Development of Augmented Sports

Application Using a Ball Including Electronic and Wireless Modules”, Osamu Izuta,

Toshiki Sato, Sachiko Kodama and Hideki Koike.

Article 23: “On-line Document Registering and Retrieving System for AR Annotation

Overlay”, Hideaki Uchiyama, Julien Pilet and Hideo Saito.

Article 24: “Augmenting Human Memory using Personal Lifelogs”, Yi Chen and Gareth

Jones.

Article 25: “Aided Eyes: Eye Activity Sensing for Daily Life”, Yoshio Ishiguro, Adiyan

Mujibiya, Takashi Miyaki and Jun Rekimoto.

ExoInterfaces: Novel Exosceleton Haptic Interfaces for Virtual Reality, Augmented Sport and Rehabilitation Dzmitry Tsetserukou

Toyohashi University of Technology 1-1 Hibarigaoka, Tempaku-cho,

Toyohashi, Aichi, 441-8580 Japan

dzmitry.tsetserukou@erc. tut.ac.jp

Katsunari Sato University of Tokyo

7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 Japan

Katsunari_Sato@ ipc.i. u-tokyo.ac.jp

Susumu Tachi Keio University

4-1-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8526 Japan

[email protected]

ABSTRACT We developed novel haptic interfaces, FlexTorque and FlexTensor that enable realistic physical interaction with real and Virtual Environments. The idea behind FlexTorque is to reproduce human muscle structure, which allows us to perform dexterous manipulation and safe interaction with environment in daily life. FlexTorque suggests new possibilities for highly realistic, very natural physical interaction in virtual environments. There are no restrictions on the arm movement, and it is not necessary to hold a physical object during interaction with objects in virtual reality. Because the system can generate strong forces, even though it is light-weight, easily wearable, and intuitive, users experience a new level of realism as they interact with virtual environments.

ACM Classification Keywords H5.2. Information interfaces and presentation: User Interfaces – haptic I/O, interaction styles, prototyping.

General Terms Design, Experimentation, Performance.

Keywords Exoskeleton, haptic display, haptic interface, force feedback, Virtual Reality, augmented sport, augmented games, rehabilitation, game controller.

1. INTRODUCTION In order to realize haptic interaction (e.g., holding, pushing, and contacting the object) in virtual environment and mediated haptic communication with human beings (e.g., handshaking), the force feedback is required. Recently there has been a substantial need and interest in haptic displays, which can provide realistic and high fidelity physical interaction in virtual environment. The aim

of our research is to implement a wearable haptic display for presentation of realistic feedback (kinesthetic stimulus) to the human arm. We developed a wearable device FlexTorque that induces forces to the human arm and does not require holding any additional haptic interfaces in the human hand. It is completely new technology for Virtual and Augmented Environments that allows user to explore surroundings freely. The concept of Karate (empty hand) Haptics proposed by us is opposite to the conventional interfaces (e.g., Wii Remote [11], SensAble’s PHANTOM [7]) that require holding haptic interface in the hand, restricting thus the motion of the fingers in midair. The powered exoskeleton robots, such as HAL [3] (weight of 23 kg) and Raytheon Sarcos [8] (weight of about 60 kg) intended for the power amplification of the wearer can be used for the force presentation as well. However, they are heavy, require high power consumption, and pose danger for user due to the powerful actuators. Another class of exoskeletons is aimed at teleoperator systems. Most of the force feedback master devices are similar in sizes to slave robot and are equipped with powerful actuators. Such systems pose dangerousness for human operator and in case of failure during bilateral control can harm human. In the last years there have been several attempts to make the force feedback devices more compact, safe, and wearable. In [5], an exoskeleton-type master device was designed based on the kinematic analysis of human arm. Pneumatic actuators generate torque feedback. The authors succeeded in making the lightweight and compact force reflecting master arm. However, the force-reflection capability of this device is not enough to present contact forces effectively. An artificial pneumatic muscle-type actuator was proposed [4]. Wearable robotic arm with 7 DOF and high joint torques was developed. Robotic arm uses parallel mechanisms at the shoulder part and at wrist part similarly to the muscular structure of human upper limb. It should be noted, however, that dynamic characteristics of such pneumatic actuator possess strong nonlinearity and load-dependency, and, thus, a number of problems need to be resolved for its successful application. The compact string-based haptic device for bimanual interaction in virtual environment was described in [6]. The users of SPIDAR can intuitively manipulate the object and experience 6-DOF force feedback. The human-scale SPIDAR allowing enlargement of working space was designed [9]. However, the wires moving in

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Augmented Human Conference, April 2–3, 2010, Megève, France. Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

front of the user present the obstacle for the human vision. They also restrict the human arm motion in several directions and user has to pay attention to not injure himself. Moreover, user grasps the ball-shaped grip in such a way that fingers cannot move. In order to achieve human-friendly and wearable design of haptic display, we analyzed the amount of torque to be presented to the operator arm. Generally, there are three cases when torque feedback is needed. The first case takes place when haptic communication with remote human needs to be realized. For example, the person handshakes the slave robot and joint torques are presented to the operator. Such interaction results in very small torque magnitude (in the range of 0-1.5 Nm). The second situation takes place when a slave robot transports heavy object. Here, the torque values are much higher than in previous case and torque magnitude depends on the load weight. However, continuous presentation of high torques to the operator will result in human muscle fatigue. We argue that downscaled torque indicating direction of the force would be informative enough. The third and the worst case of contact state in term of interactive force magnitude is collision. The result of collision with fixed object (as it is often the case) is immediate discontinuation of the operator’s arm motion. Therefore, the power of torque display must be enough to only fixate the operator arm. For the case of collision with movable obstacle, the haptic display should induce human’s arm motion in the direction of the impact force, decreasing thus the possible damages.

2. DEVELOPMENT OF THE HAPTIC DISPLAY FlexTorque The idea behind the novel torque display FlexTorque (haptic display that generates Flexor and extensor Torque) is to reproduce human muscle structure, that allows us to perform dexterous manipulation and safe interaction with environment in daily life.

Figure 1. Structure and action of a skeletal muscle.

Main functions of the muscles are contraction for locomotion and skeletal movement. A muscle generally attaches to the skeleton at both ends. Origin is the muscle attachment point to the more stationary bone. The other muscle attachment point to the bone that moves as the muscle contracts is Insertion. Muscle is connected to the periosteum through tendon (connective tissue in the shape of strap or band). The muscle with tendon in series acts

like a rope pulling on a lever when pulling tendons to move the skeleton (Figure 1). When we hold a heavy object in a palm, its weight produces torques in the wrist, elbow, and shoulder joint. Each muscle generates a torque at a joint that is the product of its contractile force and its moment arm at that joint to balance gravity force, as well as inertial forces, and contact forces. Thus, we can feel object weight. Because muscles pull but cannot push, hinge joints (e.g., elbow) require at least two muscles pulling in opposite direction (antagonistic muscles). The torque produced by each muscle at a joint is the product of contractile force (F) and moment arm at that joint (d). The net torque Tnet is the sum of the torques produces by each antagonistic muscle. Movement of human limbs is produced by coordinated work of muscles acting on skeletal joints. The structure of the developed torque display FlexTorque is presented in Figure 2.

Figure 2. FlexTorque on the human’s arm surface.

FlexTorque is made up of two DC motors (muscles) fixedly mounted into plastic Motor holder unit, Belts (tendons), and two Belt fixators (Insertions). The operation principle of the haptic display is as follows. When DC motor is activated, it pulls the belt and produces force Fflex generating the flexor torque Tflex. The oppositely placed DC motor generates the extensor torque Text. Therefore, the couple of antagonistic actuators produce a net torque at operator elbow joint Tnet. We defined the position of the Insertion point to be near to the wrist joint in order to develop large torque at the elbow joint. The position of the operator’s arm, when flexor torque is generated, is shown in Figure 3 (where θ stands for angle of forearm rotation in relation to upper arm).

Fflex Fext

dflex dext

Text=Fext × dext

Tflex=Fflex × dflex

Fload

Tload=Fload × dload

dload

Tnet= Tflex - Text

Origin

Insertion

Muscle

Tendon

Figure 3. Positions of the human’s arm under flexor torque.

Let us consider the calculation procedure of the net torque value. The layout of the forces and torques applied to the forearm during flexion is given in Figure 4.

Figure 4. Diagram of applied forces and torques.

The tension force Ft of the belt can be derived from:

mt

T iFr

= , (1)

where Tm is the motor torque, i is the gear ratio, and r is the shaft radius. The net torque Tn acting at the elbow joint is:

( )cosn ty f t fT F d F d α= = , (2)

where df is the moment arm.

The angle α varies according to the relative position of the forearm and upper arm. It can be found using the following equation:

2 2 21cos

2f t

f

l d dld

α − ⎛ ⎞+ −= ⎜ ⎟⎜ ⎟

⎝ ⎠, (3)

where dt is the distance from the pivot to the Origin; l is the length of belt, it can be calculated from the rotation angle of the motor shaft. The detailed view of the FlexTorque is presented in Figure 5.

Figure 5. 3D exploded view of the driving unit of FlexTorque.

Each unit is compact and light in weight (60 grams). This was achieved due to the use of plastic and duralumin materials in manufacturing the main components. The Supporter surface has concave profile to match the curvature of human arm surface (Figure 6).

Figure 6. Driving unit of FlexTorque.

The essential advantage of the structure of FlexTorque device is that heaviest elements (DC motors, shafts, and pulleys) are located on the part of upper arm, which is nearest to the shoulder. Therefore, operator’s arm undergoes very small additional loading. The rest of components (belts, belt fixators) are light in weight and do not load the operator’s muscles considerably. We propose to use term “Karate (empty hand) Haptics” to such kind of novel devices because they allow presenting the forces to the human arm without using additional interfaces in the human hands. The developed apparatus features extremely safe force presentation to the human’s arm. While overloading, the belt is

m

d

α

T

f

tyFtxF

tF

nT

dt l

θ = 10° θ = 55° θ = 110°

Direction of motor shaft

rotation Direction of belt tension

DC Motor

Motor holder

Supporter

Stopper

Stopper

Pulleys Timing belt

Shaft

physically disconnected from the motor and the safety of the human is guaranteed. The vibration of the human arm (e.g., simulation of driving the heavy truck) can be realized through alternate repeatable jerks of torque of antagonistic motors. Thus, the operator can perceive the roughness of road surface. The FlexTorque enables the creation of muscle stiffness. By contracting belts before the perturbation occur we can increase the joint stiffness. For example, during collision of human hand with the moving object in Virtual Environment the tension of the belt of one driving units drops abruptly and the tension of the belt pulling the forearm in the direction of the impact force increases quickly. The contact and collision with virtual object can be presented through FlexTorque as well. In the case of collision, the limb must be at rest. In such a case, the net torque produced by the muscles is opposed by another equal but opposite torque Tload. Similarly to the human muscles, the net torque produced by the haptic display restrains the further movement of the user’s arm.

3. APPLICATIONS The main features of FlexTorque are: (1) it presents high fidelity kinesthetic sensation to the user according to the interactive forces; (2) it does not restrict the motion of the human arm; (3) it has wearable design; (4) it is extremely safe in operation; (5) it does not require a lot of storage space. These advantages allow a wide range of applications in virtual and augmented reality systems and introduce a new way of game playing. Here we summarize the possible application of haptic display FlexTorque:

1) Virtual and Augmented Environments (presentation of physical contact to human’s arm, muscle stiffness, object weight, collision, physical contact, etc.).

2) Augmented Sport and Games (enhancing the immersive experience of the sport and games through the force feedback).

3) Rehabilitation (user with physical impairments can easily control the applied torque to the arm/leg/palm during performing the therapeutic exercises).

4) Haptic navigation for blind persons (the obstacle detected by camera is transferred to force restricting the arm motion in the direction of the object).

A number of games for augmented sport experiences, which provide a natural, realistic, and intuitive feeling of immersion into virtual environment, can be implemented. The Arm Wrestling game that mimics the real physical experience is currently under development (Figure 7). The user wearing FlexTorque and Head mounted display (HMD) can play either with a virtual character or a remote friend for more personal experience. The virtual representation of players’ arms are shown on the HMD. While playing against a friend, user sees the motion of arms and experiences the reaction force from rival.

Figure 7. Augmented Arm Wrestling and Augmented Collision.

4. USER STUDY AND FUTURE RESEARCH FlexTorque haptic interface was demonstrated at SIGGRAPH ASIA 2009 [1,2,10]. To maintain the alignment of the extensor belt on the elbow avoiding thus slippage, user wears specially designed pad equipped with guides. We designed three games with haptic feedback. We developed the Gun Simulator game with the recoil imitation (Figure 8). Quick single jerk of the forearm simulates the recoil force of a gun. High-frequency series of impulsive forces exerted on the forearm imitate the shooting by machine gun. In this case upper motor is supplied with short ramp impulses of current.

Figure 8. The Gun Simulator game.

In Teapot Fishing game player casts a line by quick flicking the rod towards the water (Figure 9).

Figure 9. The Teapot Fishing game.

Once the user feels the tug at the forearm (and see float going down), he gives fishing rod a quick jerk backward and up. When jerk is late, the fish (teapot) gets off the hook. The ramp impulse of the motor torque generates the jerk of the forearm downward indicating that fish picks up the hook. Such practice can help the user to get a feel of the real fishing. With Virtual Gym game we can do the strength training exercise at home in a playful manner (Figure 10). The virtual biceps curl exercise machine was designed. The belt tension creates the resistance force in the direction of the forearm motion. The user can adjust the weight easily.

Figure 10. The Virtual Gym game.

In total more than 100 persons had experienced novel haptic interface FlexTorque. We have a got very positive feedback from the users and companies. While discussing the possible useful applications with visitors, the games for physical sport exercises and rehabilitation were frequently mentioned. The majority of users reported that this device presented force feedback in a very realistic manner.

5. DESIGN OF MULTIPURPOSE HAPTIC DISPLAY FlexTensor The motivation behind the development of the FlexTensor (haptic display that uses Flexible belt to produce Tension force) was to achieve realistic feedback by using simple and easy to wear haptic display. The multipurpose application is realized by means of fixation of different elements of FlexTensor (i.e. middle of the belt, Origin/Insertion points) in the particular application. The structure of the FlexTensor is similar to the flexor part of FlexTorque haptic display. The main differences are: (1) belt connects the movable points on the human arm; (2) both attachment points of the belt have embedded DC motors. In the haptic display FlexTorque the function of each attachment point is predetermined (Figure 11). The configuration of FlexTensor allows each point to perform the function of Insertion and Origin depending on the purpose of application (Figure 12). This fact enables to enlarge the area of FlexTensor applications in Virtual Reality extraordinary.

Figure 11. Kinematic diagram of FlexTorque and human arm.

Figure 12. Kinematic diagram of FlexTensor and human arm.

In the configuration when the middle of the belt is not fixed, FlexTensor presents the external force resisting the expanding of the human arms (basic configuration). This action can be used for simulation of the breaststroke swimming technique, when human sweeps the hands out in water to their widest point (Figure 13). The configuration, in which the middle of the belt is fixed by user standing on the band with both (or one) feet, enables presentation of the object weight (Figure 14). The tension of the belt represents the magnitude of gravity force acting on the human arms. The fixation of the middle of the belt can be positioned on the human neck (for simulation of human arm lifting) and on the waist (for simulation of resistance of environment in the direction of arm stretching, e.g., in the case of contact with the virtual wall).

Figure 13. Application of FlexTensor for swimming training.

Tm

Ft

Upper arm Forearm

Origin

Insertion

FEXT

FEXT Breaststroke technique

m1T

Left Arm

Ft1

Right Arm

Hands

t2F

m2T

Origin/Insertion

Origin/Insertion

Shoulder joint

Shoulder joint

Belt

Figure 14. Application of FlexTensor for the weight presentation and strength training exercise.

In the case when the palm of one arm is placed on some part of the body (e.g., waist, neck), this attachment point becomes Origin. Such action as unsheathing the sword can be simulated by stretching out the unfixed arm. FlexTensor can interestingly augment the 3D archery game presenting the tension force between arms.

The illusion of simultaneous pulling of both hands can be implemented by exertion of different values of forces Ft1 and Ft2 in the basic configuration (see Figure 12). The illusion of being pulled to the left side and to the right side can be achieved when Ft1>Ft2 and Ft1<Ft2, respectively.

The developed apparatus features extremely safe force presentation to the human’s arm. While overloading, physical disconnection of the belt from the motor protects the user from the injury.

6. CONCLUSIONS Novel haptic interfaces FlexTorque and FlexTensor suggest new possibilities for highly realistic, very natural physical interaction in virtual environments, augmented sport, and augmented game applications. A number of new games for sport experiences, which provide a natural, realistic, and intuitive feeling of physical immersion into virtual environment, can be implemented (such as skiing, biathlon (skiing with rifle shooting), archery, tennis, sword dueling, driving simulator, etc.). The future goal is the integration of the accelerometer and MEMS gyroscopes into the holder and fixator of the FlexTorque and into FlexTensor for capturing the complex movement and recognizing the gesture of the user. The new version of the FlexTorque and FlexTensor (ExoInterface) will take advantages of the Exoskeletons (strong force feedback) and Wii Remote Interface (motion-sensing capabilities).

We expect that FlexTorque and FlexTensor will support future interactive techniques in the field of robotics, virtual reality, sport simulators, and rehabilitation.

7. ACKNOWLEDGMENTS The research is supported in part by the Japan Science and Technology Agency (JST) and Japan Society for the Promotion of Science (JSPS). We would like also to acknowledge and thank Alena Neviarouskaya for valuable contributions and advices.

8. REFERENCES [1] FlexTorque. Games presented at SIGGRAPH Asia 2009.

http://www.youtube.com/watch?v=E6a5eCKqQzc [2] FlexTorque. Innovative Haptic Interface. 2009.

http://www.youtube.com/watch?v=wTZs_iuKG1A&feature=related

[3] Hayashi, T., Kawamoto, H., and Sankai, Y. 2005. Control method of robot suit HAL working as operator’s muscle using biological and dynamical information. In Proceedings of the International Conference on Intelligent Robots and Systems (Edmonton, Canada, August 02 - 06, 2005). IROS '05. IEEE Press, New York, 3063-3068.

[4] Jeong, Y., Lee, Y., Kim, K., Hong, Y-S., and Park, J-O. 2001. A 7 DOF wearable robotic arm using pneumatic actuators. In Proceedings of the International Symposium on Robotics (Seoul, Korea, April 19-21, 2001). ISR '01. 388-393.

[5] Lee, S., Park, S., Kim, W., and Lee, C-W. 1998. Design of a force reflecting master arm and master hand using pneumatic actuators. In Proceedings of the IEEE International Conference on Robotics and Automation (Leuven, Belgium, May 16-20, 1998). ICRA '98. IEEE Press, New York, 2574-2579.

[6] Murayama, J., Bougrila, L., Luo, Y., Akahane, K., Hasegawa, S., Hirsbrunner, B., and Sato, M. 2004. SPIDAR G&G: a two-handed haptic interface for bimanual VR interaction. In Proceedings of the EuroHaptics (Munich, Germany, June 5-7, 2004). Springer Press, Heidelberg, 138-146.

[7] PHANTOM OMNI haptic device. SensAble Technologies. http://www.sensable.com/haptic-phantom-omni.htm

[8] Raytheon Sarcos Exoskeleton. Raytheon Company. http://www.raytheon.com/newsroom/technology/rtn08_exoskeleton/

[9] Richard, P., Chamaret, D., Inglese, F-X., Lucidarme, P., and Ferrier, J-L. 2006. Human scale virtual environment for product design: effect of sensory substitution. The International Journal of Virtual Reality, 5(2), 37-34.

[10] Tsetserukou, D., Sato, K., Neviarouskaya, A., Kawakami, N., and Tachi, S. 2009. FlexTorque: innovative haptic interface for realistic physical interaction in Virtual Reality. In Proceedings of the 2nd ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Technologies in Asia (Yokohama, Japan, December 16-19, 2009), Emerging Technologies. ACM Press, New York, 69.

[11] Wii Remote. Nintendo Co. Ltd. http://www.nintendo.com/wii/what/accessories

mg

Object

Motor

Belt

Middle of a belt is fixed Biceps curl exercise

PossessedHand: A Hand Gesture Manipulation Systemusing Electrical Stimuli

Emi TamakiInterdisciplinary Information

Studies,The University of Tokyo, Japan

[email protected]

Takashi MiyakiInterfaculty Initiative inInformation Studies,

The University of Tokyo, [email protected]

Jun RekimotoInterfaculty Initiative inInformation Studies,

The University of Tokyo, [email protected]

ABSTRACTAcquiring knowledge about the timing and speed of handgestures is important to learn physical skills, such as play-ing musical instruments, performing arts, and making hand-icrafts. However, it is difficult to use devices that dynam-ically and mechanically control a user’s hand for learningbecause such devices are very large, and hence, are unsuit-able for daily use. In addition, since groove-type devicesinterfere with actions such as playing musical instruments,performing arts, and making handicrafts, users tend to avoidwearing these devices. To solve these problems, we proposePossessedHand, a device with a forearm belt, for controllinga user’s hand by applying electrical stimulus to the musclesaround the forearm of the user. The dimensions of Pos-sessedHand are 10×7.0×8.0 cm, and the device is portableand suited for daily use. The electrical stimuli are gener-ated by an electronic pulse generator and transmitted from14 electrode pads. Our experiments confirmed that Pos-sessedHand can control the motion of 16 joints in the hand.We propose an application of this device to help a beginnerlearn how to play musical instruments such as the piano andkoto.

Categories and Subject DescriptorsB4.2 [Input/output and data communications]: In-put/Output Devices

General TermsDesign

Keywordsinteraction device, output device, wearable, hand gesture,electrical stimuli

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Augmented Human Conference April 2–3, 2010, Megève, FranceCopyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00.

Figure 1: Interaction examples of PossessedHand.(a)A feedback system. (b)A navigation system.

1. INTRODUCTIONAlthough a number of input systems for hand gestures

have been proposed, very few output systems have been pro-posed for hand gestures. If a computer system controls auser’s hand, the system can also be used to provide feed-backs to various interaction systems such as systems for rec-ognizing virtual objects (Fig. 1-a) and navigation (Fig. 1-b), assistant systems for playing musical instruments, anda substitute sensation system for the visually impaired andhearing impaired. In this paper, we propose PossessedHand,a device with a forearm belt, for controlling a user’s handby applying electrical stimulus to the muscles around theforearm.

2. PHASE OF DEVELOPMENTThere are four phases for controlling the hand posture. In

this research, we confirm the phase for which PossessedHandcan be used. Thereafter, we propose interaction systemsbased on PossessedHand.

• Phase. 1 : Although the user cannot visually con-firm the hand motion, he/she feels the motion owingto his/her somatic sense.(e.g., Providing feedback for recognizing virtual ob-jects)

• Phase. 2 : User can visually confirm the motion.(e.g., Learning systems for performing arts)

• Phase. 3 : User’s fingers can be independently con-trolled to achieve grasping and opening motions.(e.g.,Assistant systems for musical performances andsport activities, navigation systems, and sensory sub-stitution systems for the visually impaired and thehearing impaired)

• Phase. 4 : User’s hand can be controlled to achieve finemotions such as pinching using the thumb and indexfinger.(e.g., Learning systems for finger languages and formaking handicrafts)

Many devices that directly stimulate a user’s fingers[6]are proposed. However, users tend to avoid wearing devicesplaced on area A, which is shown in Figure 2; this is becausearea A is used to touch, hold, and pinch real objects. Area Ainterferes with playing musical instruments, performing arts,and making handicrafts. Groove-type devices that dynam-ically and mechanically control a user’s hand are available.Such devices can control a user’s hand for the phases 1-4.However, such devices cover most of area A. Although a de-vice that can be worn on the forearm is proposed[16], it istoo large for daily use. We propose a small device that cancontrol a user’s hand and avoid covering area A.

3. RELATED WORKElectrical muscle stimulation (EMS) has several appli-

cations. EMS is widely used in low-frequency therapeuticequipments and in devices for ergotherapy[9]. Akamatsu etal. applied EMS for performing arts[18].

Our goal is to control a user’s hand by EMS, which issimilar to functional electrical stimulation (FES)[8], [7],[17],[14]. In FES, electric currents are used to activate nervesinnervating extremities that are affected by paralysis result-ing from stroke or other neurological disorders and injuriesof spinal cord or head; FES can be used to restore functionsin people with disabilities[19].

Watanabe et al. and Kruijff et al. proposed a techniquein which a user’s wrist can be controlled with two degreesof freedom by stimulating four muscles[14], [5]. They con-firmed that they could control wrist motion by electricallystimulating a muscle because such a stimulation results inthe motion of the tendon connected to the wrist. However,they did not consider the motion of finger joints; this motionis important for controlling the hand posture. Moreover,they use invasive electrodes embedded under the skin; suchelectrodes are not suitable for daily use. For enabling dailyuse, we need to use noninvasive electrodes. In addition, we

Figure 2: Area A:This area is involved in pinch-ing, gripping, and holding motions. Area B: Electricstimuli are given in this area.

Figure 3: A prototype of PossessedHand (electronicpulse generator and electric pads).

need to avoid placing electrodes on hands or fingers becausethey are used to hold or touch objects.

In this paper, we propose PossessedHand, a device usedfor controlling a user’s hand by applying an electrical stim-ulus to the muscles around the forearm with noninvasiveelectrode pads. Muscles, which are involved in finger mo-tions, are clustered in the forearm[10]. PossessedHand has14 electrode pads placed on the forearm to stimulate thesemuscles. The tendons that are connected to the musclesmove the finger joints. There are no precedent researches onthe manner in which hand posture can be controlled by pro-viding only electrical stimulation to the forearm. First, weconducted an experiment to identify which and how manyfinger joints can be controlled by PossessedHand. In thispaper, we discuss the results on the basis of the phases 1-4discussed above. Thereafter, we propose interaction systemsthat can be realized by using PossessedHand.

Figure 4: Configuration.

4. SYSTEM CONFIGURATION

4.1 Muscles and Stimulations for Making HandPostures

We use EMS[2], in which muscle contraction is achievedby using electric impulses, to control a user’s hand. The im-pulses are generated by PossessedHand and are transmittedthrough electrode pads placed on the skin to the musclesthat are to be stimulated. PossessedHand with the desiredoutput energy and compact size can be realized by usingEMS[12].

An electrical stimulus of PossessedHand is applied to themuscles in the forearm of a user because many muscles thatcontrol the fingers and the wrist are located here. We adopta forearm belt for PossessedHand. The electrical stimuli aregenerated by an electronic pulse generator and transmittedfrom 14 electrode pads. The pads are arranged on the upperand lower parts of the forearm of a user (Fig. 3); eight padsare needed to stimulate the muscles that are used to bendthe joint in a finger, and six other pads are needed to stimu-late finger extension and wrist flexion. PossessedHand stim-ulates seven muscles (superficial flexor muscle, deep flexormuscle, long flexor muscle of thumb, common digital exten-sor muscle, flexor carpi radialis muscle, long palmar muscle,and flexor carpi ulnaris muscle). These muscles are shownin area B in Figure 3. We can select a channel between apad on the upper portion and one on the lower portion ofthe forearm. Thus, 7 × 7 channels are available.

4.2 A Prototype of PossessedHandWe built a prototype of PossessedHand using a pulse gen-

erator, a channel selector (Photo-MOS Relays Series AQV253),and 14 electrode pads (Fig. 3). The dimensions of Pos-sessedHand are 10.0 × 7.0 × 8.0 cm, and it is portable andsuited for daily use. Its configuration is shown in Figure 4.Pulse width is 0.2 ms, and voltage is in the range 17-29 V.

5. EXPERIMENTSWe confirmed that PossessedHand can control the motion

of 16 joints in the hand.We conducted an experiment to confirm whether the fin-

Figure 5: Operable joints. Arrows and squares indi-cate independently operable joints. Circles indicateganed operable joints.

ger joints can be appropriately moved to achieve desiredhand postures. We selected an anode from the seven elec-trodes placed on the upper arm, and a ground electrode fromthe seven electrodes placed on the hand side. We tested 7-by-7 patterns of the electronic paths corresponding to eachof three peak values of the pulse (17 V, 23 V, and 29 V); inother words, we performed 147 stimulations. We asked thesubjects to eliminate strain in the hand.

In the next section, we introduce 3 interaction systemsof PossessedHand; navigation system, providing feedbackfor recognizing virtual objects, assistant system for musi-cal performance. They correspond to phase 1 to 3 of thehand posture, respectively.

We have confirmed that PossessedHand can control sevenindependent and nine linked joints, i.e., a total of 16 joints.We have also confirmed that a clasped hand can be openedby stimulating a common digital extensor muscle. Further,we have confirmed that the users can recognize the motionof their hands motion even with closed eyes. Figure 5 showsthe results of our experiment. These results suggest thatPossessedHand can control hand postures in phases 1-3 asdiscussed above. In the next section, we introduce the threeinteraction systems of PossessedHand, namely, the naviga-tion system, feedback system for recognizing virtual objects,and assistant system for musical performance. These sys-tems correspond to phases 1-3 of the hand posture, respec-tively.

6. INTERACTION SYSTEMS OF POSSESSED-HAND

6.1 Navigation System(Using Phases 1, 2, and3)

We propose a navigation system for PossessedHand (Fig.1-b). PossessedHand can be used to make hand gestures topoint to user’s destination. This is advantageous becausemaps or announcements are not required when using Pos-

Figure 6: manipulator

sessedHand. Watanabe et al. proposed a navigation systemin which galvanic vestibular stimulation (GVS) [15], [20],[3] is used. Since GVS affects user’s sense of acceleration,user’s walking direction can be controlled by the proposedsystem. However, this system cannot provide detailed in-formation such as direction and distance. We propose anavigation system that controls wrist flexion and hand pos-ture and provides detailed information about direction anddistance.

6.2 Feedback System for Recognizing VirtualObjects(Using Phase 1)

PossessedHand can be used as a feedback system that con-veys the existence of a 3D virtual object in the real world(Fig. 1-a). Haptic feedback is necessary for receiving infor-mation on virtual objects in augmented reality and mixedreality spaces. PossessedHand provides haptic feedback bycontrolling hand posture as well as visual feedback[4],[11]obtained using head-mounted displays or 3D displays.

6.3 Assistant System for Musical Performance(Using Phase 1, 2, and 3)

We propose an application of PossessedHand that helpsa beginner learn how to play the musical instruments suchas the piano and koto. In such musical instruments, subtledifferences in tones are achieved by fine finger movements.The koto is a traditional Japanese stringed musical instru-ment. A koto player uses three finger picks (on the thumb,index finger, and middle finger) to pluck the strings. Anappropriate hand posture is important for playing such in-struments well (Fig. 6). PossessedHand can assist the be-ginner to acquire proper hand positions and postures. Ahand-gesture recognition system with a camera[1] can alsobe used to identify whether the hand positions and posturesare appropriate for the instrument. PossessedHand can helpthe beginner to learn professional techniques, which cannotbe written in scores (Fig. 7). Furthermore, PossessedHandcan help a distant learner learn to move fingers appropriatelywhen playing musical instruments.

7. DISCUSSIONTo extend the use of PossessedHand, we have to consider

reaction rates, accuracy, and muscle fatigue[13] and realizeautomatic setup systems to control the voltage and the po-sitions of the electrode pads. It takes 5 min for manuallysetting the position of the pads and voltage value. We have

Figure 7: Hand postures for musical performances.(a)An incorrect posture for playing the piano. (b)Acorrect posture for playing the piano. (c)An incor-rect posture for playing the koto. (d)A correct pos-ture for playing the koto.

to develop an automatic setup system that is based on neuralnetwork systems, which provide rapid feedback on the posi-tion of the pads, voltage value, and joint angles. Thereafter,the use of PossessedHand can be extended for performingsports, learning finger languages, performing arts, and mak-ing handicrafts.

8. CONCLUSIONIn this paper, we proposed the use of PossessedHand, a

device used to control hand postures by an electrical stim-ulation technique. The electrical stimuli are transmittedfrom the 14 noninvasive electrode pads placed on the fore-arm muscles of the user; these stimuli control the motions ofa user’s hand. Our experiments confirmed that Possessed-Hand can to control the motion of 16 joints in the hand. Thedevice can control the motion of seven independent jointsand nine joints whose motions are linked with those of otherjoints. We confirmed that a clasped hand can be opened bystimulating the common digital extensor muscle. We alsoconfirmed that the users can recognize the motion of theirhand even with their eyes closed. On the basis of the resultsof the experiments, we proposed three interaction systems,namely, a navigation system, a feedback system for recog-nizing virtual objects, and an assistant system for aidingmusical performance.

9. ACKNOWLEDGMENTSWe thank Ken Iwasaki who have contributed time to this

research.

10. REFERENCES[1] T. Emi, M. Takashi, and R. Jun. A robust and

accurate 3d hand posture estimation method forinteractive systems. IPSJ, 51(2):1234–1244, 2010.

[2] H. Hummelsheim, M. Maier-Loth, and C. Eickhof.The functional value of electrical muscle stimulationfor the rehabilitation of the hand in stroke patients.Scandinavian journal of rehabilitation medicine,29(1):3, 1997.

[3] J. Inglis, C. Shupert, F. Hlavacka, and F. Horak.Effect of galvanic vestibular stimulation on humanpostural responses during support surface translations.Journal of neurophysiology, 73(2):896, 1995.

[4] D. Jack, R. Boian, A. Merians, S. V. Adamovich,M. Tremaine, M. Recce, G. C. Burdea, andH. Poizner. A virtual reality-based exercise programfor stroke rehabilitation. In Assets ’00: Proceedings ofthe fourth international ACM conference on Assistivetechnologies, pages 56–63, New York, NY, USA, 2000.ACM.

[5] E. Kruijff, D. Schmalstieg, and S. Beckhaus. Usingneuromuscular electrical stimulation for pseudo-hapticfeedback. In VRST ’06: Proceedings of the ACMsymposium on Virtual reality software and technology,pages 316–319, New York, NY, USA, 2006. ACM.

[6] S. Kuroki, H. Kajimoto, H. Nii, N. Kawakami, andS. Tachi. Proposal for tactile sense presentation thatcombines electrical and mechanical stimulus. In WHC’07: Proceedings of the Second Joint EuroHapticsConference and Symposium on Haptic Interfaces forVirtual Environment and Teleoperator Systems, pages121–126, Washington, DC, USA, 2007. IEEEComputer Society.

[7] M. Poboroniuc and C. Stefan. A method to testfes-based control strategies for neuroprostheses. InICAI’08: Proceedings of the 9th WSEAS InternationalConference on International Conference onAutomation and Information, pages 344–349, StevensPoint, Wisconsin, USA, 2008. World Scientific andEngineering Academy and Society (WSEAS).

[8] Y. Ryo, S. Yoshihiro, N. Yukio, H. Yasunobu,Y. Shimada, K. Shigeru, N. Akira, I. Masayoshi, andH. Nozomu. Analysis of hand movement indubed byfunctional electrical stimulation in tetraplegic andhemiplegic patients. The Japanese Journal ofRehabilitation Medicine, 21(4):235–242, 1984.

[9] S. S and V. Gerta. Science and practice of strengthtraining - ems. The Journal of Physiology.

[10] M. Schuenke, U. Schumacher, E. Schulte, and et al.Atlas of Anatomy: General Anatomy andMusculoskeletal System.(Prometheus). Georg ThiemeVerlag, 2005.

[11] Y. Shen, S. K. Ong, and A. Y. C. Nee. Handrehabilitation based on augmented reality. Ini-CREATe ’09: Proceedings of the 3rd InternationalConvention on Rehabilitation Engineering & AssistiveTechnology, pages 1–4, New York, NY, USA, 2009.ACM.

[12] S. Tachi, K. Tanie, and M. Abe. Effects of pulse heightand pulse width on the magnitude sensation ofelectrocutaneous stimulus. Japanese journal of medicalelectronics and biological engineering, 15(5):315–320,1977.

[13] S. Takahiro, K. Toshiyuki, and I. Koji. Lower-limbjoint torque and position controls by functionalelectrical stimulation (fes). IEICE technical report.

ME and bio cybernetics, 104(757):25–28, 2005.

[14] W. Takashi, I. Kan, K. Kenji, and H. Nozomu. Amethod of multichannel pid control of 2-degree offreedom of wrist joint movements by functionalelectrical stimulation. The transactions of the Instituteof Electronics, Information and CommunicationEngineers., 85(2):319–328, 2002.

[15] Y. Tomofumi, A. Hideyuki, M. Taro, and W. Junji.Externalized sense of balance using galvanic vestibularstimulation. Association for the Scientific Study ofConsciousness 12th Annual Meeting.

[16] D. Tsetserukou, K. Sato, A. Neviarouskaya,N. Kawakami, and S. Tachi. Flextorque: innovativehaptic interface for realistic physical interaction invirtual reality. In SIGGRAPH ASIA ’09: ACMSIGGRAPH ASIA 2009 Art Gallery & EmergingTechnologies: Adaptation, pages 69–69, New York,NY, USA, 2009. ACM.

[17] S. H. Woo, J. Y. Jang, E. S. Jung, J. H. Lee, Y. K.Moon, T. W. Kim, C. H. Won, H. C. Choi, and J. H.Cho. Electrical stimuli capsule for control movingdirection at the small intestine. In BioMed’06:Proceedings of the 24th IASTED internationalconference on Biomedical engineering, pages 311–316,Anaheim, CA, USA, 2006. ACTA Press.

[18] N. Yoichi, A. Masayuki, and T. Masaki. Developmentof bio-feedback system and applications for musicalperformances. IPSJ SIG Notes, 2002(40):27–32, 2002.

[19] D. Zhang, T. H. Guan, F. Widjaja, and W. T. Ang.Functional electrical stimulation in rehabilitationengineering: a survey. In i-CREATe ’07: Proceedingsof the 1st international convention on Rehabilitationengineering & assistive technology, pages221–226, New York, NY, USA, 2007. ACM.

[20] R. Zink, S. Steddin, A. Weiss, T. Brandt, andM. Dieterich. Galvanic vestibular stimulation inhumans: effects on otolith function in roll.Neuroscience letters, 232(3):171–174, 1997.

A GMM based 2-stage architecture for multi-subjectemotion recognition using physiological responses

Gu YuanSchool of Computer

EngineeringNanyang Technological

UniversitySingapore 639798

[email protected]

Tan Su LimSchool of Computer



[email protected]

Wong Kai JuanSchool of Computer



[email protected] Moon-Ho Ringo

School of Humanities andSocial Science

Nanyang TechnologicalUniversity

Singapore [email protected]

Qu LiSchool of Humanities and

Social ScienceNanyang Technological


[email protected]

ABSTRACTThere is a trend these days to add emotional character-istics as new features into human-computer interaction toequip machines with more intelligence when communicatingwith humans. Besides traditional audio-visual techniques,physiological signals provide a promising alternative for au-tomatic emotion recognition. Ever since Dr. Picard andcolleagues brought forward the initial concept of physiolog-ical signals based emotion recognition, various studies havebeen reported following the same system structure. In thispaper, we implemented a novel 2-stage architecture of theemotion recognition system in order to improve the perfor-mance when dealing with multi-subject context. This typeof system is more realistic practical implementation. Insteadof directly classifying data from all the mixed subjects, onestep was added ahead to transform a traditional subject-independent case into several subject-dependent cases byclassifying new coming sample into each existing subjectmodel using Gaussian Mixture Model (GMM). For simul-taneous classification on four affective states, the correctclassification ration (CCR) shows significant improvementfrom 80.7% to over 90% which supports the feasibility ofthe system.

Categories and Subject DescriptorsH.1.2 [Models and Principles]: User/Machine Systems—Human information processing ; G.3 [Probability and Statis-tics]: Multivariate statistics

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Augmented Human Conference, April 2-3, 2010, Megève, France.Copyright c⃝ 2010 ACM 978-1-60558-825-4/10/04... $10.00.

1. INTRODUCTIONEmotion awareness has become one of the most innovativefeatures in human computer interaction in order to achievemore natural and intelligent communications. Towards var-ious measures of automatic emotion recognition in the engi-neering way, numerous efforts have been deployed to the au-diovisual channels such as facial expressions [4, 6] or speeches[2, 12, 16]. Recently physiological signals, as an alterna-tive channel for emotional communication, have graduallyearned attentions in the field of emotion recognition. Start-ing from the series of publications authored by Dr. Picardand her colleagues in the Massachusetts Institute of Tech-nology (MIT) Laboratory [17, 18, 19], several interestingfindings have been reported indicating that certain affec-tive states can be recognized by means of heart rate (HR),skin conductivity (SC), temperature (Tmp), muscle activity(EMG), and respiration velocity (Rsp). They also elabo-rated a complete physiological signal based emotion recogni-tion procedure which gave great inspirations to the followers[15, 9, 24, 7, 10].

There is one particular issue that first appeared during thedescription of the affective data collection in [19], turns outto be a major obstacle to the development of a generalmethodology for multi-subject emotion recognition using phys-iological signals. This issue, we put as the “individual differ-ences”, can be briefly explained as the intricate variety of in-dividual behaviors among subjects. On one hand, the prob-lem shows the concern of different interpretation of emotionsacross individuals within the same culture [19]. Therefore itmay complicate the signal processing and classification pro-cedures when the goal is to examine whether subjects elicitsimilar physiological patterns for the same emotion. For-tunately, thanks to the vast studies in psychology such asthe proposal of six basic emotions by Ekman [4, 5] or thedevelopment of the International Affective Picture System(IAPS), this aspect of “individual differences”has been someextent alleviated by the employment of scientific categoriza-

tion of emotions and the usage of standardized emotion elic-itation facilities.

On the other hand, the problem however, explains as thepossibility of significant physiological signal variation acrossindividuals, can be quite difficult to solve. The basic in-tent of using physiological signals for emotion recognitionis to discover the inner trend of signal variation during hu-man’s emotional variation. So in fact, it yields a fundamen-tal knowledge that the physical properties of all the signalsshould at least follow a same pattern. In other words, ifsignals from different subjects differ too much, the exactpattern of signal changes during emotional variation thatwe are looking for could be buried by the other distinct pat-terns brought by the “individual differences”. That is whysome studies limited the experimental subject to a singleperson [19, 9, 7], to potentially remove the variability. Thiscompromise of single-subject approach might be consideredin the early stages of affective recognition since it is valuableto develop subject-dependent methods. However, the spe-cific features and recognition results obtained from one singleperson may not be the same for other subjects [19]. Hence,single-subject approach is always argued as showing the fa-tal weakness that any developed recognition methods are notgenerally applicable. Therefore, some other researchers im-plemented various ways to realize the subject-independent(meaning all signals mixed together for different subjects)approach [15, 24, 10, 8]. After a few not-so-successful at-tempts (a comparison of similar studies is shown in Table 1),it is commonly accepted that subject-independent approachtends to perform worse than subject-dependent (single sub-ject) approach due to the influence of“individual differences”[10]. Hence Kim and Andre briefly suggested that it can bepossible to improve the recognition rate by identifying eachindividual prior to the recognition phase, and then conduct-ing the emotion classification in a subject-dependent way.However they did not experimentally elaborate this issue.Besides, they also concerned that this kind of recognitionsystem may only be feasible for a limited number of sub-jects, who are supposed to be “known” to the system (corre-sponding data of each subject are cumulatively collected ina learning phase) [10].

In this paper, we introduce a complete physiological signalsbased 2-stage emotion recognition system using GaussianMixture Model (GMM) for the 1st stage, Sequential Float-ing Forward Search (SFFS) with kNN classifier (SFFS-kNN)for the 2nd stage. In the 1st stage, data from each subjectare trained into separate GMM models. A new incomingsample is classified to one subject model using the MaximumPosterior Probability (MAP) rule, then follows a traditionalsubject-dependent procedure which is the 2nd stage. Notedthat a major difference in our understanding is that the 1ststage is not treated as a subject identification process, butrather a similarity classification based on “known” subjectmodels prepared by the system. Suppose there are c subjectmodels M1...Mj ...Mc stored in the system, a new incomingsample xi will be classified to model Mj . This situationwould be explained as data xi shows the most similar char-acteristics to model Mj but it may not certainly come fromsubject j. In other words, the 2-stage system does not neces-sarily require the test subjects to be “known” to the system.Further elaboration will be presented in later section.

Figure 1: Experimental procedure of the actual testsession.

2. DATA COLLECTIONOne set of ProComp Infiniti unit from Thought Technol-ogy was employed as the data acquisition system. Two highspeed channels at 2048 Hz were used for electrocardiogram(ECG) and blood volume pulse (BVP) sensor. Four lowspeed channels at 256 Hz were occupied by skin conductivity(SC) sensor, respiration (Resp) sensor and two electromyo-graphy (EMG) - one for corrugator (EMGc) and one for zy-gomaticus (EMGz). All the data collection were carried outin the same environment using the same sets of equipment.

28 pictures from the International Affective Picture System(IAPS) were chosen for emotion elicitation. The IAPS is astandard emotion induction procedure developed by Langet al. [11], which has been classified by a large numberof participants in terms of valence and arousal. The pic-tures were selected based on a criteria that the distributionof ratings along pleasant/unpleasant (valence) and excite-ment/calm (arousal) should be relatively balanced. Since itis still unclear whether people from different cultural back-ground would respond to the same emotional stimuli sim-ilarly, instead of following the original IAPS ratings fromLang’s experiments, participants were required to rate pleas-antness and emotional intensity with a nine-point scale.

Figure 1 shows the detail process of the experiment. Be-fore and after the 28 trials of actual emotional inductionphase, there were one PANAS questionnaire (to choose thewords that best described the present mood state) and a3-minute recording session of the physiological levels respec-tively. Each trial consisted of displaying a fixation point as“+” for 6 secs, a picture (randomly chosen from the 28 pre-selected IPAS pictures) for 6 secs, and a black screen for 6secs. Participant was asked to rate the viewed picture interms of its arousal level and valence on a scale from 1-9and verbally speak out a single word of emotion that bestdescribed their feelings after viewing the picture. Then,the participant chose from a list of emotional descriptorsextracted from Tomkins’s concept of eight basic emotions:“anger”, “interest”, “contempt”, “disgust”, “distress”, “fear”,“joy”, “shame”, “surprise” [22, 23], and “nothing” that bestdescribed their feelings after viewing the picture. Each trialwas concluded with solving 5 simple mathematic problemsso to “wash out” the effect of the viewed picture on the sub-ject before next trial was administrated.

3. METHODOLOGY3.1 Signal Processing and Feature Extraction

Table 1: Comparison with Similar Studies (Exp: Experiment settings, Classi: Classifiers, Sel/Red: Featureselection/reduction algorithms)

Author Exp Classi. Sel/Red Results

Picard et al. [19] - single-sub DFA and QDF SFFS and Fisher all classes: 81.25%- 8 emotions using guided imagery

Haag et al.[9] - single-sub MLP none aro: 96.58%- aro/val using IAPS val: 89.93%

Gu et al. [7] - single-sub SVM none aro: 85.71%- aro/val using IAPS val: 78.57%

Wagnar et al. [24] - single-sub kNN, LDF SFFS, Fisher - no feat.red.: 80%- 6 emotions using music and MLP and ANOVA - with feat.red.: 92%

Nasoz et al. [15] - multi-sub kNN, DFA none kNN: 71.6%- 6 emotions using movie clips and MBG DFA: 74.3%

Gu et al. [8] - multi-sub kNN, fkNN, GA - no feat.red.:- aro/val using IAPS LDF and QDF val: 64.2%, aro: 62.8%

- with feat.red.:val: 76.1%, aro: 78%

Kim and Andre [10] - multi-sub pLDA SBS for 4 classes:- 4 EQs on aro/val plane sub-indep: 70%using music sub-dep: 95%

Collected data samples were first segmented into 28 dataentries per subject. Each data entry covered 12 seconds ofsignals trimmed from the beginning of displaying the picturestimulus and ended after showing the black screen. RawECG signals were preprocessed with a series of high and lowpass filters to remove noises and then down-sampled withthe remaining signals including BVP, SC, EMGz, EMGcand Rsp by a factor of 8 for further feature evaluation. In-stead of directly using the ECG signals, the HR (heart rate)information were deduced from the intervals between succes-sive QRS complexes (the most striking waveform within anECG signal). In this study, QRS complexes were detectedusing a derivative-based algorithm and a moving-average fil-ter for smoothing the output [21]. Subsequently, a simplepeak-searching method was applied to locate the peak pointswhich indicate the heart beats. The detailed procedure ofthe QRS detection can be found in our previous report [7].

For a preliminary study on the proposed 2-stage emotionrecognition system, we adopted the time-domain statisticalfeature sets proposed by Picard et al. [19], because these fea-tures have appeared in several previous studies and shownthe ability in classification affective states [19, 9, 7, 8, 14]. 6features were extracted from each physiological signal (HR,BVP, SC, EMGz, EMGc and Rsp) using the formulas de-picted in Table 2. In all, there were 36 features prepared foreach data entry.

3.2 Proposed 2-stage Emotion RecognitionSystem

As discussed above, the basic idea of our recognition systemis to transform a general subject-independent case into sepa-rated subject-dependent cases by classifying the new coming

Table 2: Formulas for feature extraction wherex(n) = x(n)−µ

σrefers to the normalized signal of x(n)

(std: standard derivation, abs: absolute values)

The mean of x(n): µ = 1n

∑Nn=1 x(n)

The std of x(n): σ =√

1N−1

∑Nn=1(x(n)− µ)2

The mean of the abs of the

1st differences of x(n): δ = 1N−1

∑N−11 |x(n+ 1)− x(n)|


1st differences of x(n): δ = 1N−1

∑N−11 |x(n+ 1)− x(n)|


2nd differences of x(n): γ = 1N−2

∑N−21 |x(n+ 2)− x(n)|


2nd differences of x(n): γ = 1N−2

∑N−21 |x(n+ 2)− x(n)|

Figure 2: Diagram of the 2-stage emotion recogni-tion system.

data into existing subject models prior to the actual emo-tion recognition procedure. Figure 2 illustrates the overallstructure of the system.

3.2.1 The 1st stageProcess starts by learning a GMM probability distributionfor each subject using only the 6 features from HR. By def-inition, a GMM is expressed as follows:

p(x) =

G∑g=1

πgpg(x) =

G∑g=1

πgN(x|µg,Σg), (1)

whereG is the number of Gaussian components, N(x|µg,Σg)is a normal distribution with mean µg and covariance matrixΣg, and πg is the weight of component with the constraint of∑

πg = 1. The parameters of GMM are estimated using theExpectation-maximization (EM) algorithm [1], which yieldsa Maximum Likelihood (ML) estimation. Each iteration ofthe EM algorithm consists the E-step (Expectation) and theM-step (Maximization). During the E-step, the missing dataare estimated given the observed data and current estimateof the model parameters. And in the M-step, the likelihoodfunction is maximized using the estimated missing data fromthe E-step in lieu of the actual missing data. Eq. (2) depictsthe estimation formulas of the model parameters.

µ′

=

∑Tt=1 pm(xt)xt∑Tt=1 pm(xt)

,

Σ′m =

∑Tt=1 pm(xt)(xt − µm)T (xt − µm)∑T

t=1 pm(xt),

ω′m =

∑Tt=1 pm(xt)∑T

t=1

∑Gg=1 pm(xt)

.

(2)

Let c represents the number of subject classes, the posteriorprobabilities of the input data sample, P = pjj=1,2,...,c,are calculated based on each GMM generated for corre-sponding subject. According to the MAP rule, the data

Figure 3: Categories of affective states on 2D emo-tion model: EQ1 = val rating > 5 and aro rating > 5,EQ2 = val rating <= 5 and aro rating > 5, EQ3 =val rating <= 5 and aro rating <= 5, and EQ4 =val rating > 5 and aro rating <= 5.

sample is assigned to the subject class with the highest pos-terior probability, Sj = argmax(P ). Onwards, the origi-nal subject-independent problem is transformed to a normalwithin-subject case for classification of the affective states.

3.2.2 The 2nd stageThis stage follows a general way of hybrid feature selectionand classification method. In this study, Sequential FloatingForward Search (SFFS) and k-Nearest Neighbor (kNN) ruleare employed.

SFFS [20] is one of the frequently used feature set searchmethods in the area of physiological signal based emotionrecognition. It is an improved version of traditional Sequen-tial Forward Search (SFS) [25] by following a “bottom up”procedure but introducing a “floating” characteristic, andfirst appeared in affective computing by Picard et al. in[19]. Served as a wrapper mode of feature selection pro-cedure [13], SFFS is commonly applied with a pre-definedlearning algorithm (kNN) and uses its performance as theevaluation criteria.

The kNN rule [3] classifies a data sample by assigning it thelabel of the most frequently represented among the k nearestsamples. In other words, a decision is made by examiningthe labels on the k nearest neighbors (the Euclidean dis-tance) and taking a vote. The main reason to choose kNNas the classification method in this study lies in the factthat kNN is able to achieve simultaneous multi-class classi-fication.

4. CLASSIFICATION RESULTSAs a preliminary study on the proposed system, we designand implement two classification tasks using data from 5subjects. The first task, called a close-set experiment, learnsGMM models based on the training data set coming fromall the 5 subjects. So the 1st stage of the proposed systemactually becomes a subject identification process. Whereasthe second task, an open-set experiment, trains only 3 GMMsubject models, but tests using data from all the 5 subjects.This experiment intends to prove the idea that it is not re-quirable for the new coming data to be known to the system.

Table 3: Recognition Results of close-set experi-ment using traditional subject-independent proce-dure and the proposed 2-stage system (k = 7 forkNN)

sub-indep 0.807

sub A 0.889

sub B 1

2-stage system sub C 0.9 0.931

sub D 0.891

sub E 0.975

The classification goal is to simultaneously differentiate fourtypes of affective states (EQ1 to EQ4 in Figure 3). All ofthe classification procedures are conducted under 10-crossvalidation.

4.1 Task One: Close-set ExperimentTable 3 presents the correct classification ratio (CCR) usingtraditional subject-independent procedure (mixing all thedata together) with SFFS-kNN directly and the proposed 2-stage system. Since the proposed system turns the subject-independent task into separate subject-dependent cases, thefinal result 0.931 is the average value taken from the individ-ual CCR of subject A to E. It clearly shows that by using the2-stage system, the CCR raised about 13% over the resultof traditional procedure which obtains 80.7% correctness.

Besides significant improvement of CCR, an interesting phe-nomenon also appears in Table 4, where the confusion ma-trix of results from the 1st stage process is presented. Noticethat roughly 70% of data samples from subject B and C re-spectively are correctly classified into “sub B” and “sub C”,while the other three especially subject E, tend to appearmuch higher classification error rate. For example, amongall the 28 samples from subject E, a major part of them(75%) is wrongly recognized as “sub C”. One explanationof this situation is that the inner properties of data fromsubject C and E are so similar that using only the “sub C”model is representative enough for both subject C and E.

Hence the following open-set tasks are designed in this way.The first open-set experiment uses data from subject A, Band C for training the GMM subject model, and the seconduses data from subject B, C and D. Both experiments aretested using data from all the 5 subjects.

4.2 Task Two: Open-set ExperimentTable 5 compares results from the two experiments con-ducted for open-set task. Testing data samples are classifiedinto three subject models that leant by the system, then fedinto the SFFS-kNN procedure. Both the CCRs show about10% improvement over traditional subject-independent method(80.7%), though a bit less than 93.1% which is obtained fromthe close-set case.

Table 4: Confusion matrix for results of subjectrecognition (CCR % = 98.21%)

sub A B C D E total error

A 13 5 0 9 1 28 0.536

B 6 20 0 2 0 28 0.286

C 0 0 19 0 9 28 0.321

D 13 3 0 12 0 28 0.571

E 0 0 20 1 7 28 0.75

Table 5: Recognition Results of open-set experimentusing the proposed 2-stage system (k = 7 for kNN)

Experiment I Experiment II

sub A 0.832 sub B 0.963

sub B 1 0.896 sub C 0.859 0.903

sub C 0.855 sub D 0.886

The performance of open-set task actually suggests that it isnot requirable for the 2-stage system to learn all the subjectsthat the testing data are coming from, given the fact thatthe two open-set experiments can also achieve quite compa-rable results with the close-set task. However, the systemshould at least knows certain subject models that are rep-resentative enough for the “unknowns”. In our case, whenswitching the subject models from “sub A” to “sub D”, theperformance raises about 0.07%, which indicates that it israther important to select the“right” subjects for the systemto learn.

5. CONCLUSIONThis paper introduced a novel 2-stage system for physiolog-ical signals based emotion recognition. The 1st stage cre-ated GMM model for each “known” subject and classifiedthe coming data sample to the subject models. So a subject-independent case was transformed into several subject-dependentcases. Then the 2nd stage follows a general way of hybridfeature selection and classification method to simultaneouslyclassify four affective states.

As a preliminary study, we designed both the close-set andopen-set tasks using data from 5 subjects to investigate theeffectiveness of the proposed system. The overall resultsshow significant improvements over the traditional subject-independent procedure (CCR improved from 80.7% to over90%). Besides, the comparison between the close-set andopen-set experiments actually suggests that it is not re-quirable for the 2-stage system to learn all the subjects be-forehand, as long as there are enough representative modelsknown by the system. Hence, it is rather critical to define acriteria that can properly choose the “representative” learn-ing data for the system. Also, since we only used 5 subjects

to test the system, it is also necessary to expand the datato a larger data pool to further enhance the performance ofthe system. We believe more interesting findings would bediscovered by focusing on those issues in the future.

6. REFERENCES[1] J. A. Bilmes. A gentle tutorial of the em algorithm

and its application to parameter estimation forgaussian mixture and hidden markov models.Technical report, International Computer ScienceInstitute, U.C. Berkeley, 1998.

[2] Z. J. Chuang and C. H. Wu. Emotion recognitionusing acoustic features and textual content. In 2004IEEE International Conference on Multimedia andExpo, pages 53–56, 2004.

[3] R. O. Duda and P. E. Hart. Pattern Classification andScene Analysis. New York: Wiley, 1973.

[4] P. Ekman. Are there basic emotions? PsychologicalReview, 99(3):550–553, 1992.

[5] P. Ekman. An argument for basic emotions. Cognitionand Emotion, 6(3/4):169–200, 1992.

[6] P. Ekman. Emotions revealed: recognizing faces andfeelings to improve communication and emotional life.New York: Henry Holt and Company, 2003.

[7] Y. Gu, S. L. Tan, K. J. Wong, M. H. R. Ho, andL. Qu. Emotion-aware technologies for consumerelectronics. In IEEE International Symposium onConsumer Electronics, pages 1–4, Portugal, 2008.

[8] Y. Gu, S. L. Tan, K. J. Wong, M. H. R. Ho, andL. Qu. Using GA-based feature selection for emotionrecognition from physiological signals. In InternationalSymposium on Intelligent Signal Processing andCommunication Systems, pages 1–4, Thailand, 2008.

[9] A. Haag, S. Goronzy, P. Schaich, and J. Williams.Emotion recognition using biosensors: first steptowards an automatic system. In Affective DialogueSystems, Tutorial and Research Workshop, pages36–48, Kloster Irsee, Germany, June 2004.

[10] J. Kim and E. Andre. Emotion recognition based onphysiological changes in music listening. IEEETransactions on Pattern Analysis and MachineIntelligence, 30(12):2067–2083, 2008.

[11] P. J. Lang, M. M. Bradley, and B. N. Cuthbert.International affective picture system (IAPS):Affective ratings of pictures and instruction manual.Technical Report A-6, University of Florida,Gainesville, FL, 2005.

[12] Y. L. Lin and G. Wei. Speech emotion recognitionbased on HMM and SVM. In Proceedings of the 4thInternational conference on machine Learning andCybernetics, pages 4898–4901, 2005.

[13] H. Liu and L. Yu. Toward integrating feature selectionalgorithms for classificaion and clustering. IEEETrans. on Knowledge and Data Engineering, 17(4),2005.

[14] K. Mera and T. Ichimura. Emotion analyzing methodusing physiological state. In Knowledge-BasedIntelligent Information and Engineering Systems,pages 195–201. Springer Berlin / Heidelberg, 2004.

[15] F. Nasoz, K. Alvarez, C. L. Lisetti, and N. Finkelstein.Emotion recognition from physiological signals forpresence technologies. International Journal of

Cognition, Technology and Work, Special Issue onPresence, 6(1), 2003.

[16] J. Nicholson, K. Takahashi, and R. Nakatsu. Emotionrecognition in speech using neural networks. InProceedings of the 6th International Conference onNeural Information Processing, pages 495–501, 1999.

[17] R. W. Picard. Affective computing. Technical ReportNo. 321, MIT Media Laboratory PerceptualComputing Section, 1995.

[18] R. W. Picard. Affective computing. Cambridge,Mass:The MIT Press, 1997.

[19] R. W. Picard, E. Vyzas, and J. Healey. Towardmachine emotional intelligence: Analysis of affectivephysiological state. IEEE Transactions on PatternAnalysis and Machine Intelligence, 23(10):1175–1191,2001.

[20] P. Pudil, J. novovicova, and J. Kittler. Floating searchmethods in feature selection. Pattern RecognitionLetters, 15:1119–1125, 1994.

[21] R. M. Rangayyan. Biomedical signal analysis: Acase-study approach (IEEE Press Series on BiomedicalEngineering). Wiley-IEEE Press, 2001.

[22] S. S. Tomkins. Affect, imagery, consciousness, volumeI, The positive affects. New York: Springer PublishingCompany, Inc, 1962.

[23] S. S. Tomkins. Affect, imagery, consciousness, volumeII, The negative affects. New York: SpringerPublishing Company, Inc, 1963.

[24] J. Wagner, J. Kim, and E. Andre. From physiologicalsignals to emotions: implementing and comparingselected methods for feature extraction andclassification. In Proceedings Of IEEE ICMEInternational Conference on Multimedia and Expo,pages 940–943, 2005.

[25] A. W. Whitney. A direct method of nonparametricmeasurement selection. IEEE Transactions onComputers, 20:1100–1103, 1971.

Gaze-Directed Ubiquitous Interaction Using aBrain-Computer Interface

Dieter SchmalstiegGraz University of Technology

Inffeldgasse 16A-8010 Graz, Austria

[email protected]

Alexander BornikLudwig Boltzmann Institute for

Clinical-Forensic ImagingUniversitätsplatz 4, 2. Stock

A-8010 Graz, [email protected]

Gernot Müller-PutzGraz University of Technology

Krenngasse 37/IVA-8010 Graz, Austria

[email protected]

Gert Pfur schellerGraz University of Technology

Krenngasse 37/IVA-8010 Graz, Austria

[email protected]

ABSTRACTn this paper, we present a first proof-of-concept for using amobile Brain-Computer Interface (BCI) coupled to a wear-able computer as an ambient input device for a ubiquitouscomputing service. BCI devices, such as electroencephalo-gram (EEG) based BCI, can be used as a novel form ofhuman-computer interaction device. A user can log into anearby computer terminal by looking at its screen. Thisfeature is enabled by detecting a user’s gaze through theanalysis of the brain’s response to visually evoked patterns.We present the experimental setup and discuss opportunitiesand limitations of the technique.

KeywordsBrain computer interface, gaze tracking, electroencephalo-gram, biometrics, object selection, authentication.

Categories and Subject DescriptorsH.5.2 [Information Interfaces and Presentation]: UserInterfaces; K.6.5 [Management of Computing and In-formation Systems ]: Security and Protection—Authen-tication

KeywordsBrain computer interface, gaze tracking, electroencephalo-gram, biometrics, object selection, authentication.

1. INTRODUCTIONAs suggested in [4], brain-computer interface (BCI) tech-

nology can become a useful device for human-computer in-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Augmented Human Conference April 2–3, 2010, Megève, FranceCopyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00.

teraction. BCI is able to capture ambient properties of hu-man activity rather than requiring active operation. This isnot only useful for assistive technology, but also allows inputfor a computer system to be gathered without inducing cog-nitive load on the user. It is therefore suitable for contextualcomputing, such as activity recognition. We suggest the in-tegration of BCI into the toolset of user interface designers,assuming that BCI will soon become sufficiently accessibleand inexpensive.

In this paper, an electroencephalogram (EEG) based BCIis used to capture brain activity with a wearable computer.Unlike typical laboratory experiments, this wearable hard-ware setup allows a user’s brain activities to be monitoredwhilst freely roaming an environment. The wearable devicetherefore enables the prototyping of ubiquitous computingservices based on BCI.

To illustrate the potential of mobile BCI as an input de-vice, we have created a system for secure login to a com-puter terminal within visual proximity of the user by de-tection of characteristic brain patterns evoked when lookingat a blinking screen of the computer terminal. A proof-of-concept implementation of this type of interaction us-ing a real world, secure remote desktop software was im-plemented. We present first experiences how to build andoperate a ubiquitous computing system involving BCI as apersonal input modality. While our approach does not yetfully qualify as biometric identification in the sense that auser’s identity is uniquely verified through physical means,it does provide the verification of the presence of a digitallyauthorized user at a particular task location, and has poten-tial to be upgraded to a full biometric identification systemwith enhanced BCI technology.

2. RELATED WORKThere are a large number of localization and object iden-

tification systems, using GPS for outdoor applications, orindoor beacon systems such as RFID, Bluetooth, infraredor ultrawideband radio1. Most of these wide-area systemscan only determine position, but not viewing direction. Incontrast, the ID CAM by Matsushita et al. [7] determines

1http://www.ubisense.net

t

frequency-encoded patterns from blinking beacons in the en-vironment observed by a camera. In this work, we presenta related approach using the human visual system. Un-like location systems, the origins of BCI are not in human-computer interaction, but in assistive technology. However,there has been a lot of work in using BCI as assistive tech-nology.

Recently there has been some interest on using variantsof BCI technology for non-handicapped people, in order tocontrol aspects of a user interface with little or no attentionrequired from the user. For example, Mann [5] uses biosignalfeedback processing for controlling the brightness of a head-mounted display, while Lee and Tan [4] uses EEG for taskclassification. Chen and Vertegaal [2] use EEG for deter-mining mental load with the aim of managing interruptions.

A similar goal is pursued by Vertegaal et al. [17]. Theyuse a different sensor type, namely eye contact sensors, tocontrol interruptions from cell phones. This work exploitsgaze direction to derive information, an aspect that is sharedwith the work presented in this paper. Velichkovsky andHansen [16] suggest a combination of eye sensing and BCIcontrol electronic devices . They state their paradigm as“Point with your eye and click with your mind”. This sug-gestion is actually surprisingly close to our intention, and webelieve that in this paper we present one of the first practicalimplementations of such control.

Using the EEG as a biometric is relatively new comparedto other methods. Various types of signals can be mea-sured from the EEG, and consequently several aspects havebeen investigated in terms of user recognition or authenti-cation. Poulus et al. [15] used autoregressive parameterswhich were estimated from EEG signals containing only thealpha rhythm (eyes closed). Learning Vector Quantizationneural networks were used for classification with a 72 - 80%of success. A similar approach was performed by Paranjapeet al. [14] which also used autoregressive modeling. Theyapplied discriminant analysis with a classification accuracyof 49% to 85%. Here subjects were tested with both eyesopen and closed. Visual evoked potentials (VEP) were usedfor biometrics by Palaniappan et al. [13] [11] [12]. In thesestudies, the authors investigated the gamma band range of30-50 Hz from VEPs after visual stimuli. In the work byMarcel and Millan [6], the power spectral density (PSD) wasused from 8-30 Hz for analyzing the repetitive imaginationof either left hand movement, right hand movement or thegeneration of words beginning with the same random letter.A statistical framework based on Gaussian mixture modelsand maximum a posteriori model adaptation was used inthese experiments. In this work the authors conclude thatsome mental tasks are more appropriate than others, theperformance degrades over days, and using training dataover two days increases the performance.

3. BCI FOR GAZE DIRECTED OBJECTSELECTION

3.1 BackgroundBiosignals, such as EEG, can be used to detect human

gaze for object selection in the physical environment. Thisapproach is similar to RFID tags or ID CAM, but a mobilescanner is replaced by human perception. Therefore the ap-

proach is essentially a form of gaze tracking. Gaze trackingis normally accomplished by observing a user s pupils with acomputer-vision system. In our case, a user’s gaze is trackedby detecting the activation patterns triggered in the brainwhen gazing at a specific object. Compared to beacon basedlocation determination, such as RFID, gaze based selectionhas a wider range of operation and allows to distinguish closeobjects based on the bearing.

One mental strategy for operating an EEG-based BCI ismotor imagery, another is to focus gaze and/or visual atten-tion to a flickering light source. In the latter case, either alate cognitive component with a latency of 300 ms (P300)after rare or significant visual stimulus has to be detected,or the amplitude of the steady state visual evoked poten-tial (SSVEP) has to be measured. The SSVEP is a naturalresponse of the brain evoked by flashing visual stimuli atspecific frequencies between 6-30 Hz. SSVEP signals areenhanced when the user’s focuses selective attention (focusgaze) on a specific flashing light source [10].

While the P300-based BCI needs complex patterns recog-nition algorithms to check the absence or presence of theP300 component, the SSVEP-based BCI is simpler and canuse a linear threshold algorithm for detection of an ampli-tude increase of the SSVEP signal. A further advantage ofthe SSVEP-based BCI is its ease of use and the relativelyshort training time.

Today, SSVEP-based BCI is used to control a robotichand [9], secondary cockpit functions [8], the display of geo-graphic maps, or communication (spelling) systems [3]. Thehighest information transfer rate reported is between 60-70bits/minute.

3.2 Gaze Tracking Procedure

Figure 1: Wearable BCI setup consisting of an EEGhelmet and a mobile EEG amplifier both connectedto a UMPC. In the experiment test screens show ablinking window as a screensaver and the desktop ofthe UMPC after a successful BCI triggered login.

In our setup, a mobile user is equipped with a wearablecomputer (Sony Vaio UX280p) and a portable EEG ampli-fier (g.tec mobilab2) as shown in Figure 1. Wearable com-2http://www.gtec.at

puter and EEG communicate via a Bluetooth personal areanetwork. The user wears a cap fitted with electrodes.

A characteristic blinking frequency of an observed objectcan be determined with the EEG. This allows multiple fre-quencies to be distinguished within a few seconds. By set-ting up physical objects to emit blinking patterns, for ex-ample using LEDs or computer screens in the environment,it is possible to identify these objects.

The most obvious way is to directly encode the id of theperceived object using any combination of frequency mul-tiplexing and time multiplexing. For example, an IPv4 ad-dress has 32 bits, which once transmitted and decoded couldbe used to access a web service. However, using current BCItechnology, the achievable bit rate is very low, requiring theuser to wait too long for the data to be transmitted.

Therefore, the observed characteristic frequency is used asan index into a central directory service accessed wirelesslyfrom the wearable computer. The directory server returnsthe actual object id or network id (IP address in case of acomputer terminal).

To increase the number of addressable objects, the searchspace is organized hierarchically using a second, complemen-tary sensor system besides EEG. A sensor system (in ourcase Ubisense) provides coarse wide-area location. The lo-cation system is used to limit the search space to one room,and the BCI gaze detection selects one computer terminalwithin this room.

4. REMOTE, SECURE DESKTOP ACCESSIn [1], a location system based on ultrasonic sensors worn

by the users of a large office environment is used to imple-ment various ubiquitous computing services based on theobservation of user location.

For example, every computer terminal can be remotely ac-cessed using the Virtual Network Computer (VNC) service.Likewise, incoming calls can be automatically routed by thetelecom system to the office phone nearest to a roaming user.

In our setup, the wearable computer can be used to di-rectly connect to a local computer terminal to use the inputand output peripherals, while the displayed applications ac-tually run on the wearable computer. In a conference orseminar room, the wearable computer could also be con-nected to a video projector to give a presentation. Theoverall procedure is shown on the right in Figure 2. Forsecure remote desktop access, the user is interested in deter-mining and verifying the identity of the selected object ata particular location, to establish a secure communicationchannel.

The secure channel is based on CSpace3, an open sourcesecure communication framework. It uses public key cryp-tography (PKC) to allow distributed applications to com-municate securely without burdening application developerswith the details of establishing secure connections. CSpaceregisters a unique id, public key and current IP address in aglobal directory.

An application uses a local CSpace proxy object to ob-tain the user’s public key and IP address from the globaldirectory. Then a secure connection tunnel is established tothe destination, which looks to the client application like anordinary TCP connection.

In our implementation, a user can connect the VNC ses-

3http://www.cspace.in

Figure 2: Workflow for establishing a secure VNCconnection to a computer terminal after the termi-nal has been identified using BCI: (1) Localizationservice determines position, (2) user observes char-acteristic screen blinking, (3) from EEG signal thecode is detected, (4) code and position are trans-mitted to translation server, (5) translation serverreturns terminal id, (6) terminal id sent to CSpacedirectory, (7) CSpace directory returns public keyof terminal, (8) secure VNC session established withterminal using public key.

sion originating at the wearable computer to the computerterminal selected by gazing. The current position is deter-mined from a Ubisense indoor location system which coversa large portion of our office space, and the screens of thecomputer terminals have been set up to run a screen saveremitting characteristic blink patters picked up through theEEG. The combined code position/frequency is transmittedto a global translation service, which translates the code toa CSpace id. This step is necessary because CSpace ids areglobally unique and cannot be chosen arbitrarily.

Since the CSpace directory itself is based on an exist-ing peer-to-peer infrastructure (Kademlia), it cannot be ex-tended with the map service directly. Therefore the trans-lation service was implemented as a new service using theCSpace communication infrastructure. Read access worksas described above, while write access for updating positionor frequency of a particular computer terminal is securedusing the private key associated with the terminal’s CSpaceid. A client tool for the translation service can be used toconnect securely to the map service for updating the map,and also launches the appropriate blinking widget used totrigger the EEG.

5. EXPERIMENTSSSVEP was first tested using a setup consisting of two

similar screens placed in front of the subjects, each present-ing an individual stimulation pattern (flickering). Screen 1showed repetitively code 1 pause (6s) - f1 (4s) – pause (1s)– f2 (4s), whereas screen 2 presented code 2 pause (6s) – f2(4s) – pause (1s) – f1 (4s) presented.

EEG was bipolarly recorded from one occipital position(O1 or O2, subject-specific) and digitized with a samplingfrequency of 256 Hz. A lock-in amplifier system (LAS) wasused to extract the SSVEP amplitudes of 2 specific frequen-cies (f1=6.25 Hz and f2=8.0 Hz) and their harmonics (upto 3). A simple one versus rest classifier was used to dis-tinguish between those frequencies [7]. A correct login wasperformed when the pattern of the detected frequencies rep-resented either code 1 or 2 (C1 or C2).

Four different runs (lasting max. 5min, pause was definedwith 30s) were performed to validate the functionality of thelogin classification:

• run1: pause-C1-C2-pause-C2-C1-pause-C2-pause-C1

• run2: pause-C2-C1-pause-C1-C2-pause-C1-pause-C2

• run3: lasted 2min, reading newspaper, no login

• run4: pause-C1-C2-C2-C1-pause-C2-C1-C1-C2-pause

Results of SSVEP-based login shows following results (Ta-ble 5): TP (true positives, correctly logged in, max. 20TPs in whole experiment, FN (false negatives, incorrect lo-gins), FP (false positives, logged in, although no login wasrequired).

A more practical experiment was carried out using themobile setup from Figure 1. In this setup the UMPC wasrunning a modified version of CSpace to connect to three dif-ferent PCs. The screens of the test PCs showed a blinkingpattern transmitting a 2-bit code registered with a trans-lation service as shown in Figure 2, while the UMPC wasrunning the signal processing routines for frequencies of 6and 9 Hz. Whenever a valid machine code was detected by

the BCI software, a remote login to the corresponding testPC found using the translation service was initiated.

The setup was tested with 2 participants. Participantshad to perform the following sequence of remote login tasks:pause (no login, 30s) – login P0 – login P2 – login P1 – pause(30s) - login P2 – login P0 – login P1 – pause (30s). Table2 shows the results. Both users could successfully completethe tasks. However, we noticed errors. Login times wereranging from 15s up to three minutes with a median of 27.5sand a standard deviation of 51.2s.

6. CONCLUSIONSWe have shown that it is possible in principle to use BCI

for biometric communication useful in deploying ubiquitouscomputing services. However, significant improvements arerequired to make such services practical in terms of robust-ness and information transfer rate, which are currently verylow. Higher rates can be achieved by more efficient BCI(such as the laser based BCI currently being developed), re-ducing the intervals of blinking stimuli/pauses, exploitingphase as well as amplitude information and than two blink-ing frequencies. Even more exciting is the possibility to usesubject-specific frequencies and sample positions, which mayyield better efficiency but also allow to create unique signa-tures per human which cannot easily be forged and allowtwo-way authorization between human and environment.

7. ACKNOWLEDGMENTSThis work was sponsored by the European Union contract

FP6-2004-IST-4-27731 (PRESENCCIA) and the AustrianScience Fund FWF contract Y193. Special thanks to g.tecfor the equipment loan and assistance.

8. REFERENCES[1] M. Addlesee, R. Curwen, S. Hodges, J. Newman,

P. Steggles, A. Ward, and A. Hopper. Implementing asentient computing system. Computer, 34(8):50–56,2001.

[2] D. Chen and R. Vertegaal. Using mental load formanaging interruptions in physiologically attentiveuser interfaces. In CHI ’04: CHI ’04 extendedabstracts on Human factors in computing systems,pages 1513–1516, New York, NY, USA, 2004. ACM.

[3] X. Gao, D. Xu, M. Cheng, and S. Gao. A bci-basedenvironmental controller for the motion-disabled.Neural Systems and Rehabilitation Engineering, IEEETransactions on, 11(2):137–140, June 2003.

[4] J. C. Lee and D. S. Tan. Using a low-costelectroencephalograph for task classification in hciresearch. In UIST ’06: Proceedings of the 19th annualACM symposium on User interface software andtechnology, pages 81–90, New York, NY, USA, 2006.ACM.

[5] S. Mann, D. Chen, and S. Sadeghi. Hi-cam: Intelligentbiofeedback processing. Wearable Computers, IEEEInternational Symposium, 0:178, 2001.

[6] S. Marcel and J. d. R. Millan. Person authenticationusing brainwaves (eeg) and maximum a posteriorimodel adaptation. IEEE Trans. Pattern Anal. Mach.Intell., 29(4):743–752, 2007.

Subject T1 T2 T3 T4 T5 T6 T7 T8 T9TP 20 9 20 16 19 17 14 20 12FN 6 11 1 11 11 12 9 1 9FP 9 5 0 2 6 5 7 0 3

Table 1: Results of SSVEP-based login.

Subject TP (max. 6) FN FP Total time [m:s]T1 6 9 1 7:46T2 6 3 1 5:25

Table 2: True positives (TP), false negatives (FN) and false positives (FP) measured for subjects T1 and T2.

[7] N. Matsushita, D. Hihara, T. Ushiro, S. Yoshimura,J. Rekimoto, and Y. Yamamoto. Id cam: A smartcamera for scene capturing and id recognition. InISMAR ’03: Proceedings of the 2nd IEEE/ACMInternational Symposium on Mixed and AugmentedReality, page 227, Washington, DC, USA, 2003. IEEEComputer Society.

[8] M. Middendorf, G. McMillan, C. G., and J. K.Brain-computer interfaces based on the steady-statevisual-evoked response. IEEE Trans. Rehabil. Eng.,8:211–214, 2000.

[9] G. Mueller-Putz and G. Pfurtscheller. Control of anelectrical prosthesis with an ssvep-based bci. IEEETransactions on Biomedical Engineering, 55:361–364,2008.

[10] G. Mueller-Putz, R. Scherer, C. Brauneis, andG. Pfurtscheller. Steady-state visual evoked potential(ssvep)-based communication: impact of harmonicfrequency components. Journal of Neural Engineering,2:123–130, 2005.

[11] R. Palaniappan. Utilizing gamma band to improvemental task based brain-computer interface design.Neural Systems and Rehabilitation Engineering, IEEETransactions on, 14(3):299–303, Sept. 2006.

[12] R. Palaniappan and D. Mandic. Biometrics from brainelectrical activity: A machine learning approach.Pattern Analysis and Machine Intelligence, IEEETransactions on, 29(4):738–742, April 2007.

[13] R. Palaniappan, R. Paramesran, S. Nishida, andN. Saiwaki. A new brain-computer interface designusing fuzzy artmap. Neural Systems andRehabilitation Engineering, IEEE Transactions on,10(3):140–148, Sept. 2002.

[14] R. Paranjape, J. Mahovsky, L. Benedicenti, andZ. Koles’. The electroencephalogram as a biometric. InElectrical and Computer Engineering, 2001. CanadianConference on, volume 2, pages 1363–1366 vol.2, 2001.

[15] M. Poulos, M. Rangoussi, V. Chrissikopoulos, andA. Evangelou. Parametric person identification fromthe eeg using computational geometry. In Electronics,Circuits and Systems, 1999. Proceedings of ICECS’99. The 6th IEEE International Conference on,volume 2, pages 1005–1008 vol.2, Sep 1999.

[16] B. M. Velichkovsky and J. P. Hansen. Newtechnological windows into mind: there is more in eyesand brains for human-computer interaction. In CHI’96: Proceedings of the SIGCHI conference on Humanfactors in computing systems, pages 496–503, New

York, NY, USA, 1996. ACM.

[17] R. Vertegaal, C. Dickie, C. Sohn, and M. Flickner.Designing attentive cell phone using wearableeyecontact sensors. In CHI ’02: CHI ’02 extendedabstracts on Human factors in computing systems,pages 646–647, New York, NY, USA, 2002. ACM.

Relevance of EEG Input Signals

in the Augmented Human Reader

Inês Oliveira CICANT, University Lusófona

Campo Grande, 376, 1749-024, Lisbon PORTUGAL

[email protected]

Ovidiu Grigore, Nuno Guimarães, Luís Duarte LASIGE/FCUL

University of Lisbon, Campo Grande, 1749-016, Lisbon PORTUGAL

ogrigore | [email protected]

ABSTRACT This paper studies the discrimination of electroencephalographic (EEG) signals based in their capacity to identify silent attentive visual reading activities versus non reading states. The use of physiological signals is growing in the design of interactive systems due to their relevance in the improvement of the coupling between user states and application behavior. Reading is pervasive in visual user interfaces. In previous work, we integrated EEG signals in prototypical applications, designed to analyze reading tasks. This work searches for signals that are most relevant for reading detection procedures. More specifically, this study determines which features, input signals, and frequency bands are more significant for discrimination between reading and non-reading classes. This optimization is critical for an efficient and real time implementation of EEG processing software components, a basic requirement for the future applications.

We use probabilistic similarity metrics, independent of the classification algorithm. All analyses are performed after determining the power spectrum density of delta, theta, alpha, beta and gamma rhythms. The results about the relevance of the input signals are validated with functional neurosciences knowledge.

The experiences have been performed in a conventional HCI lab, with non clinical EEG equipment and setup. This is an explicit and voluntary condition. We anticipate that future mobile and wireless EEG capture devices will allow this work to be generalized to common applications.

Categories and Subject Descriptors H.5.2 [Information Interfaces and Presentation]: User Interfaces – user centered design, evaluation, interaction styles.

General Terms Design, Experimentation, Human Factors, Measurement

Keywords Reading Detection, HCI, EEG Processing and Classification, Similarity Metrics, Feature Relevance Measurement.

1. INTRODUCTION The understanding and use of human physical and physiological states in computational systems increases the coupling between the user and the application behavior. The integration of physiological signals in applications is relevant in the design of universally-accessible interactive systems and will become more relevant as new computing paradigms such as ubiquitous computing [7] and ambient intelligence [1],[14] develop. The use of neurophysiological signals, and in particular electroencephalograms (EEG), has been widely reported in the context of an important example of coupled interaction systems: BCI’s [4],[5],[16]. These interfaces exploring the information at its source, the brain. EEG signals are frequently chosen because of their small temporal resolution and non-invasiveness [9] and also due to its relative low cost capture device settings. Visual user interfaces often require reading skills. The users’ reading flow is highly influenced by their concentration and attention while interacting with applications. The application visual characteristics and users’ cognitive state can decrease readability and degrade the interaction. Augmented reading applications should adapt to the user’s reading flow through the detection of reading and non-reading states. Reading flow analysis also improves the understanding of the users’ cognitive state while interacting with the applications and improves the current empirical style of usability testing [9]. In previous work, we integrated EEG signals in two prototypical applications, designed to analyze and assist reading tasks. These applications are briefly described further down in this paper. This paper focuses on the discrimination of EEG signals based in their relevance with respect to the identification of silent attentive reading versus non reading tasks, therefore finding the importance of each EEG signal for the reading detection procedure. The ultimate goal of this study is to allow a robust selection and weighting of input signals, which we deem critical for a feasible, efficient, and real time implementation of EEG processing software components, our augmentation approach. EEG processing literature generally refers feature vectors of some extent. We have dealt with data dimensionality reduction in the processing pipeline by using Principal Component Analysis [9]. PCA does not consider the spatial distribution of the input signals nor the functional neurosciences knowledge. Neurosciences map cognitive processes into skull areas.. Quantifying the importance of each input signal in relation to reading detection will help verifying what electrodes and frequency bands are more involved in the reading cognitive process, and builds on the functional neurosciences knowledge.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AH’10, April 2-4, 2010, Megève, France. Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00.

The analysis of EEG signals relevance is performed after determining the power spectrum density (PSD) of delta, theta, alpha, beta, and gamma rhythms (the known EEG frequencies bands) in each of the captured EEG streams. We then apply probabilistic similarity measures [10], which are independent of the classification algorithm, in each of these streams to detect the main differences, and to discriminate between visual reading and

non reading activities. All results obtained about the importance of the input signals are provided and crossed against functional neurosciences knowledge. Our experiments were performed in a conventional HCI lab, with non clinical EEG capture equipment. This is not a limitation to overcome but rather a feature and an apriori requirement of our design. Even if the results can be further validated in clinical settings (in vitro), our goal is to address real life situations (in

vivo) which have harsher stability, noise and artifact conditions. We predict future mobile and wireless EEG capture devices will allow the generalization and extension of this work to common tools and applications. The broader goal of this work is to design and develop usable and robust software components for integration in interactive systems that reach higher adaptation levels through this augmentation approach.

2. EXPERIMENTAL SETTINGS EEG signals were captured using MindSet-1000, a simple digital system for EEG mapping with 16 channels, connected to a PC using a SCSI-interface. These channels are connected through pure tin electrodes (sensors) to a cap made of elastic fabric, produced by Electro-Cap International.

Figure 1. MindSet-1000 and Electro-Cap Intl Cap.

Figure 2 shows the electrodes mapping that are used in our study. The EEG signals are amplified in differential manner relative to the ear electrodes and are sampled with 256Hz frequency. All requirements indicated by suppliers and technicians were fulfilled [9]. These included “grounding” the subjects and keeping the impedance in each electrode bellow 6000Ω, through the thorough application of conductive gel.

Figure 2. Mapping of used EEG electrodes (Int 10-20 method).

The first 5000ms and the last 3000ms of each trial are discarded for avoiding possible artifacts caused by start-end of the recording process. To assure the reliability of capture procedure the experiment was also tested using a professional medical capture device, in use in a Hospital, which setup was entirely prepared and tuned by expert technicians [9]. The results obtained with both capture devices were validated by an EEG specialist and a consistent set of sample results was produced.

2.1 Read and Not Read Experience The capture experiments, object of the relevance analysis described in this paper, were based in the presentation of alternate blank and text screens containing about 40 lines of daily news text. The duration of such screens differed according with the ability to keep subjects concentrated in the task [9]. Text screens were presented in longer periods (30s) than black screens (20s). These types of periods were interlaced: one reading text sample, followed by 2 watch-only blank screens, and again back to read. All these periods were captured separately, allowing a small resting period, where the signal was not recorded. Each capture trial included approximately 120s of both sample classes. All data was recorded without any previous or special training in a right handed female subject, mid thirties and without known vision disabilities (see discussion on this choice in the final section).

2.2 Assisted Reading Prototypes In the context of these experiences, we designed simple prototype tools. ReadingTester tests in real time “reading event scripts”, sequences of events with certain duration that are generated by the application. The subject is exposed to these events, and simultaneously the EEG is captured and analyzed. A detection performance report is built when the detection process stops.

Figure 3. Assisted Reading Prototypes.

ReadingScroller aims at controlling the text scrolling through EEG signals: while the user is reading the scrolling should occur; if the user stops reading the scrolling should also stop. This is a trivial (from the functionality’s viewpoint) Brain Computer Interface exploiting the reading detection capability, but with non trivial design challenges not reported here.

3. RELEVANCE ANALYSIS Relevance analysis is performed after determining the PSD of delta (δ), theta (θ), alpha (α), beta1 (β1), and gamma (γ) rhythms in each of the 16 electrodes’ input streams. This results in 16x5 PSD features streams, each with reading and non-reading samples. We then determined probabilistic dissimilarity measures separately in each of these streams, in order to quantify the dissimilarity between these two sample classes. The most relevant streams are those revealing larger significant differences between the reading and non-reading classes.

3.1 Probabilistic Dissimilarity Measures Relative similarity is the relationship between two entities that share common characteristics with different degrees [10]. The larger it is, the greater the resemblance between the compared objects. Relative dissimilarity, on the other hand, focuses on the differences: the smaller it is, the greater the resemblance between the compared objects [10]. In our work, we compare the dissimilarity between reading and non reading samples sets. Both sets were approximated through Normal probability functions, since their samples result from discrete observations belonging to a large vector space. Table 1 summarizes the probabilistic dissimilarity measures that were tested [2]. µi and Σi are respectively the mean vector and covariance matrix of the Normal distribution, noted as Ni, which approximates class i samples set. DM is the squared Mahalanobis1 distance between their means. In all the presented formulas, we assume Σ1 ≠ Σ2. For the sake of reproducibility of this work, the remainder of this section briefly describes each one of these measures.

3.1.1 Kullback-Leibler (KL) Divergence Based

Measures Kullback-Leibler Divergence is an asymmetric measure, a.k.a Relative Entropy or Information Gain. It quantifies, in bits, how close a distribution F1 is from a (model) distribution F2 [12], or, more precisely, the loss of information we incur in if we take F1 instead of F2 [8]. By definition, this measure between probability distributions p1(x) and p2(x) is determined by [10], [15],[8]:

For two Normal distributions N1 (x) and N2(x) it becomes in the formula displayed in Table 1. KL divergence cannot be considered a metric because it is asymmetric, that is: [12][8]. There are however measures such as J-Coefficient and Information Radius, which are symmetric versions of KL divergence. J-Coefficient (JC) [2] is calculated by applying the KL formula symmetrically:

Information Radius (IR), also known as Jensen–Shannon divergence, is a smoothed symmetric version that is the average of the KL-distances to the average distribution [15]:

1

Kullback-

Leibler

J-Coefficient

Information

Radius

χ2 Divergence

Hellinger

Coefficient

Chernoff

Coefficient

Bhatthacharyy

a Coefficient

Distance

Table 1. Probabilistic dissimilarity measures.

3.1.2 χ2

Divergence χ2 divergence is an asymmetric measure between probability distributions p1(x) and p2(x), and is determined by [8]:

The convergence in χ2 divergence implies convergence in KL-divergence, but the converse is not true [8]. This is because χ2 divergence is strictly topologically stronger then KL-divergence, since KL(P,Q)≤ χ2(P,Q).

3.1.3 Hellinger Coefficient (HC) Based Measures Hellinger Coefficient (HC) of order t is a similarity measure between probability distributions p1(x) and p2(x), defined in [8]:

From this similarity-like measure, several dissimilarity coefficients have been derived. Chernoff coeficient (CC) of the order t is defined as [5]:

This measure is related to KL divergence through its slope at t=0, it is smaller than KL divergence and it is less sensitive than the KL-divergence to outlier values [8]. There is also a special case symmetric metric for t=1/2, named Bhattacharyya Coefficient (BC), defined by [10]:

BC measures the amount of overlap between two probability distributions.

3.1.4 Minkowski’s Based Measures The Minkowski’s Lp distance with p =1,2,3, … defined in [2][5]:

All Minkowski measures are symmetric and differ only in the way they amplify the effect of outlier values. Minkowski’s distances of first and second order, L1 and L2 distances, are also known as Manhattan and Euclidean distance respectively. L2 measure is defined by [10]:

It defines the distance between two points in a Euclidean n-space– a real coordinate space with n real coordinates, in this case our samples.

3.2 Relevance Measurement Method We assume that relevance is directly proportional to the differences determined by dissimilarity measures. So our procedure is based in the ordering of the 16x5 feature streams accordingly with the calculated dissimilarities. The first step is applying all the dissimilarity measures to the feature streams. This results in 16x5 (80) real values, one for each stream, corresponding to the measured difference. In order to compare all these values, all streams are normalized and turned into percentages by applying the following formulas:

The first equation normalizes the range of each difference to the interval [0,1]. The second weights it in relation to the overall results obtained with the measure. At this stage, after observing all the produced graphics, the Minkowski’s Based Measures were discarded. These measures showed results that were too divergent from the ones provided by the rest of the metrics. The final 16x5 weights, which we are using to quantify the importance of each 16X5 stream, are the average of all the measures. These weights are then ranked from the minimum (1) to maximum (80) importance and these are the results to analyze in order to determine signal relevance.

3.3 ANOVA Analysis To statistically validate our conclusions we performed Variance Analysis, also known as ANOVA. It analyses the variation present in our experiments by statistically testing whether the statistical parameters of our groups of measures (bands, electrodes, etc.) are consistent, assuming that the sampled populations are normally distributed. If, for instance, this consistency holds for two electrodes or bands, then we can safely consider them correctly ranked. ANOVA results are put into a graphic or table (Figure 4). The center line in the graphic represents the mean of each group, the above and below polygon lines, show it’s the mean +/- variance values and the line segments delimit the confidence interval.

SS DF MS F P Crit.F

Between Groups 167,1 1 167,1 107,3 6,02E-08 4,6

Within Groups 21,8 14 1,6 Total 188,8 15

Figure 4. ANOVA for left (1) vs. right (2) Hemispheres.

The main ANOVA formula is given by:

where the numerator is the variance between groups, and the denominator is the variance within groups:

The numerator of these formulas is represented in Squared Sum (SS) column (in the Table of the above Figure 4), while the Degrees of Freedom (DF) column contains the denominator. Total row is the sum of the columns. Mean Squares (MS) column is SS/DF. Critical F is got from the F distribution table and P is the probability of Critical F being as great as F. In the case of the above values, F is much greater that critical F and P is a significantly low value, so we can state that the statistical parameters of our groups of measures are consistent

4. PROCESSING AND ANALYSIS

FRAMEWORK All the processing functionalities are encapsulated in EEGLib framework, an object oriented toolkit implemented in C++ and MatLab [9],[3]. This framework provides tools for feature extraction and classification and also components for data modeling, such as EEG streams, frames, and iterators. EEGLib includes several common EEG feature extraction procedures, including wavelets, power spectrum density (PSD), Event Related Synchronization (ERS) and other statistical measures. In the work described, we are using the mean PSD in Delta (δ) – 1 to 4 Hz, Theta (θ) – 4 to 8 Hz, Alpha (α) – 8 to 13Hz, Beta1 (β1) – 13 to 17Hz, and Gamma (γ) – 51 to 99 Hz – rhythms in all 16 electrodes. The analysis thus considers feature vectors composed by 16x5 real values. Mean PSD is determined in 1000ms frames with an overlapping of 500ms. Our framework also has tools that support various standard learning methods, including neural networks, K-Nearest Neighbors (KNN), Ada Boost and Support Vector Machines. We have tested all of these tools, but for simplicity, current reading processing procedures are using the KNN provided in SPRTOOL MATLAB Toolbox.

5. RESULTS AND DISCUSSION This section presents and discusses the results of the relevance ordering of input signals and bands.

5.1 EEG Signals Relevance Ordering The relevance measurements ranks of all bands were averaged for each electrode. Figure 5 below presents the average values determined in all samples sets. The y-axis represents the importance rank average of all features of each electrode. For instance, the average of all features relative to O1 electrode has an average rank of 60 in 80.

Figure 5. Average input signal relevance (ranks).

Figure 6 shows the locations of the most ranked electrodes. The electrodes that are not signaled have a rank inferior to 27.

Figure 6. Average input signal relevance (locations).

It is clear that the main differences are dominant in the left hemisphere. This is in agreement with the study about reading tasks conducted by Bizas et al [2]. Their findings suggested that changes in PSD between reading tasks are restricted to left hemisphere. This hemisphere specialization is also confirmed by functional neurosciences experiments. It seems that about 90% of adult population has left-hemisphere dominance for language [13]. Broca’s and Wernicke’s regions, which are respectively responsible for speech and language understanding, are located in left hemisphere [13]. Wernicke’s area influence is clear in our study: there is a visible importance elevation near electrodes T5, P3 and O1. We also expected a more visible influence of Broca’s area in our results, near F7, C3 and T3, but this was inconclusive. The highest ranked electrodes are in the frontal polar and occipital electrodes. Occipital lobe is where visual processing occurs [13], which supports our results regarding reading versus non-reading cognitive tasks. Frontal lobe is responsible for higher level processes, but we believe that it is more likely that the differences are due to eye artifacts.

5.2 Bands Relevance Ordering The relevance measurement ranks of all electrodes were averaged for each band. Figure 5 shows the average results determined in all samples. The y-axis indicates the importance rank average of all features of each band. For instance, the average of all features relative to α band has an average rank of 50 in 80.

Figure 7. Average band relevance.

γ and δ bands are visibly less ranked that θ, α , and β1. We were expecting, due to previous related work, more relevant differences between these two groups of bands. γ rhythm is considered an important marker for attention [6]. It appears that visual presentation of attended words induces a γ rhythm in the major brain regions associated with reading and that this effect is

0

10

20

30

40

50

60

70

FP1 FP2 F7 F3 F4 F8 T3 C3 C4 T4 T5 P3 P4 T6 O1 O2

Ele

ctro

de

Ave

rage

im

po

rtan

ce R

ank

>55

>47>42

>38

0102030405060

Ban

d A

vera

ge

imp

ort

ance

Ran

k

significantly attenuated for unattended words. A possible justification for poor γ band performance in our work can be that our experiments are focusing on reading instead of stressing attention. These results show the difference between visual reading and attention in the cortex activity. Left hemisphere’s δ, θ, and β1 rhythms were already used for differentiating reading tasks [13]. Significant differences in these rhythms between semantic tasks, the ultimate attentive reading activity, and visual, orthographic, and phonological tasks have been reported. The differentiation of θ rhythm was confirmed in our study, but this did not hold for δ and β1 bands. This cannot be due to the averaging effect, since these results are consistent with the non-averaged values (see next section). The α rhythm, related with resting condition, demonstrated a good performance in our study, in spite of not being referenced in related work as the other bands are. Probably our non-reading task is behaving like a mental resting activity, when compared to reading task, thus causing the differentiation between the α bands of the sample sets of both classes.

5.3 Total Features Ordering The importance measurement values were averaged for each of 16x5 features. Table 2 below displays the 10 highest average results, determined in all samples sets for each feature.

Rank Average

Relevance Rank Electrode Band

1 79,4 O1 Alpha 2 77,8 P3 Alpha 3 74,9 O1 Beta1 4 74,8 P3 Theta 5 73,9 O2 Alpha 6 73,5 O1 Theta 7 72,4 T5 Alpha 8 69,0 O2 Beta1 9 68,1 O2 Theta

10 66,3 T5 Theta

Table 2. 10 highest average feature relevance.

This ranking reinforces all the previous discussion, because all these values are located in the left hemisphere, and α and θ are the most frequent bands. It also shows that the averaging introduced in the previous analyses may minimize the importance of certain electrodes, namely P3 that appears twice in the top 10.

5.4 ANOVA Analysis Results We performed several ANOVA test runs with different groups of measures, namely: left versus right hemisphere, skull areas, bands, electrodes, and features. Erro! A origem da referência não foi encontrada. above (section 3) presented the ANOVA graphic and table for left and right hemispheres. These calculations were performed after averaging the ranks of all features related with each hemisphere. As we stated before, the results in the table indicate that the statistical parameters of the analyzed groups are consistent. This conclusion is reinforced by the graphic, which shows that the average ranks of both groups are statistically distinct with no possible overlap. We can also see that the left hemisphere importance is significantly higher than that of the right hemisphere.

The next figure shows the ANOVA result taking the the five tested bands as groups: δ, θ, α, β1, and γ (this order). These calculations were also performed after averaging the ranks in all features related with each band.

SV SS DF MS F P Crit.F

Between Groups 4158,5 4 1039,6 9,2 3,372E-05 2,6 Within Groups 3944,6 35 112,7 Total 8103,1 15

Figure 8. ANOVA for δ(1), θ(2), α(3), β1(4) and γ(5) bands.

The average ranks of θ and α were relatively higher and differentiated from the rest of the bands. Γ band performed poorly, showing the lowest rank and widest variation. According with the previous reasoning about ANOVA table results, we can also state that the statistical parameters of these groups are consistent, in spite of the F being close of its critical value. To further detail this analysis we performed Multiple Comparisons: a technique that complements ANOVA and looks for specific significant differences between pairs of groups by checking the means among them. Figure 9 contains multiple comparison results for delta, theta, alpha, and beta1. Each line segment represents the comparison intervals of each group.

Figure 9. Multiple Comparison for δ(1), θ(2), α(3) and β1 (4).

δ and β1 bands comparison intervals were significantly different from the ones determined for θ and α rhythms. This also means that θ and α bands were significantly higher and distinct from the rest of the rhythms and, for this reason, they appear to be more relevant for classifying reading versus non reading tasks.

Figure 10 below displays the ANOVA result for specific skull areas: front polar, frontal, central, temporal, occipital, and parietal regions. These calculations were performed after averaging the ranks of all features related with each area.


Between Groups 4893,1 5 978,6 50,9 9,5E-17 2,4 Within Groups 807,4 42 19,2 Total 5700,5 47 Figure 10. ANOVA for Frontal Polar (1), frontal(2),

central(3), temporal(4), occipital(5) and parietal(6) areas.

These groups’ statistical parameters are also consistent, as the previous tables, since F is significantly higher than its critical value, and P is extremely small. Accordingly, with our previous results, we obtained average ranks relatively higher and distant from the remaining regions for the front polar and occipital areas. We then repeated the ANOVA process for all input signals using the average ranks in all features related with each electrode (see Figure 11). We did not discard any input signal at this stage in order to verify the averaging effect that we could get in the previous calculations. These results confirmed the previous discussion about areas. Front polar and occipital electrodes revealed higher ranks than the remaining electrodes, in spite of not being distant enough, especially front polar. The values in the table also confirm these rankings as statistically consistent. F is once more greater that its critical value and P is very small. We then applied multiple comparisons to better analyze differences among electrodes (see Figure 12), and approximately got three groups, occipital, front polar and the remaining electrodes. Only for occipital electrodes, the comparison interval was significantly different from remaining electrodes group. Finally, we applied ANOVA to individual features, but reducing its number to 16 by applying the previous conclusions (see Figure 13). Features were restricted to front-polar and occipital areas, and we also discarded the γ band. The table supports that these rankings are statistically consistent, but we got here the lowest F value. However, F still is greater than its critical value and the probability of F being smaller than its critical value is very small (P). δ band features from both occipital electrodes (9 and 13) worked poorly and showed a great variability. But the remaining features of these input signals were very concentrated and showed a relative distance regarding the rest of the groups. The variation of front polar related features (from 1 to 8) was more significant, especially for δ and β1 bands.


Between Groups 15849,8 15 1056,7 31,1 5,9E-33 1,8 Within Groups 3810,4 112 34,0 Total 19660,2 127 Figure 11. ANOVA for FP1(1), FP2(2), F7(3), F3(4), F4(5),

F8(6), T3(7), C3(8), C4(9), T4(10), T5(11), P3(12), P4(13),

T6(14), O1(15) and O2(16).

Figure 12. Multiple Comparison for FP1(1), FP2(2), F7(3),

F3(4), F4(5), F8(6), T3(7), C3(8), C4(9), T4(10),

T5(11), P3(12), P4(13), T6(14), O1(15) and O2(16).


Between Groups 17535,6 15 1169,0 6,7 4,59E-10 1,8 Within Groups 19584,3 112 174,9 Total 37119,9 127 Figure 13. ANOVA for FP1(1 to 4), FP2 (5 to 8), O1(9 to 12)

and O2(13 to 16) with bands δ, θ, α and β1 respectively.

Figure 14. Multiple Comparison FP1(1 to 4), FP2 (5 to 8),

O1(9 to 12) and O2(13 to 16) with bands δ, θ, α and β1

respectively.

Multiple comparisons (see Figure 14) revealed approximately three groups of comparison intervals: (I) 10 to 12 and 15, with higher ranks and significantly different from the next group; (II) 1, 4, 5,8,9 and 13, with lower ranks and significantly different from the previous group, and (III) Remaining features. Table 3 shows more detailed data about the first two groups.

Group

Average

Relevance

Rank

Electrode Band

1

10 O1 Theta θ 11 O1 Alpha α 12 O1 Beta1 β1 15 O2 Alpha α

2

1 FP1 Delta δ 4 FP1 Beta1 β1 5 FP2 Delta δ 8 FP2 Beta1 β1 9 O1 Delta δ

13 O2 Delta δ

Table 3. Details of the two significantly different intervals.

Almost all O1 electrode comparison intervals were situated in the higher ranked group, revealing that this electrode appears to be consistently different between both reading and non-reading classes, since most of its bands were affected. δ band intervals were also consistently located in the lower ranked group, showing that it seems to be less relevant than the remaining rhythms.

6. CONCLUSIONS AND FUTURE WORK This paper presented a study about the discrimination between the relevance of different types of EEG input signals with respect to their ability to identify silent attentive visual reading versus non

reading cognitive tasks. We have demonstrated that EEG input signals are not equally significant, and that we can quantify their contributions for the distinction between reading and non reading cognitive tasks. More than that, we outlined a systematic and quantitative method for relevance determination that can be applied to other cognitive tasks.

We presented results that reinforce that left hemisphere is dominant regarding reading tasks. We showed that its input signals consistently revealed higher dissimilarities between reading and non-reading samples than its homologues in right hemisphere. The results also indicated front polar and occipital areas, especially the latter, as also α and θ bands, related features as being more relevant that the remaining values. In opposition the some related work [12],[13], γ and δ bands results consistently performed poorly. In summary, we can state that: For EEG-based silent reading detection, use mainly O1(θ,α,β1)

and O2(α)

With this method, we can now proceed to the design of focused applications that exploit this significantly reduced set of human physiological features. The above specific conclusions are a first step towards the exploitation of this reduced set of signals in interactive applications targeted at assisted reading (such as ReadingScroller, briefly mentioned above). Having a reduced and optimized (for the cognitive task at hand) set of signals is a critical requirement for the optimization of the real time processing and for the use of the future light and portable EEG devices, where results are being reported that justify our expectations [11]. Our work elicits the following additional requirements and ideas that should be explored in sequence.

Calibration Procedures Design Although our results were consistent with neurosciences knowledge and some of the existing related work, the presented analysis was performed with a single subject and a limited set of samples. This was a conscious choice in this stage to minimize the set of variables and tune the method. The repetition of the procedure with a larger number of subjects will now evaluate the degree of generalization of these results2. Our experience indicates that user differences will introduce some degree of diversity, such as skin conductance or hair type. In any case, differences are to be expected even when the subject is the same, due to biorhythmic cycle, sleepiness, or environmental conditions. We aim to compensate these differences by designing adequate calibration procedures that adapt to the individual user profiles and conditions.

Dimensionality Reduction As we said before, the ordering of EEG signals relevance, with respect to their ability to distinguish reading and non reading mental activities, is indispensable for the use of the future light and portable EEG devices. Signal ranking will allow the reduction of the number of sensors and turn the way users interact with augment reading applications more simple and natural. In this context, we aim to include this knowledge in the current signal processing chain. A serious analysis about the impact of removing some of the less relevant features must be done. Reducing feature vector dimensionality will ultimately reduce processing time and allow the development of more effective real time applications.

2 We referred above that around 90% of the population shows left

hemisphere dominance for language [13].

Opportunities for Gamma Band Analysis As we told before, γ rhythm is considered an important marker for attention [13]. However, it performed poorly in our study. Possible reasons for these results, in relation to the ones suggested in relative work, are the use of different type of features (PSD instead of wavelets) or distinct cognitive goals (reading versus non-reading instead of attentive versus non-attentive reading). A better understanding of this effect may be achieved through the use of wavelet coefficients for analyzing the γ band patterns in our experiments.

7. ACKNOWLEDGMENTS This work was partially supported by Fundação para a Ciência e Tecnologia (FCT), Portugal, Grant SFRH/BD/30681/2006 and Ciência 2007 Program.

8. REFERENCES

[1] Aarts, E., Encarnação, J., True Visions, The Emergence of

Ambient Intelligence, Springer, 2006.

[2] Bizas, E., Simos, G., Stam, C.J., Arvanitis, S., Terzakis, D., Micheloyannis, S. EEG Correlates of Cerebral Engagement

in Reading Tasks, Brain Topography, Vol. 12, 1999.

[3] Oliveira, I., Lopes, R., Guimarães, N. M., Development of a Biosignals Framework for Usability Analysis (Short Paper), ACM SAC´09 HCI Track, 2009.

[4] Wolpaw, J. R. et al., “Brain–Computer Interface Technology: A Review of the First International Meeting”, IEEE

Transactions on Rehabilitation Engineering, Vol. 8, 2000.

[5] Millán, J.R., “Adaptative Brain Interfaces”, Communications

of the ACM, 2003.

[6] Jung, J., Mainy,N., Kahane,P., Minotti, L., Hoffmann, D., Bertrand, O., Lachaux, J., ” The Neural Bases of Attentive

Reading”, Human Brain Mapping, Vol. 29, Issue 10, pp. 1193 – 1206, 2008.

[7] Krumm, J. (ed) (2010) Ubiquitous Computing Fundamentals, CRC Press, 2010

[8] Malerba, D. Esposito, F., Monopoli, M. , Comparing dissimilarity measures for probabilistic symbolic objects, Data Mining III, Series Management Information Systems,

WIT Press, Vol. 6, pp. 31-40, 2002.

[9] Oliveira, I., Grigore, O. and Guimarães, N., Reading detection based on electroencephalogram processing, Proceedings of the WSEAS 13th international conference on

Computers, Rhodes, Greece, 2009.

[10] Pekalska, E., Duin, R., “The Dissimilarity Representation for Pattern Recognition: Foundations And Applications”, Machine Perception and Artificial Intelligence, World Scientific Publishing Company, Ch. 5, pp 215-254, 2005.

[11] Popescu, F. Siamac, F., Badower, Y., Blankertz, B., Mu ller, K., “Single Trial Classification of Motor Imagination Using 6 Dry EEG Electrodes”. PLoS , ONE 2(7): e637, 2007.

[12] Shlens, J., “Notes on Kullback-Leibler Divergence and Likelihood Theory”, Systems Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, 2007.

[13] Steinberg, R. J., Cognitive Psychology, Thomson Wandsworth, 2003.

[14] Streitz, N., Kameas, A., Mavromatti, I., The Disappearing

Computer: Interaction Design, System Infrastructures and

Applications for Smart Environments, Springer, 2007

[15] Topsøe , F., Jensen-Shannon Divergence and norm-based

measures of Discrimination and Variation, Technical report, Department of Mathematics, University of Copenhagen, 2003.

[16] Z.A. Keirn, J. I. Aunon, “A New Mode of Communication between Man and His Surroundings”, IEEE Transactions on

Biomedical Engineering, Vol. 37, 1990.

Brain Computer Interfaces for Inclusion P. J. McCullagh

Computing & Engineering, University of Ulster

Shore Road, Jordanstown, Co. Antrim BT37 0QB, UK Tel: +44 (0)2890368873

[email protected]

M.P. Ware Computing & Engineering,

University of Ulster Shore Road, Jordanstown, Co. Antrim BT37 0QB, UK Tel: +44 (0)28 90366045

[email protected]

G. Lightbody Computing & Engineering,

University of Ulster Shore Road, Jordanstown, Co. Antrim BT37 0QB, UK

[email protected]

ABSTRACT

In this paper, we describe an intelligent graphical user interface

(IGUI) and a User Application Interface (UAI) tailored to Brain

Computer Interface (BCI) interaction, designed for people with

severe communication needs. The IGUI has three components;

a two way interface for communication with BCI2000

concerning user events and event handling; an interface to user

applications concerning the passing of user commands and

associated device identifiers, and the receiving of notification of

device status; and an interface to an extensible mark-up

language (xml) file containing menu content definitions. The

interface has achieved control of domotic applications. The

architecture however permits control of more complex ‘smart’

environments and could be extended further for entertainment

by interacting with media devices. Using components of the

electroencephalogram (EEG) to mediate expression is also

technically possible, but is much more speculative, and without

proven efficacy. The IGUI-BCI approach described could

potentially find wider use in the augmentation of the general

population, to provide alternative computer interaction, an

additional control channel and experimental leisure activities.

General Terms

Experimentation

Keywords

Brain Computer Interfaces, user interface, domotic control,

entertainment.

1. INTRODUCTION

Degenerative diseases or accidents can leave a person paralyzed

yet with full mentally function. There has been significant

research into creating brain mediated computer control [1] and

assistive equipment that can be controlled by the brain, such as a

wheelchair mounted robotic arm system [2].

A Brain-Computer Interface (BCI) may be defined as a system

that should translate a subject’s intent (thoughts) into a technical

control signal without resorting to the classical neuromuscular

communication channels [3]. The key components are signal

acquisition to acquire the electroencephalogram (EEG), signal

processing to extract relevant features and translation software

to provide appropriate commands to an application.

Applications include computer and environmental control, but

entertainment applications are also under investigation. It is of

course possible that the application could provide some

opportunity for self expression and creativity. Figure 1

illustrates some possibilities of BCI for augmenting the human:

listening to music, controlling photographs, watching films, or

influencing music or visual arts.

The ability to apply a BCI to the control of multiple devices has

previously been explored [4]. It has also been demonstrated that

BCI technology can be applied to many assistive technologies: -

wheelchair control [4],[5],[6] computer speller [7],[8],[4], web-

browser [9], environment control [10],[11],[12] and computer

games [13],[14]. Smart homes technology is also an active area

of research [15],[16], both for assistive applications and to

enhance ‘lifestyle’. This BCI paradigm offers the opportunity

for automated control of domotic devices and sensor interaction,

for the purpose of providing an integrated ambient assistive

living space.

BCI

Inclusion

Engaging environment

BCI and multimodal control

of music derived

from sonified EEG

BCI and multimodal

augmentation of

visual textures

Personalisation:

photo albums

EEG channel

brain activity

modulates

BCI channel

controls interface

Listening to

music

Watching

films

Figure 1: Uses of BCI for inclusion and augmentation of

disadvantaged citizens

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. To copy

otherwise, to republish, to post on servers or to redistribute to lists,

requires prior specific permission and/or a fee.

Augmented Human Conference, April 2–3, 2010, Megève, France.

Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

The merging of these complex and emerging technologies is not

a trivial issue. The European Union has addressed this with

BRAIN (FP7-2007-ICT-2-7.2), and other BCI projects, which

come under the “Future BNCI” umbrella [17]. BRAIN is

dedicated to providing a solution for controlling multiple

domotic devices. The project adopts a multi-disciplinary

approach to address issues ranging from improving BCI

interaction [18],[19]; to integrating smart homes solutions.

BRAIN’s focus concerns the application architecture and modes

of user interaction. It offers an application interface with a

wrapper for integrating domotic standards and protocols, if

required, known as the universal application interface (UAI).

An intuitive graphical user interface (IGUI) provides an on

screen menu structure which interacts with the UAI to achieve

device control. The IGUI also interacts with BCI components,

currently implementing a high-frequency steady state visual

evoked potential (SSVEP) BCI paradigm.

2. ARCHITECTURE DESIGN

Blankertz [7] comments that a ‘major challenge in BCI research

is intelligent front-end design’. To be effective a BCI used

within a domestic setting has to interact with multiple devices

and software packages. Within BRAIN we have designed an

architecture that accommodates modification, facilitates

substitution of existing components and to which functionality

can be added. This facilitates emerging domotic devices or

upgrades in BCI technologies. The architecture is modular in

approach with each component being highly cohesive and

exhibiting loose coupling to other modules each with clearly

defined interfaces. The human interface is intuitive, to unify

principles of operation across devices thereby reducing learning

and overhead of operation. This is crucial with BCI, which

suffers very slow communication rates, with user invoked error

choices posing a major problem. The architecture of the

application and the interface must also be capable of supporting

context aware technologies such as predictive commands and

command sets based upon models of previous interactions.

The concept of a separate application working in conjunction

with BCI technologies was proposed by Mason et al [20]. This

conforms to the end user elements of the controller interface and

assistive devices as defined in the framework. It is also keeping

with the design philosophy of the BCI2000 [21]. The BRAIN

architecture incorporates the flexible approach to the brain-

computer interface of the BCI2000 general purpose platform.

The platform is designed to incorporate signals, signal

processing methods, output devices and operating protocols.

The BCI2000 interface is minimal, consisting of Universal Data

Packets (UDP) the content of which can be specified in

accordance with the signal processing being performed. This

approach employs the packets which originate from signal

processing activity [19]. Three main types of command

interaction sequence have been identified: binary, analogue and

hybrid, see Figure 2. Binary command interaction concerns the

issuing of a single command to the UAI in a single instance.

This is used to navigate menu structures, and issue single

autonomous instructions Analogue command interactions

concern the issuing of temporally continuous and progressive

commands for the purpose of adjusting environmental controls,

e.g. to increase volume on a media device; these interactions are

vulnerable to latency. A hybrid command sequence achieves

the intent of an analogue command sequence but is implemented

using a binary style of interaction. It offers pre-defined options:

high, medium, low; and is enacted via a single command. It is

not suitable for fine-tuning.

Figure 2: Operation of Binary, Analogue and Hybrid

Command Sequences

2.1 Intuitive graphical user interface (IGUI) An intuitive graphical user interface (IGUI) provides on screen

menu display of application content. This interface is suited to

operate in conjunction with BCI peripherals, transducers and the

control interface. Furthermore, conforming to specified user

requirements the interface is defined in such a way as to be

suited to operation under various BCI paradigms whilst

maintaining common principles of operation. It is applicable to

all devices in the device controller module, i.e. universal

application interface (UAI), or which may be envisaged at a

future point. It is capable of updating content as the state or

availability of the applications or devices changes over time.

The user interface is also capable of handling modifications in

display or operation according to user defined preferences. It is

anticipated that modules will be added to the interface

architecture which will provide predictive capabilities with

respect to user menu selections based upon context and passed

choices. The UAI acts as wrapper for multiple device

interaction standards and protocols; it provides a single control

interface to the IGUI, hiding the complexity of interaction. It is

expected that the UAI will facilitate a variety of protocols and

standards with regard to domotic devices and that the number of

available devices will expand.

The IGUI has three major interfaces. The first is a two way

interface for communication with BCI2000 concerning user

events and event handling. The second, a two way interface to

the UAI concerning the passing of user commands and

associated device identifiers, and the receiving of notification of

device status. The third is an interface to an extensible mark-up

language (xml) file containing menu content definitions. This

file although initially defined ‘off-line’ is thereafter written to by

the UAI and read by the IGUI, dynamically. Should the need

arise substitution of packages is possible provided that the

interface definitions are adhered to or an additional interface

wrapper is implemented for the purpose of ensuring interface

compatibility. In this manner, should the need arise; the IGUI

can support an alternative signal processing mechanism to

BCI2000, and has already been interfaced to an ‘OpenBCI’

platform [18]. The IGUI can potentially support other forms of

device interaction instead of the UAI, for instance: a dedicated

navigational application for a wheelchair, with distributed

control, executing macro commands and feeding back state

information. The IGUI can support substitution of menu

content by changing the definitions in the xml file.

The following sequence represents IGUI-UAI device interaction

1. BCI2000 raises user commands to the IGUI via an

incoming data packet. The command is given significance

based upon the context of the user interface which is

obtained from the xml menu file.

2. The specific command and associated device identification

is passed to the UAI which handles commands according to

the appropriate protocol or standard and instigates actual

device interaction.

3. The status of devices as they join or leave the smart homes

network is updated via the xml menu definition file; where

devices are either enabled or disabled as appropriate. The

IGUI is informed as device status is modified so that the

menu can be re-parsed and the display updated accordingly.

4. For the purpose of receiving incoming messages the IGUI

implements two listening threads. One dedicated to

listening for incoming BCI2000 data packets – on thread

UDPListener and one dedicated to listening for

incoming UAI redisplay events and unpackaged BCI2000

communications on thread EventQueueListener.

Clearly, it does not make sense to allow the user to issue a

command as the menu display is being up-dated. The

device that the user may wish to interact with may no

longer be available; neither does it make sense for the menu

to be redisplayed at the same time a user command is being

processed as the outcome may affect the menu display. For

this reason mediation has to take place between events

raised on either thread and each event is processed

sequentially.

Interaction with BCI2000 is based upon the reception and

sending of data packets. The Internet Protocol (IP) address and

communication port of a computer supporting BCI2000 is

known to the IGUI. Using these, a thread is initiated for the

purpose of listening for incoming packets. On packet reception,

the data is unpacked and the nature of the incoming user

command determined. The appropriate message is placed on the

EventQueue. The UAI may also write an event to the

EventQueue, essentially the UAI indicates when and how the

menu should be re-parsed and redisplayed. The IGUI

EventQueueListener monitors the EventQueue length.

If an event is detected in the queue the IGUI reads the event,

instigates the appropriate processing and removes the event from

the queue.

User commands passed through BCI2000 can either affect the

menu items displayed or can initiate the issuing of a device

command through a call to the UAI. In either case, a data

packet is sent to BCI2000 indicating that LED operation should

be suspended. The appropriate processing is performed and a

further data packet is sent to BCI2000 re-initiating LED

operation. Where the user has indicated the issuing of a device

command, the appropriate device identifier is read from the xml

menu definition file with the associated command. A command

notification is raised against the UAI using these two

parameters. The UAI returns a device status indicator to the

IGUI.

The purpose of the interface is to offer the ability to manipulate

menu context and issue simple instructions as part of an ongoing

sequence of communication. BCI is a low bandwidth form of

communication; the interface must offer the maximum amount

of control for the minimum amount of interaction. The interface

must optimise engagement for the user, giving them a sense of

grounding with the application domain, offering a pathway

towards task completion and giving a sense of accomplishment

and progression as each step in a sequence of actions is

achieved.

The IGUI and the UAI share access to a common menu

structure. This menu is implemented in static xml with a

separate parsing module. The structure as implemented is

hierarchical, however, for future implementations, it is possible

to declare traversal paths in-order to provide short-cuts, for

instance return to a high level menu on completion of a

sequence of tasks. The current menu details several locations:

back garden, two bedrooms, bathroom, dining room, front

garden, hall, kitchen, living room. Devices are grouped

according to each room. Where devices are in a communal area,

the user’s menu declaration lists the communal room and the

device. A user’s menu declaration will not normally list menu

items which are only of significance to another user. When the

UAI detects that a device is available to the network, the device

status on the menu declaration will be updated to ‘enabled’.

Provision within the xml declaration has also been made such

that, should a device be judged to be sufficiently significant it

can be made constantly available, through the use of a ‘sticky’

status to indicate permanent on screen display. It is also

possible to use a non location based groupings such as ‘general’

to collect devices and applications together which do not have a

single location, for example spellers and photo albums. Should

the interface be used for some other purpose it is possible to

implement a different classification mechanism, for instance

grouping by functionality, or if necessary no classification

mechanism. This is done by simply using replacing the xml

declaration. Where devices or locations are to be added, the xml

declaration can be expanded accordingly.

The sample xml declaration (below) lists two forms of item.

‘Node’ items have sub-item declarations (e.g., Bedroom1).

‘Leaf’ items are used to associate a specific physical device or

software package to a device/package interface command, e.g.

x10Light1. All menu items have an associated graphical

display. The location of the graphics file is declared in the icon

tag of the menu item in the xml declaration. Currently the menu

implementation uses static xml. Provision has been made in the

IGUI interface for the passing of an object containing a similar

xml declaration for dynamic content. Dynamic content of the

same format can be parsed and displayed using existing

mechanisms. Dynamic xml is relevant where content may be

subject to frequent change, such as listing available files on a

media server (e.g. movie titles).

<menu_list_item>

<label>Bedroom1</label>

<enabled>True</enabled>

<sticky>False</sticky>

<icon>Bedroom1.jpg</icon>

<on_selection>

<menu_list_item>

<label>Lighting</label>

<device_Id>x10Light1</device_Id>

<enabled>True</enabled>

<sticky>False</sticky>

<icon>Bedroom1/Lighting.jpg</icon>

<on_selection>

<command>BinaryLight.

toggle_power</command>

</on_selection>

</menu_list_item>

</on_selection>

</menu_list_item>

The xml menu declaration represents menu content. The user

needs a mechanism for manipulating content in as an effective

manner as possible, in order to traverse the menu hierarchy and

to pass correctly formulated commands to the UAI. Currently

the supported BCI paradigm implements high-frequency SSVEP

as a mechanism for user interaction, but it is anticipated that

other BCI paradigms (‘oddball’ stimulus and intended

movement) will be supported over time. Studies have reported

that up to 48 LEDs can be used. These operated between 6-

15Hz at increments of 0.195Hz [22]. However, making this

many signals available to the user in a meaningful manner using

a conventional screen interface requires a degree of mapping

which may be beyond both the interface and beyond the user’s

capabilities and inclinations. Similarly, many devices (cameras,

mp3 players, printers) with restricted input capabilities use a

four-way command mapping as an interface of choice. Using

such a command interface it is possible to cycle through lists of

menu items (left/right), to select commands or further menu

option (down), and to reverse selections or exit the system (up).

Using less than four commands produces an exponential

command burden upon the user as cycle commands (left/right)

increase and selection commands (down) cannot be applied in a

single action. It was decided that a four-way command interface

would be optimal. The LEDs are placed at the periphery of the

screen with command icons central to the display [23].

The command interface Figure 3, displays the icons relating to

three menu items central to the screen. The central icon

represents the current menu item; as such it is highlighted using

a lighter coloured frame. Icons to either side, provide list

orientation to the user, to suggest progression and to suggest

alternative options. Under the current SSEVP paradigm the four

command arrows presented on the screen point to four

peripheral LEDs. For the purpose of interface testing and to

provide a potential support facility for the carers of users, the

four command arrows can be activated using a standard mouse.

Under different BCI paradigms arrows would still be present on

the screen but they would function in a slightly different

manner. Under P300 it is anticipated that an additional module

would govern the time sequenced animation of the arrows,

thereby providing a synchronised stimulus for the ‘odd-ball’

effect. Alternatively voluntary responses can be used to control

cursor movement towards arrows (ERD/ERS intended

movement paradigm).

Figure 3: Intuitive Graphical Command Interface for

BRAIN Project Application

The screen displays location level menu items. At this menu

level the icons are photographs which in the real world context

could relate to the users own home. The use of photographs and

real world images are intended to make the interface use more

intuitive to the user and to reduce the cognitive load of

interacting with the menu structure. At a lower menu level

general concept images have been used, albeit still in a

photographic format. Specifically, individual lights are not

represented; instead a universal image of a light bulb is used. It

is felt that at this menu level the user will already have grasped

the intent of the interface. Furthermore, the concept of device

interaction at the command level is made universal by this

approach, such as a tangibly visible interaction of turning a light

on and an invisible interaction such as turning the volume of a

device up. Once again the flexibility of the application is

demonstrated. The intuitive feel of the interface can be

modified by simply replacing the graphics files. For instance, a

‘younger’ look and feel can be obtained by replacing

photographic representations with cartoon style drawings. On

screen display of icons are supported by associated labels, these

are represented by tags in the xml menu declaration. The labels

are used to make the meaning explicit; however the interface has

been devised in such a way as to ensure that literacy is not

required.

3. Application Interface

Smart Homes are environments facilitated with technology that

act in a protective and proactive function to assist an inhabitant

in managing their daily lives specific to their individual needs.

Smart homes technology has been predominately applied to

assist with monitoring vulnerable groups such as people with

early stage dementia,[24] and older people in general [25] by

optimising the environment for safety. The link between BCI

and Smart homes is obvious, as it provides a way to interact

with the environment using direct brain control of actuators. Our

contribution uses a software engineering approach, building an

architecture which connects the BCI channel to the standard

interfaces used in Smart Homes so that control, when

established, can be far reaching and tuned to the needs of the

individual, be it for entertainment, assistive devices or

environmental control. Thus a link to the standards and

protocols used in Smart Home implementations is important.

A BCI-Smart Home channel could allow users to realize several

common tasks. For instance, to switch lights or devices on/off,

adjust thermostats, raise/lower blinds, open/close doors and

windows. Video-cameras could be used to identify a caller at the

front door, and to grant access, if appropriate. The same

functions achieved with a remote control could be realized (i.e.

for a television, control the volume, change the channel, ‘mute’

function). In a media system, the user could play desired music

tracks.

3.1 Standards and Protocols

While the underlying transmission media and protocols are

largely unimportant from a BCI user perspective, the number of

standards provides an interoperability challenge for the software

engineer. Open standards are preferred. A number of standards

bodies are involved; the main authorities are Institute of

Electrical and Electronics Engineers (IEEE), International

Telecommunication Union (ITU-home networking) and

International Standards Organisation (ISO). Industry provides

additional de-facto standards. Given the slow ‘user channel’,

BCI interaction with the control aspects of domotic networks

requires high reliability with available bit-rate transmission,

being of much lesser importance.

Domotic standards for home automation are based on either

wired or wireless transmission. Wired is the preferred mode for

‘new’ build Smart Homes, where an information network may

be installed as a ‘service’ similar to electricity or mains water

supply. Wireless networks can be used to retrofit existing

buildings, are more flexible, but are more prone to domestic

interference, overlap and ‘black spots’, where communication is

not possible. Wireless networks normally use work using radio

frequency (RF) transmission and can use Industrial Scientific

and Medicine (ISM) frequencies (2.4GHz band) or proprietary

frequencies and protocols. Infra-red uses higher frequencies

which are short range and travel in straight lines (e.g. remote

control for television control).

The Universal Plug and Play (UPnP) architecture offers

pervasive peer-to-peer network connectivity of PCs, intelligent

appliances, and wireless devices. UPnP is a distributed, open

networking architecture that uses TCP/IP and HTTP protocols

to enable seamless proximity networking in addition to control

and data transfer among networked devices in the home. UPnP

does not specify or constrain the design of an API for

applications running on control points. A web browser may be

used to control a device interface. UPnP provides interoperable

specification, offering the possibility of ‘wrapping’ other

technologies (e.g. where a device is not UPnP compliant). UPnP

enables data communication between any two devices under the

command of any control device on the network. UPnP

technology can run on any medium (category 3 twisted pairs,

power lines (PLC), Ethernet, Infra-red (IrDA), Wi-Fi,

Bluetooth). No device drivers are used; common protocols are

used instead.

The UPnP architecture supports zero-configuration, invisible

networking and automatic discovery, whereby a device can

dynamically join a network, obtain an IP address, announce its

name, convey its capabilities upon request, and learn about the

presence and capabilities of other devices. Dynamic Host

Configuration Program (DHCP) and Domain Name servers

(DNS) are optional and are only used if they are available on the

network. A device can leave a network smoothly and

automatically without leaving any unwanted state information

behind. UPnP networking is based upon IP addressing. Each

device has a DHCP client and searches for a server when the

device is first connected to the network. If no DHCP server is

available, that is, the network is unmanaged, the device assigns

itself an address. If during the DHCP transaction, the device

obtains a domain name, the device should use that name in

subsequent network operations; otherwise, the device should use

its IP address.

Open Source Gateway Interface (OSGi ) is middleware for the

Java platform. OSGi technology provides a service-oriented,

component-based environment for developers and offers a

standardized ways to manage the software lifecycle. The OSGi

platform allows building applications from components. Two

(or more components) can interact through interfaces explicitly

declared in configuration files (in xml). In this way, OSGi is an

enabler of expanded modular development at runtime. Where

modules exist in a distributed environment (over a network),

web services may be used for implementation. The OSGi UPnP

Service maps devices on a UPnP network to the Service

Registry.

3.2 Interoperability with existing smart

home interface It is important that the architecture developed can interoperate

with existing and future assistive technology. A BRAIN partner,

The Cedar Foundation, has sheltered apartments (Belfast) which

are enabled for non-BCI Smart Home control. Each apartment is

fully networked with the European Installation Bus (EIB) for

home and building automation [26]. Into this, peripherals are

connected which can be operated via infra-red remote control

[27]. These peripherals when activated carry tasks that tenants

are not physically able to perform. Examples include door

access, window and blind control, heating and lighting control

and access to entertainment. Whilst this was ‘state of the art’

technology at the time of development, KNX has replaced EIB

as the choice for open standard connectivity. This reinforces the

need for interoperability within a modular architecture, if BCI is

to be introduced to the existing configuration.

3.3 A Universal Application Interface The Universal Application Interface (UAI) aims to interconnect

heterogeneous devices from different technologies integrated in

the home network, and to provide common controlling interface

for the rest of the system layers. Figure 4 illustrates how the

interfaces between BCI2000, IGUI, menu definition and UAI.

UAI control is based on UPnP specification, which provides

protocols for addressing and notifying, and provides

transparency to high level programming interfaces. The UAI

maps requests to events, generates the response to the user’s

interaction, and advertises applications according device. The

UAI infers the device type and services during the discovery

process, including the non-UPnP devices, which can be wrapped

as UPnP devices with the automatic deployment of device

proxies.

Figure 4: Interaction of the Intuitive Graphical Command

Interface and Universal Application Interface

The UAI is divided into three modules: 1) Devices Control

Module which interacts directly with the UPnP devices. It

consists of a Discovery Point and several Control Points for the

different types of UPnP services. It also includes the UPnP

wrappers needed to access the non UPnP devices. 2) Context

Aware Module, which is capable of triggering automatic actions

when certain conditions are met without the need of user

intervention. The module receives events from the applications

and devices and invokes the actions resulting from the

evaluation of a set of predefined rules. 3) Applications Layer,

which provides the interactive services the user will control

through the BCI. Applications include domotic control, which

allows the user to control simple devices; and entertainment,

which allows the user to control a multimedia server, e.g. to

watch movies.

The purpose of the UAI is to provide a uniform platform for

device inter-action based upon masking the complexity of

numerous domotic device standards and communication

protocols. Flexibility and robustness in design is evidenced by

the fact that it is possible to substitute the two core components

or for the core components to interact with other hardware and

software configurations, if necessary. For instance by simply

replacing a communication wrapper the IGUI could interface

with a different BCI package, or by replacing the UAI the IGUI

could be harnessed for other control purposes, for example

driving a robot. It is also possible for the IGUI to be substituted

and for a different command interface to call the services of the

UAI.

The UAI is also flexible, additional standards can be added

without modifying the core command processing and device

handling modules. By incorporating new standards it is possible

to interact with an increasing number of devices without

radically modifying other aspects of the application device

architecture. By presenting an architecture which facilitates the

up-grading of existing standards it is also possible to interact

with existing devices in a more efficient manner.

4. BCI FOR CREATIVE EXPRESSION

The link between EEG and performance has been established by

Miranda and Brouse [28], who used a BCI to play notes on an

electric piano. The piano didn’t play specific songs as such, but

more rhythms of similar notes put together. There was not much

choice in what the application could do but it created music

using purely the power of the mind. The system was difficult to

use but provided an outlet for disabled users. Even if not

particularly creative, music and art therapy can be used to

improve quality of life for severely disabled individuals.

Within the area of EEG there has been a level of interest in

sonification of the different frequency bands within the

waveforms. For some, the objective has been to create an

“auditory display” of the waveforms as a method to portray

information. For example a sonic representation of the heart rate

can be a powerful mechanism for conveying important

information, complementing the visualization of the data.

Biosignals such as EEG and electromyogram (EMG) or muscle

activity can also be used to generate music. Using EEG as a

forum for musical expression was first demonstrated in 1965 by

the composer Alvin Lucier through his recital named “Music for

Solo Performer” [29]. Here manipulation of alpha waves was

utilized to resonate percussion instruments. Rosenboom [30]

used biofeedback as a method of artistic exploration by

composers and performers.

There has been recent work within BCI applications for music

creation by researchers Miranda, Arslan, Brouse, Knapp and

Filatriau. Brouse and Miranda [31] developed the eNTERFACE

(http://www.enterface.net/) initiative resulting in two types of

instrument. The first was referred to as the BCI-Piano and used

a music generator driven by the parameters of the EEG. The

second instrument was the InterHarmonium and enabled

geographically separated performers to combine the sonification

of their EEG in real time. In addition to generating music, EEG

synthesis of visual textures have also been investigated [32].

The importance of enabling a level of creativity and self-

expression to be given to highly physically disabled people

through the use of EEG and bio-signal sonification is

highlighted by the Drake music project [33]. Assistive

technology is used, enabling compositions by musicians, with

only a limited level of motor movement. By incorporating BCI

within such a framework would expand the ability of self-

expression to people with much more severe physical

disabilities. Multi-modal interaction, allowing EMG for example

to augment the EEG signal, is the basis of the BioMuse system.

In addition the combination of enabling the generation of both

music and visual display as demonstrated by Arslan et al [34]

creates enhanced opportunity for self-expression.

5. CONCLUSIONS

We describe an intelligent graphical user interface (IGUI)

tailored to Brain Computer Interface (BCI) interaction, designed

for people with severe communication needs. The interface has

achieved control of simple domotic applications, via a user

applications interface (UAI). The architecture described

however permits control of more complex ‘smart’ environments

and will be extended further for entertainment by interacting

with media devices. While the use and efficacy of BCI for

creative expression is still highly speculative, the technical

approach adopted with the IGUI can be easily adapted to the

generation of relevant features from the EEG. All that is requires

is agreement upon syntax and content of UDP packets and

additional signal processing. The IGUI approach described

could potentially find wider use in the general population, to

provide alternative computer interaction, an additional control

channel and experimental leisure activities.

6. ACKNOWLEDGMENTS The BRAIN consortium gratefully acknowledge the support of

the European Commission’s ICT for Inclusion Unit, under grant

agreement No. 224156.

7. REFERENCES [1] Wolpaw, J. R, McFarland D. J. Control of a two-

dimensional movement signal by a non-invasive brain-

computer interface in humans, Published by Proceedings of

the National Academy of Sciences of the United States of

America, December 21, 2004 vol. 101 no. 51

[2] Laurentis, K.J., Arbel Y., Dubey R., Donchin E.

Implementation of a P-300 brain computer interface for the

control of a wheelchair mounted robotic arm system.

Published by ASME in the Proceedings of the ASME 2008

Summer Bioengineering Conference (SBC2008), June 25-

29, pp. 1-2 (2008).

[3] Blanchard, Gilles, Blankertz, Benjamin, (2003) BCI

Competition 2003—Data Set IIa: Spatial Patterns of Self-

Controlled Brain Rhythm Modulations, Published by IEEE.

[4] Millan J.R, ‘Adaptive Brain Interfaces’, Communications

of the ACM, March 2003, Vol 46 No 3, pp75-80.

[5] Valsan G, Grychtol B, Lakany H, Conway B.A, ‘The

Strathclyde Brain Computer Interface’, IEEE 31st. Annual

International Conference on Engineering in Medicine and

Biology Society, Sept 2009, pp606-609.

[6] Leeb R, Friedman D, Müller-Putz G.R, Scherer R, Slater

M, Pfurtscheller G, ‘Self-Paced (Asynchronous) BCI

Control of a Wheelchair in Virtual Environments: A Case

Study with a Tetrplegic’, Journal of Computational

Intelligence and Neuroscience, Vol 2007.

[7] Blankertz B, Krauledat M, Dornhege G, Williamson J,

Murray-Smith, Müller K.R, ‘A Note on Brain Actuated

Spelling with the Berlin Brain-Computer Interface’,

Lecture Notes on Computer Science, 2007, Vol 4555,

pp759-768, Heidelberg, Springer.

[8] Felton E, Lewis N.L, Willis S.A, Radwin R.G, ‘Neural

Signal Nased Control of the Dasher Writing System’, IEEE

3rd. International EMBS Conference on Neural

Engineering, May 2007, pp366-370.

[9] Bensch M, Karim A.A, Mellinger J, Hinterberger T,

Tangermann M, Bogdan M, Rosenstiel W, Birbaumer N,

‘Nessi: An EEG-Controlled Web Browser for Severely

Paralyzed Patients’, Journal of Computational Intelligence

and Neuroscience, Vol 2007.

[10] Piccini L, Parini S, Maggi L, Andreoni G, ‘A Wearable

Home BCI System: Preliminary Results with SSVEP

Protocol’, Proceedings of the IEEE 27th. Annual

Conference Engineering in Medicine and Biology, 2005,

pp5384-5387.

[11] Teo E, Huang A, Lian Y, Guan C, Li Y, Zhang H, ‘Media

Communication Centre Using Brain Computer Interface’,

Proceedings of the IEEE 28th. Annual Conference

Engineering in Medicine and Biology, 2006, pp2954-2957.

[12] Bayliss J, ‘Use of the Evoked Potential P3 Component for

Control in a Virtual Appartment’, IEEE Transactions on

Neural Systems and Rehabilitation Engineering Vol 11, No

2, 2003, pp113-116.

[13] Martinez P, Bakardjian H, Cichocki A, ‘Fully Online

Multicommand Brain-Computer Interface with Visual

Neurofeedback Using SSVEP Paradigm’, Journal of

Computational Intelligence and Neuroscience, Vol 2007.

[14] Berlin Brain-Computer Interface – The HCI

Communication Channel for Discovery’, International

Journal Human-Computer Studies, Vol 65, 2007 pp460-

477.

[15] Chan M, Estève D, Escriba C, Campo E, ‘A Review of

Smart Homes – Present State and Future Challenges’,

Computer Methods and Programs in Biomedicine, Vol 91,

2008, pp55-81.

[16] Poland, M.P, Nugent, C.D, Wang, H, Chen L, ‘Smart

Home Research: Projects and Issues’, International Journal

of Ambient Computing and Intelligence, Vol 1, No 4,

2009, pp32-45.

[17] Progress in Brain/Neuronal Computer Interaction (BNCI),

http://hmi.ewi.utwente.nl/future-bnci, accessed Jan 2010.

[18] Durka P, Kus R, Zygierewicz J, Milanowski P and Garcia

G, ‘High-frequency SSVEP responses parametrized by

multichannel matching pursuit’, Frontiers in

Neuroinformatics. Conference Abstract: 2nd INCF

Congress of Neuroinformatics. 2009.

[19] Garcia G, Ibanez D, Mihajlovic V, Chestakov D,

‘Detection of High Frequency Steady State Visual Evoked

Potentials for Brain-Computer Interfaces’, 17th European

Signal Processing Conference, 2009.

[20] Mason S.G, Moore Jackson M.M, Birch G.E, ‘A General

Framework for Characterizing Studies of Brain Interface

Technology’, Annuals of Biomedical Engineering, Vol 33,

No 11, November February 2005, pp1653-1670.

[21] Schalk G, McFarland D, Hinterberger T, Birbaumer N,

Wolpaw J.R, ‘BCI2000: A General-Purpose Brain-

Computer Interface (BCI) System’, IEEE Transactions on

Biomedical Engineering Vol 51, No 6, June 2004, pp1034-

1043.

[22] Gao X, Xu D, Cheng M, Gao S, ‘A BCI-Based

Environmental Controller for the Motion-Disabled’, IEEE

Transactions on Neural Systems and Rehabilitation

Engineering , Vol 11, No 2 June 2003, pp137-140.

[23] Parini S, Maggi L, Turconi A Robust and Self-Paced BCI

System Based on a Four Class SSVEP Paradigm:

Algorithms and Protocols for a High-Transfer-Rate Direct

Brain Communication’, Journal of Computational

Intelligence and Neuroscience, 2009.

[24] ENABLE project (2008). Can Technology help people with

Dementia? Retrieved July 7, from

http://www.enableproject.org/

[25] Kidd, C.D., Orr, R.J., Abowd, G.D., Atkeson, C.G., Essa,

I.A., MacIntyre, B., Mynatt, E., Starner, T.E., Newstetter,

W. (1999). The Aware home: a living laboratory for

ubiquitous computing research , Proceedings of 2nd

International workshop on cooperative buildings

Integrating Information, Organization, and Architecture,

191 - 198 .

[26] Kolger, M. 2006. Free Development Environment for bus

coupling units of the European Installation Bus. Available

at http://www.auto.tuwien.ac.at/~mkoegler/eib/sdkdoc-

0.0.2.pdf . [Accessed on 21.07.09]

[27] SICARE SENIOR PILOT, 2008. Available at:

http://www.reflectivepractices.co.uk/cms/index.php?option

=displaypage&Itemid=50&op=page&SubMenu=

[Accessed on 21.07.09].

[28] Miranda, E. R. and Brouse, A. (2005a). Interfacing the

Brain Directly with Musical Systems: On developing

systems for making music with brain signals, Leonardo,

38(4):331-336

[29] Lucier, A (1965). Music for the solo performer,

http://www.emfinstitute.emf.org/exhibits/luciersolo.html

last accessed July 2007.

[30] Rosenboom, D. (2003). Propositional Music from

Extended Musical Interface with the Human Nervous

System, Annals of the New York Academy of Sciences

999, pp 263.

[31] Brouse, A., Filatriau, J.-J., Gaitanis, K., Lehembre, R.,

Macq, B., Miranda, E. and Zenon, A. (2006). "An

instrument of sound and visual creation driven by

biological signals". Proceedings of ENTERFACE06,

Dubrovnik (Croatia). (Not peer-reviewed report.)

[32] Filatriau, J.J., Lehembre, R., Macq, B., Brouse, A. and

Miranda, E. R. (2007). From EEG signals to a world of

sound and visual textures. (Submitted to ICASSP07

conference).

[33] Drake music project (2007),

http://www.drakemusicproject.org/, last accessed July 2007

[34] Arslan, B., Brouse, A., Castet, J., Lehembre, R., Simon, C.,

Filatriau, JJ and Noirhomme, Q. (2006). A Real Time

Music Synthesis Environment Driven with Biological

Signals, IEEE International Conference on Acoustics,

Speech and Signal Processing, 2006. ICASSP 2006

Proceedings, vol. 2.

i Additional authors: M.D..Mulvenna, H.G..McAllister, C.D..Nugent, Faculty of Computing and Engineering, University of Ulster,

Shore Road, Jordanstown, Co. Antrim BT37 0QB, UK

Emotion Detection using Noisy EEG Data

Mina MikhailComputer Science and

Engineering DepartmentAmerican University in Cairo

113 Kasr Al Aini StreetCairo, Egypt

[email protected]

Khaled El-AyatComputer Science and

Engineering DepartmentAmerican University in Cairo

113 Kasr Al Aini StreetCairo, Egypt

[email protected]

Rana El KalioubyMedia Laboratory

Massachusetts Institute ofTechnology

20 Ames StreetCambridge MA 02139 USA

[email protected] Coan

Department of PsychologyUniversity of Virginia

102 Gilmer HallCharlottesville, VA

[email protected]

John J.B. AllenDepartment of Psychology

University of Arizona1503 E University Blvd.Tucson, AZ 85721-0068

[email protected]

ABSTRACTEmotion is an important aspect in the interaction betweenhumans. It is fundamental to human experience and ratio-nal decision-making. There is a great interest for detectingemotions automatically. A number of techniques have beenemployed for this purpose using channels such as voice andfacial expressions. However, these channels are not veryaccurate because they can be affected by users’ intentions.Other techniques use physiological signals along with elec-troencephalography (EEG) for emotion detection. However,these approaches are not very practical for real time appli-cations because they either ask the participants to reduceany motion and facial muscle movement or reject EEG datacontaminated with artifacts. In this paper, we propose anapproach that analyzes highly contaminated EEG data pro-duced from a new emotion elicitation technique. We alsouse a feature selection mechanism to extract features thatare relevant to the emotion detection task based on neuro-science findings. We reached an average accuracy of 51%for joy emotion, 53% for anger, 58% for fear and 61% forsadness.

Categories and Subject DescriptorsI.5.2 [Pattern Recognition]: Design Methodology—Clas-sifier design and evaluation, Feature evaluation and selec-tion

KeywordsAffective Computing, Brain Signals, Feature Extraction, Sup-port Vector Machines

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.AH Augmented Human Conference, April 2-3, 2010, Megève, France.Copyright c©2010 ACM 978-1-60558-825-4/10/04 ...$10.00.

1. INTRODUCTIONOver the past two decades, there has been an increasing

interest in developing systems that will detect and distin-guish people’s emotions automatically. Emotions are funda-mental to human experience, influencing cognition, percep-tion, and everyday tasks such as learning, communication,and even rational decision-making. However, studying emo-tions is not an easy task, as emotions are both mental andphysiological states associated with a wide variety of feel-ings, thoughts, and behaviors [15].

Many have attempted to capture emotions automatically.Developing computerized systems and devices that can au-tomatically capture human emotional behavior is the pur-pose of affective computing. Affective computing attemptsto identify physiological and behavioral indicators relatedto, arising from or influencing emotion or other affectivephenomena [14]. It is an interdisciplinary field that requiresknowledge of psychology, computer science and cognitive sci-ences.

Because of its many potential applications, affective com-puting is a rapidly growing field. For example, emotionassessment can be integrated in human-computer interac-tion systems in order to make them more comparable tohuman-human interaction. This could enhance the usabil-ity of systems designed to improve the quality of life fordisabled people who have difficulty communicating their af-fective states. Another emerging application that makes useof emotional responses is to quantify customer’s experiences.Automated prediction of customer’s experience is importantbecause the current evaluation methods such as relying oncustomers’ self reports are very subjective. People are not al-ways feeling comfortable revealing their true emotions. Theymay inflate their degree of happiness or satisfaction in selfreports [21].

There are two main approaches for eliciting participants’emotions. The first method presents provoking auditory orvisual stimulus to elicit specific emotions. This method isused by almost all studies in literature [9, 17, 2, 1, 18,13, 19]. The second approach builds on the facial feedbackparadigm which shows that facial expressions are robust elic-

itors of emotional experiences. In the famous Strack, Martin& Stepper’s study [20], Strack, Martin & Stepper attemptedto provide a clear assessment of the theory that voluntaryfacial expressions can result in an emotion. Strack, Martin,& Stepper [20] devised a cover story that would ensure theparticipants adopt the desired facial posing without beingable to perceive either the corresponding emotion or the re-searchers’ real motive. Each participant was asked to hold apen in his mouth in different ways that result in different fa-cial poses. Participants who held a pen resulting in a smilereported a more positive experience than those who heldthe pen in a position that resulted in a frown. This studywas followed by different psychologists including Ekman etal. [6] who found that emotions generated with a directedfacial action task results in a finer distinction between emo-tions. However, this approach contaminates brain signalswith facial muscle artifacts and that’s why this approach isnot conceived by computer scientists.

We decided to explore this second approach because ithelps making our system close to actual real time emotiondetection systems since there will be lots of facial musclemovements and other artifacts that will contaminate EEGdata.

Our work extends existing research in three principal ways.First, we are the first in the computer science field to usevoluntary facial expression as a means for enticing emotions.Although this contaminates EEG with noise, it helps to testour approach on unconstrained environment where the userswere not given any special instructions about reducing headmotions or facial expressions which makes our dataset closeto a real time application. Second, we are using a new tech-nique for selecting features that are relevant to the emo-tion detection task that is based on neuroscience findings.Finally, we tested our approach on a large dataset of 36subjects and we were able to differentiate between four dif-ferent emotions with an accuracy that ranges from 51% to61% which is equal or higher than other related works.

The paper is organized as follows: section 2 surveys re-lated work on different channels used for emotion detection,especially those that use EEG. Section 3 discusses the cor-pus of EEG we are using and how different emotions areelicited. Section 4 shows the different noise sources thatcontaminates EEG signals. Section 5 gives an overview ofour methodology for emotion detection using EEG. Exper-imental evaluation and results are presented in section 6.Section 7 concludes the paper and outlines future directionsin the area of emotion detection using EEG.

2. RELATED WORKSThere is much work done in the field of emotion and cog-

nitive state detection by analyzing facial expressions or/andspeech. Some of these systems showed a lot of success suchas those discussed in [7, 10]. The system proposed by ElKaliouby and Robinson [7] uses an automated inference ofcognitive mental states from observed facial expressions andhead gestures in video. Whereas, the system proposed byKim et al [10] makes use of multimodal fusion of differenttimescale features of the speech. They also, make use ofthe meaning of the words to infer both the angry and neu-tral emotions. Although facial expressions are considered tobe a very powerful means for humans to communicate theiremotions [21], the main drawback of using facial expressionsor speech recognition is the fact that they are not reliable

indicators of emotion because they can either be faked bythe user or may not be produced as a result of the detectedemotion.

Based on the cognitive theory of emotion, the brain is thecenter of every human action [17]. Consequently, emotionsand cognitive states can be detected through analyzing phys-iological signals that are generated from the central nervoussystem such as brain signals recorded using EEG. However,there is not much work done in this area of research. Thanksto the success of brain computer interface systems, a fewnew studies have been done to find the correlation betweendifferent emotions and EEG signals. Most of these studiescombine both EEG signals with other physiological signalsgenerated from the peripheral nervous system [1, 2].

One of the earliest attempts to prove that EEG signalscan be used for emotion detection is proposed by Chanel etal [2]. Chanel et al [2] tried to distinguish among excitement,neutral and calm signals. They compared the results of threeemotion detection classifiers. The first one was trained onEEG signals, the second classifier was trained on peripheralsignals such as body temperature, blood pressure and heartbeats. The third classifier was trained on both EEG andperipheral signals. In order to stimulate the emotion of in-terest, the user is seated in front of a computer and is viewedan image to inform him/her which type of emotion s/he hasto think of. They then captured the signals from 64 differ-ent channels that cover the whole scalp in order to capturesignals in all the rhythmic activity of the brain neurons. Asfor feature extraction, they transformed the signal into thefrequency domain and use the power spectral as the EEGfeatures. Finally, they used a Naive Bayes Classifier whichresulted in an average accuracy of 54% compared to only50% for a classifier trained on physiological signals. The ac-curacy of combining both types of signals resulted in a boostof accuracy that reached up to 72%. The problem with theresearch done by Chanel et al [2] is the idea of using 64 chan-nels for recording EEG as well as other electrodes to capturephysiological signals make this approach impractical to beused in real time situation.

Ansari et al [1] improved the work done by Chanel etal [2]. They proposed using Synchronization Likelihood (SL)method as a multichannel measurement which allowed themalong with anatomical knowledge to reduce the number ofchannels from 64 to only 5 with a slight decrease in accu-racy and huge improvement in performance. The goal wasto distinguish between three emotions which are exciting-positive, exciting-negative and calm. For signal acquisition,they acquired the signal from (AFz, F4, F3, CP5, CP6). Forfeature extraction, they used sophisticated techniques suchas Hjorth Parameters and Fractal Dimensions and they thenapplied Linear Discriminant Analysis (LDA) as their classi-fication technique. The results showed an average accuracyof 60% in case of using 5 channels compared to 65% in caseof using 32 channels.

A different technique was taken by Musha et al [13]. Theyused 10 electrodes (FP1,FP2, F3, F4, T3, T4, P3, P4, O1,and O2) in order to detect four emotions which are anger,sadness, joy and relaxation. They rejected frequencies lowerthan 5 Hz because they are affected by artifacts and fre-quencies above 20 Hz because they claim that the contribu-tions of these frequencies to detect emotions are small. Theythen collected their features from the theta, alpha and betaranges. They performed cross correlation on each channel

(a)

(b)

(c)

(d)

Figure 1: Muscle movements in the full face condi-tions: (a) joy (b) fear (c) anger (d) sadness [4].

pairs. The output of this cross correlation is a set of 65 vari-ables that is linearly transformed to a vector of 1x4 usinga transition matrix. Each value indicates the magnitude ofthe presence of one of the four emotions. This means thatany testing sample is a linear combination of the four emo-tions. After that they applied certain threshold to infer theemotion of interest.

A good step towards real time applications that dependon EEG for emotion recognition is proposed by Schaaff andSchultz [19]. Schaaff and Schultz [19] used only 4 electrodes(FP1, FP2, F7, F8) for EEG recording. The main purposeof this research is to classify between positive, negative andneutral emotions. To reach their goal, they selected peakalpha frequency, alpha power, cross-correlation features andsome statistical features such as the mean of the signal, thestandard deviation. They used Support Vector Machines forclassification and they reached an average accuracy of 47%.

Other researches imply a multimodal technique for emo-tion detection. One of these studies was done by Savran etal [18]. They propose using EEG, functional near-infraredimaging (fNIRS) and video processing. fNIRS detects thelight that travels through the cortex tissues and is used tomonitor the hemodynamic changes during cognitive and/oremotional activity. Savran et al [18] combined EEG withfNIRS along with some physiological signals in one systemand fNIRS with video processing in another system. Theydecided not to try video processing with EEG because facialexpressions inject noise into EEG signals. Also, when theyrecorded both EEG and fNIRS, they excluded the signalscaptured from the frontal lobe because of the noise producedby the fNIRS recordings. For experimentation, they showedthe participant images that will induce the emotion of inter-est and then recorded fNIRS, EEG and video after showingthese images. The fusion among the different modalities isdone on the decision level and not on the feature level. Theproblem with this research is that they are excluding datathat are contaminated with facial expressions which is notpractical since most emotions are accompanied with somesort of facial expressions. This makes their approach im-practical in lifetime situations.

3. EEG STUDYIn this research, we are using the database of EEG signals

collected in the university of Arizona by Coan et al. [4].Tinelectrodes in a stretch-lycra cap (Electrocap, Eaton, Ohio)were placed on each participant’s head. EEG was recordedat 25 sites( FP1, FP2 F3, F4, F7, F8, Fz, FTC1, FTC2, C3,C4, T3, T4, TCP1, TCP2, T5, T6, P3, P4, Pz, O1, O2, Oz,A1, A2) and referenced online to Cz.

3.1 ParticipantsThis database contains EEG data recorded from thirty-six

participants (10 men and 26 women). All participants wereright handed. The age of the participants ranged from 17 to24 years, with a mean age of 19.1. The ethnic composition ofthe sample was 2.7% African American, 2.7% Asian, 18.9%Hispanic, and 75.7% Caucasian.

3.2 ProcedureAccording to Coan et al. [4], the experimenter informed

participants that they were taking part in a methodologicalstudy designed to identify artifacts in the EEG signal intro-duced by muscles on the face and head. Participants werefurther told that accounting for these muscle movement ef-fects would require them to make a variety of specific move-ments designed to produce certain types of muscle artifact.The presence of such muscle artifacts make the problem ofemotion detection using EEG very difficult because the EEGsignals will be contaminated with muscle artifacts which getsit close to real time applications where there will be no con-trol over the facial muscles or other sources of noise.

Participants were led to believe that they were engagedin purposely generating error-muscle artifact. It was hopedthat although participants might detect the associations be-tween the directed facial action tasks and their respectivetarget emotions, they would not think of the target emo-tions as being of interest to the investigators. After partici-pants were prepared for psychophysiological recording withEEG and facial EMG electrodes , participants sat quietlyfor 8 min during which resting EEG was recorded duringa counterbalanced sequence of minute-long eyes-open andeyes-closed segments.

For the facial movement task, participants were seated ina sound-attenuated room, separate from the experimenter.The experimenter communicated with participants via mi-crophone, and participants’ faces were closely monitored atall times via video monitor. Participant facial expressionswere recorded onto videotape, as were subsequent verbalself-reports of experience. The experimenter gave explicitinstructions to participants concerning how to make eachfacial movement, observing participants on the video moni-tor to ensure that each movement was performed correctly.

Participants were asked to perform relatively simple move-ments first, moving on to more difficult ones. For example,the first movement participants were asked to perform isone that is part of the expression of anger. This movementengages the corrugator muscle in the eyebrow and foreheaddrawing the eyebrows down and together. Subjects wereasked to make the movement in the following manner: ”moveyour eyebrows down and together.” This was followed bytwo other partial faces, making three partial faces in all. Nocounterbalancing procedure was used for the control faces,as they were all considered to be a single condition.

One of the approaches that describes facial movementsand their relation with different emotions is the Facial Ac-tion Coding System (FACS) [5], a catalogue of 44 unique

action units (AUs) that correspond to each independent mo-tion of the face. It also includes several categories of headand eye movements. FACS enables the measurement andscoring of facial activity in an objective, reliable and quan-titative way. Expressions included joy (AUs 6 + 12 + 25),anger (AUs 4 + 5 + 7 + 23/24), fear (AUs 1 + 2 + 4 + 5+ 15 + 20), sadness (AUs 1 + 6 + 15 + 17) can be shownin Fig 1. These kinds of action units are used to entice suchemotions.

After completing the directed facial action task of a par-ticular face, each participant was asked each of the follow-ing: (1) While making that face, did you experience anythoughts? (2) While making that face, did you experienceany emotions? (3) While making that face, did you experi-ence any sensations? (4) While making that face, did youfeel like taking any kind of action, like doing anything? Ifanything was reported, participants were then asked to rateits intensity on a scale of 1 to 7 (1 = no experience at all; 7= an extremely intense experience).

For each participant, we have four files indicating one ofthe four emotions. Each file has two minutes of recording.These two minutes are not fully representing emotions. Hu-man coders were used to code the start and the end of eachemotion. We used a one minute of recording between thestart and the end. In order to have more than one file foreach emotion for each participant, we worked on two 30-second epoches.

4. TYPES OF NOISE

4.1 Technical ArtifactsThe technical artifacts are usually related to the environ-

ment where the signals are captured. One source of technicalnoise is the electrodes itself [12]. If the electrodes are notproperly placed over the surface of the scalp or if the re-sistance between the electrode and the surface of the scalpexceeds 5 kohm, this will result in huge contamination ofthe EEG. Another source of technical artifact is the linenoise. This noise occurs due to A/C power supplies whichmay contaminate the signal with 50/60 Hz if the acquisitionelectrodes are not properly grounded. Our EEG database iscontaminated with the 60 Hz frequency.

4.2 Physiological ArtifactsAnother sources of noise are the physiological artifacts.

Those physiological artifacts include eye blinking, eye move-ments, Electromyography (EMG), motion, pulse and sweatartifacts [12].

The problem in eye blinks is that it produces a signalwith a high amplitude that is usually much greater than theamplitude of the EEG signals of interest. Eye movementsare similar to or even stronger than eye blinks.

The EMG or muscle activation artifact can happen due tosome muscle activity such as movement of the neck or somefacial muscles. This can affect the data coming from somechannels, depending on the location of the moving muscles.

As for the motion artifact, it takes place if the subject ismoving while EEG is being recorded. The data obtained canbe corrupted due to the signals produced while the personis moving, or due to the possible movement of electrodes.

Another involuntary types of artifacts are pulse and sweatartifacts. The heart is continuously beating causing the ves-sels to expand and contract; so if the electrodes are placed

Classication

Voluntary facial expression

EEG Recording

O!ine Average Reference

Downsampling to 256 Hz

Bandpass lter [3-30] Hz

Signal Preprocessing

Fast Fourier Transform

Extracting Alpha Band

Feature Extraction

Figure 2: Multistage approach for emotion detectionusing EEG.

near blood vessels, the data coming from them will be af-fected by the heartbeat.

Sweat artifacts can affect the impedance of the electrodesused in recording the brain activity. Subsequently, the datarecorded can be noisy or corrupted. These different types ofnoise make the processing of EEG a difficult task especiallyin real time environment where there is no control over theenvironment or the subject.

Our dataset is largely contaminated with facial muscleartifacts. Despite this highly noisy dataset, we are trying toachieve reasonable detection accuracy for the four emotions,anger, fear, joy and sad , and low false positive so that wecan integrate our emotion detection approach into real timeaffective computing systems.

5. APPROACH FOR EMOTION DETECTIONUSING EEG

As shown in Fig. 2, we use a multilevel approach for ana-lyzing EEG to infer the emotions of interest. The recordedEEG recorded signals are first passed through the signalpreprocessing stage in which the EEG signals are passedthrough number of filters for noise removal. After that rel-evant features are extracted from the signals and finally weuse support vector machines for classification.

5.1 Signal PreprocessingFig. 2 shows the three stages that EEG data is passed

through during the signal preprocessing stage. Our EEGdata are referenced online to Cz. According to the recom-mendation of Reid et al [16] who pointed out that this ref-erence scheme did not correlate particularly well, an offlineaverage reference is performed for the data by subtractingfrom each site average activity of all scalp sites. To reducethe amount of data to be analyzed, our data is downsampledfrom 1024 Hz to 256 Hz.

Our dataset is largely contaminated with facial muscle andeye blink artifacts. Moreover, there are segments that arehighly contaminated with artifacts and are marked for re-moval. Instead of rejecting such segments, we included them

1 sec

2 sec

30 seconds

Figure 3: Applying FFT to overlapping windows.

in our analysis so that our approach can be generalized toreal time applications. Since most of the previously men-tioned artifacts appear in low frequencies, we used a bandpass finite impulse response filter that removed the frequen-cies below 3 Hz and above 30 Hz.

5.2 Feature ExtractionOur approach divides each 30 sec data epoch into 29 win-

dows, 2 seconds wide with a sec overlapping window. Eachwindow is converted into the frequency domain using FastFourier Transform (FFT) as shown in Fig. 3. The fre-quency descriptors of the power bands, theta, alpha andbeta rhythms, are extracted. This resulted in a huge featureset of 146025 features.

5.2.1 Feature Reduction Using Alpha BandWe made use of the study made by Kostyunina et al. [11]

in order to reduce our feature set. Kostyunina et al. [11]showed that emotions such as joy, aggression and intentionresult in an increase in the alpha power whereas, emotionssuch as sorrow and anxiety results in a decrease in the alphapower. As a result of this conclusion, we focused our featureextraction on the power and phase of the alpha band onlywhich ranges from 8 Hz to 13 Hz for the 25 channels. Weused other features such as the mean phase, the mean power,the peak frequency, the peak magnitude and the numberof samples above zero. Making use of the study made byKostyunina et al. [11] helped in decreasing the number offeatures from 146025 to 10150 features.

5.2.2 Feature Reduction Using EEG Scalp Asymme-tries

Another important research that we made use of in orderto reduce our feature set is the research done by Coan etal. [4]. Coan et al. [4] showed that positive emotions areassociated with relatively greater left frontal brain activ-ity whereas negative emotions are associated with relativelygreater right frontal brain activity. They also showed thatthe decrease in the activation in other regions of the brainsuch as the central, temporal and mid-frontal was less thanthe case in the frontal region. This domain specific knowl-edge helped us in decreasing the number of features from10150 to only 3654 features.

The asymmetry features between electrodes i and j at fre-quency n are obtained using the following equation

c(n, i, j) = Xi(fn)−Xj(fn)

in which Xi(fn) is the frequency power at electrode i andthe nth bin. This equation is applied to scalp symmetricelectrodes only such as (C3, C4), (FP1, FP2)...etc.

5.3 Support Vector Machines (SVMs)For classification, we used support vector machines (SVMs).

SVM is a supervised learning technique. Given a training set

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Alpha linear kernel

Alpha + asymmetry linear kernel

Percentage

Run Number

Figure 4: A comparison of the classification accuracyof joy emotion using a linear SVM kernel on twodifferent feature selection criteria.

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

overall

absence of joy

presence of joy

Run number

Percentage

Figure 5: A comparison of the classification accuracyof joy emotion using a linear SVM kernel on thesecond set of features (alpha band + asymmetry).

of feature vectors, SVMs attempt to find a hyperplane suchthat the two classes are separable and given a new featurevector, SVMs try to predict to which class this new featurevector belongs to.

SVMs view the input data, FFT features, as two sets ofvectors in a n-dimensional space. SVM will construct a sep-arating hyperplane in that space that maximizes the mar-gin between the two data sets. A good hyperplane will bethe one that has the highest distance to different points indifferent classes [8]. We built eight different binary classi-fiers. For each emotion, we used two different classifiers, thefirst classifier is trained on the alpha band extracted featuresonly. The second classifier is trained on scalp asymmetriesextracted features. For each classifier, we used linear, poly-nomial and radial kernels.

6. EXPERIMENTAL RESULTSThe experiment included 36 participants with 265 sam-

ples (66 samples representing joy, 64 samples representingsadness, 65 samples representing fear and 70 samples repre-senting anger). We started by building a joy emotion classi-fier on which all the samples representing joy are consideredpositive samples and all other samples represent negativesamples.

Six different classifiers were built, two classifiers with lin-ear kernel for each set of features, two classifiers with ra-dial kernel for each set of features and two classifiers withpolynomial features for each set of features. The SVM classi-fiers with polynomial did not converge whereas the classifierswith radial kernel resulted in a very low accuracy of almost0 %.

To test our classifiers, we used 20-fold cross validation inwhich we divided our 265 samples into testing samples (10%)and training samples (90%) which means that the sampleswe used for training are different from those used for testing.We repeated this approach 20 times during which the testingand training samples were selected randomly and we madesure that the training and testing samples are different in

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

overall

absence of anger

presence of anger

Run number

Percentage

Figure 6: A comparison of the classification accuracyof anger emotion using a linear SVM kernel on thesecond set of features (alpha band + asymmetry).

the 20 trials. Since we have only two samples per emotionand per subject and using random selection for training andtesting samples make our approach user independent. Fig. 4compares the true positive, false negative and overall detec-tion accuracy results of two different classifiers. We foundout that the use of the alpha band combined with EEG scalpdifference resulted in a better detection accuracy than usingthe alpha band only. This again proves that using neuro-science findings in feature selection helps in decreasing thesize of the feature set and results in better classification ac-curacies. Also, we found out that the radial kernel for bothtypes of features resulted in 0 % accuracy for joy and in avery high classification accuracy of almost 100% for the notjoy emotion. The average detection accuracy is 51% and83% for the presence of joy and not joy emotion respectivelyusing linear kernel.

Fig. 5 shows the average overall detection accuracy, av-erage detection accuracy of the presence of the joy and theaverage detection accuracy of the absence of the joy emotionusing a linear SVM kernel. The average overall detection ac-curacy represents the number of correctly classified samplesthat represent joy or not joy divided by the total number oftesting samples which is 27 samples, 10% of the total numberof samples. The average detection accuracy of the presenceof joy is the number of correctly classified samples that rep-resent joy divided by the total number of joy samples in thetesting set. Finally, the average detection accuracy of theabsence of joy is the number of correctly classified samplesthat represent not joy divided by the total number of notjoy samples in the testing set. From the graph, it can bededucted that the false negative is in the range of 77% to95% which means that the false positive is very low in therange of 5 % to 23 %.

We applied the same approach for building classifiers foranger, fear and sad emotions. Fig. 6, Fig. 7, Fig. 8 showsthe classification accuracies of the linear SVM kernel for thesecond set of features (alpha + asymmetry) for anger, fearand sad emotions respectively.

The reason why the accuracies of anger, fear, joy andsadness range from 30% to 72.6% can be explained by thefact that voluntary facial expressions may affect the emo-tional state of people differently and with different inten-sities. Coan and Allen [3] who experimented on the samedataset, reported that the dimensions of experience vary as afunction of specific emotions and individual differences whencompared self reports against the intended emotions to beelicited with certain facial expressions. Table 1 shows thereport rates for different emotions. Table 1 can show thatself reports were different from the the intended emotions

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

overall

absence of fear

presence of fear

Run number

Percentage

Figure 7: A comparison of the classification accuracyof fear emotion using a linear SVM kernel on thesecond set of features (alpha band + asymmetry).

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

overall

absence of sad

presence of sad

Percentage

Run number

Figure 8: A comparison of the classification accuracyof sad emotion using a linear SVM kernel on thesecond set of features (alpha band + asymmetry).

in 48% of the samples. In this work, we did not ignore thesamples for which self reports did not match the elicitedemotions. It may have increased the accuracy if we used thesamples for which the participants have felt and reportedthe same emotion as the intended one. Also, the accuracymay be affected if the samples used are the ones that theparticipants reported the emotions with high intensities.

Table 2 shows a comparison of the average detection accu-racy for the four emotions. For each emotion, we are report-ing the results of the linear SVM kernel on two feature sets,using the alpha band only and using the alpha band alongwith scalp asymmetries. For each feature set, the percentageof presence of joy, for instance, is computed as:

(

N∑i=1

F (i)) ∗ 100/N

where F(i) is 1 if the joy sample number i was correctlyclassified and 0 otherwise. N is the number of all the joysamples in the 20 different runs. The overall accuracy isthe number of samples whether it is joy or not joy that arecorrectly classified divided by the total number of samplesin the 20 different runs.

It is observed that the accuracy of the linear kernel for thesecond feature set (alpha + asymmetry) is higher than thelinear kernel for the first feature set (alpha band only) in joy,anger and sad emotions. Whereas, the detection accuracyfor the linear kernel for the first feature set (alpha bandonly) is higher in the fear emotion than the linear kernel ofthe second feature set (alpha + asymmetry).

7. CONCLUSIONThe goal of this research is to study the possibility of clas-

sifying four different emotions from brain signals that wereelicited due to voluntary facial expressions. We proposedan approached that is applied on a noisy database of brainsignals. Testing on large corpus of 36 subjects and usingtwo different techniques for feature extractions that rely on

Table 1: Self Report Rates by Emotion. The ratecolumn reflects the percentage that self reports werethe same as the target emotion.

Emotion RateAnger 65.7%Fear 61.8%Joy 50.0%Sadness 30.6%Overall Average 52.0%

Table 2: Results of emotion classification using lin-ear SVM kernels on two different feature sets: usingthe alpha band only and using scalp asymmetries.

Emotion Alpha Alpha + asymmetrypresence overall presence overall

Anger 38% 73% 53% 74%Fear 58% 79% 38% 77%Joy 38% 73% 51.2% 74%Sadness 48% 77% 61% 79%

domain knowledge, we reached an accuracy of 53%, 58%,51% and 61% for anger, fear, joy and sadness emotions re-spectively.

7.1 Future DirectionsOne of the areas where we can enhance this study is to

reduce the number of features. This can be done by reduc-ing the number of channels. We will work on studying theeffect of reducing the number of channels against the clas-sification accuracy. Reducing the number of channels willhelp us reduce the processing time and make the classifica-tion task more portable. Hence, it can be used in real timeapplications.

Another way to achieve a better classification results is toimprove our preprocessing stage. This can be done by usingIndependent Component Analysis (ICA). ICA is a compu-tational model that can extract the different components ofthe signals. For instance, ICA can separate EEG and phys-iological noise from the recorded signals.

Finally, It will be interesting to compare the results achievedfrom our methodology with an emotion detection systemthat relies on facial expressions for emotion detection.

8. REFERENCES[1] K. Ansari-Asl, G. Chanel, and T. Pun. A channel

selection method for EEG classification in emotionassessment based on synchronization likelihood. InEusipco 2007, 15th Eur. Signal Proc. Conf.

[2] G. Chanel, J. Kronegg, D. Grandjean, and T. Pun.Emotion assessment: Arousal evaluation using EEG’sand peripheral physiological signals. Lecture Notes inComputer Science, 4105:530, 2006.

[3] J. Coan and J. Allen. Varieties of emotionalexperience during voluntary emotional facialexpressions. Ann. NY Acad. Sci, 1000:375–379, 2003.

[4] J. Coan, J. Allen, and E. Harmon-Jones. Voluntaryfacial expression and hemispheric asymmetry over thefrontal cortex. Psychophysiology, 38(06):912–925, 2002.

[5] P. Ekman, W. Friesen, J. Hager, and A. Face. FacialAction Coding System. 1978.

[6] P. Ekman, R. Levenson, and W. Friesen. Autonomicnervous system activity distinguishes among emotions.Science, 221(4616):1208–1210, 1983.

[7] R. El Kaliouby and P. Robinson. Mind readingmachines: automated inference of cognitive mentalstates from video. In 2004 IEEE InternationalConference on Systems, Man and Cybernetics,volume 1.

[8] S. Gunn. Support Vector Machines for Classificationand Regression. ISIS Technical Report, 14, 1998.

[9] K. Kim, S. Bang, and S. Kim. Emotion recognitionsystem using short-term monitoring of physiologicalsignals. Medical and biological engineering andcomputing, 42(3):419–427, 2004.

[10] S. Kim, P. Georgiou, S. Lee, and S. Narayanan.Real-time emotion detection system using speech:Multi-modal fusion of different timescale features. InIEEE 9th Workshop on Multimedia Signal Processing,2007. MMSP 2007, pages 48–51, 2007.

[11] M. Kostyunina and M. Kulikov. Frequencycharacteristics of EEG spectra in the emotions.Neuroscience and Behavioral Physiology,26(4):340–343, 1996.

[12] J. Lehtonen. EEG-based brain computer interfaces.Helsinky University Of Technology, 2002.

[13] T. Musha, Y. Terasaki, H. Haque, and G. Ivamitsky.Feature extraction from EEGs associated withemotions. Artificial Life and Robotics, 1(1):15–19,1997.

[14] R. Picard. Affective computing. MIT press, 1997.

[15] R. Plutchik. A general psychoevolutionary theory ofemotion. Theories of Emotion, 1, 1980.

[16] S. Reid, L. Duke, and J. Allen. Resting frontalelectroencephalographic asymmetry in depression:Inconsistencies suggest the need to identify mediatingfactors. Psychophysiology, 35(04):389–404, 1998.

[17] D. Sander, D. Grandjean, and K. Scherer. A systemsapproach to appraisal mechanisms in emotion. Neuralnetworks, 18(4):317–352, 2005.

[18] A. Savran, K. Ciftci, G. Chanel, J. Mota, L. Viet,B. Sankur, L. Akarun, A. Caplier, and M. Rombaut.Emotion detection in the loop from brain signals andfacial images. Proc. of the eNTERFACE 2006, 2006.

[19] K. Schaaff and T. Schultz. Towards an EEG-basedEmotion Recognizer for Humanoid Robots. The 18thIEEE International Symposium on Robot and HumanInteractive Communication, pages 792–796, 2009.

[20] F. Strack, L. Martin, and S. Stepper. Inhibiting andfacilitating conditions of the human smile: Anonobtrusive test of the facial feedback hypothesis.Journal of Personality and Social Psychology,54(5):768–777, 1988.

[21] F. Strack, N. Schwarz, B. Chassein, D. Kern, andD. Wagner. The salience of comparison standards andthe activation of socail norms consequences forjudgments of happiness and their communications.1989.

World’s First Wearable Humanoid Robot that Augments Our Emotions

Dzmitry Tsetserukou Toyohashi University of Technology

1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi, 441-8580 Japan

[email protected]

Alena Neviarouskaya University of Tokyo

7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 Japan

[email protected]

ABSTRACT In the paper we are proposing a conceptually novel approach to reinforcing (intensifying) own feelings and reproducing (simulating) the emotions felt by the partner during online communication through wearable humanoid robot. The core component, Affect Analysis Model, automatically recognizes nine emotions from text. The detected emotion is stimulated by innovative haptic devices integrated into the robot. The implemented system can considerably enhance the emotionally immersive experience of real-time messaging. Users can not only exchange messages but also emotionally and physically feel the presence of the communication partner (e.g., family member, friend, or beloved person).

ACM Classification Keywords H5.3 [Information interfaces and presentation (e.g., HCI)]: Group and organization interfaces – web-based interaction. H5.2. [Information interfaces and presentation (e.g., HCI)]: User Interfaces – haptic I/O, interaction styles, prototyping. I.2.7 [Artificial Intelligence]: Natural Language Processing – language parsing and understanding, text analysis. I.2.9 [Artificial Intelligence]: Robotics.

General Terms Design, Experimentation.

Keywords Wearable humanoid robot, affective user interfaces, haptic display, tactile display, haptic communication, online communication, Instant Messaging, 3D world.

1. INTRODUCTION “All emotions use the body as their theater…”

Antonio Damasio Computer-mediated communication allows interaction of people who are not physically sharing the same space. Nowadays,

companies providing media for remote online communications place a great importance on live communication and immersive technologies. Along with widely used Instant Messengers (such as Yahoo IM, AOL AIM, Microsoft Windows Live Messenger, Google talk), such new web services as Twitter, Google Wave gain notability and popularity worldwide. Such applications allow keeping in touch with friends in real-time over multiple networks and devices. Recently mobile communication companies launched Instant Messenger service on cellular phones (e.g., AIM on iPhone). 3D virtual worlds (e.g., Second Life, OpenSim) are also embedded with chat and instant messenger. Such systems encourage people to establish or strengthen interpersonal relations, to share ideas, to gain new experiences, and to feel genuine emotions accompanying all adventures of virtual reality. However, conventional mediated systems usually (1) support only simple textual cues like emoticons; (2) lack visual emotional signals such as facial expressions and gestures; (3) support only manual control of expressiveness of graphical representations of users (avatars); and (4) completely ignore such important channel of social communication as sense of touch. Tactile interfaces could allow users to enchance their emotional communication abilities by addind a whole new dimention to mobile communication [4,15]. Besides emotions conveyed through text, researchers developed an additional modality for communicating emotions in Instant Messenger (IM) through tactile interfaces with vibration patterns [18,24,26]. However, in


Figure 1. User communicating through iFeel_IM!. The devices worn on the body enhance experienced emotions.

the proposed methods users have to memorize the vibration or pin matrix patterns and cognitively interpret the communicated emotional state. Demodulation of haptically coded emotion is not natural for human-human communication, and direct evocation of emotion cannot be achieved in such kind of systems. Moreover, users have to place fingers on the tactile display in order to maintain contact with tactors that interrupts the typing process while Instant Messaging. The rest of shortcomings of conventional communication system hold true.

2. AFFECTIVE HAPTICS. EMERGING FRONTIER Everything we know about the world entered our minds through the senses of sight, hearing, test, touch, and smell. Our feelings are rich and continuous flow of changing percepts. All our senses play significant role in recognition of emotional states of communication partner. Human emotions can be easily evoked by different cues, and sense of touch is one of the most emotionally charged channels.

Affective Haptics is the emerging area of research which focuses on the design of devices and systems that can elicit, enhance, or influence on emotional state of a human by means of sense of touch. We distinguish four basic haptic (tactile) channels governing our emotions: (1) physiological changes (e.g., heart beat rate, body temperature, etc.), (2) physical stimulation (e.g., tickling), (3) social touch (e.g., hug), (4) emotional haptic design (e.g., shape of device, material, texture). Driven by the motivation to enhance social interactivity and emotionally immersive experience of real-time messaging, we pioneered in the idea of reinforcing (intensifying) own feelings and reproducing (simulating) the emotions felt by the partner through specially designed wearable humanoid robot iFeel_IM! (Figure 1). The philosophy behind the iFeel_IM! (intelligent wearable humanoid robot for Feeling enhancement powered by affect sensitive Instant Messenger) is “I feel [therefore] I am!”. The emotion elicited by physical stimulation might imbue our communication with passion and increase the emotional intimacy, ability to be close, loving, and vulnerable. The interpersonal relationships and the ability to express empathy grow strongly when people become emotionally closer through disclosing thoughts, feelings, and emotions for the sake of understanding. In this work, we focus on the implementation of an innovative system, which includes haptic devices for generation of physical stimulations aimed at conveying the emotions experienced during online conversations. We attempt to influence on human emotions by physiological changes, physical stimulation, social touch, and emotional design.

3. ARCHITECTURE OF WEARABLE HUMANOID ROBOT A humanoid robot is an electro-mechanical machine with its overall appearance based on that of the human body and artificial intelligence allowing complex interaction with tools and environment. The field of humanoid robotics is advancing rapidly (e.g., ASIMO, HRP-4C). However, they have not found the practical applications at our homes yet (high price, large sizes, safety problems, etc.). Recent science fiction movies explore the future vision of co-existence of human beings and robots. In the movie “Surrogates” humans withdrew from everyday life almost

completely. Instead, they stay at home safely and comfortably and control the robotic replicas of themselves. Human is able to feel and to see what the surrogate feels and sees. However, surrogates are physically stronger, move faster, and look more beautiful than human beings. “Terminator. Salvation” offers us a definition of what separates man from machine – it is the human heart (repository and indicator of our emotions). “Avatar” suggests the opportunity of controlling the behavior of sapient humanoid by brain waves of the genetically matched human operator. Despite different ideas presented in those movies, the main message is that humans must not heavily rely on their substitutions (surrogates, cyborgs, avatars, etc.) and live by their own.

The paradigm of wearable humanoid robotics is to augment the human beings abilities rather than substitute them. Such robots are characterized by wearable design, structure based on that of the human body, embedded devices for influencing and enhancing our emotional state, health, physical strength, etc., and artificial intelligence allowing communication and sensing the user and environment.

Figure 2. Wearable humanoid robot iFeel_IM!.

The structure of the wearable humanoid robot iFeel_IM! is shown in Figure 2. As can be seen, the structure is based on that of the human body and includes such parts as head, brain, heart, hands, chest, back, abdomen, and sides. In the iFeel_IM!, great importance is placed on the automatic sensing of emotions conveyed through textual messages in 3D virtual world Second Life (artificial intelligence), the visualization of the detected emotions by avatars in virtual environment, enhancement of user’s affective state, and reproduction of feeling of social touch (e.g., hug) by means of haptic stimulation in a real world. The architecture of the iFeel_IM! is presented in Figure 3.

Head Heart

Hands

Chest

Back

Abdomen Side

Side

Brain

As a media for communication, we employ Second Life, which allows users to flexibly create their online identities (avatars) and to play various animations (e.g., facial expressions and gestures) of avatars by typing special abbreviations in a chat window. The control of the conversation is implemented through the Second Life object called EmoHeart (invisible in case of ‘neutral’ state) attached to the avatar’s chest. In addition to communication with the system for textual affect sensing (Affect Analysis Model), EmoHeart is responsible for sensing symbolic cues or keywords of ‘hug’ communicative function conveyed by text, and for visualization (triggering related animation) of ‘hugging’ in Second Life. The results from the Affect Analysis Model (dominant emotion and intensity) and EmoHeart (‘hug’ communicative function) are stored along with chat messages in a file on local computer of each user. Haptic Devices Controller analyses these data in a real time and generates control signals for Digital/Analog converter (D/A), which then feeds Driver Box for haptic devices with control cues. Based on the transmitted signal, the corresponding haptic device

(HaptiHeart, HaptiHug, HaptiButterfly, HaptiTickler, HaptiTemper, and HaptiShiver) worn by user is activated.

4. AFFECT RECOGNITION FROM TEXT The Affect Analysis Model [20] senses nine emotions conveyed through text (‘anger’, ‘disgust’, ‘fear’, ‘guilt’, ‘interest’, ‘joy’, ‘sadness’, ‘shame’, and ‘surprise’). The affect recognition algorithm, which takes into account specific style and evolving language of online conversation, consists of five main stages: (1) symbolic cue analysis; (2) syntactical structure analysis; (3) word-level analysis; (4) phrase-level analysis; and (5) sentence-level analysis. Our Affect Analysis Model was designed based on the compositionality principle, according to which we determine the emotional meaning of a sentence by composing the pieces that correspond to lexical units or other linguistic constituent types governed by the rules of aggregation, propagation, domination, neutralization, and intensification, at various grammatical levels. Analyzing each sentence in sequential stages, this method is capable of processing sentences of different complexity, including simple, compound, complex (with complement and relative clauses), and complex-compound sentences. To measure the

Figure 3. Architecture of the iFeel_IM!. In order to communicate through iFeel_IM!, users have to wear innovative affectivehaptic devices (HaptiHeart, HaptiHug, HaptiButterfly, HaptiTickler, HaptiTemper, and HaptiShiver) developed by us.

3D world Second Life

Affect Analysis Model

chat text

emotion: intensity

Chat log file

Haptic Devices

Controller

PC

D/A

HaptiHeart HaptiHug HaptiTickler HaptiBatterfly

HaptiTemper and HaptiShiver Driver Box

percent of all sentences, %

0 20 40 60 80 100

Neutral

Emotional

percent of emotional sentences, %

0 20 40 60 80 100

Negative

Positive

1.1

1.0

1.8

2.1

68.8

8.8

0.6

9.0

6.9

angerdisgustfearguiltinterestjoysadnessshamesurprise

accuracy of the proposed emotion recognition algorithm, we extracted 700 sentences from a collection of diary-like blog posts provided by BuzzMetrics (http://www.nielsenbuzzmetrics.com). We focused on online diary or personal blog entries, which are typically written in a free style and are rich in emotional colourations. Three independent annotators labelled the sentences with one of nine emotions (or neutral) and a corresponding intensity value. We developed two versions of the Affect Analysis Model (AAM) differing in syntactic parsers employed during the second stage of affect recognition algorithm: (1) AAM with commercial parser Connexor Machinese Syntax (http://www.connexor.eu) (AAM-CMS); (2) AAM with GNU GPL licensed Stanford Parser (http://nlp.stanford.edu/software/lex-parser.shtml) (AAM-SP). The performance of the AAM-CMS and AAM-SP was evaluated against two sets of sentences related to ‘gold standards’: 656 sentences, on which two or three human raters completely agreed; (2) 249 sentences, on which all three human raters completely agreed. An empirical evaluation of the AAM algorithm showed promising results regarding its capability to accurately (AAM-CMS achieves accuracy in 81.5 %) classify affective information in text from an existing corpus of informal online communication.

5. EmoHeart

Once attached to the avatar in Second Life, EmoHeart object (1) listens to each message of its owner, (2) sends it to the web-based interface of the AAM, (3) receives the result (dominant emotion and intensity), and visually reflects the sensed affective state through the animation of avatar’s facial expression, EmoHeart texture (indicating the type of emotion), and size of the texture

(indicating the strength of emotion, namely, ‘low’, ‘middle’, or ‘high’). If no emotion is detected in the text, the EmoHeart remains invisible and the avatar facial expression remains neutral. The examples of avatar facial expressions and EmoHeart textures are shown in Figure 4. During a two month period (December 2008 – January 2009), 89 Second Life users became owners of EmoHeart, and 74 of them actually communicated using it. Text messages along with the results from AAM were stored in an EmoHeart log database. From all sentences, 20 % were categorized as emotional by the AAM and 80 % as neutral (Figure 5). We observed that the percentage of sentences annotated by positive emotions (‘joy’, ‘interest’, ‘surprise’) essentially prevailed (84.6 %) over sentences annotated by negative emotions (‘anger’, ‘disgust’, ‘fear’, ‘guilt’, ‘sadness’, ‘shame’). We believe that this dominance of positivity expressed through text is due to the nature and purpose of online communication media. We analysed the distribution of emotional sentences from EmoHeart log data according to the fine-grained emotion labels from our Affect Analysis Model (Figure 6).

Figure 6. Avatar facial expressions and EmoHeart textures.

We found that the most frequent emotion conveyed through text messages is ‘joy’ (68.8 % of all emotional sentences), followed by ‘surprise’, ‘sadness’ and ‘interest’ (9.0 %, 8.8 %, and 6.9 %, respectively).

6. AFFECTIVE HAPTIC DEVICES According to James-Lange theory [13], the conscious experience of emotion occurs after the cortex receives signals about changes in physiological state. Researchers argued that feelings are preceded by certain physiological changes. Thus, when we see a venomous snake, we feel fear, because our cortex has received signals about our racing heart, knocking knees, etc. Damasio [5] distinguishes primary and secondary emotions. Both involve

Figure 5. Percentage distribution of sentences.

Joy Sadness

Anger Fear

Figure 4. Avatar facial expressions and EmoHeart textures.The motivation behind using the heart-shaped object as anadditional channel for visualization was to represent thecommunicated emotions in a vivid and expressive way.

changes in bodily states, but the secondary emotions are evoked by thoughts. Recent empirical studies support non-cognitive theories of nature of emotions. It was proven that we can easily evoke our emotions by something as simple as changing facial expression (e.g., smile brings on a feeling of happiness) [29]. Moreover, it is believed that some of our emotion responses are mediated by direct pathways from perceptual centers in temporal cortex and the thalamus to the amygdala [17]. In order to support the affective communication, we implemented several novel haptic gadgets embedded in iFeel_IM!. They make up three groups. First group is intended for emotion elicitation implicitly (HaptiHeart, HaptiButterfly, HaptiTemper, and HaptiShiver), second type evokes affect in a direct way (HaptiTickler), and third one uses sense of social touch (HaptiHug) for influencing on the mood and providing some sense of physical co-presence. All these devices produce different senses of touch including kinesthetic and coetaneous channels [14]. Kinesthetic stimulations, which are produced by forces exerted on the body, are sensed by mechanoreceptors in the tendons and muscles. This channel is highly involved in sensing stimulus produced by HaptiHug device. On the other hand, mechanoreceptors in the skin layers are responsible for the perception of cutaneous stimulation. Different types of tactile corpuscles allow us sensing thermal property of the object (HaptiTemper), pressure (HaptiHeart, HaptiHug), vibration frequency (HaptiButterfly, HaptiTickler, and HaptiShiver), and stimuli location (localization of stimulating device enables association with particular physical contact). The affective haptic devices worn on a human body and their 3D models are presented in Figure 7.

Figure 7. Affective haptic devices worn on a human body.

6.1 HaptiHug: Realistic Hugging Over a Distance 6.1.1 Development of the haptic hug display On-line interactions heavily rely on senses of vision and hearing, and there is a substantial need in mediated social touch [9]. Among many forms of physical contact, hug is the most emotionally charged one. It conveys warmth, love, and affiliation. DiSalvo et al. [7] introduced “The Hug” interface. When person desires to communicate hug, he/she can squeeze the pillow, so that such action results in the vibration and temperature changes in the partner’s device. The Hug Shirt allows people who are

missing each other to send physical sensation of the hug over distance [12]. User can wear this shirt, embedded with actuators and sensors, in everyday life. However, these interfaces suffer from inability to resemble natural hug sensation and, hence, to elicit strong affective experience (only slight pressure is generated by vibration actuators) [10]; lack the visual representation of the partner, which adds ambiguity (hugging in a real life involves both visual and physical experience), and do not consider the power of social pseudo-haptic illusion (i.e., hugging animation is not integrated). Recently, there have been several attempts to improve the force feeling by haptic display. Mueller et al. [19] proposed air-inflatable vest with integrated compressor for presentation of hug over a distance. Air pump inflates the vest and thus generates light pressure around the upper body torso. Hug display Huggy Pajama is also actuated by the air inflation [27]. The air compressor is placed outside of the vest allowing the usage of more powerful actuator. However, pneumatic actuators possess strong nonlinearity, load dependency, time lag in response, and they produce loud noise [19]. Our goal is to develop a wearable haptic display generating the forces that are similar to those of a human-human hug. Such device should be lightweight, compact, with low power consumption, comfortable to wear, and aesthetically pleasing. When people are hugging, they generate pressure on the chest area and on the back of each other by the hands, simultaneously. The key feature of the developed HaptiHug is that it physically reproduces the hug pattern similar to that of human-human interaction. The hands for a HaptiHug are sketched from a real human and made from soft material so that hugging partners can realistically feel social presence of each other. The couple of oppositely rotating motors (Maxon RE 10 1.5 with gearhead GP 10 A 64:1) are incorporated into the holder placed on the user chest area. The Soft Hands, which are aligned horizontally, contact back of the user. Once ‘hug’ command is received, couple of motors tense the belt, pressing thus Soft Hands and chest part of the HaptiHug in the direction of human body (Figure 8).

Figure 8. Structure of the wearable HaptiHug device.

Pressure

Soft Hand

Direction of belt tensionPressure

Couple of motors

Pressure

Direction of belt tension

Soft Hand

Motor holder

Human body

The duration and intensity of the hug are controlled by the software in accordance with the emoticon or a keyword, detected from text. For the presentation of a plain hug level (e.g., ‘(>^_^)>’, ‘’, ‘<h>’), a big hug level (e.g., ‘>:D<’, ‘’), and a great big hug level (e.g., ‘gbh’, ‘’), the different levels of pressure with different durations are applied on the user’s back and chest. The Soft Hands are made from the compliant rubber-sponge material. The contour profile of a Soft Hand is sketched from the male human and has front-face area of 155.6 cm2. Two identical pieces of Soft Hand of 5 mm thickness were sandwiched by narrow belt slots and connected by plastic screws. Such structure provides enough flexibility to tightly fit to the human back surface, while being pressed by belt. Moreover, belt can loosely move inside the Soft Hands during tension. The dimensions and structure of Soft Hands are presented in Figure 9.

Figure 9. Left: Soft Hand dimensions. Right: sandwiched structure of Soft Hands.

6.1.2 Social pseudo-haptic touch We developed animation of hug and integrated it into Second Life (Figure 10).

Figure 10. Snapshots of hugging animation in Second Life.

During the animation the avatars approach and embrace each other by hands. The significance of our idea to realistically reproduce hugging is in integration of active-haptic device HaptiHug and pseudo-haptic touch simulated by hugging animation. Thus, high immersion into the physical contact of partners while hugging is achieved. In [16], the effect of pseudo-haptic feedback on the experiencing force was proved. We expect that hugging animation will also increase the force sensation.

6.1.3 Hug measurement Since so far there were no attempts to measure the pressure and duration of the hug, we conducted the experiments. A total of 3 pair of subjects (3 males and 3 females) with no previous knowledge about experiment was examined. Their age varied from 24 to 32. They were asked to hug each other three times with three different intensities (plain hug, big hug and great big hug levels). The subject’s chest and upper back side were covered

with Kinotex tactile sensor measuring the pressure intensity through amount of backscattered light falling on photodetector [23]. The taxels displaced with 21.5 mm in X and 22 mm in Y direction make up 6x10 array. The Kinotex sensitivity range is from 500 N/m2 to 8 000 N/m2. The example of pressure patterns on the back of the user and on the chest area are given in Figure 11 and Figure 12, respectively.

Figure 11. Example of pressure distribution on the back of the user. The highest pressure corresponds to 4800 N/m2.

Figure 12. Example of pressure distribution on the chest of the user. The highest pressure corresponds to 5900 N/m2.

The results of measured average pressure are listed in Table 1. Experimental results shows that males produce more force on the partner back than females. Interestingly, the pressure on the chest changes nonlineary. The probable cause of this is that while experiencing great big hug level humans protect the vitally important part of our body, heart, from overloading. Table 1. Experimental findings.

The developed HuptiHug device can achieve the force level of plain hug (generated pressure is bigger comparing with other hug displays). We consider that there is no reason to produce very strong forces (that requires more powerful motors) resulting

Plain Hug, kN/m2

Big Hug, kN/m2

Great big hug, kN/m2

Male Back 1.4 2.5 5.05

Female Back 1.7 2.9 6.4

Chest 2.3 3.5 5.9

Cover fabric

Belt slots

Soft HandBelt Soft Hand

sometimes in unpleasant sensations. Based on the experimental results we designed the control signals in such a way that the resulting pressure intensity, pattern, and duration are similar to those of human-human hug characteristics. We summarized the technical specifications of the hug displays in Table 2 (O means this characteristic is present, – is absent).

Table 2. Specifications of the hug displays.

Developed HuptiHug is capable of generating strong pressure while being lightweight, and compact. Such features of haptic hug display as visual representation of the partner, social pseudo-haptic touch, and pressure patterns similar to that of human-human interaction, increase the immersion into the physical contact of partners while hugging greatly.

6.2 HaptiHeart Enhancing User Emotions Each emotion is characterized by a specific pattern of physiological changes. We selected four distinct emotions having strong physical features [28]: ‘anger’, ‘fear’, ‘sadness’, and ‘joy’. The precision of AAM in recognition of these emotions is considerably higher (‘anger’ – 92 %, ‘fear’ – 91 %, ‘joy’ – 95 %, ‘sadness’ – 88 %) than of other emotions. Table 3 shows the affiliation of each haptic device and emotion being induced. Of the bodily organs, the heart plays a particularly important role in our emotional experience. The ability of false heart rate feedback to change our emotional state was reported in [6]. The research on interplay between heart rate and emotions revealed that different emotions are associated with distinct patterns of heart rate variations [2].

Table 3. Each affective haptic device is responsible for stimulation of particular emotion.

The heart sounds are generated by the beating heart and the flow of blood through it. There are two major sounds that are heard in the normal heart and are often described as a lub and a dub (“lubb-dub” sound occurs in sequence with each heartbeat). The first heart sound (lub), commonly termed S1, is caused by the sudden block of reverse blood flow due to the closure of mitral and tricuspid valves at the beginning of ventricular contraction. The second heart tone “dub”, or S2, is resulted from sudden lock of reversing blood flow at the end of ventricular contraction [3]. We developed the heart imitator HaptiHeart to produce special heartbeat patterns according to emotion to be conveyed or elicited (sadness is associated with slightly intense heartbeat, anger with quick and violent heartbeat, fear with intense heart rate). We take advantage of the fact that our heart naturally synchronizes with the heart of a person we hold or hug. Thus, the heart rate of a user is influenced by haptic perception of the beat rate of the HaptiHeart. Furthermore, false heart beat feedback can be directly interpreted as a real heart beat, so it can change the emotional perception. The HaptiHeart consists of two modules: flat speaker FPS 0304 and speaker holder. The flat speaker sizes (66.5 x 107 x 8 mm) and rated input power of 10 W allowed us to design powerful and relatively compact HaptiHeart device. It is able to produce realistic heartbeating sensation with high fidelity. The 3D model of HaptiHeart is presented in Figure 13.

Figure 13. HaptiHeart layout.

Hapti Hug

The Hug

Hug Shirt

Hug vest

Huggy Pajama

Weight, kg 0.146 >1.0 0.160 >2.0 >1.2

Overall sizes Height, m × Width, m

0.1 × 0.4 0.5 × 0.6

0.4 × 0.5

0.4 × 0.55

0.3 × 0.45

Wearable design O – O O O

Generated Pressure, kPa

4.0 – – 0.5 2.7

Actuators DC motors

Vibro- motors

Vibro- motors

Air pump

Air pump

Visual representation of the partner

O – – – –

Social pseudo-touch

O – – – –

Based on human-human hug

O – – – –

Haptic Device Emotions

Social TouchJoy Sad

ness Anger Fear

HaptiHeart V V V

HaptiButterfly V

HaptiShiver V

HaptiTemper V V V

HaptiTickler V

HaptiHug V V

Heart-shaped Speaker Case Flat Speaker

The pre-recorded sound signal with low frequency generates the pressure on the human chest through vibration of the speaker surface.

6.3 Butterflies in the Stomach (HaptiButterfly) and Shivers on Body’s Spine (HaptiShiver/HaptiTemper) HaptiButterfly was developed with the aim to evocate joy emotion. The idea behind this device is to reproduce effect of “Butterflies in the stomach” (fluttery or tickling feeling in the stomach felt by people experiencing love) by means of the arrays of vibration motors attached to the abdomen area of a person (Figure 14).

Figure 14. Structure of HaptiButterfly.

We conducted the experiment aimed at investigation of the patterns of vibration motor activation that produce most pleasurable and natural sensations on the abdomen area. Based on the results, we employ ‘circular’ and ‘spiral’ vibration patterns. The temperature symptoms are great indicators of differences between emotions. The empirical studies [28] showed that (1) fear and (in a lesser degree) sadness, are characterized as ‘cold’ emotions, (2) joy is the only emotion experienced as being ‘warm’, while (3) anger is ‘hot’ emotion. Fear is characterized by the most physiological changes in the human body. Driven by the fear, blood that is shunted from the viscera to the rest of the body transfers heat, prompting thus perspiration to cool the body. In order to boost fear emotion physically, we designed HaptiShiver interface that sends “Shivers down/up human body’s spine” by means of a row of vibration motors (HaptiShiver), and “Chills down/up human body’s spine” through both cold airflow from DC fan and cold side of Peltier element (HaptiTemper). The structure of HaptiShiver/HaptiTemper device is shown in Figure 15.

Figure 15. Structure of HaptiShiver/HaptiTemper.

6.4 HaptiTickler: Device for Positive Emotions Two different types of tickling are recognized. The first type is knismesis referring to feather-like (light) type of tickling. It is elicited by a light touch or by a light electrical current at almost any part of the body [25]. It should be emphasized that this type of tickling does not evoke laugh and is generally accompanied by an itching sensation that creates the desire to rub the tickled part of the body. The second type of tickle called gargalesis is evoked by a heavier touch to particular areas of the body such as armpits or ribs. Such kind of stimuli usually results in laugher and squirming. In contract to knismesis, one cannot produce gargalesis in oneself. Two explanations were suggested to explain the reasons of inability to self-tickling. The scientists supporting interpersonal explanations argue that tickling is fundamentally interpersonal experience and thus requires another person as the source of the touch [8]. On the other side of the debate is a reflex view, suggesting that tickle requires the element of unpredictability or uncontrollability. The experimental results from [11] support the later view and reveal that ticklish laugher evidently does not require that stimulation be attributed to another person. However, the social and emotional factors in ticklishness affect the tickle response greatly. We developed HaptiTickler with the purpose to evoke positive affect (joy emotion) in a direct way by tickling the ribs of the user. The device includes four vibration motors reproducing stimuli that are similar to human finger movements during rib tickling (Figure 16).

Figure 16. HaptiTickler device.

The uniqueness of our approach is in (1) combination of the unpredictability and uncontrollability of the tickling sensation through random activation of stimuli, (2) high involvement of the social and emotional factors in the process of tickling (positively charged on-line conversation potentiates the tickle response).

7. EMOTIONAL HAPTIC DESIGN Aesthetically pleasing objects appear to the user to be more effective by virtue of their sensual appeal [22]. The affinity the user feels for an object is resulted from the formation of an emotional connection with the object. Recent findings show that attractive things make people feel good, which in turn makes them think more creative. The importance of tactile experience to produce the aesthetic response is underlined in [21]. We propose the concept of Emotional Haptic Design. The core idea is to make user to feel affinity for the device by means of (1) appealing shapes evoking the desire to touch and haptically explore them, (2) usage of material pleasurable to touch, and (3) the pleasure anticipated through wearing.

Vibration motor

Plastic frame

Holder

Revolute joint

DC Fan Peltier element

Vibration motor

Aluminum plate

Front view Back view

Large vibration motor (little finger)

Large vibration motor (index finger)

Small vibration motors (middle and

ring finger)

The designed devices are pleasurable to look at and to touch (colorful velvet material was used to decorate the devices) and have personalized features (in particular, the Soft Hands in HaptiHug can be sketched from the hands of the real communication partner). The essence of emotional, moral, and spiritual aspects of a human being has long been depicted using heart-shaped symbol. The heart-shaped HaptiHeart was designed with primary objective to emotionally connect the user with the device, as heart is mainly associated with love and emotional experience. The HaptiButterfly, its shape, and activated vibration motors induce the association with a real butterfly lightly touching the human body by spreading its wings. We placed great attention on the comfortable wearing of garment. Such devices as HaptiButterfly, HaptiTickler, and HaptiShiver have the inner sides made from foam-rubber. While contacting the human body, surface shape is self-adjusted to fit particular contour of the human body, and any uncomfortable pressure is therefore avoided. All of the designed devices have flexible and intuitive in use system of bucklers and fasteners to enable user to easily adjust the devices to the body shape.

8. USER STUDY Recently, we demonstrated the wearable humanoid robot iFeel_IM! at such conferences as INTETAIN 2009, ACII 2009, and ASIAGRAPH 2009 (Figure 17). In total more than 300 persons had experienced our system. Most of them commented that the haptic modalities (e.g., heartbeating feeling, hug sensation) were very realistic. Subjects were highly satisfied with wearing the HaptiHug. The simultaneous observation of hugging animation and experiencing hugging sensation evoked surprise and joy in many participants. However, there were some remarks about necessity to design unique haptic stimuli for particular user (e.g., heart rate of each user while experiencing the same emotions is different).

Figure 17. Demonstration of iFeel_IM! at ASIAGRAPH 2009.

From our personal consideration we noticed that while stimulation joy through HaptiButterfly many participants were smiling. Participants expressed anxiety when fear was evocated through fast and intensive heartbeat rate. Taking into account our observations we will work further in order to improve the emotional immersive experience while online communication. The atmosphere between the participants and exhibitors became more relaxing and joyful while iFeel_IM! demonstration. That proves that wearable humanoid robot was successful at emotion elicitation. Also, in spite of users varied greatly in size, the device was capable of fitting everyone.

9. CONCLUSIONS While developing the iFeel_IM! system, we attempted to bridge the gap between mediated and face-to-face communications by enabling and enriching the spectrum of senses such as vision and touch along with cognition and inner personal state. In the paper we described the architecture of the iFeel_IM! and the development of novel haptic devices, such as HaptiHeart, HaptiHug, HaptiTickler, HaptiButterfly, HaptiShiver, and HaptiTemper. The emotional brain of our robot, Affect Analysis Model, can sense emotions and intensity with high accuracy. Moreover, AAM is capable of processing the messages written in informal and expressive stile of IM (e.g., “LOL” for “laughing out loudly”, “CUL” for “see you later”, “<3” for love, etc.). Haptic devices were designed with particular emphasis on natural and realistic representation of the physical stimuli, modular expandability, and ergonomic human-friendly design. User can perceive the intensive emotions during online communication, use desirable type of stimuli, comfortably wear and easily detach devices from torso. The significance of our idea to realistically reproduce hugging is in integration of active-haptic device HaptiHug and pseudo-haptic touch simulated by hugging animation. Thus, high immersion into the physical contact of partners while hugging is achieved. The preliminary observation has revealed that developed devices are capable of influencing on our emotional state intensively. Users were captivated by chatting and simultaneously experiencing emotional arousal caused by affective haptic devices. Our primary goal for the future research is to conduct extensive user study on wearable humanoid robot iFeel_IM!. Additional modalities aimed at intensifying affective states will be investigated as well. For example, it is well known that duration of emotional experiences differ for joy, anger, fear, and sadness. Another embodiment of iFeel_IM! aims at sending emotional messages to the partner in real-time, so that he/she can perceive our emotions and feel empathy. The heartbeat pattern of the communicating persons can be recorded in real-time and exchanged through conversation. iFeel_IM! system has great potential to impact on the communication in ‘sociomental’ (rather than ‘virtual’) online environments, that facilitate contact with others and affect the nature of social life in terms of both interpersonal relationships and the character of community. It is well known that our emotional state and our health are linked strongly. Through We believe that integration of Affective haptics and artificial intelligence technologies into online communication system will provide such important channel as sensual and non-verbal connection of partners along with textual and visual information.

10. ACKNOWLEDGMENTS The research is supported in part by the Japan Science and Technology Agency (JST) and Japan Society for the Promotion of Science (JSPS).

11. References [1] Anderson, C.A. 1989. Temperature and aggression:

ubiquitous effects of heat on occurrence of human violence. Psychological Bulletin 106, 1, 74-96.

[2] Anttonen, J., and Surakka, V. 2005. Emotions and heart rate while sitting on a chair. In Proceedings of the ACM Conference on Human Factors in Computing Systems (Portland, USA, April 01 - 07, 2005). CHI '05. ACM Press, New York, 491-499.

[3] Bickley, L.S. 2008. Bates’ Guide to Physical Examination and History Taking. Philadelphia: Lippincott Williams & Wilkins.

[4] Chang, A., O’Modhrain, S., Jacob, R., Gunther, E., and Ishii, H. 2002. ComTouch: design of a vibrotactile communication device. In Proceedings of the ACM Designing Interactive Systems Conference (London, UK, June 25 - 28, 2002). DIS '02. ACM Press, New York, 312-320.

[5] Damasio, A. 2000. The feeling of what happens: body, emotion and the making of consciousness. London: Vintage.

[6] Decaria, M.D., Proctor, S., and Malloy, T.E. 1974. The effect of false heart rate feedback on self reports of anxiety and on actual heart rate. Behavior research & Therapy, 12, 251-253.

[7] DiSalvo, C., Gemperle, F., Forlizzi, J., and Montgomery, E. 2003. The Hug: an exploration of robotic form for intimate communication. In Proceedings of the IEEE Workshop on Robot and Human Interactive Communication (Millbrae, USA, Oct. 31 - Nov. 2, 2003). RO-MAN '03, IEEE Press, New York, 403-408.

[8] Foot H.C., and Chapman, A.J. 1976. The social responsiveness of young children in humorous situation. In Chapman, A.J. & Foot, H.C. (Eds.), Humour and Laugher: Theory, research, and applications. London: Wiley, 187-214.

[9] Haans, A., and Ijsselsteijn, W. I. 2006. Mediated social touch: a review of current research and future directions. Virtual Reality, 9, 149-159.

[10] Haans, A., Nood C., Ijsselsteijn, W.A. 2007. Investigating response similarities between real and mediated social touch: a first test. In Proceedings of the ACM Conference on Human Factors in Computing Systems (San Jose, USA, Apr. 28 - May 3, 2007). CHI '07. ACM Press, New York, ACM Press, 2405-2410.

[11] Harris, C.R., and Christenfeld, N. 1999. Can a machine tickle? Psychonomic Bulletin & Review 6, 3, 504-510.

[12] Hug Shirt. CuteCircuit Company. http://www.cutecircuit.com

[13] James, W. 1884. What is an Emotion? Mind, 9, 188-205. [14] Kandel, E. R., Schwartz, J. H., and Jessel, T. M. 2000.

Principles of Neural Science. New York: McGraw-Hill. [15] Keinonen T. and Hemanus J. Mobile emotional notification

application. 2005. Nokia Co. US Patent No. 6 959 207. [16] Lecuyer, A., Coquillart, S., Kheddar, A., Richard, P., and

Coiffet, P. 2000. Pseudo-haptic feedback: Can isometric input devices simulate force feedback? In Proceedings of the

IEEE Virtual Reality (New Prunswick, USA, March 18 - 22, 2000). VR '00. IEEE Press, New York, 403-408.

[17] LeDoux, J.E. 1996. The emotional brain. New York: Symon & Shuster.

[18] Mathew, D. 2005. vSmileys: imagine emotions through vibration patterns. In Proceedings Alternative Access: Feeling and Games.

[19] Mueller F.F., Vetere F., Gibbs M.R., Kjeldskov J., Pedell S., Howard S. 2005. Hug over a distance. In Proceedings of the ACM Conference on Human Factors in Computing Systems 2005 (Portland, USA, April 01 - 07, 2005). CHI '05. ACM Press, New York, 673-1676.

[20] Neviarouskaya, H. Prendinger, and M. Ishizuka. 2009. Compositionality principle in recognition of fine-grained emotions from text. In Proceedings of the International AAAI Conference on Weblogs and Social Media (San Jose, USA, May 17 - 20, 2009). ICWSM 2009. AAAI Press, Menlo Park, 278-281.

[21] Noe, A. 2004. Action and Perception. Cambridge: MIT Press.

[22] Norman, D.A. 2004. Emotional design. Why we love (or hate) everyday things. New York: Basic Book.

[23] Optic Fiber Tactile Sensor Kinotex. Nitta Corporation. http://www.nitta.co.jp/english/product/mechasen/sensor/kinotex_top.html

[24] Rovers, A.F., and Van Essen H.A. 2004. HIM: a framework for haptic Instant Messaging. In Proceedings of the ACM Conference on Human Factors in Computing Systems 2004 (Vienna, Austria, April 24 - 29, 2004). CHI '04, ACM Press, New York, 1313-1316.

[25] Sato, Y., Sato, K., Sato, M., Fukushima, S., Okano, Y., Matsuo, K., Ooshima, S., Kojima, Y., Matsue, R., Nakata, S., Hashimoto, Y., Kajimoto, H. 2008. Ants in the pants – ticklish tactile display using rotating brushes. In Proceedings of the International Conference on Instrumentation, Control and Information Technology (Tokyo, Japan, August 20 - 22, 2008). SICE '08, 461-466.

[26] Shin, H., Lee, J., Park, J., Kim, Y., Oh, H., and Lee, T. 2007. A tactile emotional interface for Instant Messenger chat. In Proceedings of the International Conference on Human-Computer Interaction (Beijing, China, July 22 - 27, 2007). HCII '07. Springer Press, Heidelberg, 166-175.

[27] Teh J.K.S., Cheok A.D., Peiris R.L., Choi Y., Thuong V., and Lai S. Haggy pajama: a mobile parent and child hugging communication system. 2008. In Proceedings of the International Conference on Interaction Design and Children (Chicago, USA, June 11 - 13, 2008). IDC 2008. ACM Press, New York, 250-257.

[28] Wallbott, H.G., and Scherer, K.R. 1988. How universal and specific is emotional experience? Evidence from 27 countries on five continents. In Scherer, K.R. (Eds.), Facets of Emotion: Recent Research. Hillsdale (N.J.): Lawrence Erlbaum Inc., 31-56.Zajonc, R.B., Murphy, S.T., and Inglehart, M. 1989. Feeling and facial efference: implication of the vascular theory of emotion. Psychological review, 96, 395-416.

KIBITZER: A Wearable System for Eye-Gaze-based Mobile Urban Exploration

Matthias Baldauf FTW

Donau-City-Strasse 1 1220 Vienna, Austria +43-1-5052830-47

[email protected]

Peter Fröhlich FTW


[email protected]

Siegfried Hutter FTW


[email protected]

ABSTRACT Due to the vast amount of available georeferenced information novel techniques to more intuitively and efficiently interact with such content are increasingly required. In this paper, we introduce KIBITZER, a lightweight wearable system that enables the browsing of urban surroundings for annotated digital information. KIBITZER exploits its user’s eye-gaze as natural indicator of attention to identify objects-of-interest and offers speech- and non-speech auditory feedback. Thus, it provides the user with a 6th sense for digital georeferenced information. We present a description of our system’s architecture and the interaction technique and outline experiences from first functional trials.

Categories and Subject Descriptors H.5.2 [Information Interfaces and Presentation]: User Interfaces – Interaction styles

General Terms Experimentation, Human Factors

Keywords Wearable Computing, Mobile Spatial Interaction, Eye-gaze

1. INTRODUCTION Computers are becoming a pervasive part of our everyday life, and they increasingly provide us with information about the ambient environment. Smartphones guide us through unfamiliar areas, revealing information about the surroundings, and helping us share media with others about certain places. While such location-based information has traditionally been accessed with a very limited set of input devices, usually just a keyboard and audio, multimodal interaction paradigms are now emerging that take better advantage of the user’s interactions with space.

The research field of Mobile Spatial Interaction [7] breaks with the conventional paradigm of displaying nearby points-of-interest

(POIs) icons on 2D maps. Instead, MSI research aims to develop new forms of sensing the users’ bodily position in space and to envision new interactions with the surrounding world through gesture and movement.

Key interaction metaphors for recently implemented MSI systems are the ‘magic wand’ (literally pointing the handheld at objects of interest to access further information [16]), the ‘smart lens’ (superimposing digital content directly on top of the recognized real-world object [14]), or the ‘virtual peephole’ (virtual views aligned to the current physical background, e.g. displaying a “window to the past” [15]). These interaction techniques have been successfully evaluated in empirical field studies, and first applications on mass market handhelds have attracted much end-user demand on the market [20].

Metaphors like the magic wand, smart lens and virtual peephole incorporate the handheld as the gateway for selecting physical objects in the user’s surroundings and to view related digital content. However, it may not always be preferable to put all attention on the mobile device, such as when walking along a crowded pedestrian walkway or when having both hands busy.

We present KIBITZER, a gaze-directed MSI system that enables the hands-free exploration of the environment. The system enables users to select nearby spatial objects ‘with the blink of an eye’ and presents related information by using speech- and non-speech auditory output.

In the following section, we provide an overview of relevant previous work in the area of eye-based interaction. We then describe KIBITZER’s system architecture and the realized user interaction technique, and we demonstrate its operation with photos and screenshots. We conclude with experiences from first functional trials and plans for further research.

2. EYE-BASED APPLICATIONS Applications making use of eye movement or eye gaze patterns can be broadly categorized as diagnostic and interactive applications [4].

In diagnostic use cases, an observant’s eye movements are captured to assess her attentional processes over a given stimulus. User tests including eye tracking techniques are a popular method in HCI e.g. to evaluate the usability of user interfaces and improve product design. In the 1940s, pioneering work in this field was done by Fitts et al [6] who were the first ones to use cameras to capture and analyze observants’ eye movements. They

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

Augmented Human Conference, April 2–3, 2010, Megève, France. Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

collected data from pilots during landing to propose a more efficient arrangement and design of the cockpit instruments.

The investigation of interactive eye tracking applications started in the 1980s. In contrast to diagnostic systems where the recorded data mostly is evaluated after the test, eye movements in interactive scenarios are exploited as input parameters for a computer interface in real-time. Bolt [2] first introduced the idea of eye movements acting as a complementary input possibility for computers. First studies investigating the usage of eye movements for common desktop computer tasks were presented by Ware [19] and Jacob [9]. Ware identified eye input as fast technique for object selection; Jacob found it to be effective for additional tasks such as moving windows. Recently, eye gaze as natural input has been revisited in the context of so-called attentive user interfaces [18] that generally consider intuitive user reactions such as gestures and body postures to facilitate human-computer-interaction. Commercial interactive eye tracking applications are currently focused on military use cases and tools for people with severe physical disabilities [11]. A comprehensive survey of eye tracking applications can be found in [4].

Mobile applications in the field of Augmented or Virtual Reality that allow the visual exploration of real or virtual worlds are mainly restricted to the awareness of head movement. E.g. Feiner et al [5] and, more recently, e.g. Kooper et al [10] and Reitmayr et al [13] presented wearable systems that support object selection by gaze only estimated from head pose. Bringing the object-of-interest within the center of a head-worn see-through display would select or trigger another specified action. The exploitation of a user’s detailed eye gaze through suitable trackers is a rarely considered aspect in Augmented Reality. Very recent examples include the integration of eye-tracking in a stationary AR videoconferencing system [1] and a wearable AR system combining a head-mounted display and an eye-tracker to virtually interact with a real-world art gallery [12].

Dependent on the underlying technology, two basic types of eye tracking systems can be distinguished. Systems based on so-called electro-oculography exploit the electrostatic field around a human’s eyes. Electrodes placed next to the observant’s eyes measure this field’s changes as the eyes move. As the eye’s position can only be estimated using this technique, electro-oculography is rather applied for activity recognition than for gaze detection [3].

In contrast, video-based eye tracking approaches make use of one or several cameras recording one or both eyes of a user. By analyzing reflections in the captured eye using computer vision techniques the eye’s position and thus, the eye gaze can be determined. The cameras can either be placed near the object-of-interest (usually a computer display) to remotely record the user’s head in a non-intrusive way or mounted on a headpiece worn by the user. Though this approach’s disadvantage of intrusiveness, head-mounted eye trackers offer a higher accuracy than non-intrusive systems and can also be applied in mobile use cases [11] such as our exploration scenario.

3. SYSTEM SETUP This section introduces the technical setup for the realization of our KIBITZER system. First, we present the used hardware, and

then we explain the involved software components and their functionality.

3.1 Mobile Equipment The core hardware component of our setup is an iView X HED system, a latest generation mobile eye tracker from Sensomotoric Instruments GmbH (SMI). It includes two cameras to record both an eye’s movement and the current scene from the user’s perspective. For best possible stability the equipment is mounted at a bicycle helmet (Figure 1). Via USB the tracker is connected to a laptop computer (worn in a backpack) where the video stream is processed and the current eye-gaze is calculated.

To augment a user’s relative gaze direction with her global position as well as her head’s orientation and tilt, we use a G1 phone powered by Android (Figure 2). This smartphone contains all necessary sensors such as a built-in GPS receiver, a compass and accelerometers. With a custom-made fixation the G1 device is mounted on top of the bicycle helmet (Figure 3).

Figure 1. iView X HED, a mobile eye-gaze-tracker [17]

Figure 2. G1 phone powered by Android

Figure 3. The combined head-mounted device worn during a functional test.

3.2 Software Components Figure 4 gives an overview of the software components of our system architecture and their communication.

The aforementioned eye tracker system comes with a video analyzer application that is installed on the laptop computer. This iView X HED application offers a socket-based API interface via Ethernet to inform other applications about calculated eye data. For our scenario, we implemented a small component that connects to this interface and forwards the fetched gaze position in pixels (with regard to the scene camera’s picture) via Bluetooth.

Our mobile application installed on the attached smartphone receives the gaze position and (based on a prior calibration) is able to convert these pixel values in corresponding horizontal and vertical deviations in degrees with regard to a straightforward gaze. These values are continuously written to a local log file together with the current location, the head’s orientation and tilt. Adding the horizontal gaze deviation to the head’s orientation and, respectively, adding the vertical gaze deviation to the head’s tilt results in the global eye gaze vector. To provide auditory feedback the mobile application makes use of an integrated text-to-speech engine.

Bluetooth sender

Location and orientation detection

Visibility detection

Gaze detection

Text-to-speech engine

Figure 4. Overview of our system’s software components and their communication.

The mobile application may invoke a remote visibility detection service via a 3G network. This service takes the user’s current view into account: By passing a location and an orientation (in our case the global eye gaze vector) to this HTTP service, a list of currently visible POIs in this direction is returned. The engine makes use of a 2.5D block model, i.e. each building in the model is represented by a two-dimensional footprint polygon, which is extruded by a height value. Based on this model, POIs with a clear line-of-sight to the user and POIs located inside visible

buildings can be determined. The resulting list contains the matching POIs’ names and locations as well as the relative angles and distances with regard to the passed user position and orientation. More details about the used visibility detection engine can be found in [16].

3.3 Initial Calibration Before the equipment can be used it needs to be calibrated. This procedure starts with a nine point calibration for the eye-tracker using the iViewX HED application. For this purpose, nine markers must be arranged at a nearby wall in a 3x3 grid, whereas the marker in the center should be placed at the user’s eye height. The calibration points then can be setup in the application via the delivered scene video and mapped point by point to the corresponding gaze direction. The standard procedure is extended with a custom calibration to later map gaze positions in pixels to gaze deviations in degrees. By turning and tilting the head towards the calibration points while now keeping eye gaze straight forward, conversion factors for horizontal and vertical gaze positions can be determined based on the fetched compass and accelerometer data.

4. USER INTERACTION After the calibration process, the presented equipment is ready for outdoor usage. Previously mentioned, our goal is the gaze-sensitive exploration of an urban environment providing the user with a 6th sense for georeferenced information.

When to trigger which suitable action in an eye-gaze-based system is a commonly investigated and discussed issue known as the ‘Midas Touch’ problem. A good solution must not render void the intuitive interaction approach of such an attentive interface by increasing the user’s cognitive load or disturbing her gaze-pattern. At the same time, the unintended invocation of an action must be avoided.

The task of object selection on a computer screen investigated by Jacob [9] might seem related to our scenario of mobile urban exploration where we want to select real-world objects to learn more about annotated POIs. Jacob suggests either to use a keyboard to explicitly execute the selection of a viewed item via a key press or, preferably, apply a dwell time to detect a focused gaze and fire the action thereafter. In Jacob’s experiment, users were provided with visual feedback about the current selection and therefore, were able to easily correct errors.

Due to our mobile scenario, we want to keep the involved equipment as lightweight as possible sparing an additional keyboard or screen. Therefore, we rely on an explicit eye-based action to trigger a query for the currently object. As though the user would memorize the desired object, closing her eyes for two seconds triggers the selection. In technical terms, the spatial query is executed for the last known global gaze direction if the user’s tracked eye could not be detected during the last two seconds. An invocation of the query engine is marked in the log file with a special status flag.

The names of the POIs returned by the visibility detection service are then extracted and fed into the text-to-speech engine for voice output. If a new query is triggered during the output, the text-to-speech engine is interrupted and restarted with the new results. The auditory output is either possible via the mobile’s built-in loudspeakers or attached earphones.

5. TOUR ANALYSIS During the usage of our KIBITZER all sensor values are continuously recorded to a log file. These datasets annotated with corresponding time stamps enable a complete reconstruction of the user’s tour for later analysis.

To efficiently visualize a log file’s content we implemented a converter tool that generates a KML file from a passed log file. KML is a XML-based format for geographic annotations and visualizations with the support of animations. The resulting tour video can be played using Google Earth [8] and shows the user’s orientation and gaze from an exocentric (‘third person’) perspective (Figure 5). The displayed human model is orientated according to the captured compass values; its gaze ray is corrected by the calculated gaze deviations. The invocation of the visibility detection service, i.e. the gaze-based selection of an object, is marked by a different-colored gaze ray.

Figure 5. Screenshot of a KML animation reconstructed from the logged tour data.

Figure 6. Screenshot of the video taken by the helmet-mounted scene camera. The red cross represents the

current eye gaze.

As the scene camera’s video stream can be recorded via the iViewX HED application, the reconstructed tour animation can be compared to the actually captured scene video (Figure 6). The

scene video is overlaid with a red cross representing the user’s current gaze and thus, can be used to evaluate our system’s accuracy. Furthermore, when combined with the visibility detection engine, the tour reconstruction can be used to automatically identify areas of interest or compile further statistics.

6. CONCLUSIONS AND OUTLOOK In this paper, we introduced KIBITZER, a wearable gaze-sensitive system for the exploration of urban surroundings, and presented related work in the field of eye-based applications. Wearing our proposed headpiece, the user’s eye-gaze is analyzed to implicitly scan her visible surroundings for georeferenced digital information. Offering speech-auditory feedback via loudspeakers or earphones, the user is unobtrusively informed about POIs in their current gaze direction. Additionally, we offer tools to reconstruct a user’s recorded tour visualizing her eye-gaze. These animations are not only useful for accuracy tests during development but rather aim at a later automated tour analysis, e.g. to identify areas of interest.

Experiences from first functional tests and reconstructed tour videos showed that the proposed system’s overall accuracy is sufficient for determining POIs in the user’s gaze. However, in some trials the built-in compass was heavily influenced by magnetic fields resulting in wrong POI selections. This problem could be solved by complementing the system with a more robust external compass.

During these tests we observed some minor limitations of the chosen vision-based gaze tracking approach and the blinking interaction. In rare cases, unfavorable reflections caused by direct sunlight prevented a correct detection of the user’s pupil and therefore, interfered the gaze tracking. Obviously, at night the usage of such a vision-based system is not feasible without any artificial light source.

Our proposed research prototype is a first step towards the exploitation of a user’s eye-gaze in mobile urban exploration scenarios and therefore, it is deliberately designed for experimentation. The current system built from off-the-shelf hardware components provides a complete framework to study possible gaze-based interaction techniques. With the future arrival of smart glasses or even intelligent contact lenses, the required equipment is supposed to become more comfortable to wear, if not almost unnoticeable.

Applying the presented system, we will evaluate the usability and effectiveness of eye-gaze-based mobile urban exploration in upcoming user tests. We will set special focus on the acceptance of the currently implemented ‘blinking’ action and the investigation of alternative interaction techniques, respectively. Inspired by ‘mouse-over’ events known from Web sites such as switching an image when moving the mouse cursor over a sensitive area, implicit gaze feedback is conceivable. When a user glances at an object, she might be notified about the availability of annotated digital information by a beep or tactile feedback. The combination of our gaze-based system with a brain-computer-interface to estimate a gaze’s intention and thus, trigger an according action is another promising direction for future research.

7. ACKNOWLEDGMENTS This work has been carried out within the projects WikiVienna and U0, which are financed in parts by Vienna’s WWTF funding program, by the Austrian Government and by the City of Vienna within the competence center program COMET.

8. REFERENCES [1] Barakonyi, I., Prendinger, H., Schmalstieg, D., and Ishizuka,

M. 2007. Cascading Hand and Eye Movement for Augmented Reality Videoconferencing. In Proc. of 3D User Interfaces, 71-78.

[2] Bolt, R.A. 1982. Eyes at the Interface. In Proc. of Human Factors in Computer Systems Conference, 360-362.

[3] Bulling, A., Ward, J.A., Gellersen, H., and Tröster, G. 2009. Eye Movement Analysis for Activity Recognition. In Proc. of the 11th International Conference on Ubiquitous Computing, 41-50.

[4] Duchowski, A.T. 2002. A breadth-first survey of eye tracking applications. In Behavior Research Methods, Instruments, & Computers (BRMIC), 34(4), 455-470.

[5] Feiner, S., MacIntyre, B., Höllerer T., and Webster, A. 1997. A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment. In Personal and Ubiquitous Computing, Vol. 1, No. 4, 208-217.

[6] Fitts, P. M., Jones, R. E., and Milton, J. L. 1950. Eye movements of aircraft pilots during instrument-landing approaches. In Aeronautical Engineering Review 9(2), 24–29.

[7] Fröhlich, P., Simon, R., and Baillie, L. 2009. Mobile Spatial Interaction. Personal and Ubiquitous Computing, Vol. 13, No. 4, 251-253.

[8] Google Earth. http://earth.google.com. Accessed January 7 2010.

[9] Jacob, R.J.K. 1990. What you look at is what you get: eye movement-based interaction techniques. In Proc. of the SIGCHI conference on Human factors in computing systems, 11-18.

[10] Kooper, R., and MacIntyre, B. 2003. Browsing the Real-World Wide Web: Maintaining Awareness of Virtual

Information in an AR Information Space. In International Journal of Human-Computer Interaction, Vol. 16, No. 3, 425-446.

[11] Morimoto, C.H., and Mimica, M.R.M. 2005. Eye gaze tracking techniques for interactive applications. In Computer Vision and Image Understanding, Vol. 98, No. 1, 4-24.

[12] Park, H.M., Lee, S.H., and Choi, J.S. 2008. Wearable Augmented Reality System using Gaze Interaction. In Proc. of the 7th IEEE/ACM international Symposium on Mixed and Augmented Reality, 175-176.

[13] Reitmayr, G., and D. Schmalstieg, D. 2004. Collaborative Augmented Reality for Outdoor Navigation and Information Browsing. In Proc. of Symposium on Location Based Services and TeleCartography, 31-41.

[14] Schmalstieg, D., and Wagner, D. 2007. The World as a User Interface: Augmented Reality for Ubiquitous Computing. In Proc. of Symposium on Location Based Services and TeleCartography, 369-391.

[15] Simon, R. 2006. The Creative Histories Mobile Explorer - Implementing a 3D Multimedia Tourist Guide for Mass-Market Mobile Phones. In Proc. of EVA.

[16] Simon, R., and Fröhlich, P. 2007. A Mobile Application Framework for the Geo-spatial Web. In Proc. of the 16th International World Wide Web Conference, 381-390.

[17] SMI iView X™ HED. http://www.smivision.com/en/eye-gaze-tracking-systems/products/iview-x-hed.html. Accessed January 07 2010.

[18] Vertegaal, R. 2002. Designing Attentive Interfaces. In Proc. of the 2002 Symposium on Eye Tracking Research & Applications, 23-30.

[19] Ware, C., and Mikaelian, H.T. 1987. An evaluation of an eye tracker as a device for computer input. In Proc. of the ACM CHI + GI-87 Human Factors in Computing Systems Conference, 183-188.

[20] Wikitude. http://www.mobilizy.com/wikitude.php. Accessed January 07 2010.

Airwriting Recognition using Wearable Motion Sensors

Christoph Amma, Dirk Gehrig, Tanja SchultzCognitive Systems Lab (CSL), Karlsruhe Institute of Technology, Germany

(christoph.amma,dirk.gehrig,tanja.schultz)@kit.edu

ABSTRACTIn this work we present a wearable input device which en-ables the user to input text into a computer. The text iswritten into the air via character gestures, like using animaginary blackboard. To allow hands-free operation, wedesigned and implemented a data glove, equipped with threegyroscopes and three accelerometers to measure hand mo-tion. Data is sent wirelessly to the computer via Bluetooth.We use HMMs for character recognition and concatenatedcharacter models for word recognition. As features we ap-ply normalized raw sensor signals. Experiments on singlecharacter and word recognition are performed to evaluatethe end-to-end system. On a character database with 10writers, we achieve an average writer-dependent characterrecognition rate of 94.8% and a writer-independent charac-ter recognition rate of 81.9%. Based on a small vocabularyof 652 words, we achieve a single-writer word recognitionrate of 97.5%, a performance we deem is advisable for manyapplications. The final system is integrated into an onlineword recognition demonstration system to showcase its ap-plicability.

Categories and Subject DescriptorsH.5.2 [Information Interfaces and Presentation]: In-put Devices and Strategies; I.5.1 [Computing Method-ologies]: Models—Statistical

General TermsDesign, Algorithms, Human Factors

1. INTRODUCTIONThe field of Human-Computer Interaction has a long tradi-tion of exploring and implementing numerous devices andinterfaces. However, the requirements and conditions forsuch interfaces shifted drastically over the past years due tothe changing demands of our global and mobile information-based society. Today’s electronic devices and digitial assis-tants fit into the smallest pocket, are ubiquitous, always on-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Augmented Human Conference April 2-3, 2010, Megève, France.Copyright c© 2010 ACM 978-1-60558-825-4/10/04 . . . $10.00.

line and accessible. Although everybody savours the smallsize of hand-helds, the operation of these devices becomesmore challenging - only small keys fit on to the device, if atall. This makes complex operations like the input of textvery cumbersome. it requires good eye-sight and hand-eyecoordination, it keeps hands and eyes busy, and is difficultif the device or the person is moving during operation. Fur-ther considering future augmented reality applications withdisplays integrated in glasses, there is an obvious need fornew input devices offering more natural interaction possibil-ities. Future wearable computer systems will loose many oftheir advantages, if the interface would still rely on an extratext input device held in hand.

Fortunately, the technology of sensors is advancing signifi-cantly, allowing for very small, body-worn, wearable sensorsthat foster the design of unobtrusive, mobile, and robust in-terfaces suitable for intuitive wearable computing systems.Traditionally used sensors, which are small and come at veryreasonable costs are accelerometers, gyroscopes and magne-tometers. Based on the sensor signals, pattern classifica-tion techniques can be applied to recognize the performedgestures. Current research on inertial sensor-based gesturerecognition concentrates on the recognition of a small setof very simple predefined single gestures. These gesturescan then be mapped to certain functions to build gesturebased interfaces. Kim et al. [7] use a pen-like accelerometer-based device to recognize single Roman and Hangul charac-ters written in the air.

While predefined gestures are easier to recognize for the ma-chine, it is more burden for the users since they have to learnand memorize these gestures. Therefore, we aim at an user-adaptable approach that does not make any assumptions onthe used gestures. Furthermore, our approach reaches be-yond [7] by introducing a hands-free device to write Romancharacters and even whole words in the air. We developeda data glove instead of a pen, which is less obstrusive thanan additional device. We implemented an HMM-based rec-ognizer for whole words by concatenating character modelsallowing complex text input. For the purpose of this studywe limited the input character set to capital Roman letters.However, since we apply statistical models, our system candeal with any kind of one-handed gestures.

2. RELATED WORKThe field of gesture recognition has been extensively studiedin the past. Two main approaches can be identified, external

Figure 1: Prototypical CSL data glove.

systems and internal systems, with the latter using body-mounted sensors, including hand-held devices. The formertype is traditionally based on non-wearable video based sys-tems and thus is not applicable to our envisioned scenario.

2.1 Sensors and DevicesDifferent sensing techniques are applied to gesture recogni-tion based on body-worn sensors. Brashear et al. [3] use ahybrid system, consisting of a head mounted camera andaccelerometers worn at the wrist for sign language recogni-tion. While such a hybrid system seems promising for signlanguage recognition, the number and positioning of sensors,particularly those positioned at the head, limit the users’ ac-ceptance. Kim et al. [7] use a wireless pen device, equippedwith accelerometers and gyroscopes for airwriting characterrecognition. Their good recognition indicate that airwrit-ing recognition based on accelerometers and gyroscopes isfeasible.

Sensors are often integrated into a device, which has to becarried around and held in hand for operation. Therefore,external devices are more obtrusive. Instead, data gloveshave been proposed for gesture recognition, for example byHofmann et al. [6] and Kim and Chien [8]. Usually, thesegloves contain a magnetic sensor to determine the globalposition through an externally induced magnetic field, pro-hibiting mobile or outdoor applications. These gloves alsoincorporate many sensors delivering information on fingermovement and position, which makes them quite bulky. Al-ternatively, an accelerometer equipped bracelet for gesturerecognition was proposed by Hein et al. [5] and Amft et al.[1], who used accelerometers included in a watch to controlcertain functions of the watch by gestures. In our work, wecombine the advantages of data gloves with the convenienceof small hand-held devices by designing and implementing avery slim and unobstrusive interfaces based on a data glove,as depicted in Figure 1. One should be aware that this isa first prototype and that current technology already allowsfurther miniaturization.

2.2 Single Character ClassificationCho et al. [4] and Kim et al. [7] use a pen-type device forsingle digit and single character recognition using BayesianNetworks. Kim et al. [7] introduce a ligature model basedon Bayesian Networks to recognize pen-up and pen-down

strokes and therefore are able to use traditional handwrit-ing training data for their recognizer. Both works rely on thereconstruction of the 3D trajectory. Since errors accumulatequickly over time in this approach, it will only give reason-able results for short periods of time and is thus difficultto apply to the continuous recognition of words. There hasalso been work on single gesture recognition using a rangeof other classification methods, such as Naive Bayes, Mul-tilayer Perceptrons, Nearest Neighbour classifiers (Rehm etal. [14]), and SVMs (Wu et al. [15]). Oh et al. [11] useFisher Discrimant Analysis to recognize single digits. Thementioned methods might perform well on single gesturerecognition but are not applicable to recognize a continuoussequence of gestures.

2.3 Continuous Gesture RecognitionHidden Markov Models have been applied to gesture andhandwriting recognition, since they can be easily concate-nated to model continuous sequences. A good tutorial in-troduction in the context of speech recognition was writtenby Rabiner [13]. Liang et al. [9] make use of HMMs for con-tinuous sign language recognition, but they use desk basedPolhemus magnetic sensors. Kim and Chien [8] also usedsuch a sensor to recognize complex gestures based on con-catenated HMMs of single strokes. Starner and McGuire [10]use HMMs for continuous mobile american sign languagerecognition with a 40 word vocabulary, modelling each wordby one HMM. Plamondon and Srihari [12] give a surveyon traditional on-line and off-line handwriting recognition,where HMMs are applied with success for the on-line case.

It seems that few attempts have been made towards contin-uous gesture recognition solely with inertial sensors. Fur-thermore, the set of gestures is also normally limited to asmall number of 10 to 20 gestures, which are often chosenand defined for ease of discrimination. In our work, we in-troduce a system capable to model sequences of primitivegestures and evaluate our system with character gestures.

3. AIRWRITING CHALLENGESIn comparison to conventional online handwriting recogni-tion, typically used on tablet PC’s, we have to deal withsome specialities when using inertial sensors for writing inthe air. In traditional online handwriting recognition, the2D trajectory of the written strokes is available and classi-fication is done based on this data. In contrast, we get thelinear acceleration and the angular velocity of the sensor.Figure 2 shows the raw sensor signals received by writingthe character A. The two main differences of our approachfrom traditional handwriting recognition are the absence ofthe trajectory information and the pen-up and pen-downmovements. The missing pen-up and down movements re-sult in a single continuous stroke for the whole writing, i.e.we get a continuous data stream from the sensors which lacksany segmentation information. If a writer produces severalsentences, the result would be one single stroke. This makesthe task more difficult, since pen-up and down movementsautomatically give a segmentation of the data. While this isnot a segmentation of characters it gives useful information,we are missing. Also the motion between consecutive char-acters, which would normally be pen-up, will be representedin the signals we get from the inertial sensors. We face thisproblem by introducing a repositioning model between the

individual characters of words. The experimental results onword recognition show that this modification of the HMMis suitable.

We also do not get the 3D trajectory easily. While it istheoretically possible to reconstruct the trajectory from a 6DOF sensor like the one we use by applying a strapdowninertial navigation algorithm, it is practically a hard taskbecause of sensor drift and noise. A standard strapdownalgorithm integrates the angular rate once to obtain the at-titude of the sensor, then the gravitational acceleration canbe subtracted from the acceleration signals and finally dou-ble integration of the acceleration yields the position. Thistriple integration leads to summation of errors caused bysensor drift and noise and after a few seconds the error inposition is so high that no reconstruction which is close tothe real trajectory is possible. Bang et al. [2] propose an al-gorithm to reconstruct the trajectory of text written in theair, which works well for isolated characters or short words(in sense of writing time). Since our goal is a system, thatallows for arbitrary long input, we avoid this problem by di-rectly working on the sensor signals without estimating the3D trajectory. The experimental results show that this isfeasible in practice.

Gravitation is also a problem, we have to deal with, evenwithout estimating the trajectory. Earth acceleration is al-ways present in the measured acceleration signals and canonly be subtracted when the exact attitude of the sensor isknown. Due to the mentioned difficulties concerning sen-sor drift, gravitational acceleraction is compensated by sub-tracting the signal mean. This approximation equals theassumption, that the sensor attitude is constant over thetime of one recording. The Variance of the signals is alsonormalized to reduce the effect of different writing speeds.

4. DATA GLOVEOur system consists of a thin glove and a wristlet. Theglove holds the sensor and the wristlet holds the controllerboard and the power supply. Figure 1 shows a picture of thesystem. A microcontroller reads the data from the sensorand sends it over a bluetooth link to a computer, where therecognition is performed. The power supply consists of tworechargeable micro cells allowing operation for more than 4hours. As sensor, we use an Analog Devices ADIS163641

Inertial Measurement Unit. This sensor contains three or-thogonal accelerometers for measuring translational accel-eration and three orthogonal gyroscopes for measuring an-gular velocity. The sensor is cubic with an edge length of23 mm. This is due to the 3-dimensional orientation of thegyroscopes, 3-axis accelerometers are available as standardflat integrated curcuits. The measurement range of the ac-celerometers is -5 g to 5 g with a resolution of 1 mg. Thegyroscope measurement range is ±300 deg/s with a resolu-tion of 0.05 deg/s. The sampling rate is 819.2 Hz. The sensordata is read out by a Texas Instruments MSP430F2132 mi-crocontroller with 8KB Flash memory and 512 Byte RAM,operating at up to 16 MHz. The microcontroller sends thedata over an Amber Wireless AMB2300 class 2 BluetoothModule to a laptop.

1www.analog.com

0 0.5 1−1500

−1000

−500

0

500

Time in s

Acc

eler

atio

n in

mg

0 0.5 1−150

−100

−50

0

50

100

150

Time in s

Ang

ular

vel

ocity

in d

eg/s

gx

az

gz

gy

ay

ax

Figure 2: Raw sensor signals for the character A.The upper plot shows accelerometer data, the lowerplot shows gyroscope data.

5. DATA ACQUISITIONWe have collected character data from 10 subjects, five maleand five female. One of the subjects was left-handed, theother nine were right-handers. Every subject wrote 25 timesthe alphabet resulting in 650 characters per writer and 6500characters in the dataset in total. In the first recordingsessions, the alphabet was recorded in alphabetical ordering,in the latter sessions, every character was recorded in thecontext of every other. For every subject, data was recordedin one recording session. Word data was collected from onetest person which contributed 652 English words to the dataset. The words were chosen at random from the list of the“1000 most frequent words in English“2 from the corpus ofthe University of Leipzig. Every word was recorded once.Table 1 gives a summary of the recorded data.

The subjects were sitting on a chair and were asked to writein front of them like writing on an imaginary blackboard.The writing hand was approximately at eye level. Writerswere asked to write characters between 10 to 20 cm in heightand to keep the wrist fixed while writing. Furthermore, thesubjects were told to write in place, which means, the hori-zontal position should be approximately the same for everycharacter written.

All writing in our experiments was done in capital block let-ters. Since every writer has its own writing style even forblock letters, this led to several writing variants for someletters. These variants are referred to as allographs in hand-writing recognition. For example, the letter E has five mainallographs even in our small data set. The subjects wereasked to be as consistent as possible in the way they write

2http://www.wortschatz.uni-leipzig.de/html/wliste/html

DataAcquisition

SensorValues

Preprocessing NormalizedFeatures

Decoding (HMM +Language Model)

RecognizedCharacter or Word

“A”

“TO”“T” “O”Repos

Figure 3: System Overview: Motion data is gathered by the sensors on the glove. The raw data is sent to acomputer and preprocessed resulting in a feature vector. The feature vectors are classified using an HMMdecoder in combination with a language model. The recognized character or word is the output of the system.

Writer Char. Samples TimeCharacter RecordingsA 650 555359 11m 18sB 650 964185 19m 37sC 650 733603 14m 56sD 650 924252 18m 48sE 650 717878 14m 36sF 650 682580 13m 53sG 650 594320 12m 05sH 650 1122828 22m 51sI 650 604499 12m 18sJ 650 779136 15m 51sWord RecordingsA 3724 2332410 47m 27s

Table 1: Overview of the collected data recordings.The samples column gives the total amount of sensordata samples and the time column the correspondingrecording time.

the letters, i.e. stick to one writing variant. When makingmistakes, the subjects were able to correct themselves, theywere not observed all the time. For that purpose, the record-ing software allows repeating of characters, which were notwritten or segmented correctly.

The segmentation of the recordings was done manually bythe subject while recording. The subject had to press a key,before starting and after finishing a character or word. Thiskey press was performed with the other hand than the oneused for writing. All subjects were told to have their writinghand in start position for the first stroke of the individualcharacter or word and to hold it still, when pressing thestart key. After pressing the start key, the character or wordwas written in the air. When the end point of the writingmotion was reached, the subjects should hold their handstill and press the stop key. This kind of segmentation isnot as accurate as using a video stream recorded in parallelbut avoids manual postprocessing. The motion between twocharacters was also recorded for initialization and trainingof repositioning HMMs used for word modeling. Table 1shows large differences in recording time, which is probablynot only caused by differences in writing speed, but also bydifferences in segmentation quality.

6. RECOGNITION SYSTEM OVERVIEWOur system consists of a glove equipped with sensors to mea-sure linear acceleration and angular velocity of hand motion.Sensor data is sent to a computer via Bluetooth. A user canperform 3-dimensional gestures, which are recognized by anHMM classifier based on the deliverd signals. Basically anykind of gesture sequence can be modeled. We use charactergestures and whole words written in the air to show the po-tential of our system. Figure 3 shows the basic functionalblocks in a diagram.

6.1 ModelingThe HMM modeling was done using the Janus RecognitionToolkit, developed at the Karlsruhe Institute of Technol-ogy (KIT) and Carnegie Mellon University in Pittsburgh.As features, we used the accelerometer and gyroscope data,which was normalized by mean and variance resulting in asix-dimensional feature vector per sample. The samplingrate was always set to 819.2 Hz and no filtering was appliedto the signals, since the signals are already very smooth.We have made experiments with a moving average filter,which showed no impact. A 0-gram language model wasused which allows the recognition of single instances of char-acters or words out of a defined vocabulary.

6.2 Character Recognition SystemFor all characters, HMMs with the same topology were used.We always used left-right models with self transitions. Hof-mann et al. [6] use HMMs for accelerometer based gesturerecognition with a data glove. They evaluate the perfor-mance of ergodic and left-right models and can find no signif-icant difference for the task of gesture recognition. The out-put probability function is modeled by Gaussian Mixtures.The number of states and the number of Gaussians per stateswas varied on a per experiment basis. That means for oneexperiment, every character model has the same number ofstates and Gaussians. Before training, the Gaussians areinitialized by k-means clustering of the training data. Alltraining samples are linearly split into as many sample se-quences of equal length as the HMM model has states. Bythat all samples are initially assigned to exactly one HMMstate. All samples corresponding to the same state are thencollected and clustered in as many clusters as the numberof Gaussians in the Gaussian Mixture. Mean and Varianceof the clusters are then used as initial values for the GMM.Afterwards the models are trained to maximize their likeli-hood on the training data using Viterbi training. Different

T Orepos

Figure 4: Context model for the word “TO”. It con-sists of the independent models for the graphemes“T” and “O” with 7 states and the 2 state model forthe repositioning in between.

writing variants of one and the same character are modeledby one HMM. There is always one model for one character.

6.3 Word RecognitionWord recognition was performed by concatenating charac-ter HMMs to form word models. For a given vocabulary,word models were built from the existing character models.This enables the use of an arbitrary vocabulary. Normally,one has to move the hand from the endpoint of a characterto the start point of the next character. To model theserepositioning movements, a repositioning HMM is insertedbetween the character models. We use one left-right HMMto model all repositioning movements. Figure 4 shows an ex-ample word model. The initialization and training of thesemodels is the same as for character models. Training datafor the repositioning models was collected along with thedata for the character models.

6.4 Evaluation MethodOn the collected data, two main kinds of experiments havebeen performed, writer-dependent recognition and writer-independent recognition. For the experiments on writer-dependent recognition, data from one writer was taken anddivided into a training, development and test set. On thedevelopment set, the parameters, namely number of HMMstates and number of Gaussians, were optimized. All writer-dependent results given are from the final evaluation on thetest set. Writer-independent recognition was evaluated us-ing a leave-one-out cross validation. We took the data fromone writer as test set and the data from the other writersas training data. This is done once for every writer. Againdifferent parameters for the number of states and Gaussianswere used and the parameter set was chosen, which gives thebest average recognition rate over all writers. No indepen-dent test set was used here, due to the small data collection.The performance of all systems is given by the recognitionrate, which is defined as the percentage of correctly classifiedreferences out of all references. In case of character recogni-tion, a reference is one character, in case of word recognitiona reference is one word.

7. EXPERIMENTS AND RESULTS7.1 Character Recognition

Writer-Dependent. We evaluated writer-dependent recog-nition performance by testing the recognizer on each writersdata individually. For each writer, the data was dividedrandomly in a training (40%), development (30%) and test(30%) set. The character frequencies were balanced over thesets. A number of 6, 8, 10 and 12 HMM states and 1 to 5

A B C D E F G H I J Avg85

90

95

100

Writer

Rec

ogni

tion

Rat

e in

%

6 States, 1 GMM 10 States, 2 GMMs 10 States, 5 GMMs

Figure 5: Results of the writer-dependent charac-ter recognition for 3 different HMM topologies andnumber of Gaussians. The worst and best perform-ing systems are shown by black and white bars. Toavoid the risk of overspecification, the system rep-resented by the grey bar was used for evaluation.

Gaussians per mixture was investigated. In Figure 5 theresults of three exemplary systems with different numberof states and Gaussians are shown. The worst performingsystem with 6 states and 1 Gaussian per state reached anaverage recognition rate of 93.7% and the best performingsystem with 10 states and 5 Gaussians per state reached95.3%. Since the data set is small, overspecification of mod-els is an issue. To avoid the possibility of overspecificication,we took a well performing system with few Gaussians in to-tal. Our final evaluation of the writer-dependent system wasdone with a 10 state model with 2 Gaussians per state. Theperformance of this system is represented by the grey barin Figure 5. Table 2 shows the breakdown of results for allwriters in the column named WD. An average recognitionrate of 94.8% was reached for the writer-dependent case.

Writer-Independent. As written in section 6.4, a leave-one-out cross validation was performed on the collected char-acter data set to investigate the performance of a writer-independent system. The cross validation was done on alldata and on the right-hander data only. When models aretrained on solely right-hander data and tested on the left-hander data, results are very poor. We repeated all exper-iments without the left-hander data to exclude this effect.We used HMMs with 8, 10, 12 and 15 states and 2 to 6 Gaus-sians per state. Parameters were optimized on the data ofthe left out writer in each cross validation fold. Figure 6shows the recognition rate dependent on the total numberof Gaussians of the system for the right-handers only test. Ifmore than one parameter combination resulted in the sametotal amount of Gaussians, the range of achieved recognitionrates is illustrated by a vertical bar. One can clearly see theperformance gain by increasing the number of Gaussians.We used models with 15 states and 6 Gaussians per statefor the writer-independent evaluation. Table 2 shows thebreakdown of the individual results for the right-hander onlyevaluation and the evaluation with all ten writers. A recog-nition rate of 81.9% was reached for the writer-independent

Writer Recognition RateWD WI(RH) WI(ALL)

A (RH) 97.9 75.4 73.5B (RH) 98.5 83.5 84.3C (LH) 96.4 46.8D (RH) 98.0 90.5 89.8E (RH) 89.2 86.9 85.4F (RH) 91.3 77.2 77.1G (RH) 94.9 80.6 78.8H (RH) 91.8 72.9 73.8I (RH) 96.4 86.0 87.8J (RH) 93.9 84.3 84.6Average 94.8 81.9 78.2

Table 2: Results of character recognition exper-iments, writer-dependent and writer-independent.The second column of the table shows the writer-dependent results. The third column shows thewriter-independent results when leaving out theleft-hander, the fourth column shows the writer-independent results for all writers.

case on the right-hander data.

It is not surprising that the recognition performance dropswhen testing on the left-handed person, since the writingstyle of this person differs in a fundamental way from thewriting of the right-handed test persons. All horizontalstrokes are written in opposite direction and all circular let-ters are also written in opposite direction.

The main problem of the right-hander only systems are am-biguities in characters and writing variants of different writ-ers. Figure 7 shows the confusion matrix for the cross val-idation on the right-handed writers. First of all, there areproblems with similar graphemes like the pairs (P, D) and(X, Y ). The similarities are obvious. In case of P and D,the only difference is the length of the second stroke (thearc). In case of X and Y, depending on how people writeit, the only difference is the length of the first stroke (fromupper left to lower right). One should notice that the writ-ers did not get any kind of visual feedback on their writing.This probably leads to even more ambiguous characters thanwhen writing with visual feedback. The pair (N, W ) is alsosubject to frequent misclassification. The reason gets obvi-ous when considering the way the character N was writtenby some test persons. Four of the nine right-handers startedthe N in the upper left corner. They moved down to thelower left corner and up to the upper left corner again be-fore writing the diagonal stroke. An N written this way hasthe same stroke sequence than a W . Figure 8 illustrates thisambiguity.

We see that most classification errors arise from the differ-ences in writing style between the individual writers. Thetest persons do write characters in different ways even un-der the constraint of block letters. Some of the variantshave a very similar motion sequence to variants of differentcharacters observed by other writers. This leads to moreambiguities than in the writer-dependent case. On the sin-gle character level, it is hard to solve this problem. Butwhen switching to recognition of whole words, the context

10 20 30 40 50 60 70 80 90 10080

80.5

81

81.5

82

82.5

Total Number of GMMs

Rec

ogni

tion

Rat

e in

%

Figure 6: Results of the writer-independent charac-ter recognition on right-handers dependent on thetotal amount of Gaussians per HMM. If differentparameter combinations had the same total amountof Gaussians, the performance range is shown as ver-tical bar. A polynomial fitted on the data illustratesthe tendency.

hypothesis

refe

renc

e

A B CD E F GH I J K L MNOP QR S T U VWX Y ZABCDEFGHIJKLMNOPQRSTUVWXYZ

fals

e ne

gativ

es

ABCDEFGHIJKLMNOPQRSTUVWXYZ

false positivesA B CD E F GH I J K L MNOP QR S T U VWX Y Z

Figure 7: Accumulated confusion matrix for thecross validation of the right-handed writers. Theconfusion matrices of the tests on each writer weresummed together.

(a) (b) (c)

Figure 8: Writing variants of N (a),(b) compared tothe stroke sequence of W (c). The allographs (b)and (c) have the same basic stroke sequence.

Data Features States GMM Rec. RateRH axyz 15 6 76.8RH axyz,gxyz 15 6 81.9

Table 3: Comparison of sensor configurationson writer-independent character recognition. Ac-celerometer and gyroscope features are comparedto accelerometer only features.

information should help dealing with these ambiguities.

We also investigated the effect of using only accelerometerdata as features. We would be able to keep the sensor flatand cheaper if we do not use gyroscopes. We compared theresults of using accelerometers (axyz) and gyroscopes (gxyz)to the results of using only accelerometers. Table 3 shows theresults of this comparison. We see, that accelerometer onlyperformance is worse than with the full sensor setup, butdepending on the application, this might still be acceptable.

7.2 Word RecognitionThe word data was recorded from test person A, who alsocontributed to the character data set. To build a writer-dependent word recognizer, we took the character data fromthis test person to initialize the character models. We usedthe trained models from the experiments on writer-dependentcharacter recognition, i.e. HMMs with 10 states and 2 Gaus-sians per state. The repositioning models were trained onthe repositioning data from the character recording session.We used models with 7 states and 2 Gaussians per state. Werandomly split the word data set into two sets (Set A andSet B) of equal size, taking either of the two once as train-ing and once as test set. The vocabulary always containedall 652 words from the recording, that means the recognizerhad models for all these words and chose one out of thisset as hypothesis. The words were all recorded only onceand there were no duplicates in the set. That means, noword in the test set appeared in the training set before. Weevaluated the recognizer without any additional training onword recordings and after 10 training iterations on the wordtraining set. Table 4 shows the results of the experiments.Word training boosts performance significantly. The reasonis probably two-fold. First, the repositioning models fromthe character experiments are supposingly not very good,since they were trained with the movements occuring in theartificial pauses between consecutive character recordings.Second, the manual segmentation in the character recording

Train Test Word Training RecognitionIterations Rate

Set A Set B 0 74.2Set B Set A 0 71.8

Average 0 73.0Set A Set B 10 97.5Set B Set A 10 97.5

Average 10 97.5

Table 4: Results of the writer-dependent wordrecognition. The character models were alreadytrained on character data. The number of trainingiterations in the table corresponds only to furthertraining on words.

is inaccurate and by that also character models can profitfrom word training. We can see that word recognition per-formance is in the same range than character recognitionperformance for this writer. Typical misclassifications occurby confusing words that barely differ, like “as” and “was” or“job” and “jobs”.

8. DEMONSTRATORFinally we built an online demo system to showcase the ap-plicability of the recognizer. The trained models from theword recognition experiments were taken and a small vo-cabulary was used. The system can recognize the wordsnecessary to form the sentences “Welcome to the CSL” and“This is our Airwriting System”. The demonstration systemuses a laptop with a 2.4 Ghz Intel Core 2 Duo processorand 2 GB RAM for the data recording and the recognition.The system runs stable and few recognition errors occur.A demonstration video of the system can be seen on ourwebsite 3.

9. CONCLUSION AND FUTURE WORKWe designed and implemented a wearable computer devicefor gesture based text input. The device consist of a slimdata glove with inertial sensors and the ability to trans-fer sensor data wireless. This enables hands-free opera-tion. We made experiments on character and word recog-nition. For the task of writer-dependent character recog-nition, we reached an average recognition result of 94.8%.For the writer-independent case, a recognition rate of 81.9%was reached. We identified ambiguities between characterscaused by individual writing style as main reason for the sig-nificantly lower recognition rate in the writer-independentcase. It will be hard to solve these problems on the singlecharacter level. Since our main goal is not single charac-ter recognition, but recognition of text, context informationshould help to solve most of the ambiguities. We imple-mented a word recognizer using the existing character mod-els from character recognition and reached an average writer-dependent recognition rate of 97.5% for a single test personon a vocabulary of 652 words. We bypass the problems aris-ing from sensor inaccuracies when performing a trajectoryreconstruction by working directly on the acceleration andangular velocity values. We show that the introduction ofrepositioning models between characters is a suitable way to

3http://csl.ira.uka.de/fileadmin/Demo-Videos/airwriting -chris.mpg

deal with the lack of pen-up and down information. We im-plemented a demo system and showed that online operationis feasible.

We plan to extend the word recognition system to be writer-independent by building a word database with different writ-ers. Then we will analyze, if context information reallysolves most of the ambiguities arising in writer-independentcharacter recognition. We also plan to use a more complexlanguage model, which will enable us to recognize whole sen-tences.

We will further miniaturize the device by reducing powerconsumption and with this the size of batteries. We will alsofurther investigate the ability to abandon the gyroscopes.

10. ACKNOWLEDGMENTSWe would like to thank Michael Mende for developing thecircuit layout of the electronic components for us and Wolf-gang Rihm for the soldering of the components.

11. REFERENCES[1] O. Amft, R. Amstutz, Smailagic, A., D. Siewiorek,

and G. Troster. Gesture-controlled user input tocomplete questionnaires on wrist-worn watches. InHuman-Computer Interaction. Novel Interaction

Methods and Techniques, volume 5611 of Lecture

Notes in Computer Science, pages 131–140. SpringerBerlin / Heidelberg, 2009.

[2] W.-C. Bang, W. Chang, K.-H. Kang, E.-S. Choi,A. Potanin, and D.-Y. Kim. Self-contained spatialinput device for wearable computers. Seventh IEEE

International Symposium on Wearable Computers,

2003. Proceedings., pages 26–34, 2003.

[3] H. Brashear, T. Starner, P. Lukowicz, and H. Junker.Using multiple sensors for mobile sign languagerecognition. Seventh IEEE International Symposium

on Wearable Computers, 2003. Proceedings., pages45–52, 2003.

[4] S.-J. Cho and J. Kim. Bayesian network modeling ofstrokes and their relationships for on-line handwritingrecognition. Pattern Recognition, 37(2):253–264, 2004.

[5] A. Hein, A. Hoffmeyer, and T. Kirste. Utilizing anaccelerometric bracelet for ubiquitous gesture-basedinteraction. In Universal Access in Human-Computer

Interaction. Intelligent and Ubiquitous Interaction

Environments, volume 5615 of Lecture Notes in

Computer Science, pages 519–527. Springer Berlin /Heidelberg, 2009.

[6] F. Hofmann, P. Heyer, and G. Hommel. Velocityprofile based recognition of dynamic gestures withdiscrete hidden markov models. In Gesture and Sign

Language in Human-Computer Interaction, volume1371 of Lecture Notes in Computer Science, pages81–95. Springer Berlin / Heidelberg, 1998.

[7] D. Kim, H. Choi, and J. Kim. 3d space handwritingrecognition with ligature model. In Ubiquitous

Computing Systems, volume 4239/2006 of Lecture


[8] I.-C. Kim and S.-I. Chien. Analysis of 3d handtrajectory gestures using stroke-based composite

hidden markov models. Applied Intelligence,15(2):131–143, 2001.

[9] R.-H. Liang and M. Ouhyoung. A real-time continuousgesture recognition system for sign language. In Third

IEEE International Conference on Automatic Face

and Gesture Recognition, 1998. Proceedings., pages558–567, 1998.

[10] R. McGuire, J. Hernandez-Rebollar, T. Starner,V. Henderson, H. Brashear, and D. Ross. Towards aone-way american sign language translator. In Sixth

IEEE International Conference on Automatic Face

and Gesture Recognition, 2004. Proceedings., pages620–625, 2004.

[11] J. Oh, S.-J. Cho, W.-C. Bang, W. Chang, E. Choi,J. Yang, J. Cho, and D. Kim. Inertial sensor basedrecognition of 3-d character gestures with an ensembleclassifiers. Ninth International Workshop on Frontiers

in Handwriting Recognition, 2004. IWFHR-9 2004.,pages 112–117, 2004.

[12] R. Plamondon and S. Srihari. Online and off-linehandwriting recognition: a comprehensive survey.IEEE Transactions on Pattern Analysis and Machine

Intelligence, 22(1):63–84, 2000.

[13] L. Rabiner. A tutorial on hidden markov models andselected applications in speech recognition. InProceedings of the IEEE, pages 257–286, 1989.

[14] M. Rehm, N. Bee, and E. Andre. Wave like anegyptian: accelerometer based gesture recognition forculture specific interactions. In BCS-HCI ’08:

Proceedings of the 22nd British HCI Group Annual

Conference on HCI 2008, pages 13–22, Swinton, UK,UK, 2008. British Computer Society.

[15] J. Wu, G. Pan, D. Zhang, G. Qi, and S. Li. Gesturerecognition with a 3-d accelerometer. In Ubiquitous

Intelligence and Computing, volume 5585 of Lecture


AUGMENTING THE DRIVER’S VIEW WITH REALTIME SAFETY-RELATED INFORMATION

Peter Fröhlich, Raimund Schatz, Peter Leitner, Matthias Baldauf

Telecommunications Research Center (FTW) Donaucity-Str. 1, 1220 Vienna, Austria

froehlich, schatz, [email protected]

Stephan Mantler

VRVis Research Center Donaucity-Str. 1, 1220 Vienna, Austria

[email protected]

ABSTRACT

In the last couple of years, in-vehicle information systems have advanced in terms of design and technical sophistica-tion. This trend manifests itself in the current evolution of navigation devices towards advanced 3D visualizations as well as real-time telematics services. We present important constituents for the design space of realistic visualizations in the car and introduce realization potentials in advanced vehicle-to-infrastructure application scenarios. To evaluate this design space, we conducted a driving simulator study, in which the in-car HMI was systematically manipulated with regard to its representation of the outside world. The results show that in the context of safety-related applica-tions, realistic views provide higher perceived safety than with traditional visualization styles, despite their higher visual complexity. We also found that the more complex the safety recommendation the HMI has to communicate, the more drivers perceive a realistic visualization as a valu-able support. In a comparative inquiry after the experiment, we found that egocentric and bird’s eye perspectives are preferred to top-down perspectives for safety-related in-car safety information systems.

Author Keywords

User studies, Telematics, Realistic Visualization

ACM Classification Keywords

H.5.1. Information Interfaces and Presentation: Multimedia Information Systems—Artificial, augmented, and virtual realities; H.5.2. Information Interfaces and Presentation: User Interfaces—GUI

INTRODUCTION

In-vehicle information systems, such as personal navigation devices, built-in driver assistance units and Smartphones, have become standard equipment in today’s cars - and their capabilities are quickly evolving. The most obvious ad-vances are related to the visual presentation at the in-vehicle human-machine interface (HMI). On the consumer mass market, we see a clear trend towards increasingly rea-listic representations of the driver’s outside world, includ-ing textured 3D renderings of highway junctions, road de-tails, mountains, and buildings [14]. Arrows and icons are exactly overlaid over the virtual representation of the driv-er’s field of view to aid in navigation tasks. This develop-ment towards realistic visualization is further strengthened by the advent of augmented reality navigation systems on market-available handheld devices (e.g. [12]).

Up to now, such realistic visualizations are mostly applied to navigation. However, with emerging co-operative ve-hicle-to-infrastructure or vehicle-to-vehicle communica-tions technology [4,16,20,18], they will also become rele-vant for delivering more advanced safety-related services. For example, drivers could be notified about sudden inci-dents and provided with recommendations on how to react accordingly. In this context, the major challenge is the fact that the driver actions required can be fairly unusual and unexpected, and thus might not be adequately understood or implemented. For example, drivers may be asked to stop before a tunnel on the emergency lane due to an accident ahead.

In this application context, realistic visualization could represent both merit and demerit: information attached to a quasi-realistic mapping of the outside reality might be rec-ognized more quickly than with today’s schematic visuali-zations, but on the other hand the wealth of details might as well hamper the identification of task-relevant information. It should be clear that the effects of realistic visualizations on usability and user experience must be fully understood before recommending their use in millions of cars. In order to achieve this goal, systematic and reflective user-oriented research is needed.



Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

In this paper, we present an experimental study to evaluate the influence of realistic visualizations on perceived driving safety and satisfaction. We were interested in finding out whether realistic visualizations provide an added value in terms of safety and user experience, or whether they are just “eye-candy” that could even endanger the driver and other traffic participants. We first specify the basic elements and characteristics of realistic visualizations. Departing from this, we formulate a set of research questions that are and describe the method of an experimental study to address them. We finally provide a detailed results description and provide suggestions for further research.

SPECIFYING REALISTIC VISUALIZATIONS

The extent of realism of an HMI’s real-world representation can be described by a number of constituents (see Figure 1): map representation, viewing perspective, environmental objects, and augmentation. The possible properties of these constituents are now briefly described, and then prototypi-cal combinations are presented.

Constituents of realistic visualizations

Map representation: Digital 2D maps have long been the standard way to present the outside world on the HMI. Meanwhile, however, the global availability of environmen-tal models from map providers like Navteq Inc. has moti-vated the integration of 3D spatial representations also in portable navigation devices. Starting from basic 2.5D build-ing representations and schematic landscape models, we witness a gradual increase in fidelity towards fully textured scenes with complete buildings.

Viewing perspective: Most HMIs provide dynamic map displays that automatically align themselves towards the driver’s surroundings, based on orientation information derived by a compass or by the sequence of GPS co-ordinates. Another common feature is a “bird’s eye” view on the road situation, with the camera being positioned slightly behind and above the virtual vehicle. When 3D maps with detailed objects are displayed, often also a fully egocentric view is provided, which matches the driver’s field-of-view. This view then can be of use for the display of complex junctions, for example.

Environmental objects: Due to current developments to-wards realistic visualizations, an increasing amount of envi-ronmental objects becomes displayed to the user. This in-cludes navigation-related objects, such as turn indications on the road and direction signs. High fidelity representa-tions of surrounding objects are also used to provide loca-tion-based search and purchase, such as when a user is looking for the next gas station or shopping mall. Further-more, significant architectural landmarks are shown in more detail.

Augmented information: To provide the actual recommen-dations on the HMI, the above-described scene representa-tions are overlaid with virtual objects and elements that indirectly refer or point to aspects of the environment. In

current systems, such additional virtual information typical-ly relates to route indications, congestion information, as well as information on points-of-interest. Typical means of augmentation are color coded lines or arrows, icons, text and numbers. Visualization approaches reach from colored overlays over the road to virtual “follow-me cars” implicitly indicating the speed and direction (compare [13,8]).

Constituents Characteristics

Map representation Schematic 2D Untextured 3D Textured 3D

Viewing perspective Top-Down Bird’s Eye Egocentric

Envir.objects None Selected All

Augmentation Text Icons Arrows and lines

Match with real view of outside world

Figure 1: Constituents of Realistic Visualizations

Prototypical visualization styles

Each one of the aforementioned constituents is necessary to describe the extent of realism of an outside world represen-tation on the HMI. However, the constituents also must be seen in combination, as their properties exert mutual influ-ence on each other. The following combinations can be regarded as prototypical variants within the design space of realistic visualizations:

Conventional view: In most of the navigation systems cur-rently available, a schematic 2D map of the outside world is presented from a bird’s eye perspective, with occasional display of a few important points of interest in the environ-ment. The recommended route is visualized by a schematic overlay over the map, and for recommendations and warn-ings icons are georeferenced on the map.

Realistic view: An idealized “reality view” of an in-car HMI would be a quasi-realistic 3D map representation, dy-namically presented from the driver’s own viewing pers-pective, including all environmental objects. This visualiza-tion would be augmented by accurately spatially-referenced overlay lines and arrows, as well as by 3D-spatially refe-renced icons. Realistic views may also be realized by aug-mented reality, provided that accuracy problems with align-ing the virtual guidance information with the real scene are solved.

Conventional and realistic interleaved: most contemporary navigation systems also feature dynamic switching between different visualization modes, depending on current context. For example, in non-critical situations, a conventional bird’s eye view is presented as default (with varying accu-racy of environmental objects). However, when approach-ing critical points (such as highway junctions), the device might switch to a realistic view, in order to avoid ambi-guous situations and reduce the potential for misinterpreta-tion and navigation errors by the driver.

RESEARCH ISSUES

The key purpose of realistic visualizations is to reduce the amount of abstract symbolization. This way, map use is reduced to “looking rather than reading” [15]. In the car, realistic views could potentially make visual processing easier and enable better concentration on the driving task. Inferring from earlier results in cognitive psychology [5], one might argue that the more realistic a virtual representa-tion (of the road situation) is presented, the easier a map-ping to the real situation based on perceptual features is possible. Especially in complex driving situations, this could result in increased driving safety. Furthermore, a higher realism of visualizations may promise higher usage satisfaction and appeal to customers than standard visuali-zations.

On the other hand, also problematic aspects of realistic vi-sualizations in cars need to be taken into account. First, it may take more time to identify task-relevant information in realistic displays, which would limit a faster mapping be-tween virtual and real environment. This may lead to se-rious restrictions and poor compliance with international car safety standards, such as the ‘European Statement of Principles’ [6]. Furthermore, it is not clear which features of realistic views really help the user to match with the real road situation. They could as well just be “eye candy”: nice to look at, but without any major safety benefit.

The general challenge in this regard is therefore to identify the safety impact of an increased realism of visualizations in selected realtime safety-related traffic telematics scena-rios. When designing the HMI for in-vehicle safey informa-tion systems, user interface for such systems, a basic ques-tion could be whether or not the visualization capabilities of today’s in-car information systems should be exploited.

Recommendations provided in these scenarios vary in their level of urgency. When driving along a prescribed route without any incidences, the information by the HMI must be monitored from time to time, but also in this case urgent reactions are not necessary. However, when the system cal-culates a detour (e.g., due to a congestion), the driver needs to be notified and given detailed instructions on how to change the directions. Recommendations get more urgent when a user is asked to use a certain lane on the road, due to temporary roadwork.

Unfortunately, existing research studies on the effect of visual presentations in the car may not fully apply, as these are about textual, iconic and simple spatial representations. Realistic representations have not yet been subject to rigor-ous examination in the open research community. While there are some approaches on the use of augmented head-up displays, especially the use of reality views on head-down displays are only beginning to be researched (compare [11]).

A generic approach to evaluate the added value of realistic visualizations is to compare the prototypical extreme va-riants of visualizing the real-world (as described in the pre-vious section) with regard to their support for the driver. Based on these considerations, the following research ques-tions have been formulated:

1. To what extent does any visualization of the real-world support drivers while following safety-related HMI recommendations, as compared to no real-world visualization?

2. To what extent does a realistic visualization sup-port drivers while following safety-related HMI recommendations, as compared to a conventional visualization?

3. To what extent does an interleaved presentation of a conventional visualization (in non-critical situa-tions) and a realistic visualization (in critical situa-tions) support drivers while following safety-related HMI recommendations, as compared to the continuous display of a realistic visualization?

4. When considered in isolation, how are the consti-tuents of realistic visualizations (map representa-tion and perspective) likely to support the driver?

5. Does the urgency of the safety scenario influence the extent to which a realistic visualization can support a driver in following an HMI recommen-dation?

METHOD We conducted a driving simulator study with potential fu-ture users of advanced traffic telematics systems.

Participants

28 participants, 16 male and 12 female, took part in the study. Their mean age was 32.7, ranging from 18 to 59 years. 70% were frequent drivers. 60 % of the users owned a navigation device. As remuneration, each subject received a voucher for a consumer electronics store.

Simulation environment

A simulation instead of a field environment has been cho-sen, because the investigated scenarios would be harmful for the involved drivers and impracticable with the current-ly installed telematics infrastructure. A number of simula-tion environments for driving have been developed, many of them dedicated for the purpose of in-car HMI evaluation. The fidelity of these simulators varies strongly, ranging from highly advanced moving-base simulators involving physical motion to single computer screens with game con-troller settings [3,19]. However, to the best of our know-ledge, not even the most advanced prototypes provide dedi-cated simulation features with regard to realistic HMI vi-sualizations.

2,8 km

0h01‘

1:09NICHT IN TUNNEL EINFAHREN!

FOLGEN SIE DEM PFEIL!

ACHTUNG BRAND IN TUNNEL 2,8 km

0h01‘



ACHTUNG BRAND IN TUNNEL 2,8 km

0h01‘



ACHTUNG BRAND IN TUNNEL

HMI DeviceWindscreen Simulation

Cockpit controls

(Pedals, Steering Wheel)

Scene Representation Overlays

(e.g. Arrows)

Texts, Icons,

Controls

3D Visualization

and Rendering

IVIS

Presentation

Driving Simulator

Presentation

Scenarios

+

Figure 2: Realistic visualization rendering environment

To overcome this shortcoming, we have developed a versa-tile simulation environment that employs highly detailed geospatial models of current existing and future highways (see Figure 2). These models were originally created for construction planning and for the visualization of construc-tion alternatives to facilitate public discussion.

As such, the models may guarantee a higher validity than in usual tests with abstracted highway simulations, due to a higher degree of user familiarity with the “look-and-feel” of a country’s (here: Austria) road infrastructure.

Both the “outside reality” (windscreen simulation) and the HMI display are rendered with the same rendering engine and based on the same spatial model. This architecture enables systematic and fine-grained variations of scene re-presentations on the HMI display. Both the windscreen and the HMI simulations were rendered in realtime at 25 fps.

Our laboratory simulator was running on a Windows PC with a powerful graphics adaptor. Users were sitting on a driving seat and in front of a dashboard, both taken from a real car. They were operating a steering wheel, gas and break pedals (the clutch was not used; automatic transmis-sion was assumed). The windshield view was displayed by a large 42” TFT screen, covering about 75 degrees of par-ticipants’ field of view. Our setup also follows the guide-lines for the placement of personal navigation devices [7]: the in-car information system was modeled by a second 8” TFT screen (landscape format) mounted to the lower left side of the windscreen, next to the simulation car’s left A-pillar.

The screen layout of the HMI was designed to be consistent with contemporary in-car navigation systems (compare ), involving a large ‘map area’ to represent the outside world, with an information area below that displays textual and iconic recommendations, as well as the current speed. At the very bottom, navigation elements were displayed, which were not active, as their functionality was not needed for the experiment.

Application scenario Urgency Possibilities for realistic visualization

Navigation: Driving on the highway and following the instructions of the navigation system. Not far from a highway exit, a new route is recommended, requiring the driver to react and leave the highway.

Low Realistic representations of complex junctions, highlighting of dynamic route changes, e.g. integrated into today’s navigation HMI styles

Lane utilization: Driving on the highway and following the instructions of the HMI. Suddenly, the system warns the driver of an accident ahead and instructs the driver to use the right (or left) lane.

Medium Mark lane utilization information directly on the scene representation, with an overlayed route projection

Urgent incident: Driving on the highway and following the instructions of the navigation system. Suddenly, the system warns the driver of an accident behind the curve and instructs the driver to stop on the emergency lane at a certain position.

High Highlight where to drive or stop in urgent cas-es, with a an overlaid route projection and an arrow indicating the destination.

Table 1: Typical application scenarios for safety-related traffic telematics services, their urgency and related oppor-tunities for realistic visualization.

Experimental application scenarios

The test users were exposed to three safety-related applica-tion scenarios, as specified in Table 1: navigation with un-expected route change, lane utilization, and urgent incident warning. The dramaturgical design of these scenarios fol-lowed a three-phase structure: the initial phase, the critical moment, and the final phase.

In the initial phase, users were driving for about 1 km along the highway, following the routing instructions of their in-car information system. Then, when entering a predefined zone, a warning was presented to the user, consisting of a short audio signal, a text message and an icon (see Figure 2). The first line of the text message recommends an action to the driver, together with an indication of distance. The second line provides information on the cause for the given recommendation.

The critical phase was between the point of the warning reception and the point at which the action requested in the respective scenario (the respective turn, lane selection, or emergency stop) should have been performed at the latest.

The final phase mostly served as a way to let users natural-ly finish their driving task. For example, in the lane change scenarios, the driver passed the partly blocked road section and was then told about the scenario end.

Experimental visualization styles

Four visualization variants were specified: ‘none’ (as a con-trol condition), ‘conventional’, ‘realistic’, and ‘interleaved’. Each visualization style was then realized for the three ap-plication scenarios, resulting in 12 different combinations. Figure 3 illustrates the realization of the conventional and the realistic view for navigation, lane change, and urgent incident. In the ‘none’ variant, the map area was filled with grey color. In the interleaved variant, the conventional view was in the initial and the final phase, and the realistic view in the critical phase.

Procedure and measures

The overall duration of test (from the participant’s entering and leaving the test room) was approximately two hours. A test assistant was present to conduct the interview, to pro-vide task instructions, and to note specific observations made during the experiment. Each individual test consisted of an introduction phase, in which the test persons were briefed about the goals and procedure of the test, and data on demographics and previous experiences was gathered. Then, the participants were enabled to familiarize them-selves with the driving simulator and with the HMI. To minimize a potential habituation effect, it was assured that the users were informed about and had actively used each visualization and each application scenario. The subsequent phases of the study will now be described in detail.

Safety Scenario Conventional HMI View Realistic HMI View

Navigation: route following with un-expected route change.

Lange utilization: use a certain lane because of an accident ahead.

Urgent incident: stop at a certain lane at a certain position.

Figure 3: HMI Screenshots from the IVIS simulation for different application scenarios

Experimental part

The two independent factors of the experimental part were visualization and safety scenario. Each participant was driv-ing 12 conditions, the product of 4 of scenario types and 3 of visualization variants. (This way, participants encoun-tered every possible combination of visualization and sce-nario types). In order to avoid order effects, the sequence of conditions was varied systematically. At the start of each condition, the car was “parked” at the emergency lane of a highway. The participant was instructed to drive along the highway and to follow the instructions on the HMI as accu-rately as possible.

In the critical phase of each driving condition, the experi-menter assessed task completion. Task completion was giv-en if the subjects generally followed the system instructions (taking the right exit, selecting the right lane, and emerging stop on the right lane). Furthermore, the test facilitator noted incidents that occurred during the driving situation.

To capture the immediate driving- and HMI-related impres-sions, the participants filled out a questionnaire after each of the 12 conditions. The first question aimed at under-standing the general support perceived in the driving situa-tion. The two subsequent questions were designed to under-stand the visualization’s support for identifying the driving-task relevant information (a potential problem area of de-tailed realistic visualizations) and its support for finding matches between the road situation and the HMI display (a potential advantage of realistic visualizations).

Final interview

The final interview aimed at gathering the participants’ overall reflections of the driving situations experienced in the different conditions. The first two questions directly addressed the potential strengths by asking: “Did realistic visualizations support you in finding accordances between the road situation and the HMI display?”, and the weak-nesses “Did realistic visualizations deter you from identify-ing the task-relevant details in the necessary time span?”

Due to the realistic nature of the test, the 12 visualization variants tested represent specific prototypical combinations of constituents. In order to also obtain a rough understand-ing of the impact of the constituents of realistic visualiza-tions in isolation, a systematic comparison was performed, based on an illustrated questionnaire. Due to their impor-tance, ‘map representation’ and ‘viewing perspective’ were selected as the constituents of interest in the interview. Re-garding the ‘viewing perspective’, the users were shown three clusters of screenshots of 2D and 3D views in naviga-tion, lane change and urgent incident warning scenarios, one cluster only including top-down, the other only bird’s eye, and the third only egocentric perspective. The partici-pants were then asked to provide a ranking on the three different perspectives, with regard to their assumed support in the driving situations. The same principle was applied for ‘map representation’.

RESULTS

In this section, the results from the experimental part, the post-experimental inquiry, and the comparative inquiry are described. The statistical analysis was based on the data-from 28 participants. Mean differences were calculated with non-parametrical techniques for dependent samples (Fried-man and Wilcoxon tests). In all figures, the error bars represent 95% confidence intervals. Throughout the meas-ures used in the study we did not find age- or gender-specific differences.

Experimental part Task completion

Our results are characterized by a very high task completion ratio across all test conditions: 99.4% of the navigation, lane utilization and urgent stop recommendations were gen-erally followed. We found no significant differences be-tween the different visualization styles

Post-condition Questionnaire

Figure 4 presents an overview of participants’ mean ratings of the four different visualization styles on three scales: perceived general support in the respective driving situa-tion, support for identifying the relevant details and match-ing with the outside real world. On all three scales, partici-pants rated those visualizations without a real-world repre-sentation worse than all others. Participants consistently judged the realistic view as more supportive than the con-ventional view (all differences significant, p < .05). On none of the three scales, any difference could be found be-tween the realistic and the interleaved visualizations.

Figure 4: Mean post-condition ratings on the visualization styles, with regard to perceived general support in the driving situation, the support for identifying relevant details and for

matching the virtual representation with the real-world

Figure 5 again shows the perceived overview support in the driving situation, but here separated by the three safety sce-narios. The ratings are mostly consistent throughout all safety scenarios. A notable exception was observed when looking at the difference between the conventional and the realistic view: this was perceived as significantly lower rated in the urgent incident and lane utilization scenarios, but not in the navigation scenario (p<.001, p<.004, p=082). When directly comparing the rating values for the conven-tional visualizations between the different scenarios, the conventional visualization was rated better in the navigation than in the urgent incident scenario (p<.01). The mean rat-ings in the lane utilization scenario also tended to be lower, but the difference did not reach significance (p = .065).

Figure 5: Mean post-condition ratings on the visualization styles, with regard to perceived general support in the driving

situation, separated by the three safety scenarios.

Observations

The main observations of incidents that had been noted during the test conditions were as follows:

No visualization: When being confronted with on-screen navigation instructions, drivers did not encounter notable problems. In the two other scenarios, subjects often ap-peared to be confused about how they should behave cor-rectly. They were unsure about where exactly to change lanes or where to stop (but as indicated above, the vast ma-jority stopped at the right lane). Several users also got noti-ceably excited after receiving a warning and very attentive-ly looked onto the road situation, to look for the announced incident.

Conventional: During navigation, no notable problems were observed. However, in the other two scenarios many users were unsure about where to stop or which lane to take. This was obviously due to the rather schematic visualization on the 2D map.

Realistic: In the realistic view conditions, we noticed hat users tried to follow the indicated arrow as closely as possi-ble. In the urgent incident scenario, this attitude sometimes resulted in driving significantly slower to exactly stop at the indicated location. However, this behavior was mostly ob-served the first and second time a realistic view was used.

Interleaved: The switch from conventional view to the rea-listic view was noticed well by the drivers. In general, the observations made in the critical moment were similar to the ones made for the realistic visualization.

Participant impressions

The participants‘ comments provided after using the visua-lizations were as follows:

No visualization: The vast majority of users stated that without a real-word visualization it was difficult to follow the lane utilization and urgent incident recommendations on the HMI. They were basically regarded as a standard fea-ture for every form of navigation devices.

A few participants stated that in principle it could suffice to provide safety warnings without a real-word representation, but that in this case a combination with audio output would be necessary. Furthermore, they wished the icon placed at a more prominent place on the screen (interestingly, many participants only took notice of the icon in the no-visualization condition).

Conventional: The majority of participants complained about the experienced difficulties in interpreting the over-layed lines and icons over the schematic 2D map, when following utilization and emergency stop recommendations. Furthermore, users of latest navigation systems criticized the relatively low number of displayed details on the map and the lack of a car position item. What was often positive-ly valued was the good foresight provided by the bird’s eye perspective.

Realistic: Many participants stated that they felt safe when using the realistic visualization. A very often mentioned reason was that the “1:1” match with the outside world im-proved orientation. They would have liked to see even more spatially-referenced annotations, such a blocking icon di-rectly placed on the respective lane. The display of many details was not seen as distractive from the relevant infor-mation. The few critical remarks were related to less fore-sight, as compared to the conventional view.

Interleaved: Participants provided similar comments with regard to the interleaved as to the realistic view. The switch was not seen as an added value by the participants. Many stated that they would have preferred a continuously dis-played realistic view.

Final Interview The participants widely stated that realistic visualizations had enabled them to find a match between the HMI and the real road situation (mean rating of 16.11 on a 20 point scale, SD = 3.8). Similarly, many participants stated that realistic visualizations had not hindered them in finding the relevant details on the screen display (mean rating of 5.5 on a 20 point rating scale, SD = 4.4).

Comparative inquiry

Figure 6 shows the ranking results from the comparative inquiry on the perspectives top-down, bird’s eye, and ego-centric, with regard to their assumed support in the driving situations. Overall, the top-down perspective was rated sig-nificantly lower than the other perspectives (for both p<.001). However, the ratings for bird’s eye and egocentric perspective did not differ significantly from each other, the navigation scenario again differed from the other two sce-narios: here the bird’s eye view was preferred to the ego-centric perspective (Z=-2.05, p<.05).

Figure 6: Mean interview rankings on perspectives, with re-gard to perceived general support in the driving situation,

separated by the three scenarios.

The comparative inquiry on the map representation re-vealed a strong preference of 3D over 2D (77.4% vs. 22.6%; Z=-3.49, p<.001). Again, the navigation differed from lane utilization and urgent incident: here a difference between 3D and 2D could not be found.

CONCLUSIONS

In the following, the results are summarized with regard to the research questions:

Q1: Real-world visualization in general (baseline)

The results suggest that an HMI is perceived to support a driver better in following safety-related recommendations if it displays a real-world visualization, as compared to a pure textual and iconic message. A map appears to be regarded as a standard HMI feature, and it helps to better orientate oneself. The added value of such a real-world representa-tion is consistently supported by user ratings and com-ments. On other hand, our task completion results show that the pure display of text and an icon obviously suffices to correctly follow a recommendation, at least in low-complexity driving situations.

Q2: Realistic vs. conventional visualization

We found that realistic visualizations is perceived as an added value when presenting safety-related recommenda-tions on the HMI, as compared to conventional visualiza-tions. This is a result that was not easily predictable: in principle, the many ‘irrelevant’ details shown in realistic visualizations could as well have been assumed to be dis-turbing. Also we found that realistic views do not decrease task completion, at least in simple scenarios.

Q3: Interleaving conventional and realistic visualization

Switching between a conventional visualization (shown in non-critical situations) and a realistic visualization (shown in critical situations) does not provide an added benefit, as compared to the continuous display of a realistic visualiza-tion.

Q4: Constituents of realistic visualization

Regarding the main constituents of realistic visualizations, we found that, when considered in isolation, 3D represent-tations are preferred to schematic 2D representations on the HMI. Regarding the viewing perspective, the top-down alternative appears to be not well suited for in-vehicle safe-ty information systems. This is not only based on the results comparative inquiry results, but also by frequent comments throughout the test conditions.

Q5: Influence of safety scenarios

Throughout the study, we found that drivers felt even more supported by realistic visualizations when they had to fol-low urgent and non-standard instructions in the urgent inci-dent and lane utilization scenarios. While drivers in prin-ciple followed the general instructions correctly, they often felt insecure when choosing the right lane or place to stop.

DISCUSSION

The experiment presented in this paper is the first compre-hensive evaluation of the suitability of different visualiza-tion styles and their constituents for safety-related in-car information applications. The goal was to overcome the current scarcity of prescriptive knowledge on this important and safety-relevant topic.

Our simulator study results show that realistic HMI visuali-zation styles have a significant positive impact on the user experience. In comparison to other visualization styles, rea-listic views provided added value in terms of driver support and perceived safety, beyond a purely aesthetic function as visual enhancement or “eye candy”. These utilitarian bene-fits materialized particularly in more acute safety-critical scenarios which required effective and timely action by the driver. Furthermore, we did not find any evidence for nega-tive impact of realistic views on participants, e.g. in terms of diminished task-performance, distractions by visual clut-ter or reduced safety. Our findings may thus challenge con-ventional recommendations which postulate the simplifica-tion and reduction of visual HMIs designs [6]. In the light of our results, the application of realistic views in safety contexts should be considered again on a broader level. We therefore suggest further systematic research on the merits and demerits of realistic visualizations for in-vehicle navi-gation and safety applications.

Our results also show that compared to traditional naviga-tion, safety scenarios have different properties, and conse-quently different visualization requirements: in the naviga-tion scenario, users saw no additional benefit of realistic views over conventional, schematic ones. However, with rising urgency of the scenarios, participants found realistic views to be significantly more useful. This shows not only that reality views provide tangible benefits for the driver, but also that safety-related HMI represents an application class distinct from pure navigation, requiring dedicated user experience research.

Our study participants were only exposed to relatively sim-ple environments (highway) and tasks (such as stopping at the emergency lane). This may explain the observed insen-sitivity of users’ (near to perfect) task completion rate to visualization style. Thus, our results should not be genera-lized towards more challenging high complexity scenarios. Under high strain and cognitive load, users might change preferences and perform better with other or even without HMI visualizations. Future studies should extend and vali-date the design space towards such higher complexity de-mands.

In this study, we were deliberately interested in understand-ing the effects of certain prototypical extreme variants (no visualization, conventional, realistic and interleaved views). Obviously, further visualization variants are possible in this context. Most importantly, we want to stress the fact that these three styles represent idealized variants highly suita-ble for experimental testing, but which in practice are rather

encountered as downgraded or simplified implementations. For example, visualizations currently marketed as “reality views” actually still have many aspects of schematic repre-sentations: in many cases they do not display the current situation, but only display 3D templates or 2D images of prototypical junctions. To advance towards safe and satis-factory realistic visualizations in the car, the results clearly encourage the scientific advancement and understanding of the design space for realistic visualizations.

ACKNOWLEDGMENTS

This work has been carried out within the projects REALSAFE and U0, which are financed in parts by ASFiNAG AG, Kapsch TrafficCom AG, nast consulting, the Austrian Government and by the City of Vienna within the competence center program COMET.

REFERENCES

1. Allen, R. W., Cook, M. L., Rosenthal, T. J. (2007). Ap-plication of driving simulation to road safety. Special Is-sue in Advances in Transportation Studies 2007.

2. Böhm, M., Fuchs, S., Pfliegl, R. (2009). Driver Beha-vior and User Acceptance of Cooperative Systems based on Infrastructure-to-Vehicle Communication. Proc. TRB 88TH Annual Meeting.

3. Burnett, G. (2008): “Designing and Evaluating In-Car User Interfaces”. In: J. Lumsden (Ed), Handbook of Re-search on User Interface Design and Evaluation for Mo-bile Technology. Idea Group Inc (IGI), 2008

4. COOPERS project: http://www.coopers-ip.eu/ 5. Crampton, J. (1992) A cognitive analysis of wayfinding

expertise. Cartographica 29 3: 46-65 6. European Commission. Commission Recommendation

of 22 December 2006 on safe and efficient in-vehicle in-formation and communication systems: Update of the European Statement of Principles (ESOP) on human machine interface. Commission document C (2006) 7125 final, Brussels.

7. Janssen, W. (2007). Proposal for common methodolo-gies for analysing driver behavior. EU-FP6 project HUMANIST. Deliverable 3.2.

8. Levy, M. Dascalu, S., Harris, FC. ARS VEHO: Aug-mented Reality System for VEHicle Operation. Proc. Computers and Their Applications, 2005.

9. Martens, M.H., Oudenhuijzen, A.J.K., Janssen, W.H., and Hoedemaeker, M. (2006). Expert evaluation of the TomTom-device:location, use and default settings. TNO memorandum. TNO-DV3 2006 M048.

10. McDonald, M., Piao, J., Fisher, G., Kölbl, R., Selhofer, A., Dannenberg, S., Adams, C., Richter, T., Leonid, E., Bernhard, N. (2007). Summary Report on Safety Stan-dards and Indicators to Improve the Safety on Roads, Report D5-2100. COOPERS project.

11. Medenica, Z., Palinko, O., Kun, O., and Paek, T. (2009). Exploring In-Car Augmented Reality Navigation Aids: A Pilot Study. EA Ubicomp.

12. Mobilizy: http://www.mobilizy.com/drive 13. Narzt, W., Pomberger, G., Ferscha, A., Kolb, D., Mül-

ler, R., Wieghardt, J., Hörtner, H., Lindinger, C (2006).: Augmented reality navigation systems. Universal Ac-cess in the Information Society 4(3): 177-187.

14. Navigon: www.navigon.com 15. Patterson, T. (2002). Getting Real: Reflecting on the

New Look of National Park Service Maps. Proc. Moun-tain Cartography Workshop of the International Carto-graphic Association; www.mountaincartography.org/mt_hood/pdfs/patterson1.pdf.

16. REALSAFE project: https://portal.ftw.at/projects/all/realsafe/

17. Ruddle, R.A., Payne, S. J., Jones, D. M. (1997). Navi-gating buildings in “desk-top” virtual environments: Experimental investigations using extended navigational experience. Journal of Experimental Psychology: Ap-plied 3 2: 143-159.

18. TomTom HD Traffic: www.tomtom.com 19. Wang, Y., Zhang, W., Wu, S., and Guo, Y. (2007). Si-

mulators for Driving Safety Study – A Literature Re-view. In: R. Shumaker (Ed.): Virtual Reality, Proc. HCII 2007, LNCS 4563, pp. 584–593, 2007.

20. Vehicle Infrastructure Integration (VII) initiative: http://www.vehicle-infrastructure.org

An Experimental Augmented Reality Platform for Assisted Maritime Navigation

Olivier Hugues MaxSea – ESTIA Recherche

Bidart France +33 5 59 41 70 96

[email protected]

Jean-Marc Cieutat ESTIA Recherche

Bidart, France +33 5 59 43 84 75

[email protected]

Pascal Guitton University Bordeaux 1 (LaBRI) & INRIA

Bordeaux, France +33 5 40 00 69 18

[email protected]

ABSTRACT This paper deals with integrating a vision system with an efficient thermal camera and a classical one in maritime navigation software based on a virtual environment (VE). We then present an exploratory field of augmented reality (AR) in situations of mobility and the different applications linked to work at sea provided by adding this functionality. This work was carried out thanks to a CIFRE agreement within the company MaxSea Int.

Categories and Subject Descriptors H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems — Artificial, augmented and virtual realities; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Virtual reality.

General Terms Experimentation, Human Factor, Security.

Keywords Augmented Reality, Mixed Environment, Image Processing, Human Factor, Combining exteroceptive data.

1. INTRODUCTION The continuous progress of new technologies has led to a proliferation of increasingly smart and powerful portable devices. The capabilities of devices on board a ship now enable crews to be offered a processing quality and volume of information until now unrivalled. In a hostile environment such as the sea, users need a relevant flow of information. Computer assisted vessel management is therefore increasingly widespread and digitalisation is an inescapable development. The three main aims are as follows:

1. Improved safety (property, environment and people) 2. Increased gains from productivity (fishing, etc) 3. The representations required for environmental control

(orientation, location and direction)

These aims have led maritime software publishers to develop increasingly sophisticated platforms, offering very rich virtual environments and real time information updates. There are many companies on the embedded maritime navigation software market. They can be separated into two categories. The first part includes those, which develop applications enabling embedded sensors to be taken advantage of (radar, depth-finder, GPS, etc.), such as Rose Point [4], publisher of Coastal Explorer software (Figure 3) and MaxSea International [14], publisher of the MaxSea TimeZero software (Figure 2). Other companies offer hardware platforms in addition to their software applications, like Furuno [10] (Figure 4) and Garmin [11] (Figure 1).

Figure 1. Garmin Figure 2. MaxSea

Figure 3. Coastal Explorer Figure 4. Furuno

These environments enable navigation to be greatly improved by only showing the necessary information, eg. by combining satellites photos of the earth and nautical charts like PhotoFusion in Figure 5 proposed by MaxSea [14].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Augmented Human Conference, April 2–3, 2010, Megève, France. Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00. Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00.

Figure 5. PhotoFusion: mixture of satellite photos and

nautical charts (MaxSea) [14]

There are many different types of data whose relevance depends of the context in which they are used. Furthermore, the loss of reference points due to bad weather also has to be taken into account when weather conditions deteriorate (mist, fog, rough sea, etc). Emotional reactions are even more intense in extreme conditions whether at sea or in the mountains.

Firstly, we describe how we incorporated a vision system into maritime navigation software and the various ways of remotely controlling the camera. Consequently, we are able to increase the video flow using information from the assisted maritime navigation application. We used work on the LookSea project by “Technology System, Inc” [20] as a basis and did not propose a system with a Head Mounted Display (HDM). The proposed augmented reality functionalities open up new exploratory fields for GPS like handling virtual entities in real life.

On the other hand, the concept of augmented reality itself is not only purely technological: it is also based on aspects linked to human perception, which also depends on the genetic and cultural heritage of individuals [6]. Its is therefore necessary to take into account the human factor and stress caused by difficult sea conditions which are likely to increase the risk of accidents [12] and adapt to both users and context of use [18, 7].

This paper presents one of the platform's major applications, at the crossroads between various different technologies, ie. following targets.

2. VIRTUAL ENVIRONMENT The aim of virtual reality is to provide a person (or people) with a sensory-motor and cognitive activity in a digitally created artificial world, which may be imaginary, symbolic or a simulation of certain aspects of the real world [16]. Moreover, in [15], in 1994 Paul Milgram defined the concept of Mixed Reality (Figure 6) which provides a continuum linking the real world (RW) to Virtual reality (VR) with Augmented reality (AR), and Augmented Virtuality (AV).

Mixed Reality

Real Augmented Augmented Virtual Environment Reality Virtuality Environment

(RE) (AR) (AV) (VE) Figure 6. Mixed Reality Continuum [15]

In our context, users on board vessels find themselves alternatively in three of the four situations of Mixed Reality:

1. The Real World in which they are naturally immersed. 2. Augmented Reality by adding useful “field specific

information” since adding our detailed vision system later on.

3. Virtual Environment adapted to the use explained hereafter.

2.1 The navigation tool The main navigation tool centralizes the data produced by all the embedded hardware (radar, depth-sounder, GPS, etc) and combines them with nautical, coastal or river chart databases.

Figure 7. Information flow

The navigation tool informs the user of the state of the environment thanks to a visualization, which combines this different information according to the user's commands via the Human Computer Interface (Figure 7). Although based on 2D nautical charts, current software very often proposes 3D views in order to increase realism and thus optimizes the orientation required by the user. In this tool, we commonly find the following objects, directly derived from the field of navigation:

1. WayPoints (WP): Object representing a buoy signalling a specific geographical position

2. Route: Succession of points that the user needs to plan a route

3. Trace: Succession of points where the vessel has already sailed

4. Targets: Two major families of targets: ARPA and AIS

o ARPA: Automatic Radar Plotting Aids (moving or not) from an eco-radar.

o AIS: Automatic Identification System: System for automatically exchanging messages between vessels.

o Man Overboard.

All these objects are present in navigation software and are displayed on charts. In Figure 8, we can see a Vessel (red), two WayPoints (yellow with a black star) and a Route (red) added to the chart's display. These objects, drawn in vectorially, can be moved by the user (except the boat whose location depends on the GPS position).

Figure 8. View of the Virtual Environment

3. PILOTING THE VISION SYSTEM We propose in a single and unique application, two distinct, but complementary presentations (AR and VE) for the user.

We placed a motorized video camera on a vessel. This camera is motorized on two axes. It has a 360° Azimuth rotation axis and an elevation rotation axis of approximately 90°. It includes a dual axis gyroscope to compensate for the boat's movements caused by floating. This camera also has the particularity of having classical black and white vision mode (day-light/lowlight) making it possible to see during daylight and with little exposure and an ultra sensitive thermal vision mode in the infrared medium (wavelength 2.5 - 25 µm). The video flow is incrusted in the software's main screen as can be seen in Figures 14 and 15.

The motorized camera can also be piloted in four different ways. The first consists of using explicit commands. When the cursor is placed in the video flow window a menu appears enabling the user to act directly on the camera’s axes as well as the zoom.

The second possibility is implicit piloting, which does not require the camera's degree of freedom to be directly handled. From the chart's contextual menu (or any object drawn on the screen) it is possible to ask the camera to point in the direction of the object or

area in question, as illustrated in Figure 9, by simply clicking. We have called this functionality a “Clicked Reality” after the “Clickable Reality” highlighted by Laurence Nigay in [13].

It is possible to follow a target (whether moving or not). We shall detail this functionality in Section 5. Once the user has chosen the target, the camera's orientation can be locked on this target from a contextual menu in the virtual environment (Figure 9) so that it does not allow the target to leave its field of vision. Technically this locking works by updating the camera's orientation in real time according to the target's position as well as the boat's position and orientation.

Figure 9. Left : Radar + Boat + WP. Right : Contextual Menu

The third camera-piloting mode is based on using WayPoints, which represent a position in the real environment. WayPoints are created by users, who can modify their position (amongst other things) as they please. If the user decides to lock a WayPoint type target with the camera, it will then be possible to pilot it by moving the WayPoint in the virtual environment.

Finally, the fourth possibility for piloting the camera is the automatic supervision mode illustrated in Figure 10 where an object floating on the surface of the water is visible thanks to the camera's thermal mode. This information, once processed, may trigger an alarm in the navigation platform as the SAFRAN Unidentified Floating Object Module Detection [24].

Figure 10. Supervision enabling obstacles to be highlighted

4. EXAMPLE OF ENRICHMENT We propose using the video flow to enrich information from the assisted maritime navigation application's virtual environment. This is an augmented reality system with video monitor (indirect vision), where the virtual world is augmented by the video flow (fifth category of taxonomy introduced by Milgram in 1994 [15]). Technology System's LookSea project [20] underlines the fact that the current state of direct vision technology is not compatible with

use at sea especially because of the difficulty caused and the reduction in the field of vision. It is however worth observing that the camera is not fitted to the visualizations device, but placed somewhere on the vessel since the camera's position with regard to the GPS antenna is known. As we are in a mobile context, the increased video flow faces certain difficulties [23] in calculating the camera's exposure. It is impossible to equip all the real world elements with artificial markers [21, 19, 3]. We must therefore implement a calculation for the camera’s exposure using a "markerless" system. Such techniques use existing natural characteristics in the real scene such as corners, contours and line segments [22, 5, 9] invariant to certain transformations. We have opted for integrating a georeferenced video flow in a virtual environment where all the elements are themselves georeferenced as in [17, 1].

Using Fuchs & Moreau's functional taxonomy [16], we shall show in four points how our assisted maritime navigation augmented reality platform relates to admit concepts and how it proposes new exploratory fields.

4.1 Documented Reality Like Fuchs & Moreau's Documented Reality functionality, our video flow can be enriched with information identifying what is visible from the camera, but without alignment between the real and virtual worlds (Figure 11). We can see that this functionality does not respect the third property of augmented reality systems proposed by Azuma [2]. In Figure 12 we illustrate tide information where the height is represented by filling a small “capsule” and the tide's movement represented by a directional arrow.

Figure 11. Simple non-recalled synthetic information:

Documented scene

Figure 12. Capsule showing the height of the tide and its

dynamics

4.2 Reality with Augmented Comprehension (or Visibility) This involves incrusting semantic information such as the function or referencing real objects on the real scene's images [8]. We use the georeferenced data of objects to inform users of their position like in Figure 13, for example, where we can see the information showing where port entrances are located with an onboard view. In this functionality, it is possible to take into account the thermal part of our vision system whose flow is also augmented, as presented above in Figure 10.

Figure 13. Recalled synthetic information enables the scene to

be reed more effectively

4.3 Combining Real & Virtual This functionality represents virtual objects added to the real scene, or replacing real objects by virtual objects like 3D representations of the coast or seabed. In addition to the problem of aligning two worlds we are also faced with the problem of masking the real world with the virtual world or vice-versa. This problem of concealing certain elements can be partially solved thanks to image segmentation techniques (thresholding, cutting into regions, etc), enabling the sea and the sky to be dissociated from other elements in the real scene. Given field specific hypotheses, priority is given to adding virtual elements.

To align the two worlds, we use an inertial unit located on the vessel that takes into account the boat's roll, pitch and yaw. This unit has a three-axis accelerometer, a three-axis gyroscope and two distant GPS receivers whose phase difference enables the vessel's angular degrees of freedom to be calculated. The boat's movements are thus compensated by directly applying these angular displacements to the 3D environment virtual camera.

4.4 Virtualized Reality We go from one environment to another. On the one hand, our platform consists of a VE fully representing the real environment (charts, weather, sea-bed, coasts, nearby vessels, etc), which can be used to substitute reality. On the other hand, we also have the possibility of increasing the video flow by virtual vectorial objects, which have no physical reality (WayPoints, Routes, Trace, Buoys, etc), like in Figure 15.

4.5 Visualization Modes Our platform's design enables the user to get two visualization modes. The first visualization mode integrates a thumbnail image of the augmented video flow in the virtual environment as in Figure 14. The second visualization mode, as presented in Figure 15 enables the augmented video flow to be visualized in the main part of the screen as well as a thumbnail image representing the virtual environment. We also allow users to modify the video flow transparency value. This enables the virtual environment's chart to show through the video flow. In Figure 16, the top image's transparency value has been changed so as to show the chart.

Entrance to the marina

Figure 14. VE augmented video flow thumbnail image

Figure 15. VE thumbnail image in the augmented video flow

Figure 16. Combining charts and video flows

5. CONCLUSION In a context of major developments in Human Computer Interface (HCI) in mobile systems, we explore the possibilities offered by creating a new mixed environment further to integrating in a single application, a rich virtual environment and an augmented reality environment. We try to satisfy the user's request that varies according to sailing conditions. The functionalities provided by augmented reality must therefore differ according to people and weather conditions, hence the need to provide contextual information.

6. FUTURE WORK Given the exploratory nature of this platform, we consider several fields of work important. These can de divided into two categories. The first refers to the technology and the second relates to the human factor.

From the technological point of view, aligning the real world and virtual world remains a challenge, which the boat's movements do not facilitate. The precision of GPS data and recognizing shapes in image analysis are complex issues which still need to be dealt with.

Concerning the human factor, we propose determining the extent to which this platform helps users to satisfy their orientation needs. In which conditions is it more natural to use a VE or AR to navigate and to which extent is it possible to contextualize the information?

Secondly, we would like to extend our platform to enable it to be generalized as an AR platform with a VE, which can be, used both at sea (on boats) and on land (by car or by foot).

7. REFERENCES [1] Amir H. Behzadan, and Vinett R. Kamat. Georeferenced

Registration of Construction Graphics in Mobile Outdoor Augmented Reality. Journal of Computing in Civil Engineering 21, 4 (July 2007), 247–258

[2] Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., and MacIntyre, B. Recent advances in augmented reality. IEEE Computer Graphics and Applications 21, 6 (2001), 34–47.

[3] Cho, Y., Lee, J., and Neumann, U. A Multi-ring Color Fiducial System and an Intensity-Invariant Detection Method for Scalable Fiducial-Tracking Augmented Reality. In IWAR (1998).

[4] Coastal Explorer. http://rosepointnav.com/default.htm, October 2009.

[5] Comport, A., Marchand, F., and Chaumette, F. A Real-time Tracker for Markerless Augmented Reality In ACM/IEEE Int. Symp. on Mixed and Augmented Reality, ISMAR’03 (Tokyo, Japan, October 2003), 36– 45.

[6] Damasio, A. Decartes’s Error. BasicBooks, 1983

[7] Dey, A., and Abowd, G. Towards a Better Understanding of Context and Context-awareness. In Proceedings of the 2000 Conference on Human Factors in Computing Systems (The Hague, The Netherlands, April 2000).

[8] Didier, J.-Y. Contribution à la dextérité d’un systéme de réalité augmentée mobile appliquée à la maintenance industriel le. PhD thesis, Université d’Evry, Décembre 2005.

[9] Drummond, T., and Cipolla, R. Real-time Tracking of Complex Structures for Visual Servoing. In Workshop on Vision Algorithms (1999), 69–84.

[10] Furuno. http ://www.furuno.fr/, Octobre 2009.

[11] Garmin. http://www.garmin.com/garmin/cms/site/fr, October 2009.

[12] Goleman, D. Emotional Intelligence. New York: Bantam Books, 1995.

[13] Laurence Nigay, Philippe Renevier, Marchand, T., Salembier, P., and Pasqualetti, L. La réalité cliquable: Instrumentation d’une activité de coopération en situation de mobilité. Conférence IHM-HCI2001 Lil le (2001), 147–150.

[14] MaxSea. http://www.maxsea.fr, October 2009.

[15] Milgram, P., and Kishino, F. Taxonomy of mixed reality visual displays. IEICE Transactions on Information Systems E77-D, 12 (December 1994), 1–15.

[16] Philippe Fuchs. Les interfaces de la réalité virtuelle. La Presse de l’Ecole des Mines de Paris, ISBN 2- 9509954-0-3 (1996).

[17] Schall, G., Mendez, E., Kruijff, E., Veas, E., Junghanns, S., Reitinger, B., and Schmalstieg, D. Handheld augmented reality for underground infrastructure visualization. Personal and Ubiquitous Computing, Special Issue on Mobile Spatial Interaction 13, 4 (Mai 2008), 281–291.

[18] Schilit, B., Adams, N., and Want, R. Context-aware Computing Applications. 1st International Workshop on Mobile Computing Systems and Applications (1994), 85–90.

[19] State, A., Hirota, G., Chen, D., Garret, W., and Livingston, M. Superior Augmented Reality Registration by Integrating Landmak Tracking and Magnetic Tracking. Computer Graphics, (Annual Conference Series) 30 (1996), 429–438.

[20] Technology Systems Inc. Augmented reality for marine navigation. Tech. rep., LookSea, 2001.

[21] William A. Hoff, Khoi Nguyen, and Torsten Lyon. Computer Vision-Based Registration Techniques for Augmented Reality. Intel igent Robots and Computer Vision XV, In Intel ligent System and Advanced Manufacturing, SPIE 2904 (November 1996), 538–548.

[22] Wuest, H., Vial, F., and Stricker, D. Adaptive Line Tracking with Multiple Hypotheses for Augmented Reality. In ISMAR’05: Proceedings of the Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality (Washington, DC, USA, IEEE Computer Society. 2005), 62–69.

[23] Zendjebil, I. M., Ababsa, F., Didier, J., Vairon, J., Frauciel, L., and Guitton, P. Outdoor Augmented Reality: State of the Art and Issues. 10th ACM/IEEE Virtual Reality International Conference (VRIC2008), Laval : France.

[24] SAFRAN, UFO Detection. http://www.safran-group.com

Skier-ski system model and development of a computer simulation aiming to improve skier’s performance and ski

François ROUX

INSEP, Laboratoire d'Informatique Appliquée au Sport

11 avenue du Tremblay 75012 Paris, FRANCE

+33 492 212 477

[email protected]

Gilles DIETRICH UFR STAPS, Laboratoire Action,

Mouvement et Adaptation 1 rue Lacretelle

75015 Paris, FRANCE +33 156 561 245

[email protected]

Aude-Clémence Doix INSEP, Laboratoire d’Informatique

Appliquée au Sport 11 avenue du Tremblay 75012 Paris, FRANCE

Tel+33 674 711 601

[email protected]

ABSTRACT Background. Based on personal experience of ski

teaching, ski training and ski competing, we have noticed that some gaps exist between classical models describing body-techniques and actual motor acts made by performing athletes. The evolution of new parabolic shaped skis with new mechanical and geometric characteristics increase these differences even more. Generally, scientific research focuses on situations where skiers are separated from their skis. Also, many specialized magazines, handbooks and papers print articles with similar epistemology. In this paper, we describe the development of a three-dimensional analysis to model the skier-skis’ system. We subsequently used the model to propose an evaluation template to coaches that includes eight techniques and three observable consequences in order to make objective evaluations of their athletes’ body-techniques. Once the system is modeled, we can develop a computer simulation in the form of a jumping jack, respecting degrees of freedom of the model. We can manipulate movement of each body segment or skis’ gears’ characteristics to detect performance variations. The purpose of this project is to elaborate assumptions to improve performance and propose experimental protocols to coaches to enable them to evaluate performance. This computer simulation also involves board and wheeled sports.

Methods. Eleven elite alpine skiers participated. Video

cameras were used to observe motor acts in alpine skiers in two tasks: slalom and giant slalom turns. Kinematic data were input into the 3D Vision software. Two on-board balances were used to measure the six components of ski-bootsàskis torques. All data sources were then synchronized.

Findings. We found correlations between force and

torque measurements, the progression of center of pressure and

the eight body-techniques. Based on these results, we created a technological model of the skier-ski system. Then, we have made a reading template and a model to coach young alpine skiers in clubs and world cup alpine skiers and, we have obtained results demonstrating the usefulness of our research.

Interpretation. These results suggest that it is now

possible to create a three-dimensional simulator of an alpine skier. This tool is able to compare competitors’ body-techniques to detect the most performing body-techniques. Additionally, it is potentially helpful to consider and evaluate new techniques and ski characteristics.

Categories and Subject Descriptors Measurement, Performance, Experimentation.

General Terms .

Keywords Skier-ski system; Computer simulation; Techniques Reading template; Elite Skiing.

1. INTRODUCTION There are gaps in research between classical models, describing body-techniques judged to be efficient and motor acts performed by athletes that we have observed during ski racing competition. Therefore, we decided to launch a research study on to analyse alpine skiing kinematics.

We believe that analysis of slalom or giant turn movement in alpine skiing are necessary to improve the current understanding of elite alpine skiing performance. Several studies have already been done on alpine skiing biomechanics: analysis of the carving turn with a combination of kinematics, electromyographic and pressure measurement method [1]; Analysis of the ski stiffness and snow conditions on the turn radius [2]. Research in ski racing used motion capture with inertial measurement units and GPS to collect biomechanical data and improve performance [3].

A three-dimensional movement analysis of the alpine skier while turning is a first step. The modeling of the skier and his/her technique will allow the development of a computer simulation in which every body segment will be modeled and potentially manipulated to interpret some parameters.


The last decade has seen the transformations of alpine skier equipment notably in terms of the geometrics and mechanical characteristics of skis due to new materials as well as the arrival of snowboarders. Research devices with micro computing and video cameras have also considerably improved.

On account of this evolution, we have developed a motion analysis system as a tool for alpine skiing research. The scientific literature is generally dedicated to the study of the alpine skier’s body, independently from his equipment [2]. Our proposal is to consider skier and skis as a system divided into several sub-systems: skis, lower-limbs, trunk/head, upper-limbs which all interact with the coxal joint.

Modeling allows us to propose a reading template to coaches and instructors which can be potentially useful for champions’ performance analysis and knowledge. It is made with eleven observation benchmarks, including eight specific ones for body-techniques and three with mechanical consequences. The goal is to simultaneously use and control these body-techniques to optimize efficiency according to the possible trajectory of the race’s layout as well as the relief of the slope.

We assume that the simulator will be a tool for coaches to improve alpine skier performance and to improve new skis development.

The purpose of this article is three-fold. First a 3D analysis of eleven elite alpine skiers when performing a giant slalom turn or two slalom turns during a real skiing situation. We have done a tridimensional modeling of the skier, highlighting each body-segment. Second, a computer simulation in which we can manipulate movement of each body segment or skis’ gears’ characteristics to detect performance variations. Third, from the eight body-techniques (with their three consequences), as the reading template for coaches to improve skiers’ efficiency, we have provided details of one, for an example.

2. SKIER-SKIS’ SYSTEM MODELING 2.1 Methods The first experiment took place in Les Saisies during an ISF (International Ski Federation) race, with nine skiers. The second experiment took place on the Grande Motte Glacier in Tignes, and the last one took place on the Mont de Lans Glacier in Les Deux-Alpes. Eleven French elite alpine skiers (World Cup level) participated. During the first experiment, we worked with the FFS (French Ski Federation) and videotaped nine athletes during a Giant Slalom ISF race. The French teams’ director obtained the agreement from the referee of the race to put video cameras on the sides of the slope. The aim was to measure kinematic data of the skiers’ techniques during a real race situation, to find out how alpine skiers start a turn and pilot for an efficient trajectory. Distance between two Giant Slalom gates is 20 meters. We used two video cameras and oriented the optical field of the second camera in order to make a 30 degrees angle with the optical field of the first video camera. The common optical field of both video cameras was oriented according to the place where alpine skiers started to turn. The distance between the first camera and the slope was 20 meters, with maximal zoom lens while the distance of the second camera and the slope was 30 meters with maximal zoom lens. Data from the video were collected.

The second experiment involved one female alpine skier, performing the Giant Slalom. We created a mechanical model of skier-ski system thanks to Rossignol R&D who provided us with an on-board balance to measure ski bootsàski torques and snowàski torques. The balance was inserted between the ski boot and the ski (figure 1). Skier had to carry herself units’ data acquisition from the on-board balance (figure 1). Data was sampled at 936 Hz. Six video cameras were located on the sides of the analyzed turn, three on the left side, and three on the right side.

Nineteen focal points, made by fsquares tennis balls or scotch-tapsubject (anatomical landmarks) andtip of the ski; one on the tail of theon the ski boot regarding the anklehip; one on the elbow; one on the hon the top of the helmet. Data from the subject were input intreated by a DLT (Direct Linear T

Figure1. Unit data acquisitioon-board Bala

n, focal points and nce

luorescent yellow and black e, have been placed on the skis on each side: one on the ski; one on the front binding; ; one on the knee; one on the and; one on the shoulder; one

to 3D Vision software [4] and ransformation) algorithm [5].

Every 1/25th seconds, each position of each focal point located on subjects has been calculated. That way, the skier-ski system is located into the space of experiment. The six video cameras were connected together to a synchronization device. To synchronize kinematic data (from video cameras) with on-board balance, we used a remote control which sent out a signal to units’ data acquisition (on-board balance data) and which switched on a diode into the cameras’ field of vision. The time-code value matching to the lighting of the diode is the time’s origin of kinematic data. The signal recorded in units’ data acquisition is the time’s origin of torques’ data. We made match the two origins (common event: remote control signal), and reduced the on-board balance frequency to 25Hz to synchronize all data. We kept data when the subject skied through the experiment space: from the internal piquet of the upper gate to the internal piquet of the lower gate. Data from focal points from the subject (on-board balance) was input into 3D Vision software and treated by a DLT algorithm. Finally, for our last experiment, three male alpine skiers and one female alpine skier participated to the study. They skied on the Giant Slalom and Slalom. Kinematic data were acquired the same way, except for video cameras. They were put on scaffolding to get rid of the spray snow, due to ski turns, which kept from seeing lower focal points. We got two Kistler on-board force plates to measure ski-bootsàski torques and snowàski torques. Force plates were inserted between the ski-boot and the ski. Skiers had to carry units’ data acquisition of the on-board balance. Data was sampled at 936 Hz. To collect 3D kinematic data, the measurement area was first calibrated. The area marking has been realized with focal points made by fluorescent yellow and black squares tennis balls impaled on a wood stick jabbed in the snow in the sight-line of video cameras. Focal points were also put on the bottom of the internal piquet of the upper gate, and on the bottom of the internal piquet of the lower gate. Focal points data from the slalom gates and from the subjects were input into 3D Vision software and treated by a DLT algorithm. We calculated both the position and the progression of the center of mass for each experiment. To determine the center of mass position, we had to calculate the weight, m1, m2, m3… mn, of each segment of the skier and of each ski gear, then determinate the position of the center of gravity M1, M2, M3…Mn of each body segments and ski equipment in a space defined according to a frame. The position of the skier-ski system’s center of gravity is calculated according to positions PM1, PM2, PM3… PMn of each center of gravity of body segments and ski equipment.

3. COMPUTER SIMULATOR 3.1 Tridimensional modeling We consider skier-ski system as an articulated system divided in several sub-systems (figure 2). The first one, where the torque snowàskis is applied, is itself made by sub-systems skis and lower limbs. The second one, which it articulated with the previous one at each coxal joint, is constituted by the trunk, the head and upper limbs.

We atrainineffectstudyi

Aboutmechaarticudegreeapplie The sksuppocompothe apbindinconstr

3.2 From 3), wemassecalculchosetechni

Figure 2. Skier-skis’ system and sub-systems

ssume that we can better improve skis’ development and g separating skier and skis, contrarily to a study about s of ski and snow-cover on alpine skiing turn without ng the skier while using a sledge as the skier [2].

the kinematic study, skier system is represented as a nical structure made by many stiff and rigid segments lated with each other. Their moves are determined by s of freedom allowed by skeletal anatomy. The torques d on joints have a muscular origin.

i-bootàski torque is made up with three components of the rt reaction (on the three axis) and with three torques unding the resulting torque, results of the distance between plication point of the support reaction and the origin of the g mounting mark (engraved on the topsheet by the ski uctor).

Computer representation the first model obtained with the 3D Vision software (figure have determined the position of some segmental centers of s, using an anthropometric model [6] [7]. Then, we ated the global center of mass. Those segments have been n because they can produce the eight fundamental ques that the skier uses to control him/herself.

Software works with a DLT algorithm (“Direct Linear Transformation”). It connects real coordinates (known frame) of focal points which are in the common optical field of video cameras, with coordinates collected on the computer screen of each focal point recorded by each video camera. This calculation enables to rebuild the position of the moving point into the real space of experimentation. Then, the point which the screen owns the 3D characteristics of the real moving point kinematic.

The 3D computer representation distinguishes each body segment, the coxal joint, the progression of center of gravity, feet trajectory, and each ski-bootàski torque (represented by arrows

The computer simulator makes the manipulationpossible to measure disturbances on the system:changes; a joint order; a torque ski-bootàskiwhich corresponds to a modification of a mechanical or geometric characteristics of a kind of ski. Thus, we will be able to discover, in a short term, new ways to improve performance.

Figure 4. Jumping jack, computer simulator

Figure 3. First m

algorithm (“Direct Linear real coordinates (according to a

in the common optical field of video cameras, with coordinates collected on the computer screen of each focal point recorded by each video camera. This calculation enables to rebuild the position of the moving point into

Then, the point which moves on characteristics of the real moving point

computer representation distinguishes each body segment, center of gravity, feet trajectory,

ski torque (represented by arrows, figure 4).

the manipulation of the model e system: the trajectory

ski transformation which corresponds to a modification of a mechanical or geometric

of a kind of ski. Thus, we will be able to discover, to improve performance.

With the new software ID3D (latest software), we can pilot the progressionjumping jack with captured kinematic of skier’s movements’ analysis. We can provide to the jumping jack anthropometrics characteristics modeled by Hanavan and apply context’s torques that constrain the jumping jack. We can then

With the computer simulator, we aim two goalsdidactic, to show to coaches, trainees or athletes (beginners or experts) some biomechanical causes which affect the skiersystem and which the understanding is usefulinstructions or to act. The second one is technological; it consists to impose technical instructions to the computemodify some equipment characteristics, to measure their consequences. The goal is then to make assumptevolution of techniques, skis modelingevolution of the ski structure and the torspecific situation.

4. RESULTS The figure 4 shows the segmental modelingof skiàski-boot torques, and also the direction of the global acceleration of the center of masses. We calculated variations of joint angles to underline technique used by each subject to pilot himself. Wcomparisons which make objective our empirical model.The two next graphics (figures 5 and 6lateral knee inclination to the on-edge angleGraphics make appear a dispersion of size and timing but also a similar shape showing that the action is rSo, it is a fundamental technique to make vary the onand make our biomechanical and technological models.

Jumping jack, computer simulator

version of the 3D Vision we can pilot the progression of each segment of the

jumping jack with captured kinematic of skier’s movements’ to the jumping jack anthropometrics

characteristics modeled by Hanavan and apply context’s torques the jumping jack. We can then measure data.

With the computer simulator, we aim two goals. The first is s or athletes (beginners or

experts) some biomechanical causes which affect the skier-skis’ system and which the understanding is useful either to give

. The second one is technological; it consists the computer jumping jack, or

characteristics, to measure their consequences. The goal is then to make assumptions about the

modeling, and relations between the and the torques produced during a

modeling with resultant forces boot torques, and also the direction of the global

angles to underline technique each subject to pilot himself. We made statistical

comparisons which make objective our empirical model. 6) show in 2D variations of

edge angle. raphics make appear a dispersion of size and timing but also a

showing that the action is realized by every racer. So, it is a fundamental technique to make vary the on-edge angle and make our biomechanical and technological models.

Figure 5. Angle right shin tip upAngle right shin tip up

odeling

That way, contrarily to the classical model, and comparing our results with skeletal determinisms of our species described by physiologists [8], we have showed that the most performing skiers use the technique of the lower-limb’s plane rotation, made by the axis middle of the ski boot – knee and knee -coxal joint around the axis middle of the ski boot to the coxal joint by a femoral adduction on the pelvis creating a lateral inclination of the outer knee. The aim is to make vary the on-edge angle without moving the center of mass on the inner foot in order to create a radial component to the contact snowàski setting off the steering change the skier wants to make.

5. EXAMPLE OF A BODY-TECHNIQUE Eight techniques have been highlighted with a ski motion analysis [9]. Because of their relevancy for world level coaches and physics theory, they have been defined according theses two points of view. For example, the body-technique told as “shin tip up” became for coaches “lateral knee inclination”. The first definition respects physicians logical thought. The second one translates it as a useful technique to coaches because they can directly discernible, and described it, by words used in this professional field, biomechanical conceptions hold for the mathematical definition, keeping same body frames.

According to mechanics laws and because of mechanical and geometric characteristics of skis, three conditions are required to produce a directional effect: a sliding (skier’s engine is the gravity); a ski loading (a deforming effort on the ski); an on-edge angle. To create and control this directional effect, skiers produce eight techniques (actually, nine: one of them is not an action, but a consequence: the lunge [9]). Theses eight techniques are three-aimed, but during skiing, they are constantly interacting and make a system. Following, we have focused on one technique: the lateral knee inclination.

5.1 Lateral knee inclination This technique controls on-edge ski angle variations loading more the external foot leaning the ski. The observation will be done staring at the skier with a front or a rear view.

This technique is evaluated by the size of the angle formed between the axis middle of the ski-boot – knee and the slope’s plane (figure 7). It is the only biomechanical way to increase the external on-edge ski angle without moving the center of gravity inwards. It creates a directional effect and the future upper ski. It is only possible when the external knee bending is about 120° [10].

The variations of lateral knee inclination to the on-edge angle are measured into the moving frame o, x

r, yr, zr (figure 8). The

binding mounting mark is the origin (OS1) that the ski constructor skis designed, and which becomes the middle of the ski boot for the coach. This is a negligible approximation (for the one who observes): the ski boot is hold by ski bindings where the middle of the ski boot matches with the binding mounting mark, but separated by the thickness of the ski – ski-bindings interface.

Figure 7. Lateral knee inclination

Figure 8. Variation knee inclination

Figure 6. Angle left shin tip up

5.2 Observation for coaches We have transposed this biomechanical conception to the field of coaches completing the technique’s description by a cause which makes sense to give an instruction: the lateral outer knee inclination is the only way to make vary, according to joints limits, the on-edge angle of the ski (the outer one) always keeping a predominant load on the outer ski in order to create a directional effect. This technique becomes possible when the skier’s knee flexion is around 120° because it releases the second degree of freedom of his knee which allows the leg fixing [8] [10]. We propose below, two synchronized pictures (figures 9 and 10) illustrating the invented words to pass on our evaluations and our technological conceptions, comparing techniques used by the winner with one used by out athlete.

We cachang

movements. Those loads variations are due to skier’s weight speed of altitude variations of the center of gravity or the moving body segments (accelerations).

It causes loads variations which deform skis and make change the trajectory radius depending on the race layout demands (interactions between loading, ski edge angle and skis’ characteristics). The on-board balance inserted between the ski boot and the ski, to measure the torque skiàski-boot highlights efforts applied on the articulated system which is the skier body. Those data are necessary to make calculation of dynamical modeling of the system skier and the system ski. Plus, the forward-backward hip inclination with the knee bended about 120°, combined with the forward-backward trunk inclination, aims to keep the pressure on the bearings (ankle core) whatever the friction force between snow and skis.

The observation of this technique makes sense only if the coach is capable to connect them and think about the mechanical consequences (modeling). The amplitude, the rhythm and the timing causes the trajectory. The only goal is performance.

6. DISCUSSION The system modeling, from this experimental method, seems now possible. This study went further than results obtained with a 3D motion analysis only [3] because it has highlighted some factors to improve performance. Nevertheless, it keeps pursuing the investigation of interactions of the mechanical and geometric skis’ characteristics between the skiàski-boot torque and the torque applied on ski by the snow cover. Snow cover properties change the on-edge angle on steering phase and loading on the ski [2].

Let’s remind that the technique is defined according to articular or material marks taken from body or equipment. It corresponds to a technique seen as pertinent for the performance with current skis, and it also corresponds to a goal intended to vary ski-snow efforts or aerodynamic efforts characteristics.

This technical model of the skier is a tool for the coach and the skier to improve training. The body-technique described is referred to the middle of the ski-boot of the same side because the articulate skier-skis’ system is mostly guided by each effort at the contact ski-snow cover. Mostly, because the aerodynamic strain, which depends on the speed and the skier’s shape, also affects the system guiding but weaker than ski-snow efforts. It has also been showed that the saggital balance (that we call forward-backward inclination) is an important factor for performance [1].

It is still hard to predict loads on skis from the skier by electromyographic study, because estimations from EMG are barely reliable [11].

With the computer simulator, it is possible to apply on ski torques measured between the skiàski-boot torque and the torque applied on ski by the snow cover. It is also possible to apply the on-edge angle measured. That way, the static load repartition is known. The computer simulator is capable to know the dynamic torques of skis and bindings (on-board balances measurements). Then, we can link loads applied on ski by the skier and by the snow to the ski materials and the ski structure. The simulator manipulates
Figure 10. Picture of our athlete
Figure 9. Picture of the winner

n notice that the weight, which can be measured by scales, es if the skier does flexing, extension, and/or segmental

torques and skier-skis system characteristics. We can though find out what structure/material of the ski improves skier’s performance

Avoiding replacing 3D skiers motion analysis, in their contexts, and measurement of external constraints applied on skiers, the computer simulator will allow to impose to the skier-ski system univocal constraints reducing experimental uncertainties, and to make assumptions easily. That way, with coaches, ski constructors, and researchers, studies will be lead to make and evaluate experimental protocols to improve ski development.

Let’s add that the computer simulator is not only thought for alpine skiing, but also for board and wheeled sports. We can manipulate movement of each body (on the jumping jack) or skis’ gears’ characteristics to detect and even simulate performance variations.

7. ACKNOWLEDGMENTS Our thanks to first in memory of Alain Durey; to Rossignol Ltd. for provided us devices; and to French Skiing Federation for allowed us to work with athletes.

8. REFERENCES

[1] Müller, E., Schwameder, H. 2003. Biomechanical Aspects of New Techniques in Alpine Skiing and Ski-jumping. Journal of Sports Sciences, 21, 679-692.

[2] Nachbauer, W., Kaps, P., Heinrich, D., Mössner, M., Schindelwig, K., Schretter, H. 2006. Effects of Ski and Snow Properties on the Turning of Alpine Skis – A computer Simulation. Journal of Biomechanics, 39, Suppl.1, 6900.

[3] Brodie, M., Walmsley, A., Page, W. 2008. Fusion Motion Capture: A Prototype System Using Inertial Measurement Units and GPS for the Biomechanical Analysis of Ski Racing. Journal of Sports Technology, 1, 17-28.

[4] Maesani, M., Dietrich, G., Hoffmann, G., Laffont, I.,

Hanneton, S., Roby-Brami, A. 2006. Inverse Dynamics for 3D Upper Limb Movements - A Critical Evaluation from Electromagnetic 6D data obtained in Quadriplegic Patients. Ninth Symposium on 3D Analysis of Human Movement. Valenciennes.

[5] Abdel-Aziz, Y.I., Karara, H. M. 1971. Direct linear Transformation from Comparator Coordinates into Object Space Coordinates in Close-Range Photogrammetry.

[6] Hanavan, E. P. 1964. A mathematical Model of the Human Body. AMRL-TR-64-102, AD-608-463. Aerospace Medical Research Laboratories, Wright-Patterson Air Force Base, Ohio

[7] Miller, D.I., Morrison, W. 1975. Prediction of Segmental Parameters using the Hanavan Human Body Model. Med. Sci. Sports 7, 207-212.

[8] Kapandji, I.A. 1982. Physiologie Articulaire. Maloine S.A. éditeur. Paris.

[9] Roux F. 2000. Actualisation des Savoirs technologiques pour la Formation des Entraîneurs de Ski Alpin de Compétition. Doctoral Thesis. University of Paris Orsay XI.

[10] Cotelli, C. 2008. Sci Moderno. Mulatero Editore.

[11] Buchanan, T.S., Llyod, D.G., Manal, K., Besier, T.F. 2005. Estimation of Muscles Forces and Joint Moments Using a Forward-Inverse Dynamics Model. Official Journal of the American College of Sports Medicine. 1911-1916.

T.A.C: Augmented Reality System for Collaborative Tele-

Assistance in the Field of Maintenance through Internet. Sébastien Bottecchia

ESTIA RECHERCHE - IRIT Technopôle Izarbel

64210 Bidart (France) (+33)5 59 43 85 11

[email protected]

Jean-Marc Cieutat ESTIA RECHERCHE Technopôle Izarbel

64210 Bidart (France) (+33)5 59 43 84 75

[email protected]

Jean-Pierre Jessel IRIT

118, Route de Narbonne 31000 Toulouse (France)

(+33)5 61 55 63 11

[email protected]

ABSTRACT

In this paper we shall present the T.A.C. (Télé-Assistance-

Collaborative) system whose aim is to combine remote

collaboration and industrial maintenance. T.A.C. enables the

copresence of parties within the framework of a supervised

maintenance task to be remotely "simulated" thanks to augmented

reality (AR) and audio-video communication. To support such

cooperation, we propose a simple way of interacting through our

O.A.P. paradigm and AR goggles specially developed for the

occasion. The handling of 3D items to reproduce gestures and an

additional knowledge management tool (e-portfolio, feedback,

etc) also enables this solution to satisfy the new needs of industry.

Categories and Subject Descriptors

H.5.3 [Information Interface and Presentation]: Group and

Organization Interfaces – Synchronous interaction, Computer-

supported cooperative work, Web-based interaction.

K.4.3 [Management of Computing and Information Systems]:

System Management – Quality assurance.

General Terms

Performance, Reliability, Experimentation, Human Factors.

Keywords

Augmented Reality – TeleAssistance – Collaboration – Computer

Vision – Cognitive Psychology.

1. INTRODUCTION Over the last few years the world of industry has held great

expectations with regard to integrating new technological

assistance tools using augmented reality. This need shows the

difficulties encountered by maintenance technicians currently

faced with a wide variety of increasingly complex

mechanical/electronic systems and the increasingly rapid renewal

of ranges.

The compression of training periods and the multiplication of

maintenance procedures favor the appearance of new constraints

linked to the activity of operators, eg. the a lack of "visibility" in

the system to be maintained and the uncertainty of operations to

be carried out. These constraints often mean that mechanics have

to be trained "on the job ", which can in the long term involve a

greater number of procedural errors and therefore increase

maintenance costs as well as lead to a considerable loss of time.

In this highly competitive globalised context, the demand of

industrialists to increase the performance of technical support and

maintenance tasks requires the integration of new communication

technologies. When an operator working alone needs help, it is

not necessarily easy to find the right person with the required

level of skill and knowledge. Thanks to the explosion of

bandwidth and the World Wide Web, real time teleassistance is

becoming accessible. This collaboration between an expert and an

operator is beneficial in many ways, such as with regard to quality

control and feedback, although a system enabling remote

interactions to be supported is needed. With AR, we can now

envisage a remote collaboration system enabling an expert to be

virtually cop resent with the operator. By allowing the experts to

see what the operators see, they are able to interact with operators

in real time using an adequate interaction paradigm.

2. A.R. FOR MAINTENANCE & TELE-

ASSISTANCE We shall firstly take a brief look at existing systems and see that

there are two major types which are quite separate. We shall then

study the basic aspects which led us to build our solution.

2.1 Current systems Amongst the AR systems aimed at assisting maintenance tasks,

the KARMA prototype [8] is certainly the most well-known

because it was at the origin of such a concept as far back as 1993.

The aim of this tool was to guide operators when carrying out

maintenance tasks on laser printers. Later other systems followed

like those of the Fraunhofer Institute [20] and Boeing [18] in

1998. The purpose of the first was to teach workers specific

gestures in order to correctly insert car door bolts. The second was

aimed at assisting the assembly of electric wiring in planes.

Following these systems, industry became increasingly interested

in using such AR devices in their fields of activity. We then saw





otherwise, or republish, to post on servers or to redistribute to lists,



Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

the creation of more ambitious projects like ARVIKA [1] whose

purpose was to introduce AR in the life cycle of industrial

product, Starmate [22] to assist an operator during maintenance

tasks on complex mechanical systems, and more recently ARMA

[7] which aims to implement an AR mobile system in an

industrial setting. Even more recently, Platonov [19] presented

what can be described as a full functional AR system aimed at

repairs in the car industry. This system stands out from others

because it proposes an efficient technique enabling visual markers

to be avoided.

The vocation of all of these systems is to support operators in the

accomplishment of their tasks by providing contextualized (visual

or sound) information in real time. Both of these conditions

should reduce the risks of running errors according to Neumann's

work [18].

Another common point is the importance placed on transparency

in interaction with the machine. This is effectively a key point of

AR in this field. Users must be able to pay their attention to the

task in hand and not have to concentrate on how to use the tool

itself, hence the different strategies of each project in creating

prototypes. Also, the choice of the display device is important

because the objective may be to reduce the need for resorting to

classical supports (paper), thus leaving operator's hands free [24].

However, certain contradictory studies [25][10] are not

conclusive with regard to the efficiency of AR compared to paper

supports.

Finally, all these systems are particularly pertinent when tasks are

governed by rules which allocate specific actions to specific

situations, ie. within the framework of standard operational

procedures. In this case we talk about explicit knowledge,

although accessing this knowledge is not necessarily sufficient to

know how to use it, which is known as tacit (or implicit)

knowledge. This belongs to the field of experience, aptitude and

know-how. This type of knowledge is personal and difficult to

represent.

Thus, current AR systems for maintenance are of little use when

an unforeseen situation occurs in which case it is sometimes

necessary to resort to a remote person who has the required level

of qualification.

It is only very recently that systems which support remote

collaborative work for industrial maintenance have begun to

appear. However, greater importance is given to the collaborative

aspect than to maintenance. In [26] Zhong presents a prototype

which enables operators, equipped with an AR display device to

be able to "share" their view with a remote expert. The operator

can handle virtual objects in order to be trained in a task which is

supervised by an expert. However, the expert can only provide

audio indications to guide the operator. Concerning [21], Sakata

says that the expert should be able to remotely interact in the

operator's physical space. This operator has a camera fitted with a

laser pointer, and the entire system is motorized and remotely

teleguided by the expert who can therefore see the operator's work

space and point to objects of interest using the laser. The

interaction here is therefore limited to being able to name objects

(in addition to audio capabilities). There are other systems like [6]

which enable the expert to give visual indications to an operator

with an AR display device fitted with a camera. What the camera

sees is sent to the expert who can "capture" an image from the

video flow, add notes, then send back the enriched image to the

operator's display device. Here the expert is able to enrich real

images to ensure the operator fully understands the action to be

carried out.

2.2 Motivation/Issue In the paragraph above we saw that existing systems are either

very maintenance-oriented with a single operator with a device or

collaboration-oriented which do not necessarily enable direct

assistance to be provided for the task in hand.

Our work is therefore based on the possibility of remote

collaboration enabling both efficient and natural interaction as in

a situation of copresence, whilst taking advantage of the

possibilities offered by AR in the field of maintenance. Although

in [14] Kraut shows us that a task can be carried out more

efficiently when the expert is physically present, his study also

shows that remote assistance provides better results than working

alone, as confirmed by Siegel and Kraut in [23]. Other studies like

[15] even show that a task can be accomplished more quickly and

with less error when assisted rather than alone with a manual.

However, communication mechanisms and the context play an

important role when both the operator and expert share the aim:

They share the same visual space. In remote

collaboration, the expert does not necessarily have a

spatial relation with objects [14] and must therefore be

able to have a peripheral visual space so as to better

apprehend the situation. This will directly affect

coordination with the operator's actions and enable the

expert to permanently know the status of work [9]. The

lack of peripheral vision in remote collaboration

therefore reduces the efficiency of communication when

accomplishing a task [11].

They have the possibility of using ostensive references,

ie. deixis ("That one!", "There!") associated with

gestures to name an object. Much research as in [14]

and [4] suggests the importance of naming an object in

collaborative work or not. This type of interaction is

directly related to the notion of shared visual space

referred to above.

These characteristics provided by a collaborative relationship of

copresence are symmetrical [2], ie. those involved have the same

possibilities. On the contrary, remote collaboration systems

introduce asymmetries in communication. Billinghurst [3]

highlights three main asymmetries which can hinder

collaboration:

implementation asymmetry: the physical properties of

the material are not identical (eg. different resolutions in

display modes)

functional asymmetry: an imbalance in the functions

(eg. one using video, the other not)

social asymmetry: the ability of people to communicate

is different (eg. only one person sees the face of the

other)

Remote collaboration between an operator and an expert must be

considered from the point of view of the role of each party,

therefore necessarily introducing asymmetries, eg. due to the fact

that the operator does not need to see what the expert sees.

However, Legardeur [16] shows that the collaboration process is

unforeseeable and undetermined, which means that experts may

have at their disposal possibilities for interaction close to those of

operators as well as those which are available in real life, ie. the

ability to name and mime actions. Finally, the underlying element

with regard to collaboration in the field of tele-assistance is the

notion of synchronism: collaboration may be synchronous or

asynchronous. This shows the need for a real time interaction

method between parties.

3. THE T.A.C. SYSTEM

3.1 Principle To propose a solution combining remote collaboration and

maintenance thanks to augmented reality, we have chosen two

basic aspects:

The mode of interaction between parties: This is the

way expert can "simulate" their presence with operators

The shared visual space: This is about being able to

show the expert the operator's environment AND the

way in which the operator is able to visualize the

expert's information

Through these aspects we also suggest that our system is able to

support synchronous collaboration between parties.

To implement this, we propose the following principle of use

(figure 1): the operator is equipped with a specific AR display

device. Its design enables it to capture a video flow of what the

carrier's eye exactly sees (flow A) and a wide angle video flow

(flow B). Amongst the two video flows which the expert will

receive, there is the possibility of incrementing flow A thanks to

our interaction paradigm (cf. paragraph 3.3). The incrementations

are then sent in real time to the operator's AR display.

Figure 1. How the T.A.C. system works. The operator's view is

sent to the expert who enhanced it in real time by simply

clicking on it.

Hereafter we shall examine in greater detail our interaction

paradigm and the visualization system supported by it as well as

other functionalities.

3.2 Perceiving the environment For each AR system developed, its type of display should be

specifically chosen. Within the framework of maintenance, we

must therefore take into account the constraints imposed by the

operator's work. The many different aspects of using an AR

system in working conditions linked to a manual activity poses

certain problems. Furthermore, we must take into account how the

situation is seen by the expert who must effectively apprehend the

operator's environment as if he or she were there in person. In [5]

we presented our visualization system carried by the operator and

which is responsible for providing an exact vision of part of what

is seen to the expert. This specific HMD, known as MOVST

(Monocular Orthoscopic Video See-Through) satisfies the criteria

of our application. The first of these criteria was that the operator

must be able to easily apprehend the environment, without being

immersed and keep as natural a field of vision as possible, ie.

having the impression of seeing what can be seen with the naked

eye (eg. orthoscopic).

Figure 2. Simulation of the operator's field of vision carrying

our MOVST. At the top a classic display (inside the red

rectangle). At the bottom an orthoscopic display.

Figure 3. Prototype of our AR goggles known as MOVST.

Figure 4. Expert interface. The orthoscopic view (inside the

red rectangle) is placed in the panoramic view.

In order not to overload the operator's visual field with virtual

elements, the choice of a monocular system has the advantage of

only being able to be partly augmented. Finally, the "Video See-

Through (VST)" principle was chosen for two reasons. Firstly,

because it has an orthoscopic system, with a VST it is easier to

implement the carrier's point of view. Secondly, it is possible to

switch between orthoscopic display and classic display (figure 2).

The advantages of the classic display lie in the fact that it can be

used like any screen. It is therefore possible to present videos,

technical plans, etc.

This so-called classic information is essential because it

characterizes the "visibility" of the overall system subject to

maintenance. Mayes in [17] distinguishes, amongst other things,

the importance for the user of conceptualizing the task thanks to

this type of information. However, the previous model of our

MOVST only enabled the expert to see the "augmentable" part of

the operator's field of vision, ie. approximately 30˚. In order to

take into account the lack of peripheral vision as mentioned in

2.2, adding a second wide angle camera on the MOVST enables

this problem to be solved (figure 3).

With regard to the expert's interface (figure 4), this gives a

panoramic video of the scene in which the orthoscopic video is

incrusted (PiP or Picture in Picture principle).

3.3 The P.O.A. interaction paradigm In [5] we presented a new interaction paradigm based on the

ability of a person to assist another in a task. Generally, when

physically present together, the expert shows how to carry out the

task before the operator in turn attempts to do so (learning

through experience). To do this, the expert does not only provide

this information orally as can be found in manuals, but uses more

naturally ostensive references (since the expert and the operator

are familiar with the context). Our P.O.A. (Picking Outlining

Adding) paradigm is inspired by this and is based on three points:

"Picking": the simplest way to name an object

"Outlining": the way to maintain attention on the object

of the discussion whilst being able to provide adequate

information about it

"Adding": or how to illustrate actions usually expressed

using gestures

In order to implement these principles, we propose simply

clicking on the video flow received from the operator.

The first mode, "Picking", therefore enables an element belonging

to a work scene to be quickly named. This is equivalent to

physically pointing to an object. The visual representation can be

modelised in different ways like simple icons (circles, arrows,

etc). Thus, the expert, by simply clicking on the mouse on an

element of interest in the video, enables the operator to see the

associated augmentation (figure 5). This provides experts with an

efficient way of remotely simulating their physical presence in a

more usual way and saying: "take this object and ...".

Figure 5. Operator's augmented view after a "Picking"

operation. Here we clearly see the advantage of being able to

discriminate an important element by showing it rather than

describing it.

The second mode, known as "Outlining", uses the idea of

sketching the elements of a scene using the hands to highlight

them. These gestures support the verbal description. The

principles of AR mean that we have the possibility of

retranscribing this visually for the operator. Elements in the scene

which require the operator's can be highlighted by drawing the

contours or the silhouette of these objects (figure 6).

Figure 6. Operator's augmented view after "Outlining". The

expert has selected the elements of interest and has given the

temperature of an object.

With regard to the expert, this is done by clicking on the

interesting parts whose 3D modeling is known by the system. We

also have the possibility of adding characteristic notes (eg.

temperature of a room, drill diameter).

The final mode, known as "Adding", replaces the miming of an

action using adequate 3D animations. The expert has a catalogue

of animations directly related to the system subject to

maintenance. According to the state of progress of the task and the

need, the expert can select the desired animation and point to the

element to which it refers. Eg. (figure 7) the virtual element is

placed directly where it should be.

Figure 7. Operator’s augmented view after "Adding". The

expert shows the final assembly using a 3D virtual animation

placed on the real element.

3.4 Other functionalities From the point of view of interaction by the system to support

collaboration, P.O.A. interaction may be completed by the

expert's ability to handle virtual elements. "Adding" enables

actions expressed using gestures via animations to be illustrated,

but this is only meaningful within the framework of a formal and

therefore modelised process. This is not the case in unforeseen

situations. For these, we are currently taking advantage of the

formidable development of miniaturized inertial units. This works

by handling this interactor associated with a 3D virtual element in

the expert interface. The unit's position and orientation is

retranscribed on the 3D element. The operator sees the virtual part

handled just like if the expert had done so using the real part

whilst using a tangible interface. However there is the problem of

the expert not being able to handle at the same time both 3D

interactors and the keyboard to provide important information. To

support the transfer of implicit knowledge between the expert and

operator, it is more efficient to add a "speech to text" type man-

machine interaction mode.

The T.A.C. system, with its simulation of copresence, enables us

to support a tool in full development in the world of work: the e-

portfolio. This tool aims to manage a career path and validate

acquisitions. In sum, this is a database enabling a person's skills to

be capitalized. Thus, the T.A.C. system can be seen as a

monitored system providing the possibility of recording images

from different operations carried out with a view to an e-

qualification. Work and qualifications can therefore be more

easily combined.

Regarding the expert, recording images from different operations

is first and foremost a quality control system. Since maintenance

tasks in industry are highly formalized (set of basic operations),

their supervision in the event of problems thanks to the synoptic

view of operations carried out enables the cause to be analyzed.

Its feedback can also be capitalized on to be used when designing

future products and new maintenance procedures.

4. INITIAL RESULTS

4.1 Preliminary tests We tested the T.A.C. system using two examples to verify their

use within the framework of remote assistance. Operators do not

have specific knowledge in the field of mechanical maintenance.

The expert is someone who has been received a training in how to

carry out maintenance on a helicopter turboshaft engine. The first

example is not a real problem since it is simply question of

assembling an electrically controlled engine in an order pre-

defined by the expert (A, B, C, and D in figure 8). This simple

example was initially chosen because 3D modeling and the

associated animations were easy to create. Currently

implementing our system is based on ARToolKit [13] and

OpenCV [12] libraries for 3D recognition. To establish

connection between two computers (voice and video session), we

used the SIP signaling protocol implemented in SophiaSIP

library. Transfer data is ensured by SDP and RTP protocols of the

Live555 C++ library.

The second example concerns measuring the wear of blades in a

helicopter turboshaft engine (E, F, and G in figure 9). This

requires the use of a specific instrument which needs to be

inserted in a precise location. The checking of measurements is

supervised by the expert (this operation can prove delicate for

beginners).

4.2 Discussion During experiments, it became clear that our system provided

easier and more natural interaction than other systems which

provide traditional audio and video communication. The

possibility for synchronous interactions by the expert vis-à-vis the

operator stimulate exchanges and offer a strong feeling of being

physically present which in the end leads to greater efficiency.

This is due to the ability to act in unforeseen situations thanks to

"Picking" and "Outlining" and well determined processes thanks

to "Adding". Technical feasibility is extremely important with the

increasing calculation capacities of laptops and the explosion of

the bandwidth of communication networks. However, in

experimental conditions the expert preferred it when the video

offered a resolution of at least 640x480, which was not always

possible because of our network's limited bandwidth. Most often,

we were according the time of day forced to use a resolution of

320x240, enabling us to highlight this problem. It is therefore

necessary to currently look at an exclusive communication

solution between the expert and the operator. It also became clear

that the expert would himself have liked to control the virtual

objects supported by "Adding" instead of simple animations. We

are currently working on this taking inspiration from interaction

modes and virtual reality. Finally, the operator expressed the wish

to be able to control switching from classic to orthoscopic

displays in the MOVST and more generally have greater

possibilities for controlling the display system.

5. CONCLUSION In this paper we have presented a system enabling two remote

parties to be able to collaborate in real time in order to

successfully carry out a mechanical maintenance task. This system

is based on our P.O.A. interaction paradigm enabling the expert's

presence to be simulated with an operator in a situation of

assistance. This prototype was tested on simple cases, but which

were representative of certain real maintenance tasks and it

showed that it was able to support both defined and undefined

interaction processes. However, we must provide the means for

greater interaction between parties and carry out a more in-depth

study of the real benefits of such a system.

Figure 8. Example of collaboration

A: "Take this stator and put it on the red support"

B: "That's how the rotor and the case are put together"

C: "Turn the carter in this direction until you hear it click"

D: "Put the screws here and here with this torque"

Figure 9. Other examples of assistance.

E: "Undo this cap so you can then turn the shaft"

F: "Place the instrument in hole no. 1, that one there"

G: "Look over here, the small needle says 2 tenths, that's ok"

6. ACKNOWLEDGMENTS We would like to thank LCI, which specializes in turboshaft

engine maintenance, Christophe Merlo for his help on the

knowledge of collaborative processes and Olivier Zéphir for his

various contributions with regard to cognitive psychology.

7. REFERENCES [1] Arvika. Augmented reality for development, production,

servicing. http://www.arvika.de, URL.

[2] Bauer, M., Heiber, T., Kortuem, G. and Segall, Z. 1998. A

collaborative wearable system with remote sensing. ISWC

’98: Proceedings of the 2nd IEEE International Symposium

on Wearable Computers, page 10.

[3] Billinghurst, M., Kato, H., Bee, S. and Bowskill, J. 1999.

Asymmetries in collaborative wearable interfaces. ISWC '99,

pages 133–140, 1999.

[4] Bolt, R.1980. ‘put-that-there’: Voice and gesture at the

graphics interface. SIGGRAPH ’80: Proceedings of the 7th

annual conference on Computer graphics and interactive

techniques, pages 262–270.

[5] Bottecchia, S., Cieutat, J., and Merlo, C. 2008. A new AR

interaction paradigm for collaborative TeleAssistance

system: The P.O.A. International Journal on Interactive

Design and Manufacturing, N°2.

[6] Couedelo, P. Camka system. http://www.camka.com, URL.

[7] Didier, J. and Roussel, D. 2005. Amra: Augmented reality

assistance in train maintenance tasks. Workshop on

Industrial Augmented Reality (ISMAR’05).

[8] Feiner, S., Macintyre, B. and Seligmann, D. 1993.

Knowledge-based augmented reality. Commun. ACM,

36(7):53–62.

[9] Fussell, S., Setlock, L., Setlock, L.D. and Kraut, R. 2003.

Effects of head-mounted and scene-oriented video systems

on remote collaboration on physical tasks. CHI ’03:

Proceedings of the SIGCHI conference on Human factors in

computing systems, pages 513–520.

[10] Haniff, D. and Baber, C. 2003. User evaluation of augmented

reality systems. IV ’03: Proceedings of the Seventh

International Conference on Information Visualization, page

505.

[11] Heath, C. and Luff, P. 1991. Disembodied conduct:

Communication through video in a multi-media office

environment. CHI 91: Human Factors in Computing Systems

Conference, pages 99–103.

[12] INTEL. OpenCV. http://sourceforge.net/projects/opencv/

[13] Kato, H. and Billinghurst, M. ARToolkit.

http://www.hitl.washington.edu/artoolkit/, URL.

[14] Kraut, P., Fussell, S. and Siegel, J. 2003. Visual information

as a conversational resource in collaborative physical tasks.

Human-Computer Interaction, 18:13–49.

[15] Kraut, R., Millerand, M. and Siegel, J. 1996. Collaboration

in performance of physical tasks: effects on outcomes and

communication. CSCW ’96: Proceedings of the 1996 ACM

conference on Computer supported cooperative work, pages

57–66.

[16] Legardeur, J. and Merlo, C. 2004. Empirical Studies in

Engineering Design and Health Institutions, chapter Methods

and Tools for Co-operative and Integrated Design, pages pp.

385–396. KLUWER Academic Publishers.

[17] Mayes, J. and Fowler, C. 1999. Learning technology and

usability: a framework for understanding courseware.

Interacting with Computers, 11:485–497.

[18] Neumann, U. and Majoros, A. 1998. Cognitive, performance,

and systems issues for augmented reality applications in

manufacturing and maintenance. VRAIS ’98: Proceedings of

the Virtual Reality Annual International Symposium, page 4.

[19] Platonov, J., Heibel, H., Meyer, P. and Grollmann, B. 2006.

A mobile markless AR system for maintenance and repair.

Mixed and Augmented Reality (ISMAR’06), pages 105–108.

[20] Reiners, D., Stricker, D., Klinker, G. and Muller, S. 1999.

Augmented reality for construction tasks: doorlock assembly.

IWAR ’98: Proceedings of the international workshop on

Augmented reality: placing artificial objects in real scenes,

pages 31–46.

[21] Sakata, N., Kurata, T., Kato, T., Kourogi, M. and Kuzuoka,

H. 2006. Visual assist with a laser pointer and wearable

display for remote collaboration. CollabTech06, pages 66–

71.

[22] Schwald, B. 2001. Starmate: Using augmented reality

technology for computer guided maintenance of complex

mechanical elements. eBusiness and eWork Conference

(e2001), Venice.

[23] Siegel, J., Kraut, R., John, B.E. and Carley, K.M. 1995. An

empirical study of collaborative wearable computer systems.

CHI ’95: Conference companion on Human factors in

computing systems, pages 312–313.

[24] Ward, K. and Novick, D. 2003. Hands-free documentation.

SIGDOC ’03: Proceedings of the 21st annual international

conference on Documentation, pages 147–154.

[25] Wiedenmaier, S. and Oehme, O. 2003. Augmented reality for

assembly processes-design an experimental evaluation.

International Journal of Human-Computer Interaction,

16:497–514.

[26] Zhong, X. and Boulanger, P. 2002. Collaborative augmented

reality: A prototype for industrial training. 21th Biennial

Symposium on Communication, Canada.

Designing and Evaluating Advanced Interactive Experiences to increase Visitor’s Stimulation in a Museum

Bénédicte Schmitt (1), (2), Cedric Bach (1), (3), Emmanuel Dubois (1), Francis Duranthon (4) [email protected], [email protected], [email protected], [email protected]

(1) University of Toulouse, IRIT 118 Route de Narbonne 31062 Toulouse Cedex 4

France

(2) Global Vision Systems 10 Avenue de l'Europe

31520 Ramonville Saint Agne France

(3) Metapages 12 rue de Nazareth

3100 Toulouse France

(4) LECP/Muséum d’Histoire Naturelle de Toulouse 35 Allées Jules Guesde

31000 Toulouse France

ABSTRACT

In this paper, we describe the design and a pilot study of two

Mixed Interactive Systems (MIS), interactive systems combining

digital and physical artifacts. These MIS aim at stimulating

visitors of a Museum of Natural History about a complex

phenomenon. This phenomenon is the pond eutrophication that is

a breakdown of a dynamical equilibrium caused by human

activities: this breakdown results in a pond unfit for life. This

paper discusses the differences between these two MIS

prototypes, the design process that lead to their implementation

and the dimensions used to evaluate these prototypes: user

experience (UX), usability of the MIS and the users’

understanding of the eutrophication phenomenon.

Categories and Subject Descriptors: H.5.2. [User

Interface]: Prototyping| Evaluation/Methodology| User-centered

design| Theory and methods|.

General Terms

Design, Experimentation, Human Factors.

Keywords Mixed Interactive Systems, Advanced Interactive Experience, co-

design, museology, eutrophication

1. INTRODUCTION During the past years, increasing cultural interactive experiences

were produced in particular in museology. A major goal of this

trend is to increase the involvement of visitors during a visit, in

order to make them actors of their own museum experience.

Different attempts have been introduced in Museum. Guides [23]

are used to provide additional information about the exhibit

objects by use of numeric comments. Games [23], [25] can

propose challenges to visitors, encouraging them to learn through

play and adding new interests on the exhibition. More advanced

forms of Interactive Systems, called Mixed Interactive Systems

(MIS) [6] have also been developed to serve this goal. Mixed

Interactive Systems combine digital and physical artifacts.

Examples of MIS include Augmented Reality (AR), Mixed

Reality (MR) and tangible user interfaces (TUI). The interest of

such advanced interactive experiences is that rather than

manipulating technological devices, visitors handle physical

objects related to the exhibits, such as wooden blocks to create

programs to control a robot on display [9], or physical objects to

trigger different phenomena on the environment [19], and tightly

coupled to the display and animation of digital artifacts carrying a

predefined knowledge. Users can explore the advanced interactive

experience created by the system and discover its content. Limits

of such approaches mainly lie in the fact that they do not propose

clear challenges: proposed tasks are open-ended and users can

terminate them whenever they want. However involving physical

artifacts in an interactive experience is strongly in line with the

most recent trends of museology: indeed Wagensberg [22]

develops an approach for modern museum in which it is

recommended to maintain real objects or phenomena at the center

of exhibits. Among the above systems, only advanced interactive

experiences prompt visitors to manipulate real objects or real

phenomena to stimulate visitors.

But who are museum visitors? Most of the research about learning

in museums is dedicated to children. However, Wagensberg [22]

points out the universality of museum audience. In addition, a

recent study of Hornecker [10] shows a high interest in using

Tangible User Interfaces (TUI) in museums: they are universal

and can engage a range of visitor profiles. Investigating the use of

interactive systems for adults therefore appears as a required

complement to existing studies related to children (Figure 1). TUI

are thus good candidates for providing a fun experience while

enhancing the process of teaching complex natural phenomenon

to adult visitors.

Nevertheless introducing such technology also raises the problem

of evaluation. Indeed such context usability evaluation methods

(UEM) are still required to study the usability of the application,

i.e. how efficient, effective and easy to learn they are [11]. In

addition evaluating the user experience (UX) is equally important

because visiting a museum is a leisure activity rather than a

working task. Evaluating such advanced interactive experiences

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are


copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists,


Augmented Human Conference, April 2–3, 2010, Megève, France. Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

therefore required studying how visitors feel about their

experience [12].

In this paper, we focus on the introduction and evaluation of a

Mixed Interactive Systems in a Museum of Natural History. These

MIS are used to illustrate and teach a complex phenomenon: a

pond eutrophication.

Figure 1. “Mixed Interactive Systems for Eutrophication with

Palette” (MISEP): an example of a TUI for Museum

We first motivate the use of MIS in this context and briefly review

the goals of usability and UX evaluations. We then present the

basis of the pond eutrophication and the principles of the co-

design process we applied to design and implement two different

MIS. These MIS, namely MISE and MISEP, are introduced,

illustrated and finally compared in order to assess their adequacy

with the eutrophication context. Results of a pilot study focusing

on their usability and UX with these prototypes are also presented.

2. RELATED WORK As previously shown, advanced interactive systems have been

developed in domains such as cultural activity and informal

learning. Here we discuss limitations of these systems but also

such advantages that explain this introduction.

2.1 Use, limitations and advantages of MIS Complex or abstract concepts like in informal learning are often

difficult to explain or to understand. MIS provide tools to simulate

natural phenomena and make these concepts reachable by users.

Manipulating physical and digital artifacts is one of the most

interesting characteristics of MIS, making these concepts easier to

understand and to perceive as users can experience them [18]

[21]. A study of Kim et al. [16] shows that MIS support the

designer’s cognitive activities and spatial cognition by giving a

―sense of presence‖. MIS make use of three-dimensional

interfaces to provide a sense of reality that other systems cannot

offer.

MIS have some limitations. The first one is that physical objects

do not provide a trace of actions [15]: once an action is performed

on an object, the object itself is not able to provide any

information about its previous state to the user or event to the

computer system. A second limitation with the use of physical

objects involved in MIS is to ensure that their planed use is

perceivable and understandable by the users: by essence physical

objects have no means to express how to act on them. Therefore it

is required to base their use on affordable actions, behavior and

representations. Involving multiple physical objects is another

limitation because the user will have to grab and release several

objects and find a place where to put them. In addition,

technological limitations exist such as the detection and

localization of the required physical artifacts.

However MIS also have the advantage of prompting people, as

the technology is embedded into the physical environment [7].

Users can experience a new concept with less reluctance or

without feeling any constraints. They can manipulate objects

without barriers that separate digital and physical worlds.

Another advantage of MIS is their affordance, as objects provide

some common representations. Furthermore they provide some

opportunities not present in desktop interfaces [15]. Users can

explore the physical objects and evaluate which actions to

perform. This can also enable several groups of users, e.g. novices

and children, to use MIS more intuitively. In other words MIS are

potentially more universal than desktop interfaces [16].

Universality and accessibility of MIS are two major advantages

which can be used to help users familiarize with complex or

abstract concepts. We hypothesize that museums could benefit

from these advantages to engage visitors about their exhibits.

The impacts of MIS on users can be an interest topic to measure,

as they engage users. The next section presents the reasons that

encourage us to use both UEM and UX evaluation methods.

2.2 Usability and UX evaluations These past years, a new concept has emerged: user experience. As

it is recent, user experience (UX) has no consensual definition yet.

In particular, Bevan [3] studies the difference between usability

and UX evaluation methods and first highlights different possible

interpretations of UX in the literature: the goal of UX can be to (a)

improve human performances or to (b) improve user’s satisfaction

in terms of use and appropriation of the interactive system.

Hassenzahl points out the role of hedonic and pragmatic goals; on

this basis UX can be considered as the subjective aspects of a

system [8]. Moreover, according to results of a survey of

Wechsung et al. [24], the interest in usability focused mainly on

designing better products whereas in UX, it is generally more

linked to concepts with emotional content (e.g. fun and joy).

Finally, in the context of informal learning systems, speed and

accuracy are no longer the unique goals; they also should bring

new knowledge and stimulate user’s emotion, which are indeed

evaluated through UX methods: it is expected from UX

evaluations to extract users’ feelings and opinions about the

system. In short one can say that usability measurements reveal

problems related to the system behavior and measurements of UX

highlight some additional perspectives to understand their impact.

3. MUSEOGRAPHIC CONSIDERATIONS The aim of our collaboration with the Natural History Museum of

Toulouse is to make visitors aware about the eutrophication

process by explaining this phenomenon. It seems primordial to

show that a pond is not just a waterbody, but a complex and

dynamical system (Figure 2). A pond can live 100 years or more

and fill up naturally over years. However, this filling can be

slowed or accelerated by human activities. These activities impact

parameters involved in the eutrophication process: if human adds

weed killer or pumps water, the pond disappears faster; if human

removes mud, the pond disappears slower. These parameters are

Optimization

Analysis

Design

Implementation

Evaluation

Production

Analysis of interactive principles Preliminary

analysis

for example: water temperature, oxygen rate, water level, mud

level.

Most visitors ignore all the effects of their activities on a pond and

making them actors of an advanced interactive experience can

aware them about the eutrophication. However visitors cannot

experience the real phenomenon with real objects since the

lifetime is long and it is complicated to insert a real pond into a

Museum. As demonstrated previously, MIS represent a fit

solution for museum. We decide to simulate the real phenomenon

on a digital pond and to put forward physical objects to represent

human activities. We face two different solutions: either all

human activities can be made physically, or a man manipulated

physically can select digital human activities and impact the

digital environment. The interest of MIS is also that visitors can

interact with a realistic pond to better observe the evolution of the

ecosystem.

Figure 2. An oligotrophic pond which becomes an eutrophic

pond [26].

Through this advanced interactive experience, the Museum wants

to deliver three main messages to visitors:

A dead pond is a filled pond.

A pond is a system that produces filling

Man can accelerate or slow the eutrophication process.

The design process, described in the next section, takes into

consideration these museographic requirements.

4. DESIGN PROCESS We use a specific co-design process [1] as our collaboration with

the museum involves a multidisciplinary team, composed by

museographic experts, ergonomists and designers, to design our

prototypes. This co-design process facilitates the communication

between the participants and is adapted to Mixed Interactive

Systems, systems we decide to design. Furthermore, with regards

to software design process, the present process focuses more on

pedagogic, museographic and visitors’ requirements than on

engineering the software, as the question of technology is

postponed as the end of the development cycle. Finally, in

contrast with traditional HCI processes, this design process

primarily supports exploration of initial expectations rather than

just users requirements. It then turns these expectations into

interaction considerations and finally iterates to finalize the

application.

This co-design process consists in four phases we will define and

illustrate: preliminary analysis, analysis of interactive principles,

optimization and production (Figure 3). This process has the

advantage to guide the design team throughout these phases, and

particularly to facilitate the transformation of the requirements

into interactive experience.

In the preliminary analysis phase, we have analyzed the

museologic domain to list all its activities, constraints, targeted

users. The objective of this first phase is to extract some generic

activities that can be applied to other thematic to respect the

consistency of an interactive exhibit. For our prototypes, the

generic activity is ―Making an action on an environment has a

perceptible consequence on many different objects of this

environment‖.

The aim of the analysis of interactive principles phase is to define

how to make interactive one of the needs of the domain. All the

elements listed in the previous phase are taken in account. The

minimal functions of the system which are necessary to make

interactive the generic activity are identified in this phase. For our

prototypes, the minimal functions can be expressed by: ―To show

that an action on an environment has a perceptible consequence

on many different objects of this environment, the system should

allow to perform an action, to present a flexible environment, to

distinguish impacted objects and the environment, and to present

attributes of these objects‖. A set of recommendations to stage the

minimal functions are also listed, for example for our prototypes:

―To putting across that impacted parameters are constituent of

the environment, the system should mark the link between these

parameters and the pond‖.

The elements of these above phases are the same for the two

prototypes as museographic and visitors’ requirements has to be

well understood before interaction techniques problematic and

technical questions.

The optimization phase aims at designing the interaction with the

system. Our prototypes result from two different optimization

phases as the analyses of this phase do not deal with same

problems: the second prototype should interact directly with the

digital 3D pond and put forward fewer devices. The main concept

of this phase is to use the participatory design involving end users.

This iterative phase improves different dimensions of the ongoing

prototype, like social or educational dimensions. Users can

participate to design the prototype by use of some creative

Figure 3. Overview of the co-design process for Museums

methods like brainstorming, focus group and to assess it by

participating to user tests.

The last phase conducts to the production of one designed

prototypes.

5. PROTOTYPES The eutrophication process is complex to explain and we design

two alternative prototypes that have differences regarding

interaction space and forms of coupling between the two worlds.

These aspects are detailed in the next sections and can impact

understanding of this process and our objective is to determine the

role of these different aspects.

5.1 MISE (Mixed Interactive Systems for

Eutrophication) For this first prototype, visitors manipulate objects of a physical

scale model which represents a natural environment around a

pond: a set of houses with gardens, a field and a forest. This

physical scale model is used as an input and gives no feedback to

visitors. Manipulations of physical objects in this scale model

have a direct impact on a digital environment that includes: a 3D

representation of a pond, a timeline representing the life

expectancy of the pond, and a representation of environmental

parameters relevant to the pond eutrophication such as water

temperature, oxygen rate, water level, mud level (Figure 4).

Figure 4. MISE: the data visualization and the scale model

Through these manipulations, visitors can simulate some human

activities and observe their effect on the digital representation of

the pond and associated parameters. Human activities that can be

simulated include: pumping water by turning a tap in a private

garden or using a field hose over a field, adding weed killer with a

crop duster over a field, or removing mud of the pond with a

shovel. Each human activity is therefore activated through the use

of two different forms of objects: those adopting a global

representation and those adopting an individual representation of

the activities. For example visitors can add respectively weed

killer on a field going through it with a tractor or on a private

garden shaking a weed killer bag.

5.2 MISEP (Mixed Interactive Systems for

Eutrophication with Palette) For this prototype, visitors manipulate two physical objects: a

palette and a human figurine (Figure 5). The palette presents all

simulated human activities to visitors: adding weed killer,

pumping water and removing mud. The human figurine is a

metaphor of the Human acting on the pond. These two physical

objects are used as an input. Visitors observe the effect of their

actions on the digital environment, which includes the elements

found in MISE and, additionally, a garden and a field.

Figure 5. MISEP: the physical objects used to interact with

digital objects.

Visitors manipulate the palette to select an activity that activates

the corresponding digital object on the 3D environment. Visitors

manipulate the human figurine to move this object in order to

place it where the human activity should be performed. So visitors

can directly interact with the visualization as the digital objects

follow human figurine’s position.

5.3 Characteristics comparison A comparison of the two prototypes allows us to determine

hypothesis about which one better enhances learning and

understanding of the eutrophication phenomenon. Some studies

have tried to define characteristics of MIS [5] [18], even if the

domain is recent. We focus on characteristics that can be similar

or additional from both Dubois studies and Price et al. studies and

can better distinguish our prototypes. These prototypes can be

qualified by two parameters: interaction space and forms of

coupling between the two worlds.

Interaction space is composed by an input interaction space (e.g.

physical artifacts) and an output interaction space (e.g. digital

artifacts). Here we can define the devices, the input focus and the

location of the interaction spaces (separate, contiguous,

embedded).

The devices are different as MISE counts more physical

objects than MISEP, respectively eight and two. We expect

to investigate if these differences influence user behavior and

impact learning of the eutrophication process. Moreover, for

MISEP, visitors can control their actions as they can press a

button to release them. We believe this can impact perception

of the ecosystem evolution.

The input focus is greater for the first prototype. The scale

model catches more the visitor attention than the two

physical objects hold by users for MISEP. For MISE, users

have to make specific manipulations for each physical object.

Moreover users are passive behind the visualization, while

they can directly interact with the visualization of MISEP.

We want to discover if visitors focus more on the objects

they manipulate or activities these objects represent.

The location is also different. MISE has separate input and

output interaction spaces, as its input has no direct link with

the output and visitors can not directly interact with the

visualization. On the contrary, MISEP has contiguous spaces,

since digital objects follow the position of physical objects

manipulated by users. Indeed we want to investigate if this

difference impacts understanding of the phenomenon. We

also investigate whether the visitors make a link between

their manipulations and the visualization feedback.

Forms of coupling between the two worlds define the relation

between physical and digital worlds in terms of content and

behavior. Content is all the representations of the two worlds and

their consistency. Behavior is either the manipulations or the

movements made in the physical environment. Content and

behavior can refer respectively to semantic and articulatory

distances of Norman [17].

For MISE, physical artifacts have no overlap with digital

artifacts, as the scale model and the visualization have no

similar representations. So visitors cannot make a link

between representations of physical and digital artifacts as

easily as for MISEP. Actually, for this prototype, visitors can

manipulate digital objects, which have a match with physical

representations. They can select a tap on the palette and

move a tap model on the 3D scene. We think that this could

have an effect on understanding of the phenomenon.

Moreover, the devices of MISE require varied manipulations:

users can turn a tap or shake a fertilizer bag, etc. Even if we

try to propose manipulations un common use, this could have

an impact on the learning curve and the efficiency of the

different devices.

These characteristics help us to discover if the prototypes have an

impact on users during the experience and effects on stimulation,

as we expect to arouse more questions about this phenomenon

after the experience.

6. EXPERIMENTAL DESIGN We have conducted a pilot study to test our experimental settings

and to highlight some technical problems of the prototypes and

experiment instrumentation.

6.1 Method Our objective is to study three dimensions: usability, user

experience and museographic considerations. We designed a user

test protocol to assess these dimensions.

Before the evaluation, a profile questionnaire is proposed to the

participants. Some questions deal with their experience of

Museums and their basic knowledge about eutrophication.

A post-test questionnaire is also proposed to assess the usability of

the system and to measure UX. We build this post-test usability

questionnaire using all SUS items (System Usability Scale) [4]

and with some questions of the SUMI (Software Usability

Measurement Inventory) [27] and the IBM Computer Usability

Questionnaire. This questionnaire includes Likert scale of 5 or 7

items. We also use the SAM (Self-Assessment Manikin) method

[13] (Figure 6) to measure users’ experience and emotions on a

scale of 5.

This method is interesting as a personal assistance for users is not

required and we can measure the dynamic of four emotions:

pleasure-displeasure, degree of arousal, dominance-

submissiveness and presence. We gather data with SAM method

before and after the user test to measure if the system affects the

emotions of the participants.

Figure 6. Example of two dimensions of the SAM method.

At the end of the post-test questionnaire, participants can answer

some questions on the museographic considerations to assess their

understanding of the eutrophication phenomenon, e.g. ―What are

the impacts of our activities on the environment?‖, ―What can

reduce the pond lifetime?‖ or ―What can extend the pond

lifetime?‖.

During the test, we record some log files to collect metrics: the

length of the experience, the actions done, the error rate, the

cancel rate, the success rate, the time to select the first the first

activity, the time between activities, the selected activities rate

and the number of filling.

6.2 Participants The participants of the pilot study match with a user profile we

defined in order to have a homogeneous panel of participants. Six

participants take part to this pilot study (5 males and 1 female),

ranged from 26 to 40 years old. They are regular computer users

and they are not familiar with the eutrophication phenomenon.

6.3 Experimental conditions We have conducted the pilot study in a user lab and participants

have no time constraints. They can begin the experience and stop

it anytime. No session training is required as we test if the systems

are easy to learn and to simulate Museum context.

The participants test only one of the two systems, so 3 of the 6

users assess one prototype and the other 3 users assess the other

prototype. We believe the participants can rationalize their

understanding of eutrophication if they test the two prototypes,

and we want to avoid this as we measure this parameter.

6.4 Quantitative measures Quantitative measures concern especially usability and UX. The

usability is measured in terms of effectiveness, efficiency,

satisfaction and ease to learn with logs and results of

questionnaire. The UX is measured in terms of emotions of the

participants with the SAM method. It allows us to have non verbal

measures on the feelings of the participants before and after the

test.

6.5 Qualitative measures Qualitative measures concern the understanding of the

museographic considerations and some details on the feelings of

the participants about their experience. We investigate the

stimulation of participants by the advanced interactive experience

and if they enjoy it. An objective of the systems is that

participants ask themselves more questions about eutrophication

and that they wish to learn more about it.

7. EARLY RESULTS The pilot study aims at assessing our test protocol and our

prototypes, for example to detect technical issues. Some results

have been gathered for the three metrics but they are not

statistically significant due to the small number of participants per

prototype condition. Here we do not compare prototypes with

these results, as we will make this comparison with results of user

test, when the protocol and the systems will be improved.

However we can extract some benefits and lacks of the current

prototypes.

Stimulation and Eutrophication understanding

First impressions of users on MIS were positive and they seemed

engaged and stimulated by the prototypes, as proved by comments

collected during the post-interview: ―Manipulations involve the

users during the experience‖, ―The system is play and

stimulating‖. Moreover, users had more questions after the

experience than before, between 6 and 10. Another main point

was that users had another perception of a pond after the

experience. Before the experience, they believed that a pond is

―stagnant water‖ or ―a big puddle‖ and the experience made us

understand that a pond is ―a stretch of water exposed to direct or

indirect environmental modifications‖ or ―a living organism

impacted by human activity‖. So users perceived impacts of their

actions on the pond and it died faster because of these actions. But

users had not perceived time action on the pond, so they were not

able to understand that a pond was a system produces filling.

Usability

The results of SUS for the two prototypes show that they can be

characterized as ―OK‖ according to [2], since average scores are

52.5 for MISE and 51.2 for MISEP. The analysis reveals that the

systems could be considered as ―marginal‖ but they could become

―acceptable‖ with some improvements. The results of the SUMI

and IBM questionnaires support the previous analysis.

One interesting result about efficiency is that users spent less than

1 minute to select the first activity for the two prototypes.

Nonetheless, the effectiveness of the prototypes was constrained

by several technical issues and the lack of feedback about their

appropriate use. This point is confirmed by the error rate from the

use of the systems (57% for MISEP and 42% for MISE). We plan

to improve the prototypes and further evaluate it with a larger

number of users.

User Experience

The results of the SAM method (Table 1) indicate that users

enjoyed their experience. Results for pleasure with MISEP are

relatively high, even if the score after the experiment was lower

than the one before. This result can be explained by results for

dominance. It seems that users felt dominated by the system.

These results are consistent with usability results, in particular

effectiveness, and with users’ comments about technical

problems.

Table 1 : Average results of the SAM method

MISE MISEP

Before After Before After

pleasure 2,67 3,67 4 3,33

degree of arousal 3,33 3 2,67 2

dominance 4 2,33 3,67 2

presence 3 2,67 4 3,33

Results concerning users’ feeling of presence are encouraging as

the corresponding score after the experiment is relatively close to

the one before.

Improvements of the protocol and the prototypes

This pilot study assessed the evaluation protocol and helped us

detect several interesting areas of improvement. Some questions

were not well understood by users, such as the question ―Can you

tell me if in our everyday life, some actions can be close to this

experience?‖. Therefore we need to reformulate them. We also

noticed that a major issue has not been assessed by the

questionnaire: the impact of time on the pond. None of the

questions assessed whether users could understand that a pond is a

machine that produces filling. Moreover, we don’t know whether

users understood well the match between filling and death

although results show that they realized that the pond dies or fills.

Finally, this pilot study revealed some technical and graphics

problems in the prototypes. Users suggested interesting

modifications in the user interface and they pointed out the lack of

feedback. In future work, we need to improve user feedback with

respect to the needs of this particular audience.

8. DISCUSSION This pilot study reveals the importance of the co-design process

and the involvement of a multidisciplinary team. During the

preliminary analysis phase and the analysis of interactive

principles, this co-design process enables designers or non-experts

to grasp different aspects of museum domain and minimal

functions to make these aspects interactive. Then designers or

non-experts have main elements to design a system and to

evaluate it in order to propose systems that answer museum needs

and user satisfaction.

Besides, this pilot study highlights that Mixed Interactive Systems

are complex to assess. MIS evaluations involve multiple

dimensions and multiple metrics: the systems must stimulate and

engage users while enhancing the learning of a real complex

phenomenon. First we use both usability and UX evaluation

methods to gather complete data about usability issues and users’

feelings. For this last dimension, we have noticed that generally

UX evaluation method requires experts and time. We decide to

apply the SAM method as this is rather easy to use and to analyze.

We observe that this method brings to us additional data about

prototypes that we can collect with such common usability

evaluation methods, e.g. feelings about presence. Subsequently

we use questionnaire to collect users’ answers about the

understanding of the phenomenon and we notice the complexity

to assess this dimension. Actually this kind of measure requires a

lot of questions but these questions should not let users rationalize

their experience or include some indications on answers.

Finally, the formalization of MIS can certainly go through the

characterization of the systems. This can allow designers to have

some recommendations about systems and impacts of some

dimensions on the design or on users. Moreover some evaluation

tools can be provided to measure these dimensions if results are

significant. Some studies have already been conducted to compare

some characteristics of MIS, like the impact of direct and indirect

multi-touch input [20] or the effect of representation location on

interaction [18]. These studies deliver first results and contribute

to the maturation of the domain.

9. CONCLUSION In this paper we have presented the design and the pilot study of

two Mixed Interactive Systems, systems that combine digital and

physical artifacts. These prototypes have the objective to make

accessible the eutrophication phenomenon to visitors of the

Natural History Museum of Toulouse.

Our objective is to propose the most usable and enjoyable system

to museum visitors and the one enhances the learning of the

complex phenomenon. So we use a co-design process to better

understand messages to deliver to museum visitors about the

eutrophication. We also associate two evaluation methods to

detect usability problems and users’ feelings about our prototypes:

usability evaluation methods and user experience evaluation

methods. After a pilot study, our prototypes will be improved

from both these types of data and answers to our questionnaire.

The test protocol will also be improved based on our observations

during the pilot study and on early results of this pilot study. Then

we planed user tests in laboratory to determine which system to

insert into the Museum.

We also try to extract some recommendations about

characteristics of systems, i.e. interaction space and worlds match,

and their effects on perception and understanding of visitors from

results of user tests. These recommendations can guide future

designs of Mixed Interactive Systems.

10. REFERENCES [1] Bach C., Salembier P. and Dubois E. 2006. Co-conception

d'expériences interactives augmentées dédiées aux situations

muséales. In Proceedings of IHM'06 (Canada). IHM’06.

ACM Press, New York, NY, 11-18.

[2] Bangor, A., Kortum, P. and Miller, J. 2009. Determining

what individual SUS scores mean: Adding an adjective rating

scale, Journal of Usability Studies, 4(3), 114-123.

[3] Bevan N. 2009. What is the difference between the purpose

of usability and user experience evaluation methods?

UXEM'09 Workshop, INTERACT 2009, (Uppsala, Sweden).

[4] Brooke, J. 1996. SUS: a "quick and dirty" usability scale. In

P. W. Jordan, B. Thomas, B. A. Weerdmeester & A. L.

McClelland (eds.) Usability Evaluation in Industry. London:

Taylor and Francis.

[5] Dubois E. 2009. Conception, Implémentation et Evaluation

de Systèmes Interactifs Mixtes : une Approche basée

Modèles et centrée sur l'Interaction. Habilitation à diriger des

recherches, Université de Toulouse.

[6] Dubois, E., Nigay, L., Troccaz, J. 2001. Consistency in

Augmented Reality Systems, Proceedings of EHCI'2001,

Springer Verlag, 111-122.

[7] Fails, J., Druin, A., Guha, M. L., Chipman, G., and Simms,

S. 2005. Child’s play: A comparison of desktop and physical

interactive environments. In Proceedings of Interaction

Design and Children (IDC’2005). Bolder, CO.

[8] Hassenzahl. M. 2008. User Experience (UX): Towards an

Experiential Perspective on Product Quality. In Proceedings

of IHM’08 (Metz). IHM’08. Keynote presentation.

[9] Horn, M., Solovey, E. T. and Jacob, R.J.K. 2008. Tangible

Programming and Informal Science Learning: Making TUIs

Work for Museums, In Proceedings of Conference on

Interaction Design for Children (IDC’2008).

[10] Hornecker, E. and Stifter, M. 2006. Learning from

Interactive Museum Installations About Interaction Design

for Public Settings. In Proceedings Australian Computer-

Human Interaction Conference OZCHI'06.

[11] ISO 9241-11. 1998. Ergonomic requirements for office work

with visual display terminals (VDTs) -- Part 11: Guidance on

usability

[12] ISO 9241-210. 2007. Ergonomics of human-system

interaction -- Part 210: Human-centred design for interactive

systems

[13] Isomursu, M., Tähti, M., Väinämö, S., and Kuutti, K. 2007.

Experimental evaluation of five methods for collecting

emotions in field settings with mobile applications. IJHCS:

Volume 65 (Issue 4), 404--418.

[14] Kim, MJ and Maher, ML. 2008. The Impact of Tangible

User Interfaces on Designers' Spatial Cognition',Human-

Computer Interaction,23:2,101 — 137.

[15] Manches, A., O'Malley, C., AND Benford, S. 2009. Physical

Manipulation: Evaluating the Potential for Tangible

Designs. In Proceedings of the 3rd International Conference

on TEI’09 (Cambridge, UK, February 16-18, 2009).

[16] Marshall, P., 2007. Do tangible interfaces enhance learning?

In: TEI '07: Proceedings of the 1st international conference

on Tangible and embedded interaction. ACM, New York,

NY, USA, pp. 163-170.

[17] Norman D. A. and Draper S. W. 1986. User Centered System

Design: New Perspectives on Human computer interaction.

1986. Pp 31-61.

[18] Price, S. Falcao, TP. Sheridan, J and Roussos, G. 2009. The

effect of representation location on interaction in a tangible

learning environment. In: Proceedings of the 3rd

International Conference on TEI’09 (Cambridge, UK,

February 16-18, 2009). ACM Press, New York, NY, 82-92.

[19] Rizzo, F. and Garzotto, F. 2007. "The Fire and The

Mountain‖: Tangible and Social Interaction in a Museum

Exhibition for Children. In Proceedings of IDC '07. ACM

Press.

[20] Schmidt, D. Block, F. and Gellersen, H. 2009. A

Comparison of Direct and Indirect Multi-Touch Input for

Large Surfaces. In: INTERACT 2009, 12th IFIP TC13

Conference in Human-Computer Interaction (Uppsala,

Sweden August 26-28, 2009).

[21] Schkolne S, Ishii H, Schröder P: Immersive Design of DNA

Molecules with a Tangible Interface. IEEE Visualization

2004: 227-234

[22] Wagensberg, J. 2005. The ―total‖ museum, a tool for social

change. História, Ciências, Saúde – Manguinhos, v.12

(supplement), 309-21.

[23] Wakkary, R., Hatala, M., Muise, K., Tanenbaum, K.,

Corness, G., Mohabbati, B. and Budd, J. 2009. Kurio: a

museum guide for families. In Proceedings of the 3rd

International Conference on Tangible and Embedded

Interaction, ACM Press, New York, NY, 215-222.

[24] Wechsung, I., Naumann, A.B., and Schleicher, R. 2008.

Views on Usability and User Experience: from Theory and

Practice.

[25] Yiannoutsou, N., Papadimitriou, I., Komis, V., and Avouris,

N. 2009. "Playing with" museum exhibits: designing

educational games mediated by mobile technology. In

Proceedings of the 8th international Conference on

interaction Design and Children (Como, Italy, June 03 - 05,

2009). IDC '09. ACM, New York, NY, 230-233.

[26] http://www.rappel.qc.ca/lac/eutrophisation.html

[27] http://sumi.ucc.ie/whatis.html

Partial Matching of Garment Panel Shapes with DynamicSketching Design

Shuang Liang, Rong-Hua Li, George Baciu, Eddie C.L. Chan, Dejun ZhengDepartment of Computing

The Hong Kong Polytechnic UniversityHung Hom, Kowloon, Hong Kong

cssliang, csrhli, csgeorge, csclchan, [email protected]

ABSTRACTFashion industry and textile manufacturing in past decade,have been starting to reapply enhanced intelligent CAD pro-cess technologies. In this paper, we propose a partial panelmatching system to facilitate the typical garment design pro-cess. This process provides recommendations to the designerduring the panel design process and performs partial match-ing of the garment panel shapes. There are three main partsin our partial matching system. First, we make use of aBezier-based sketch regularization to pre-process the panelsketch data. Second, we propose a set of bi-segment panelshape descriptors to describe and enrich the local featuresof the shape for partial matching. Finally, based on ourprevious work, we add an interactive sketching input envi-ronment to design garments. Experiment results show theeffectiveness and efficiency of the proposed system.

Categories and Subject DescriptorsH.3.3 [Information Search and Retrieval]: Query for-mulation, search process; H.5.2 [Information Interfacesand Presentation]: User Interfaces; I.3.5 [Computer Graph-ics]: Computational Geometry and Object Modeling.

General TermsAlgorithms, Design, Experimentation

1. INTRODUCTIONIn recent years, garment computer-aided design systems

have been rapidly developed and have become the basis forthe clothing design process. Typical commercial design plat-forms are built towards rigid shape generation and editing.There are also some systems [4][11][13] incorporate sketchinginterface to provide designing flexibility and freedom.However, there are requirements to complete sketch input

in order to operate, which may result in time and efficiencylost throughout the designing process. This may be solved

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Augmented Human Conference April 2-3, 2010, Megève, France.Copyright c⃝2010 ACM 978-1-60558-825-4/10/04 ...$10.00.

with a stage of technology, in addition to the existing sys-tems, but also achieving a non-rigid garment design withdynamic prediction or suggestion that professional design-ers, researchers, as well as manufactures will certainly beinterested with [14]. This requires partial recognition andmatching of incomplete garment panel. Figure 1 illustratesthe designer’s working scenario of garment panel design pro-cess.

In this paper, we propose a partial matching system withdynamic prediction and non-rigid sketching features for gar-ment panel design. It is a google-alike design system, whichdynamically returns the possible solutions along with theshape drawing process. Our experiment results show thatthe proposed system provides an effective and efficient gar-ment design platform with partial panel matching.

The rest of paper is organized as following. Section 2presents related work of shape matching. Section 3 describesthe framework of proposed panel design system. Section 4, 5and 6 present the methodology of sketch data regularization,panel shape representation and partial matching algorithmrespectively. Section 7 and 8 give the experimental setup,performance evaluations, and discussions. Finally, section 9concludes our work.

2. RELATED WORKIn this section, we look at the related work of shape match-

ing algorithms. Shape matching has been well investigatedin computer vision in the last few decades [1][2][5][8][9][10][12][15]. The partial matching problem we meet in garmentdesign process falls into partial-complete matching (PCM).PCM refers to finding the shapes which contain a part thatis similar to the query shape, i.e., matching some part ofshape B as fit as possible to the complete shape A.

Partial shape matching is the key technique for developingpartial matching system for garment design process which isthe task of matching sub-parts or regions. These parts arenot predefined to be matched which can be any sub-shape ofa larger part in any orientation or scale. Many local shapedescriptors were presented to deal with PCM problems.

Ozcan et al. [8] used genetic algorithm to perform par-tial matching based on attributed strings. Their approachclaimed to be fast, but they cannot guarantee to obtain theoptimal result. Berretti et al. [1] proposed a local shapedescriptor which partitions a shape into tokens and repre-sents each token by a set of perceptually salient attributeswith orientation and curvature information. But the abovemethod only considered geometric features while neglectedtopological features. Tanase et al. [10] and Chen et al. [1]

(a) Garment design workspace. (c) Designed panel shape.(b) Sketching interface.

Figure 1: Illustration of working scenario for garment panel design process.

used turning function of two polylines and distance acrossthe shape (DAS) to represent local shape information. Butit may not be suitable for garment panel shapes. Chi etal. [3] proposed a primitive based descriptor according tothe law of Gestalt theory, which is effective and efficient inpartial object retrieval in cluttered environment. However,their method only considers two types of primitives of arcand line and could not represent complex shapes.However, these PCM algorithms are mostly based on static

geometric information, but fail making use of the dynamicfeatures from shape generation process.

3. OVERVIEW OF GARMENT PANEL DE-SIGN SYSTEM

The dynamic garment panel design process is a real-timeinteraction between user and the computer. The efficient,flexible, natural and convenient shape generation process isthe ultimate goal of the system. To this end, we proposea partial matching system with sketching interface whichreturns possible panel shape solutions dynamically in theprocess of garment design.The proposed system is in three phases. First, we provide

a freehand sketching interface to support the non-rigid, flexi-ble and natural design behavior of generating garment panelshapes. Second, a prior knowledge-based panel database ispre-collected to support the partial matching process. Third,the proposed system makes use of our proposed partial match-ing algorithm to return the matched panel shape solutionsdynamically with the partial sketch input based on domainknowledge.The framework of our online garment panel design sys-

tem is shown in Figure 2, which mainly contains three keymodules: sketch regularization, panel feature extraction andpartial matching.The sketch regularization module aims to build a natu-

ral and flexible panel design system, which is compatiblewith different numbers of input strokes and drawing orders.The diversity of the original sketch data makes it difficultto enumerate all possible stroke combinations in both spa-tial and temporal dimensions. After the raw sketch data iscaptured by the input device, we regularize the sketch datainto approximated primitive shapes into lines and curves.The feature extraction module then extracts both topo-

logical and geometric features from the regularized sketch-ing panels. These regularized panels are established by a

Figure 2: Framework of online garment panel designsystem.

pre-collected panel shape database with bi-segment shapedescriptors.

In the partial matching module, a partial matching algo-rithm is developed to calculate the similarity for dynamicpanel shape design. This similarity matching is asymmet-ric and partially completed among panel candidates in thedatabase. There are generally three cases. First, if the in-complete shape is part of a candidate panel, they are con-sidered highly similar because the partial input shape canbe completed later. Second, if the incomplete shape con-tains certain components that do not exist in the candidatepanel or the incomplete shape contains more componentsthan the candidate panel, the candidate panel is considerednot to conform to the user’s intention and the similarityvalue should be very low no matter how similar the corre-sponding parts are. Finally, if the incomplete shape is partof two or more candidate panels, the candidate panel withthe fewest components will have the highest similarity.

4. BÉZIER-BASED SKETCH REGULARIZA-TION

In our system, we perform a primitive-based sketch regu-larization to preprocess the sketch data, which involves twosteps: stroke segmentation and primitive fitting, as shownin Figure 3. The raw sketch data varies by user behaviors,drawing informality and ambiguity, which must be regular-ized consistently to have a standard representation.

First, we decompose the original strokes into geometricprimitive shapes with lines and curves based on pen speedand curvature information [6]. We monitor user’s drawing

Figure 3: Sketch regularization procedure. (a) Rawsketch input data; (b) Stroke segmentation results;(c) Primitive fitting results.

activity and split strokes where pen speed reaches the min-imal value or where variety of curvatures occurs evidently.Since drawings can be composed of a group of basic primitiveshapes (lines and curves) in a fixed way, stroke segmenta-tion can therefore reduce sketch diversity and computationalcomplexity.Second, we make use of lines and Bezier form to smooth

the segments. Quadratic and cubic Bezier curves are mostcommon; higher degree curves are more expensive to evalu-ate. We choose the fourth-order of Bezier form to smootha curve. Thereby, we can reduce computational complexity,while approximating most curves in panel shapes. Figure 4illustrates some complex curves contained in panel shapes.As we can see, a second-order Bezier is able to fit the curvefrom front panel shape in Figure 4(a), while it is not capableof approximating the curve from sleeve panel in Figure 4(b).

Figure 4: Illustration of Bezier approximation ofgarment panels. (a) Front panel shape; (b) Sleevepanel shape.

There are basically three major steps to approximate acurve into Bezier form. Assume a segment S is representedby a concrete point set S = (x1, y1), (x2, y2), ..., (xn, yn).The two end points are regarded as two control points, andthe objective of the Bezier form is to estimate the otherthree control points. The detailed Bezier form procedure isdescribed in Algorithm 1.

5. BI-SEGMENT PANEL SHAPE DESCRIP-TORS

In this section, we propose a bi-segment panel descriptorto represent the panel shape content. The bi-segment fea-ture descriptor consists of geometric features of a pair ofconnected segments together with the topological relationbetween them.

Algorithm 1. Bezier-based sketch regularizationInput: A sketch segment S = (x1, y1), · · · , (xn, yn)Output: A Bezier curve (x′

1, y′1), · · · , (x′

n, y′n)

Step1: Estimate t.We calculate the curve length at each point by

ci =

0, i = 1i∑

k=2

√(xk − xk−1)2 + (yk − yk−1)2, 2 ≤ i ≤ n

Then, we estimate t by

ti =ci

n∑k=2

√(xk−xk−1)

2+(yk−yk−1)2, 1 ≤ i ≤ n

Step2: Estimate control points.We estimate the X-coordinate of control points by

minimizing the following non-linear objective function:

minPi,i=0,1,··· ,4

n∑k=1

[xk −4∑

i=0

(4i

)Pi(1− tk)

4−itik]2

where xk denotes the X-coordinate of the point, Pi

denotes the X-coordinate of the control points,tk, 1 ≤ k ≤ n can be calculated from step 1. The objectivefunction can be solved by Levenberg-Marquardt algorithm[7]. Obviously, the Y-coordinate of the control point can becalculated in the same way.Step3: Compute the output points.

We calculate the X-coordinate of the output points by

x′k =

4∑i=0

(4i

)Pi(1− tk)

4−itik, 1 ≤ k ≤ n

Then, a rounding operator is performed to obtain theinteger coordinate. Likewise, the Y-coordinate canbe obtained in the same way.

A good local shape descriptor for partial panel shape match-ing should satisfy the following requirements: 1) the descrip-tor should encode rich local features of the shape, whichmeans the local descriptor should describe the shape cor-rectly and possess strong discriminative ability to differen-tiate various local shapes; 2) the local shape descriptor rep-resentation should be conveniently processed by the subse-quent partial matching algorithm; 3) the features containedin shape descriptor should accommodate scaling, translationand rotation invariance to find similar panel shapes; 4) thefeatures contained in shape descriptor should not be influ-enced greatly by user’s drawing styles. Therefore, we firstdivide a panel shape into several segments according to ver-texes of the panel. We then build a bi-segment model torepresent the shape by a sequence of bi-segments and theirencoding attributes.

5.1 Topological DescriptorsFor topological shape descriptors, we consider the topolog-

ical relations that encode the type of the segments/primitives.In this paper we call it as binary topological relation. Wedefine the topological relations between line and curve prim-itives as:

Definition 1 (Binary topological relation): Assume

the primitive set is of type ΣT = Tline, Tcurve, then the bi-nary topological relation R between two adjacent primitivesP1 and P2 could be specified as:

R(P1, P2) =

Rl,l, if PT

1 = PT2 = Tline

Rl,c, if PT1 = PT

2

Rc,c, if PT1 = PT

2 = Tcurve

Figure 5 shows the three types of binary topological re-lation between panel primitives. As we could see, thesetopological shape descriptors consider relations of these twoprimitives and reflect the structural characteristics of theshape.

Figure 5: Types of binary topological relations.

5.2 Geometric DescriptorsFor geometric features, we consider four types of shape de-

scriptors derived from the vertex and its adjacent primitivesof the bi-segment data, including inner angle of the vertex,ratio of primitive lengths, and turning angles. The detaileddescription of these four descriptors is given as follows:(1) Inner Angle (A): The inner angle formed by the two

primitives at the vertex.(2) Primitive Ratio (δ): The ratio between the lengths of

the two connecting primitives/edges. Note that the ratio iscalculated by the length of the shorter edge divided by thelonger edge to achieve rotation invariant.(3) Turning Angle Vector 1 (θ(1)): The first turning angle

vector consists of angles between the tangent at the vertexand each corresponding sample point on edge 1.(4) Turning Angle Vector 2 (θ(2)): The second turning

angle vector consists of angles between the tangent at thevertex and each corresponding sample point on edge 2.In followings, we particularly explain the calculation of

turning angle vectors. Figure 6 illustrates turning angle vec-tors in two curves where the types of the connecting primi-tives are curves.Turning Angle Vector 1 and Turning Angle Vector 2 could

be represented as θ(1) = (θ(1)1 , θ

(1)2 , θ

(1)3 , θ

(1)4 ) and θ(2) =

(θ(2)1 , θ

(2)2 , θ

(2)3 , θ

(2)4 ) respectively, where θi denotes each inner

angle between the tangent of the vertex (blue dot) and thetangent of the ith sample point (red dot) shown in the Figure6. Here, the sample points are generated by the equidistantsampling along the curves. We sample four points alongeach curve in Figure 6. Note that the number of the samplepoints can be different in various shapes. Apparently, if theprimitive is a line, all the turning angles are zero.

5.3 Bi-segment Panel ModelPanel shape is typically a closed shape that could be de-

composed into lines and curves. Therefore, we model thepanel shape by a Bi-segment sequence with its correspond-ing intrinsic characteristics. By combining both topologicaland geometric features together, we get the definition of Bi-segment model as:Definition 2 (Bi-segment model): A bi-segment model

Figure 6: Illustration of turning angle vectors in atwo curves case.

B is a (2i+ 3)-tuple with i sample points on each segment:

B = (R,A, δ,θ(1),θ(2))where R is the binary topological relation of the two edges

that connecting at vertex, A is the inner angle between thetwo edges, δ is the primitive length ratio, θ(1) and θ(2) areturning angle vectors of the two edges respectively and eachturning angle vector consists of i elements.

With the definition of bi-segment model, a panel shapeP could be represented by an ordered bi-segment sequence.We call it as the bi-segment panel descriptor in this paper.More specifically, a n bi-segments shape is described by P =(B1, B2, ..., Bn), where Bi is the bi-segment model as definedabove. Figure 7 shows a panel shape that is represented bya sequence of bi-segments.

B1

B2

B3 B4

B5

B6

B7

B8

B9

Figure 7: Bi-segment representation of panel shapeP = (B1, B2, ..., B9).

The advantages of this bi-segment panel descriptor aretwofold: 1) this descriptor is capable of capturing both topo-logical features that reflecting structural properties and ge-ometric features that describing shape information which isclosed to human’s visual perception of local panel shapes;2) this descriptor is easily processed by the partial shapematching task, as it is only in the form of numerical value.

Therefore, by introducing the bi-segment panel shape de-scriptor, we could reach a more comprehensive and efficientrepresentation of panel shapes and features. In next sec-tion, we make use of the bi-segment panel descriptor to thesubsequent partial shape matching algorithm

6. BI-SEGMENTS PARTIAL MATCHINGIn this section, we propose a partial shape matching algo-

rithm for garment panel design. We present the similaritymeasurement between two bi-segments.

Assume bi-segment B1 = (R1, A1, δ1,α(1),α(2)) and B2 =

(R2, A2, δ2,β(1),β(2)), the dissimilarity between B1 and B2

is defined in Equation 1:

dis(B1, B2) = w1λ(R1, R2) + w2f(A1, A2)+w3g(δ1, δ2) + w4h(α, β)

(1)

where wi, (1 ≤ i ≤ 4) denotes the weight coefficient, and

λ(R1, R2) =

0, if R1 = R2

1, otherwise(2)

f(A1, A2) = |A1 −A2| (3)

g(δ1, δ2) = |δ1 − δ2| (4)

h(α,β) = mins∑

i=1

((α(1)i − β

(1)i )2 + (α

(2)i − β

(2)i )2),

s∑i=1

((α(1)i − β

(2)i )2 + (α

(2)i − β

(1)i )2)

(5)

where s is the number of sample points. Note h(α,β) de-notes the minimum calibration distance between two turningangle sets of primitives.The similarity between two bi-segments can be derived

from the above dissimilarity distance as follows:

sim(B1, B2) =

0, if dis(B1, B2) > σ

1− dis(B1,B2)σ

, otherwise(6)

where σ is a threshold introduced from our experimentsto normalize similarity and make it fall onto the range of[0,1].

7. EXPERIMENTAL SETUPWe carry out a partial matching experiment to verify the

effectiveness and efficiency of our proposed method. We con-ducted our experiment with 200 panel shapes, which are col-lected by ten experienced panel designers. The machine weused in our experiment was a HP TouchSmart with AMDTurion X2 RM-74(2.2GHz) CPU, 2GB memory. We im-plemented our system using Visual C++ in the MicrosoftWindows Vista operating system environment.The experiment was first carried out by collecting sketch

samples with traditional CAD software. As mentioned, weinvited garment designers to freehand sketch the usual, com-mon and standard panel shapes. We collected 20 samplessketches from each designer. A panel database with 200panel shapes is successfully established. Figure 8 shows somepanel shape examples in our database.Second, we need to build up a feature database based on

panel database. As can be seen in Figure 8, different panelshapes have different features. We apply our proposed bi-segment shape descriptors to extract features from panelshapes. We sample 5 points on each segment in our ex-periment, and get a 13-dimension numerical vector for eachbi-segment.Finally, we evaluate the effectiveness of our proposed par-

tial matching. Similar to the main performance metrics ofinterest for general information retrieval, we test the effec-tiveness of partial matching by recall and precision rate. Re-call is defined as the ratio between the number of relevantreturned shapes and the total number of relevant shapes,while precision is defined as the ratio between the numberof relevant returned shapes and the total number of returnedshapes.

Figure 8: Samples collected in panel shape database.

Obviously, when more items are returned, recall will be in-creased but precision will be decreased. In a recall/precisiongraph, a higher curve signifies a higher recall/precision value.We will further describe and analyze the result in the nextsection.

8. RESULTS AND ANALYSISIn this section we evaluate the effectiveness of our pro-

posed partial matching in the performance of recall and pre-cision rate. In section 8.1, we compare the recall rate withpartial shapes varying from one to four bi-segments. In sec-tion 8.2, we compare the precision rate with four differentdescriptors, attributed strings descriptor [8], geometric de-scriptor, topological descriptor and our proposed bi-segmentdescriptor. In section 8.3, we test the response time of ourpartial matching garment design system.

8.1 Recall RateFigure 9 shows the relationship of number of retrieved

shapes to recall rate using partial shapes with different num-ber of bi-segments. As we can see, along with the completionof the drawing process, the partial matching recall rate in-creases gradually. The more complete the input panel shapeis, the clearer the user’s intention is expressed. Perhaps,the most important point to be stated is a high recall rateof partial matching system returns the wanted shapes. Wecould see when four bi-segments all nonetheless always out-perform one bi-segment at every setting. While returning 20shapes, an input partial shape with four bi-segments couldachieve 90% recall rate.

8.2 Precision RateFigure 10 depicts the relationship between the precision

of recall rate using four different descriptors of attributedstrings, geometric descriptor, topological descriptor and ourproposed bi-segment descriptor. We average the result andplot these four curves with different features used in partialmatching. As can be seen in Figure 10, attribute stringsdescriptor has only 73% precision. Due to neglecting of thetopological characteristics, it cannot fully express the panelcontent. Clearly, our proposed bi-segment descriptor has thebest performance and achieves in average 20% more preci-sion than all the other three descriptors.

8.3 Response Time

5 10 15 200.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Numbers of retrieved shapes

Rec

all

One bi−segmentTwo bi−segmentsThree bi−segmentsFour bi−segments

Figure 9: Performance of partial matching systemwith increasing integrity of input shapes.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Recall

Pre

cisi

on

Bi−segment descriptorAttributed stringsGeometric descriptorTopological descriptor

Figure 10: Precision-Recall graph of partial match-ing systems.

Finally, the time cost used in partial matching process isalso a major evaluation factor for real-time interactive per-formance. The response time should be as small as possibleand generally it should be less than a second. In our experi-ments, the average response time of our proposed descriptoris about 31.2 ms for partial matching. The performance ofour proposed descriptor sufficiently fulfills the time require-ment for real-time interaction.

9. CONCLUSIONSIn this paper we present a dynamic partial matching sys-

tem for garment panel shape design with interactive sketch-ing input. First, we provide an intelligent panel sketchprocessing based on automatic referential search and fea-ture matching. Second, we propose to represent the gar-ment panel shape with a bi-segment local shape descriptorthat incorporates both topological and geometric features.This bi-segment shape descriptor satisfies rotation, scaling,and translation invariance. Third, a partial matching algo-rithm is presented to solve the panel matching problem withlow computational complexity. Finally, we conduct experi-ments based on our pre-collected panel database to evaluatethe proposed approach. Experiments show the encouraging

matching accuracy of the proposed method.For future work, it is promising to extend the application

domains of our shape descriptor, i.e., mechanical drawing.

10. ACKNOWLEDGEMENTThis work is supported by the Research Grants Council of

the Hong Kong Special Administrative Region, under RGCEarmarked Grants (Project No. G-U432).

11. REFERENCES[1] S. Berretti, A. D. Bimbo, and P. Pala. Retrieval by

shape similarity with perceptual distance and effectiveindexing. IEEE TRANSACTIONS ONMULTIMEDIA, 2(4), 2000.

[2] L. Chen, R. Feris, and M. Turk. Efficient partial shapematching using smith-waterman algorithm. 2008.

[3] Y. Chi and M. Leung. Part-based object retrieval incluttered environment. IEEE Transactions on PatternAnalysis and Machine Intelligence, 29(5):890–895,2007.

[4] P. Decaudin, D. Julius, J. Wither, L. Boissieux,A. Sheffer, and M.-P. Cani. Virtual garments: A fullygeometric approach for clothing design. 25, 2006.

[5] L. J. Lateck, V. Megalooikonomou, QiangWang, andD. Yu. An elastic partial shape matching technique.Pattern Recognition, 40:3069–3080, 2007.

[6] S. Liang and Z. Sun. Sketch retrieval and relevancefeedback with biased svm classification. PatternRecognition Letters, 29(12):1733–1741, 2008.

[7] D. Marquardt. An algorithm for least-squaresestimation of nonlinear parameters. SIAM J. Appl.Math., 11(2):431–441, 1963.

[8] E. Ozcan and C. K. Mohan. Partial shape matchingusing genetic algorithms. Pattern Recognition Letters,18(10), 1997.

[9] E. Saber, Y. Xu, and A. M. Tekalp. Partial shaperecognition by sub-matrix matching for partialmatching guided image labeling. Pattern Recognition,38:1560–1573, 2005.

[10] M. Tanase and R. C. Veltkamp. Part-based shaperetrieval. 2005.

[11] E. Turquin, J. Wither, L. Boissieux, M.-P. Cani, andJ. F. Hughes. A sketch-based interface for clothingvirtual characters. IEEE Computer Graphics andApplications, 27(1):72–81, 2007.

[12] R. C. Veltkamp and M. Hagedoorn. State of the art inshape matching. Principles of visual informationretrieval, 2000.

[13] C. C. L. Wang, Y. Wang, and M. M. F. Yuen. Featurebased 3d garment design through 2d sketches.Computer-Aided Design, 35(7):659–672, 2002.

[14] J. Wang, G. Lu, W. Li, L. Chen, and Y. Sakaguti.Interactive 3d garment design with constrainedcontour curves and style curves. Computer-AidedDesign, 41:614–625, 2009.

[15] D. Zhang and G. Lu. Review of shape representationandd escription techniques. Pattern Recognition,37:1–19, 2004.

Fur Interface with Bristling Effect Induced by Vibration Masahiro Furukawa

The University of Electro-Communications 1-18-14, Chofugaoka, Chofu, Tokyo, JAPAN

[email protected]

Yuji Uema, Maki Sugimoto, Masahiko Inami Graduate School of Media Design, Keio University

4-1-1 Hiyoshi Kohoku-ku Yokohama-city, Kanagawa, JAPAN uema, sugimoto, inami @kmd.keio.ac.jp

ABSTRACT Wearable computing technology is one of the methods that can augment the information processing ability of humans. However, in this area, a soft surface is often necessary to maximize the comfort and practicality of such wearable devices. Thus in this paper, we propose a soft surface material, with an organic bristling effect achieved through mechanical vibration, as a new user interface. We have used fur in order to exhibit the visually rich transformation induced by the bristling effect while also achieving the full tactile experience and benefits of soft materials. Our method needs only a layer of fur and simple vibration motors. The hairs of fur instantly bristle with only horizontal mechanical vibration. The vibration is provided by a simple vibration motor embedded below the fur material. This technology has significant potential as garment textiles or to be utilized as a general soft user interface.

Categories and Subject Descriptors H.5.2 [Information Interfaces and Presentation]: User Interfaces – Haptic I/O, Input devices and strategies, Interaction styles.

General Terms Soft User Interface, Pet Robot, Visual and Haptic Design, Computational Fashion

Keywords Physical Computer Interfaces, Computational Clothing.

1. Introduction The important part of wearable technology is its role as an interface between information devices and humans [1]. There have been many improvements in terms of wearable input devices. A lot of research done on them has focused on the context and physical characteristics of the user interface, such as the material and texture. However, improvements to the physical aspects of user interfaces have usually involved only the reduction of thickness and greater flexibility [4][5]. There have

also been many improvements with output devices. Head mounted displays are often used to provide textural information [1], while vibration motors have mainly been used to provide non-textural information [6]. The first advantage of a vibration motor is its small size, making it easy to embed in garments [7][21]. This feature makes it possible to keep the unique textures of the garments, fully maintaining their tactile and visual characteristics, while simultaneously making this output device wearable. By combining the physical features of both fur and these vibration motors, we develop an interface that can be used as both an input and output device.

For example, a cat’s body is covered in a coat of fur, which he uses not only to maintain his body temperature but also to express his affection through bristling his fur [23] as shown in Figure 1 (a). The hair erection of chimpanzees is also known to have a social role as a means of visual communication [24]. Thus the soft body hair of these animals performs their equally important role as output interfaces. Accordingly, it is important

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Augmented Human Conference, April 2-3, 2010, Megève, France. Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

Figure 1. Soft and Flexible User Interface suitable for Wearable Computing Inspired by Hair Erection.

Figure 2. Bristling Effect with Horizontal Vibration Provided from Vibration Motor.

(a)

(b)

to create a user interface that has a soft and flexible surface, while also exhibiting both visual and tactile messages.

1.1 Purpose of Research In this paper, we propose the novel texture control method, using natural fur as a wearable interface. This method based on the bristling effect allows the texture of natural fur to change suddenly as shown in Figure 2. This technology, which needs only a simple vibration motor and furry material, also has the additional benefit of being a soft and flexible user interface.

1.2 Contributions The method with bristling effect by horizontal vibration has the following advantages:

This method only requires commonly available materials and actuators.

This method can be easily applied to clothing materials, making it compatible as wearable computer technology.

This method can be used on the surface material of Pet robots, providing a natural and believable platform to express the internal state of these robots

This method can be used on the soft surface portions of information devices including those found on common household furniture such as couches and cushions that are connected to home electronics.

2. Related Work Previous related works on soft and flexible wearable interfaces are described as the following.

2.1 Material for Garment One of these wearable interfaces makes use of visual changes in the surface itself in order to provide information to the user. For example, Wakita proposed Fabcell, which is a novel type of garment [9]. Fabcell is a non-luminous garment material which is made of conductive yarn inter-woven with fibers dyed with liquid crystal ink. When a voltage is applied to the fabric, there would be a change in temperature, which in turn changes the color of the fabric. In another similar project, the “Huggy Pajama”, they also proposed a wearable display with color changing properties [10].

These works are in some ways similar to our system as these interfaces are also non-luminous and change visually to convey information. In addition, they are also soft, flexible and wearable. Nevertheless, the difference is their visual changes are much slower. This is so because the speed of molecular changes caused by temperature changes is much slower than mechanical changes. Examples of interfaces that work based on mechanical changes are described below.

2.2 Artificial Fur In this work, “Tabby” by Ueki, artificial fur is used as a tactile interface that gives users a soft feeling. Tabby is a lamp that has an animal like soft body covered by this artificial fur and it aims to stimulate communication between people within a public space [16]. Tabby’s artificial fur has a form that changes its

shape as if it is breathing and this is controlled by a pressure fan within it. There is also an incandescent lamp underneath the fur, whose luminescence captures the people’s attention in public. Together with its soft tactile surface, it would encourage people to touch Tabby. Thus, we can suppose that an artificial fur surface plays an important role in communication. Soft and furry materials are also known to provide psychological comfort [8]. For example, Wada proposed the seal-shaped Mental Commitment Robot that is covered by artificial fur, in order to stabilize the mental state of the mentally ill and to reduce the burden of workers who serve in care facilities [19]. The artificial fur of this robot gives a psychological sense of cuteness and comfort to people. Moreover, Steve Yohaman also proposed the Haptic Creature Project, which involves a rabbit-shaped robot covered by artificial fur, similar to the Mental Commitment Robot [17].

As mentioned above, artificial fur can acts as a soft tactile interface and also provide haptic comfort. Moreover, it is possible to give the user an impression of holding a living mammal using haptics. For example, Hashimoto worked on an Emotional Touch project, which has voice coils to control the air pressure between one’s hands and the voice coils in real time, and thus providing a novel haptic impression of holding a living animal [20]. However, this project does not provide changing visual effects, which the artificial fur can.

Other than the merits of haptic and visual feedbacks, the furry material also has another visual effect that makes use of the changing attitude of each hair. Thus furry material has many hair on the base, the appearance changing occurs when the attitude of hair changes. Thus, this feature has the ability to provide information like event reporting, status changing, with remains the surface soft. The technologies that aim to control this attitude of hair are described as the following.

2.3 Static Electricity with Dense Fur The electrostatic effect is known to affect the “standing” or inclination angle of furs, and thus changing the shape of the furs. The Van de Graaff generator is used to generate static electricity, which makes one’s hair stand up when the person touches the electrode of this generator. Thus, it is possible to apply this same principle in order to control the inclination of the furs and produce a bristling effect. However, the Van de Graaff generator is relatively large and so is not very portable for users.

Circuits of high voltage also produce static electricity and are smaller in size than the Van de Graaff generator. Philips Electronics has patented this method known as the Fabric Display [11]. Nevertheless, there is the possibility of an electric shock when the user comes in direct contact with the electrodes, which makes it dangerous to be used.

2.4 Electromagnetic Forces, Shape-Memory Alloy with Sparse Fur On the other hand, there are works that are based on controlling the inclination angle of individual hair, for example, Raffle’s Super Cilia Skin [13]. This method uses electromagnetic forces to control the inclination of stick-like protrusions, which have

Figure 3. Bristling Effect with Arrector Pili Muscle.

permanent magnets fixed under them. These protrusions are distributed on an elastic membrane and are controlled by the electric magnet array arranged below the elastic membrane. The distribution density of these protrusions is much lesser than the density of hair on animal fur. Thus the appearance of the surface using this method still differs from that of animal fur.

In another work, Kusiyama proposed the Fur-fly, which involves a servomotor that controls the inclination angle of each batch of artificial feathers [12]. These artificial feathers can provide a soft, haptics interface but as the servomotor used is relatively complex and large in structure, it is not possible to control this interface in a finer and more precise resolution.

Lastly, shape-memory alloy (SMA) is known to change shape through electrical controls rather than mechanical controls and previous works that use this SMA actuation device include Sprout I/O by Coelho [14] and Shutter [15]. Both works are types of kinetic textures that use wool yam and felt, with SMAs attached onto them. The latter not only has physical shutters that are made by felt with attached SMA, the shutters also form a matrix that can display characters through its cast shadow. Thus the matrix can function like a dot matrix display to provide text visuals. Nonetheless, SMA has the issue of having a relatively lower response speed and it also moves slowly.

3. Prototyping and Implementation Arrector pili muscle is known to make body hair bristle up [22]. Figure 3 shows the process of the bristling effect and the positional relations of body hair, epitheca, arrector pili muscle and hair root. The arrector pili muscle is a muscle located around the hair root and its relaxed, initial state is shown in Figure 3 (a). As it contracts by parasympathetic activation, traction power is generated that makes the body hair bristles up as shown in Figure 3 (b).

However, industrially available natural fur does not have functioning arrector pili muscles. Thus an alternative method is necessary to make body hair bristle. And we managed to find that vibration is useful as an alternative method for bristling effect. The vibration motor, which is typically embedded in a mobile phone, is inexpensive and also safe electrically. Thus disk-shaped vibration motors were attached to the reverse side of the epitheca of natural fur shown in Figure 4 and were then supplied with current to generate vibration. The size of this natural fur is approximately 2cm in width and 25cm in length. From our test results, bristling effect was observed with several kinds of natural fur. The bristling effect is shown in Figure 2. Figure 2 (a) shows the initial state of the fur while Figure 2 (b) shows the appearance of the fur after the bristling effect had occurred. As Figure 2 shows, this bristling effect causes a visual change, which is apparent. Details of this effect are described as the following.

A prototype of the mechanism that produces the bristling effect is consisted of natural fur and a disk-shaped vibration motor as shown in Figure 5(a). The bristling effect, which is described in this paper, is such that the hair of the natural fur would stand up when the fur is vibrated using the vibration motor as shown in

Figure 5 (b). Opossum’s natural fur and the FM34F1

The bristling effect is realized in the following procedure. First, the body hair is stroked with one’s hand in order to be compressed as shown in Figure 5 (a). The appearance of this state is as shown in Figure 2 (a). This state remains as such under the condition of no external force being exerted. Secondly, the body hair bristles up when mechanical vibration, generated from the vibration motor, is applied to the fur as shown in Figure 5 (b). The appearance of this state is as shown in Figure 2 (b). Standard voltage 3.0[V] is used to drive the motor. The direction of vibration depends on the direction of the internal weight rotation as shown in Figure 4, and is parallel to the plane surface of the epitheca. Thus this mechanical vibration acts like the action of an arrector pili muscle.

disk-shaped vibration motor from T.P.C. are used in this system.

3.1 Selection of Material This bristling effect is not observed in all types of animal fur. In order to ensure that the conditions of the test are consistent throughout, a fur that would always bristle consistently have to be selected. Thus, a test is conducted in order to select a material which can produce this bristling effect consistently.

Needless to say, artificial fur is a good choice because it can be mass produced, readily available, and has uniform characteristics. Additionally, using artificial fur in place of natural fur is more desirable from the point of view of animal protection. Unfortunately, results from our preliminary study showed that artificial fur was not able to produce consistent bristling effects.

1 Specification of Vibration Motor (T.P.C FM34F): Standard Voltage 3.0V / Standard Speed 13,000rpm / Standard Current 100mA or less / Vibration Quantity 1.8G

Figure 3. Bristling Effect with Arrector Pili Muscle.

Direction of Internal Weight Rotation

Figure 4. Disk-shaped Vibration Motor To Provide Horizontal Vibration.

Thus we focus on using natural fur. The results of the tests on natural furs and the response time of the natural fur’s bristling effect are described below.

The natural furs that are used for estimation are shown in Figure 6. This estimation includes 4 materials which have relatively uniform hair type. The appearances are shown in Figure 6, and the these furs are from (a) an opossum, (b) a rabbit, (c) a mink and (d) a Tibetan lamb.

Table 1. Mechanical Characteristics and Responses

Material Thickness Diameter Length R

(a) Opossum 0.33 0.028 48 O

(b) Rabbit 0.53 0.026 36 X

(c) Mink 0.78 － 7 X

(d) Tibetan lamb 0.65 0.047 55 X

Unit: [mm]

A test on the reproducibility of the bristling effect is conducted in the following procedure, which is similar to the previous one. First, one disk shaped vibration motor was attached to the reverse side of the epitheca fur with double-stick tape as shown in Figure 5 (a). Next, the fur was put on velour and the body hair was stroked with one’s hand in order to be compressed as shown in Figure 5 (a). Then a standard 3.0[V] voltage was applied to the motor. Lastly, the reproducibility of the bristling effect was estimated by visual judgment.

The result of the reproducibility of bristling effect is as shown in Table 1. Reproducibility is described as ‘O’ means high-repeatability or ‘X’ means no repeatability. Opossum’s fur is the

only natural fur that has a high repeatability of bristling. When the vibration is provided by one motor, approximately 10 cm of the strip of natural fur is bristling.

It can be supposed this bristling effect that we have found is due to the mechanical structure of the natural fur. Thus, the thickness of the epitheca, the diameter and length of the body hair are measured in order to reveal the mechanical characteristics. The diameter is measured in micrometers. The result is as shown in Table 1. The values in the table are taken as the average of 10 readings. There is no value for the diameter of mink’s body hair as it is too small to be measured.

The result shows that the epitheca of opossum is less thick than that of the other 3 animals. In addition, although the Tibetan lamb has the greatest hair length, the hair is all bundled together as shown in Figure 5 (d), which makes it more difficult for the bundled hair to “stand” up. On the other hand, the hair of the opossum and the rabbit are not bundled, which makes it easier for their hair to “stand” up. Therefore, we can suppose that the thin epitheca and non-bundled fur of the opossum produces the highest repeatability of the bristling effect.

3.2 Response time of Bristling Effect One of the characteristics of the bristling effect that we have found is the bristling speed. In terms of mechanical engineering, mechanical findings are necessary for technological application. Thus, as basic findings, the response time of this bristling effect is measured. This response time is the duration of transition from the initial state as shown in Figure 5 (a) to the bristling state as shown in Figure 5 (b). Thus this response time can also be defined as the duration from the start time of supplying voltage to the vibration motor, and to the time of the hair reaching a stable state just after the bristling effect.

3.2.1 Experimental Setup The response time was measured by digital high-speed camera Casio EXILIM EX-F1. Experimental setup is as shown in Figure 7 and Figure 8. A matrix and an array of LEDs are set up behind the fur material. The matrix is used to calibrate displacements of the featured points, whereby each pixel is 5mm x 5mm in size. As for the LEDs array, one LED would blink after another in sequence at intervals of 10ms in order to confirm the frame rate of the captured video. The LEDs array was controlled with Arduino Duemilanove (ATmega168) and they would start to blink when a voltage across the vibration motor is detected. An incandescent lamp was also used as the lighting. As indicated above, standard 3.0V voltage was used and one disk shaped vibration motor was attached to the reverse side of the fur material, which was put on the velour (Figure 7).

3.2.2 Measurement Procedure 5 recordings were conducted under the same conditions and for each recording, it would start to record before the vibration motor is activated and end after the fur has reached its stable state. The recording specifications are as such: the frame rate is 600fps and video size is 432 x 192 pixels, in landscape mode. The bristling effect was observed in every recording and analysis is conducted after the recordings. Three featured points of observations were

Figure 6. Natural Fur Used for Experiment

Figure 5. State Transition of Bristling Effect.

selected on the fur surface as shown in Figure 8. Pyramidal implementation of the lucas kanade method is also used to track the feature points [25]. During the test, these featured points moved to the left side of Figure 8, with their movement directions and trajectory shown as arrows in Figure 8. The length of the arrow indicates the displacement of the featured point while the positions of the points are described with the x and y axis shown in Figure 8. The actual displacement was obtained after calibration and the transformation coefficients were (x, y) = (0.435, 0.385) millimeter per pixel.

3.2.3 Result and Discussion The temporal changes of the trajectories of the 3 featured points are as shown in Figure 9. The result of 5 trials shows similar trend of bristling effect. Details of one of these 5 trials are as follow. Time 0 is the time when a voltage across the vibration motor was detected and the trajectories are plotted from this time. The left sided graph of Figure 9 describes the trajectory in x axis, while the right sided graph of Figure 9 describes the trajectory in the y axis. Figure 10 is a larger scaled graph that shows the displacements in between 0 to 500ms. The left side graph of Figure 9 shows that the duration for the bristling effect to happen is approximately less than 500ms. The left side graph of Figure 10 shows that the transition may be finished in approximately 300ms.

As there is no displacement of the featured points in the duration of time before the vibration motor starts, it is supposed that the bristling effect is caused by the mechanical vibration actuated by the disk shaped vibration motor.

The oscillation of the featured point was observed as shown in Figure 10. After comparison to the captured video, we deduced that this is not a tracking error but is instead the oscillation of the fur surface, which continued after 500ms. Frequency analysis was then conducted with the Fast Fourier Transformation. 2,048

sampling points are calibrated from 500ms onwards, and the Hanning window was used for FFT. The result is as shown in Figure 11. The left side graph of Figure 11 describes the regularized power spectrum in the range of Nyquist frequency. The close up of this figure is shown in the right side graph of Figure 11 with the peak frequency at approximately 57 Hz. It is supposed that this peak is the oscillation frequency of the fur surface.

4. Discussion Our experiments showed that it is impossible to flatten the body hair using vibration. This bristling effect is not reversible with only mechanical vibration. However, manual stroking smoothes out the fur easily. If this interface serves as not only an output device but also an input device, the user strokes the fur naturally. Thus in this interactive system, human reflex instinctively reacts to the fur interface, resolving this reversal issue in the bristle effect. Furthermore, it is relatively easy for capacitive sensors to detect the touch of a hand to the fur.

As mentioned above, the visible physical changes caused by the bristle effect serve as visual information presentation. On the other hand, the mechanical vibration serves as a tactile presentation method [21]. One of the goals of this technology is to increase the desirability and comfort of wearable technology for the user. It is not difficult to present tactile stimulation to

Figure 10. Initial Shifting of Measured Points After Providing Horizontal Vibration to Epitheca

Figure 11. Result of FFT of Vertical Shifting Measured from Feature Point Tracking

Figure 9. Displacement of Measured Points from Initial Position with High-Speed Camera at 600fps.

Figure 8. Recording Setup and Displacement of Feature Points Used for High-Speed Tracking.

Figure 7. High-Speed Photography Setup

human, but through this method, using a single actuator, one can simultaneously provide two modes of sensation—both visual and tactile. This point is another advantage of our method.

5. Conclusion In this paper, we proposed the texture control method using natural fur and a simple vibration motor as a wearable interface. This method based on bristling effect allows the texture of natural fur to instantly change, and additionally serves as a soft user interface. The result of estimation experiments demonstrated that opossum fur has the best response, with the body hair bristling within less than 500ms and an oscillation frequency of 57Hz. Since our current prototype requires the use of natural fur, the achieved bristling effect is highly dependent on the mechanical properties of the natural fur. Thus, we will further explore the mechanism of the bristling effect in the future using other materials.

6. References [1] S. Mann, Wearable Computing: A First Step toward

Personal Imaging, Computer, vol. 30, no. 2, pp. 25-32, 1997 [2] J. Rekimoto, K. Nagao, The world through the computer:

computer augmented interaction with real world environments, Proceedings of the 8th annual ACM symposium on User interface and software technology table of contents, pp. 29 - 36, 1995

[3] H. Ishii, B. Ullmer, Tangible Bits: Towards Seamless Interfaces between People, Bits, and Atoms. Proceedings of CHI '97, pp. 234-241, 1997

[4] M. Orth, R. Post, E. Cooper, Fabric computing interfaces, CHI 98 conference summary on Human factors in computing systems table of contents, pp. 331 - 332, 1998

[5] D. De Rossi, F. Carpi, F. Lorussi, A. Mazzoldi, R. Paradiso, E.P. Scilingo and A. Tognetti, Electroactive fabrics and wearable biomonitoring devices, AUTEX Research Journal, vol. 3, no. 4, 2003

[6] T. Amemiya, J. Yamashita, K. Hirota, M. Hirose, Virtual Leading Blocks for the Deaf-Blind:A Real-Time Way-Finder by Verbal-Nonverbal Hybrid Interface and High-Density RFID Tag Space, Virtual Reality Conference, IEEE, pp. 165, IEEE Virtual Reality Conference 2004 (VR 2004), 2004

[7] R.W. Lindeman, Y. Yanagida, H. Noma, K. Hosaka, Wearable vibrotactile systems for virtual contact and information display. Virtual Reality, 9, 203-213, 2006

[8] H.F. Harlow, R.R. Zimmerman. Affectional responses in the infant money. Science, p.130, 1959

[9] A. Wakita, M. Shibutani, Mosaic textile: wearable ambient display with non-emissive color-changing modules. In: Proceedings of the international conference on advances in computer entertainment technology (ACE), pp.48-54, 2006

[10] J. K. Soon Teh, A. D. Cheok, R. L. Peiris, Y. Choi, V. Thuong, S. Lai, Huggy Pajama: a mobile parent and child hugging communication system, Proceedings of the 7th international conference on Interaction design and children table of contents, pp. 250-257, 2008

[11] Philips Electronics N.V. FABRIC DISPLAY. United States Patent, No. US 7,531,230 B2, 2009

[12] K. Kushiyama. Fur-fly. Leonardo, Vol. 42, No. 4, pp. 376-377, 2009

[13] H. Raffle, M.W. Joachim, and J. Tichenor. Super cilia skin: An interactive membrane. In CHI Extended Abstracts on Human Factors in Computing Systems, 2003

[14] M. Coelho, P. Maes. Sprout I/O: A Texturally Rich Interface. Tangible and Embedded Interaction, pp. 221-222, 2008

[15] M. Coelho, P. Maes. Shutters: a permeable surface for environmental control and communication. Tangible and embedded interaction (TEI ’09), pp. 13-18, 2009

[16] A. Ueki, M. Kamata, M. Inakage. Tabby: designing of coexisting entertainment content in everyday life by expanding the design of furniture. in Proc. of the Int. Conf. on Advances in computer entertainment technology, Vol. 203, pp. 72-78, 2007

[17] S. Yohaman, K. E. MacLean. The Haptic Creature Project: Social Human-Robot Interaction through Affective Touch. ACM SIGGRAPH 2007 Emerging Technologies, p. 3, 2007

[18] S. Yohanan, K. E. MacLean. The Haptic Creature Project: Social Human-Robot Interaction through Affective Touch. In Proceedings of the AISB 2008 Symposium on the Reign of Catz & Dogs: The Second AISB Symposium on the Role of Virtual Creatures in a Computerized Society, vol. 1, pp 7-11, 2008

[19] K. Wada, T. Shibata, T. Saito, K. Sakamoto and K. Tanie, Psychological and Social Effects of One Year Robot Assisted Activity on Elderly People at a Health Service Facility for the Aged, Proceedings of the 2005 IEEE, International Conference on Robotics and Automation, 2005

[20] Y. Hashimoto, H. Kajimoto, Emotional touch: a novel interface to display "emotional" tactile information to a palm, Intl. Conf. on Computer Graphics and Interactive Techniques archive, ACM SIGGRAPH 2008 new tech demos table of contents, 2008

[21] A. Toney, L. Dunne, B. H. Thomas, S. P. Ashdown, A Shoulder Pad Insert Vibrotactile Display, Proceedings of the Seventh IEEE International Symposium on Wearable Computers, pp 35-44, 2003

[22] C. Porth, K.J. Gaspard, G. Matfin, Essentials of pathophysiology: Concepts of altered health states, Lippincott Williams & Wilkins, Chapter 60, 2006

[23] J.A. Helgren, Rex cats: everything about purchase, care, nutrition, behavior, and housing, Barrons Educational Series Inc, 2001

[24] F. B. M. Waal, Behavioral Ecology and Sociobiology, Reconciliation and consolation among chimpanzees, vol. 5, issue. 1, pp.55-66, 1979

[25] J.Y. Bouguet, et al., Pyramidal implementation of the lucas kanade feature tracker description of the algorithm, Intel Corporation, Microprocessor Research Labs, OpenCV Documents, 1999

Evaluating Cross-Sensory Perception of Superimposing

Virtual Color onto Real Drink:

Toward Realization of Pseudo-Gustatory Displays Takuji Narumi

Graduate School of

Engineering,

The University of Tokyo

/JSPS,

7-3-1 Hongo, Bunkyo-ku,

Tokyo, Japan

81-3-5841-6369

[email protected]

Munehiko Sato Graduate School of

Engineering,

The University of Tokyo,


Tokyo, Japan

81-3-5841-6369

[email protected]

Tomohiro Tanikawa Graduate School of

Information Science and

Technology,



Tokyo, Japan

81-3-5841-6369

[email protected]

Michitaka Hirose Graduate School of

Information Science and

Technology,



Tokyo, Japan

81-3-5841-6369

[email protected]

ABSTRACT

In this research, we aim to realize a gustatory display that enhances

our experience of enjoying food. However, generating a sense of

taste is very difficult because the human gustatory system is quite

complicated and is not yet fully understood. This is so because

gustatory sensation is based on chemical signals whereas visual

and auditory sensations are based on physical signals. In addition,

the brain perceives flavor by combining the senses of gustation,

smell, sight, warmth, memory, etc. The aim of our research is to

apply the complexity of the gustatory system in order to realize a

pseudo-gustatory display that presents flavors by means of visual

feedback. This paper reports on the prototype system of such a

display that enables us to experience various tastes without

changing their chemical composition through the superimposition

of virtual color. The fundamental thrust of our experiment is to

evaluate the influence of cross-sensory effects by superimposing

virtual color onto actual drinks and recording the responses of

subjects who drink them. On the basis of experimental results, we

concluded that visual feedback sufficiently affects our perception

of flavor to justify the construction of pseudo-gustatory displays.


H.5.2 [INFORMATION INTERFACES AND

PRESENTATION]: User Interfaces –Theory and methods.

General Terms

Experimentation, Human Factors.

Keywords

Gustatory Display, Pseudo-Gustation, Cross-sensory Perception.

1. INTRODUCTION Because it has recently become easy to manipulate visual and

auditory information on a computer, many research projects have

used computer-generated virtual reality to study the input and

output of haptic and olfactory information in order to realize more

realistic applications [1]. Few of these studies, however, have dealt

with gustatory information, and there have been rather few display

systems that present gustatory information. One reason for this is

that gustatory sensation is based on chemical signals while visual

and auditory sensation are based on physical signals, which

introduces difficulties to the presentation of a wide variety of

gustatory information.

Moreover, in the human brain's perception of flavor, the sense of

gustation is combined with the sense of smell, sight, warmth,

memory and so on. Because the gustatory system is so complicated,

the realization of a stable and reliable gustatory display is also

difficult.

Our hypothesis is that the complexity of the gustatory system can

be applied to the realization of a pseudo-gustatory display that

presents the desired flavors by means of a cross-modal effect. In a

cross-modal effect, our perception of a sensation through one sense

is changed due to other stimuli that are simultaneously received

through other senses. The McGurk effect [2] is a well-known

example of a cross-modal effect. The visual input from the

articulatory movements of the lips saying “gaga” was dubbed over

by auditory input saying “baba”. Subjects who were asked to report

what they heard reported that they hear “dada”, which shows that

seeing the movement of the lips can interfere with the process of

phoneme identification.

By using this effect, we may induce people to experience different

flavors when they taste the same chemical substance. For example,

although every can of “Fanta” (produced by the Coca-Cola





otherwise, to republish, to post on servers or to redistribute to lists,


Augmented Human Conference, April 2-3, 2010, Megève, France.

Copyright © 2010 ACM 978-1-60558-825-4/10/04...$10.

Company) contains almost the same chemical substances in almost

the same combination, we appreciate the different flavors of orange,

grape, and so on. It is thus conceivable that the color and scent of a

drink have a crucial impact on our interpretation of flavor, which is

not based entirely on the ingredients of the drink.

Therefore, for the realization of a novel gustatory display system,

we need to establish a method that permits people to experience a

variety of flavors not by changing the chemical substances they

ingest, but by changing only the other sensory information that

accompanies these substances.

In this paper, we first introduce the knowledge from conventional

studies about the influence that other senses have on gustation.

Next, based on this knowledge, we propose a method that changes

the flavor that people experience from a drink by controlling the

color of the drink with a LED. We then report the results of an

experiment that investigates how people experience the flavor of a

drink with color superimposed upon it, by using the proposed

method and the same drink colored with dye. Finally, we evaluate

the proposed method by comparing these results and discuss its

validity and future avenues of research.

2. GUSTATORY SENSATION AND CROSS-

SENSORY EFFECT The fundamental tastes are considered the basis of a visual system

for the presentation of various tastes such as RGB. There are

several theories regarding the number of fundamental tastes. The

four fundamental tastes theory includes sweet, salty, sour and

bitter, while the five fundamental tastes theory adds a fifth taste

sensation, umami, to these four tastes. Moreover, a number of

research reports have indicated that "fundamental tastes do not

exist because gustation is a continuum", or "the acceptor sites of

sweetness or bitterness are not located in one place [3]." It can thus

be said of this crucial idea that there is no definition of fundamental

tastes that is accepted by all researchers [4].

In any case, what is commonly called taste signifies a perceptual

experience that involves the integration of various sensations. We

perceive "taste" not just as the simple sensation of gustatory cells

located on our tongue, but rather, as a complex, integrated

perception of our gustatory, olfactory, visual, and thermal

sensations, as well as our sense of texture and hardness, our

memory of other foods, and so on. When we use the common word

flavor, then, we are in fact referring to what is a quite multi-faceted

sensation.

It is therefore difficult to perceive gustatory sensation in isolation

from any other sensation unless we take special training or have a

remarkably developed capacity. This suggests, however, that it is

possible to change the flavor that people experience from foods by

changing the feedback they receive thorough another modality.

While it is difficult to present various tastes through a change in

chemical substances, it is possible to induce people to experience

various flavors without changing the chemical ingredients, but by

changing only the other sensory information that they experience.

The reason for this is that the olfactory sense, above all other

senses, is most closely related to our perception of taste. This

relationship between gustatory and olfactory sensation is

commonly known, as illustrated by our pinching our nostrils when

we eat food that we find displeasing. We also know that without

olfaction, we hardly experience any taste at all. Moreover, as

Prescott explained, the smells we experience through our nose

stimulates gustatory sensation as much as the tasting by our tongue

[5]. Rozin reported that when people were provided with olfactory

stimulation, they said that the sensation evoked in their mouth was

a gustatory sensation, even if the stimulation itself did not evoke

such a sensation [6]. Furthermore, there is a report that 80% of

what we call "tastes" have their roots in olfactory sensation [7].

On the other hand, it is well known that humans have a robust

tendency to rely upon visual information more than other forms of

sensory information under many conditions. As in the

abovementioned study of gustation, many studies have explored the

effect of visual stimuli on our perception of "palatability". Kazuno

examined whether the color of a jelly functioned as a perceptual cue

for our interpretation of its taste [8]. His survey suggests that the

color of food functions as a perceptual cue more strongly than its

taste and smell.

These studies, then, indicate the possibility of changing the flavor

that people experience with foods by changing the color of the

foods. It is not difficult to quickly change the color of a food, and

the three primary colors, which can be blended to create all colors,

are well-known. Thus, if we can change the experience of taste by

changing the color of a food, this is the key to the creation of a

pseudo-gustation display, because it is easy to present visual

information. Our research, therefore, focuses on a technological

application of the influence of colors on gustatory sensation.

3. PSEUDO-GUSTATORY DISPLAY

BASED ON CROSS-SENSORY EFFECT

EVOKED BY SUPERIMPOSITION OF

VIRTUAL COLOR ONTO ACTUAL

DRINKS In this paper, we propose a method that can induce people to

experience various tastes only through the controlled

superimposition of color upon the same drink by means of a LED.

To do this, we invented the Aji (Aji means taste in Japanese) Bag

and Coloring Device (ABDC) (Fig. 1) as a means of changing the

color of a drink without changing its chemical composition. In the

ABCD method, a small plastic bag filled with a liquid to be drunk

Figure 1: Aji Bag and Coloring Device method (ABCD

method)

is attached to a straw, and then the plastic bag is put into white-

colored water into which color is superimposed by means of a

wireless LED node.

The Particle Display System [9] proposed by Sato et al. is used as a

coloring device. This system can be installed by distributing

physically separated pixels into a large and complicated

environment. This system consists of hundreds of full-color and

wireless LED nodes. The wireless capability allows each node to be

freely moved without the distance limitation involved when wire

cables are utilized. Users are therefore able to design a uniquely

arranged pattern in full-color in the real world by distributing and

controlling the smart nodes. A wireless LED node (Fig. 2), which

as a pixel of Particle Display System was used as a coloring device

by putting it into the water in a waterproof pack or putting it in a

glass with a lid on it.

This LED node consisted of a wireless communication module

(SNODE2, by Ymatic Ltd.), a full-color LED, and a

microcontroller (PIC). The LED node can be connected to input

devices such as acceleration sensors. It works as an autonomous

processing system that changes color by means of interaction with

users when data from the connected input devices are processed

within it.

Because certain liquids are too clear to diffuse the light, we could

not change the color of the liquid to be drunk by direct exposure to

the LED. We therefore used white-colored water and took a small

plastic bag filled with the liquid to be drunk, attached the bag to a

straw, and then put the bag into the white-colored water. This

white-colored water served as a medium for the diffusion of light

and allowed the appearance of the drink to be changed to arbitrary

colors. Water with coffee cream was used as the white-colored

drink in our implementation because it would be safe if our

subjects happened to ingest it. Our prototype system of a pseudo-

gustatory display using the ABCD method is shown in Figure 3.

4. EVALUATING CROSS-SENSORY

PERCEPTION OF SUPERIMPOSED

VIRTUAL COLOR

4.1 Purpose of Experiment To evaluate the proposed method’s effectiveness at inducing

people to experience various flavors, we performed an experiment

to investigate how people experience flavor in a drink with

superimposed color, by comparing the results of the proposed

method with those in which the drink was colored with dye.

We formulated a middle taste beverage whose taste was midway

between the taste of two commercially sold drinks. We asked the

subjects to drink the middle taste beverage and a middle taste

beverage that had been colored. The purpose of this experiment

was to examine how the subjects would interpret different flavors

when the color was changed, and to examine how they would

interpret a scented drink and a colored drink.

4.2 Experiment Procedure

For the beverage used, we obtained the data for its sugar content

and its ratio of acid to sugar from that given in the Dictionary of

Fresh Fruit Juices and Fruit Beverages for orange juice, apple juice

and grape juice [10]. According to a questionnaire regarding these

three kinds of juices which were given to 23 people, there was no

specific difference between the relative gustatory images of orange

juice and apple juice. For this reason, we chose orange juice and

apple juice as the objects of imitation.

For this experiment, we created a drink whose level of sweetness

and sourness was midway between that of orange juice and apple

juice, which we called "the intermediate drink." We made this

intermediate drink from sucrose and citric acid, using a sugar

content of 12% and a citric acid concentration of 0.43%. Orange

juice, apple juice, and the intermediate drink all had approximately

the same degree of sweetness, though the orange juice was the

sourest. We then prepared three kinds of scented drinks and three

kinds of colored drinks based upon the intermediate drink.

After subjects had drunk the intermediate drink and the three kinds

of colored drinks, they were asked to compare each colored drink

with the intermediate drink and to plot their experience of the taste

of the colored juice on plotting paper. The plotting paper had two

axes: one for sweetness and one for sourness. We defined the origin

of the plotting paper as the taste of the intermediate drink. We

prepared three kinds of colored drinks as objects of comparison: an

imitation apple juice drink, an imitation orange juice drink, and a

drink which had an unfamiliar color. To eliminate any effect of the

order in which the juices were drunk, the order was randomly

assigned by the experimenters. In addition, subjects drank water in

the intervals between their drinking of the experimental drinks.

Figure 2: Wireless LED node.

Figure 3: Prototype System of Pseudo-Gustatory

Display.

.

In addition, subjects were asked to interpret the flavor they

experienced for each drink as they were drinking them. To prevent

subjects from knowing too much about the purpose of the

experiment, they were given their survey form after they had drunk

all the drinks. An experiment using our method with colored drinks

was performed with 19 subjects, and an experiment using drinks

colored with dye was performed with another 19 subjects.

4.3 Pseudo-Gustation with Dyes To evaluate the influence of color on the interpretation of flavor,

three kinds of colored drinks were prepared that used the

intermediate drink with colored water as objects for comparison.

We selected three colors: orange, yellow and green (Fig. 4). The

familiar colors orange and yellow were selected when orange juice

and apple juice were the objects of imitation for the intermediate

drink, and green was selected for one of the drinks because of its

unfamiliar color.

We used dyes for the addition of color to the drinks. Because the

taste, sweetness, and sourness of the intermediate drink could

change when it was mixed with dye, a technique was needed which

would change the appearance of the drinks without the addition of

dyes. To this end, we invented the Aji Bag and the Colored Water

method (ABC method) (Fig. 5). In the ABC method, a small plastic

bag filled with the liquid to be drunk is attached to a straw and put

into a plastic bag inside the water that is colored with dye.

4.4 Pseudo-Gustation with Proposed Method Three kinds of colored drinks that used the intermediate drink with

the ABCD method were prepared as objects of comparison. An

experimenter adjusted the color of the LED so it would resemble

the water colored with dyes in 4.3 and then superimposed that color

onto the white-colored water (Fig. 6).

Yellow Orange Green

Lemon (9)

Pineapple (3)

Other (7)

Orange (8)

Peach (3)

Other (8)

Melon (12)

Apple (2)

Other (5)

Yellow Orange Green

Lemon (8)

Pineapple (3)

Other (8)

Orange (15)

Other (4)

Unknown (9)

Sour (3)

Other (8)

Figure 6: Colored Drinks by ABCD Method

(Orange, Yellow, Green)

Figure 5: Aji Bag and Colored Water method (Left:

Outline of method, Center: Aji Bag,

Right: Outside of Drink with Bag in)

.

Figure 4: Colored Drinks by ABC Method

(Orange, Yellow, Green)

Figure 8: Average Scores of Sweetness / Sourness

When Subjects Took Colored Drinks with Proposed

Method.

Orange Yellow Green

Sweetness

Sourness

Figure 7: Average Scores of Sweetness / Sourness

When Subjects Took Colored Drinks with Dyes.

.

Sweetness

Sourness

Orange Yellow Green

Table 2 Flavors of Colored Drink with Proposed

Method Felt by Subjects (the number of the answer is

written in parenthesis)

Table 1: Flavors of the Drink Colored with Dyes Felt by

Subjects

(the number of the answer is written in parenthesis)

4.5 Results

Because the subjects plotted the sweetness and the sourness of the

drinks according to their subjective standards, there was no point in

using the distances between plane coordinates in our evaluation. We

therefore assigned scores based on the direction from the origin to

the points that were plotted by subjects. In particular, we assigned a

value of +1 to a score for sweetness/sourness when the point

plotted by a subject was in a positive direction from the origin on

the sweetness/sourness axis, and we assigned a value of -1 when a

point was plotted in a negative direction on that same axis. We

assigned a value of 0 to a sweetness/sourness score when a point

was plotted directly on the axis of sweetness/sourness.

4.5.1 Colored Drink with Dyes When subjects tasted the drinks colored with dye, they responded

that the colored drink had the same flavor as the intermediate juice

in 5 out of 57 trials (8.8%). The variances of the evaluated scores

for the change in sweetness/sourness were around 0.8 and there

were no significant differences between the groups. (Fig. 7). These

are very large variances. In addition, this particular tendency was

not found in the relative relationships among the three plotted

points.

In response to the question "What was the drink that you just had?,"

15 subjects replied "Orange" after they drank the orange colored

drink and 8 subjects replied "Lemon". After drinking the yellow

colored drink, 3 subjects replied

"Pineapple" and 9 subjects replied "Unknown." After drinking the

green colored drink, 3 subjects replied "Sour". (Table 1) Many

subjects could not offer a definite answer after tasting the green

drink, whose color is unfamiliar in drinks. On the other hand, many

subjects said that they tasted a specific flavor after tasting the

drinks whose color only imitated that of a well known juice. These

results show that visual feedback is able to evoke a pseudo-

gustatory sensation of flavors, even if it is not able to change one’s

sensitivity to fundamental tastes.

4.5.2 Colored Drink with Proposed Method On the other hand, when subjects tasted the colored drink with the

proposed method, they responded that it had the same flavor as the

intermediate juice 4 times out of 57 trials (7.2%). The variances of

the evaluated scores for the change in sweetness/sourness were

around 0.7 and there were no significant differences between the

groups (Fig. 8). These, too, are very large variances. In addition,

this specific tendency was not found in the relative relationships

among the three plotted points.

Concerning the question "What was the drink that you just had?,"

after tasting the orange colored drink, 8 subjects replied "Orange"

and 3 subjects replied "Peach". After tasting the yellow colored

drink, 9 subjects replied "Lemon" and 3 subjects replied

"Pineapple". After tasting the green colored drink, 12 subjects

replied "Melon” and 2 subjects replied "Apple" (Table 2). Many

subjects experienced a specific flavor after tasting the drinks

colored with the LED. These results show that the method of

coloration does not have much impact on the quality of the cross-

sensory effect in the interpretation of flavor.

4.6 Discussion There were a few subjects who replied that the colored drinks tasted

the same as the intermediate juice. This clearly showed that the

taste that people experience can be changed by altering the color of

what they drink. Also, the method of coloration does not have much

of an impact on the quality of the cross-sensory effect in the

interpretation of flavor. These results confirmed that the proposed

method is able to evoke the cross-sensory effect in the

interpretation of flavor, and that it can be used in the realization of

a pseudo-gustatory display system.

While 79% of the subjects experienced an orange flavor from a

drink that was colored orange with dyes, only 42% of the subjects

experienced an orange flavor from a drink colored orange by means

of a LED. In addition, almost all the subjects could not identify the

flavor of the drink that was colored green with dyes, and 63% of

them experienced the flavor of melon from a drink that was colored

green by means of a LED. This result is attributed to the turbidity

of the liquid. Orange juice is normally turbid, and the drink that

was colored orange with dye was also cloudy. The drink that was

colored orange with a LED, however, had a higher degree of

transparency than the orange juice. Similarly, there is no turbid

green drink in the marketplace, and melon soda, (which is popular

in Japan) is a well-known green drink that is clear. Multiple

subjects commented that they thought the drink colored with a LED

was a carbonated drink. The drink colored with the LED tended to

be associated with a carbonated drink that is normally transparent.

It is difficult to mimic a liquid with a low degree of transparency

using white-colored water and a LED. We consider that these

differences in visual appearance led to the differences in the

experimental result.

Because the LED node consumed the battery power too quickly, on

occasion the color of drink changed in front of a subject in the

midst of the experiment. The LED node can sustain the original

color for 40 minutes, but color of the node changes to red when the

battery is low. For this reason, the logistics of battery duration and

drainage need to be improved upon.

5. CONCLUSION In this research, we propose a novel pseudo-gustatory display that

can induce people to experience the same drink as having a variety

of tastes. This is done without changing the drink’s chemical

composition, bur rather, through the superimposition of virtual

color onto the drink; this method enhances our experience of

enjoying food. We evaluated the cross-sensory effect on flavor

interpretation that was evoked by our prototype system that

employs visual feedback with a full-color LED. The results of our

experiments show that we cannot change how people experience

fundamental tastes by means of visual feedback. However, the

results also show that visual feedback can influence the manner in

which people interpret the flavors they experience and that the

proposed system works well as a pseudo-gustatory display.

Because the coloring of drinks with dyes is different from coloring

them with a LED in terms of turbidity, the results confirmed that

the proposed method is not good for imitating certain types of

drinks. However, the results also show that the coloring method

does not have a great impact on the quality of the cross-sensory

effect during flavor interpretation. From these results, we concluded

that the proposed method is able to evoke a cross-sensory effect

during flavor interpretation and can be used to realize a pseudo-

gustatory display system.

Future work in this area will include improvement of the battery supply of the LED coloring device and development of a technique that will enable a pseudo-gustatory display to change tastes interactively.

6. ACKNOWLEDGMENT This work was supported by a Grant-in-Aid for Young Scientists

(A) (21680011).

7. REFERENCES [1] T. Nakamoto and H. P. D. Minh: “Improvement of olfactory

display using solenoid valves,” Proc. IEEE Virtual Reality

2007, pp. 179.186 (2007)

[2] H. McGurk and J. MacDonald, Hearing lips and seeing voices.

Nature, 264, 746.748, 1976.

[3] Damak S, Rong M, Yasumatsu K, Kokrashvili Z, Varadajan

V, Zou S, Jiang P, Ninomiya Y, and Margolskee R. Detection

of sweet and umami taste in the absence of taste receptor t1r3.

Science, No. 301, pp. 850-853, 2003.

[4] Sakai N., Saito S., Mikaku-Kyukaku [Gustation & Olfaction],

Section 2, Koza Kankaku Chikaku no Kagaku [Lecture on

Science of Perception], Asakura Publishing, pp. 72-114.

[5] Prescott J., Johnstone V., Francis J.: Odor–Taste Interactions:

Effects of Attentional Strategies during Exposure, Chemical

Senses 29 (2004), pp.331–340.

[6] Rozin P,: "Taste-smell confusion”and the duality of the

olfactory sense. Perception and Psychophysics, Vol. 31

(1982), pp. 397-401.

[7] Ichikawa K., Kankaku Kakuron [the Particulars about

Sensation]: Aji Mikaku, Kan’nou Kensa Seminar Textbook

[Textbook for Sensory Testing of Gustation Seminar], 1960.

[8] KAZUNO C., WATABE E., FUJITA A., MASUO Y.,

Effects of Color on the Taste Sense of Fruit Flavored Jelly,

Bulletin of Jissen Women's University, Faculty of Human Life

Sciences 13413244 Jissen Women's University 2006.

[9] Munehiko SATO, Yasuhiro SUZUKI, Atsushi HIYAMA,

Tomohiro TANIKAWA, Michitaka HIROSE: “Particle

Display System – A Large Scale Display for Public Space – ”

ICAT 2009, Lyon, France, December, 2009

[10] Yoshimura I., et al., The Dictionary of Fresh Fruit Juices and

Fruit Beverages 1, 2. Asakura Publishing (1997), Nihon Kaju

Kyoukai.

The Reading Glove: Designing Interactions for Object-Based Tangible Storytelling

Joshua Tanenbaum, Karen Tanenbaum, Alissa Antle School of Interactive Arts + Technology

Simon Fraser University 350 - 13450 102 Avenue

Surrey, BC V3T 0A3 Canada joshuat, ktanenba, [email protected]

ABSTRACT In this paper we describe a prototype Tangible User Interface (TUI) for interactive storytelling that explores the semantic properties of tangible interactions using the fictional notion of psychometry as inspiration. We propose an extension of Heidegger’s notions of “ready-to-hand” and “present-at-hand”, which allows them to be applied to the narrative and semantic aspects of an interaction. The Reading Glove allows interactors to extract narrative “memories” from a collection of ten objects using natural grasping and holding behaviors via a wearable interface. These memories are presented in the form of recorded audio narration. We discuss the design process and present some early results from an informal pilot study intended to refine these design techniques for future tangible interactive narratives.

Categories and Subject Descriptors H.5.2 [Information Systems]: User Interfaces – input devices and strategies.

General Terms Design

Keywords Interactive Narrative, Tangible User Interfaces, Wearable Computing, Object Stories

1. INTRODUCTION Abe Sapien picks up a discarded weapon from the wreckage. From across the room, Agent Manning snaps at him “Hey, Fish-Stick! Don’t touch anything!” Abe regards him with bemused tolerance.

“But I need to touch it,” he says, “to see.”

“To see what?”

Abe runs his hand along the blade. “The past, the

future…whatever this object holds.”

-Transcribed and paraphrased from Hellboy [7]

In the 2004 film Hellboy, the character of Abe Sapien possesses the ability to read the “memories” of objects by touching them with his hands. This paranormal ability, known as psychometry or object reading, has numerous occurrences in films, novels, comics, and games. The idea of being able to extract the history and future of everyday objects is a compelling one, with potent narrative implications. Imagine being able to experience the history of a fragment of the Berlin Wall or the spacesuit worn by Neil Armstrong during his first moonwalk. While this notion remains largely relegated to the realm of fiction, tangible user interfaces (TUIs) make it possible to author interactive stories that draw on the idea of psychometry as a metaphorical context for interaction.

In this paper we describe the Reading Glove: a prototype wearable user interface for interacting with Radio Frequency Identification (RFID) tagged objects in a tangible interactive narrative system. The Reading Glove extends the sensory apparatus of the interactor into a realm of meaning and association, simulating the experience of revealing the hidden “memories” of tagged objects by triggering digital events that have been associated with them. An interactor augmented with the Reading Glove need only touch a tagged object in order to experience a narrative tapestry of its past uses.

Previous work combining tangible computing with interactive narrative has emphasized the technical and design challenges of the hardware, while providing relatively little insight into the experience of narrative when mediated by a collection of objects. In this study, we explore the potential of tangible interactions to increase a reader’s awareness of story objects as narratively meaningful. We first consider the relationship between objects and narrative, before discussing the ways in which existing prototype tangible storytelling systems have used objects. The central theoretical construct of our work is the notion of semantically present objects. To explicate this idea we propose a new interpretation of Heidegger’s notions of “present-at-hand” and “ready-to-hand”. We then discuss the design challenges of constructing the Reading Glove system. We close with a discussion of a pilot user study and consider the implications of this work for future tangible storytelling systems.



Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

2. OBJECTS AS STORIES Every object in our lives has a story to tell. The relationship between objects and stories is one with a rich history. People use collections of books, movies, artwork, and other objects to communicate and define their identities and personalities. Kleine et al. write:

Possessions to which there is attachment help narrate a person’s life story; they reflect “my life.” One kind of strong attachment reflects a person’s desirable connections with others. For example, one person’s photographs signify “people who were important to me at one time in my life,” a daughter’s ring portrays her mother’s love, and another person’s piece of furniture reflects his family heritage. Another kind of attachment portrays key aspects of a person’s individuality…In this way, attachments help narrate the development of a person’s life story [11].

People use possessions and personal artifacts to construct personal narratives [10]. Objects also allow people to communicate across social, cultural, and linguistic divides. In sociology there is a notion of boundary objects: artifacts that exist between two different worldviews. Boundary objects are sites of negotiation between opposing perspectives, and allow members of different groups to translate between a familiar view and an alien one [16].

In cultural heritage and museum studies, collections of artifacts are assembled as touchstones for preserving historical knowledge. Personal objects are often used for memory elicitation in the preservation of cultural knowledge. The Australian Migration Heritage Center encourages the aging members of post-war immigrant families to construct personal stories out of their meaningful objects and documents [18]. These “object stories” are part of a broader exploration of movable heritage which they define as “any natural or manufactured object of heritage significance” [18]. By using objects from their lives, participants are able to communicate and preserve personal stories that might otherwise be lost.

Object stories have artistic and entertainment significance as well. Myst [6], one of the most significant early narrative games, revealed its story through meaningful collections of objects and narratively rich environments. Artist and writer Nick Bantock has written several books investigating the narrative implications of collections of esoteric items. In The Museum at Purgatory, Bantock uses unusual objects to conjure an image of a possible afterlife, while in The Egyptian Jukebox he composes assemblages of tantalizing objects as clues to an extended riddle [1-2] In 2008, Rob Walker and Joshua Glenn started the Significant Objects Project. They hypothesized that investing an object with fictional meaning would increase its material value. To test this theory they purchased inexpensive objects from thrift stores, and invited a group of volunteer writers to compose a piece of fiction for each object. Each object and story was then auctioned off online [20]. In this project the objects and stories existed in a dialogue with each other, with fiction arising from objects and imbuing them with shades of meaning.

In each of these cases, objects are more than simply utilitarian items with a functional purpose. Instead, they are gateways into a web of human associations and meanings. The above

examples indicate the potential of object-based stories to evoke deeply personal narrative associations, in effect triggering unconsciously embedded narrative scripts. Newman argues that humans are predisposed to understand things in terms of narrative [15]. He describes this predilection for narrative in terms of a set of species wide archetypal narrative scripts embedded in the human psyche [15].

It is the objects themselves that are central to the creation of rich narrative meanings in these stories. We contend that any narrative system seeking to use object associations to evoke a story needs to foreground the objects as semantically meaningful. Stories told through objects have the potential to engage senses not ordinarily invoked in traditional storytelling experiences. Touch, taste, and smell are currently underutilized for the telling of stories and their potential as additional channels for narrative information remains unexplored.

3. PREVIOUS WORK 3.1 Other Systems There have been several attempts to merge research in interactive narrative with research in tangible interaction. One popular approach has been to distribute narrative “lexia” – modular fragments of a larger story or stories – across a series of tangible objects. Holmquist et al. describe an object-based tangible storytelling system in which readers used a barcode scanner to retrieve video clips in a narrative puzzle [9]. This system only had five short video clips: two associated with specific objects from the story, and three associated with generic tokens. The authors claim that the goal of the interaction was to heighten the user’s sense of involvement in the story, but indicate that the small number of story fragments was a severely limiting factor.

Mazalek et al. created a tangible narrative system called genieBottles in which readers open glass bottles to “release” trapped storytellers (genies) which reveal fragments of narrative information [14]. As with the work of Holmquist et al, the authors stated that the goal of the research was to allow computer stories to bridge the gap from the digital into the physical environment. However; physical interaction was limited to opening and closing the tops of three glass bottles and it is unclear what role, if any these served in the story beyond being containers for the narrators.

Both of these systems reduce their objects to the role of generic event triggers. In some contexts, the use of more generic tokens allows the reader to imagine her own story within the system. Budd and Madej designed PageCraft, a tangible narrative system in which children created animated digital stories using RFID tagged blocks on a physical game board [4, 12]. In their prototype, the tangible objects took a generic form in order to prevent their design from interfering with the creative process of the children using them. The system allowed children and parents to tell their own stories using the physical tokens to “record” the narrative into a digital animated sequence.

Mazalek et al. made a similar design decision when creating the graspable “pawns” for their Tangible Viewpoints project. They write “the abstract manner in which these figures evoke the human form allows them to take on different character roles given different sets of story content” [13]. In the Tangible Viewpoints project, these abstracted pawns were used to access

different character perspectives in a multi-viewpoint story. Each pawn represented a specific character, which would be surrounded by projected segments of associated narrative information. Interactors could access this information through the use of a small “lens-like” tangible object. In both PageCraft and Tangible Viewpoints, the objects themselves were designed to be abstract representations of the system’s digital information.

In other tangible narrative systems, the relationship between the physical interactive items and their associated digital representations is less clear. The RENATI project places the bulk of the physical representation into a large “statue”. The interactor stands in front of the statue while experiencing video clips associated with three different colored RFID tags [5]. Interaction with RENATI involves placing specific tags on an RFID reader (embedded in a clear acrylic hand) when prompted by the system. If the interactor selects the wrong tag, then the system presents a montage of conflicting perspectives on the story. In this case, the interaction is limited to deciding to obey the system or not, and is accomplished by essentially pushing a button.

These prototypes all focus on the mapping of tangible object to system outcome, which tends to emphasize the system function of the object rather than the narrative meaning of the object. In each of these examples, the link between the narrative information and the tangible objects is primarily utilitarian. Whether by design, or by designer oversight, the objects in these prototypes are functional first, aesthetic second, and semantic a distant third (or not at all). It appears that the objects in these prototypes really just function as physical buttons, activating narrative information that is often only loosely connected to the objects themselves.

We contend that one of the unique affordances of an object-based tangible narrative is the ability to emphasize each object as a site for embodied narrative meaning. In each of the examples above, the objects are gateways to meaning, rather than loci of meaning. This is in part due to the limitations of the technology employed in their creation and in part due to a failure to frame the interactions with the objects in a way that emphasized their physicality or their specific role within the narrative.

3.2 Theoretical Background In this paper we propose a new approach to tangible object-based narratives that more closely couples the meaning of the object with the meaning of the story. This involves rendering the tangible objects semantically present. To understand what we mean by this, it is necessary to look at some of the theoretical and philosophical underpinnings of tool use and tangible interaction.

In Where the Action Is, Paul Dourish discusses Heidegger’s notions of ready-to-hand and present-at-hand [8]. Dourish interprets the notion of present-at-hand to refer to situations in which tools “breakdown”, suddenly becoming the focus of our attention. He contrasts this against the notion of ready-to-hand, wherein tools disappear from our perceptions and serve as invisible extensions of ourselves. The canonical example of these ideas is of a carpenter using a hammer. As long as the interaction is proceeding smoothly the hammer is considered

ready-to-hand, seamlessly augmenting the ability of the carpenter to perform the task. However, should the carpenter slip and miss the nail or hit his thumb, the hammer “surfaces” and becomes present-at-hand: an awkward tool which is not performing properly and thus becomes the object of its user’s attention.

To put this in a different context, it is possible to productively map Heidegger’s notions onto Bolter and Grusin’s concepts of transparent immediacy and hypermediation [3]. In their writing, interactions with mediated experiences exist in a state of immediacy, unless something happens to jolt the viewer into an awareness of the mediated nature of the experience, which they term hypermediation. Therefore, immediacy is a form of being ready-to-hand while hypermediation is akin to present-at-hand. This oscillation between two binary levels of awareness is sufficient for understanding functional tools, and for understanding passively mediated interactions, but tangible interactions – particularly those in which the tangible interface is a site of meaning – do not fit cleanly into this model.

We contend that it is necessary to reexamine these notions when attempting to understand the workings of tangible and embodied interfaces. In particular, we think that these notions do not account for the ways in which objects exist at an intersection of potential meanings. The two states described represent functional extremes: either invisibly functioning or presently malfunctioning. We think that there is a third, related mode of interacting with objects that is differentiated along semantic lines instead of functional lines. For the sake of discussion, let us call this notion “present-at-mind”.

This idea of present-at-mind encompasses the ways in which we slip between different associative awarenesses while interacting with an object or tool. We argue that this notion of present-at-mind may be used to describe any situation in which an awareness of the tool as a locus of meaning occurs.

Thus, from a first-person perspective, I can use a hammer to drive nails and as long as I do not slip or hit myself it will remain invisibly ready-to-hand. But what if I become aware of the wear of the hammer’s grip, which in turn puts me in mind of my father, to whom the hammer once belonged? What if this calls my attention to a place where he carved his initials in the handle? The hammer has not broken down as a functional tool, but is no longer an invisible extension of my hand. It has shifted into a state of being present-at-mind, due to a web of associative entanglements in which it exists, rather than to a breakdown of functionality. These entanglements are unique to this particular tool: a different hammer would not evoke the same reaction. In this case the hammer is not just a stand-in for any hammer or an extension of the body, but instead a specific hammer with a specific story to tell.

This awareness does not exist in isolation from the other two Heideggerian conditions. Certain types of breakdown can trigger this awareness: the roughness of the hammer grip wearing against the palm is sufficient to interrupt the flow of the work, but once that interruption occurs, the mind is free to explore a range of awareness and associations surrounding the tool. In this case, we would suggest that one of the roles of breakdown is as a possible gateway into a present-at-mind awareness that extends beyond the moment of breakdown.

In TUI research, one of the canonical properties of tangibles is a meaningful coupling of physical and digital representations [19]. In this case, the binary notions of ready-to-hand and present-at-hand become problematic as the operation of the tangible object as an interface device often involves paying attention specifically to the object. The incorporation of a third semantic vector allows this model to account for the relationship between physical and digital representations in a tangible interface. When the tangible is present-at-mind, it exists in the mind of the reader as a meaningful physical representation; however, as an interface device it remains ready-to-hand as a functional physical stand-in for its associated digital representations.

4. DESIGN PROCESS In order to explore these theoretical ideas within a design space, we developed the Reading Glove. The intent of this system was to create an interactive object-based narrative and an interface that leveraged natural exploratory behaviors. These behaviors support the present-at-mind awareness of the relationship between the objects and their associated narrative information.

4.1 Selecting the Objects We had several high-level design goals for the narrative. One of our central critiques of previous object-based narrative systems is a broad tendency toward using generic objects with few intrinsic narrative associations of their own. To address this, we resolved to write a narrative that existed in both a textual form and within a specific collection of meaningful objects. We set out to write a story that required the objects themselves in order for it to be complete; a story that could not be communicated purely through language. We thus chose to begin with the objects themselves, in order to help ground the writing within what would ultimately be the medium of its communication.

Figure 1. The 12 Narrative Objects

We had some rough criteria for object selection:

• Objects should invite touch. This might mean pleasing material textures or complex objects that could not be apprehended without physical handling.

• Objects should be mechanically interactive. We favored objects with moving parts wherever possible, or objects that opened and closed.

• Objects should fit together as a collection. We looked for objects with similar color schemes, and for objects that could conceivably come from the same place and time.

• Objects should support a wide range of uses, associations, and imaginings. This was a largely subjective criterion, but we wanted objects that could conceivably tell an abundance of stories.

• Objects should appear to have a history to them. For this reason, we looked for older items, with evidence of a lifetime of use.

After several weeks of collecting and assembling, we settled on a set of 12 objects (see Figure 1). These included (top to bottom and left to right) an antique camera, an antique telegraph key, a pair of silver goblets, a top hat, a leather mask, a coffee grinder, antique goggles, a wrought metal rose, a glass vase on a metal stand, a ceramic bottle, an antique scale, and a bookend with a globe on it.

4.2 Authoring the Narrative The full narrative creation process using these objects is a subject for another paper, currently submitted, but here we provide a brief overview. With the objects selected, we explored the different possible narrative uses for each of them and categorized these narrative possibilities into loose themes. Next, we constructed a sequence of events that could be told entirely through object associations within one of these themes. Knowing the events and objects that would comprise the narrative, we sat down and wrote out the background and setting for a central character and narrative situation around which this story would revolve.

For each object’s occurrence in the plot, we wrote a short piece of narration centered on that object. These narrative “lexia”, when strung together, form a single short story, told through objects. Four of these objects had only a single occurrence in the storyline, while six of them occurred twice, for a total of sixteen different narrative lexia. These were all written in a first person past tense narration, and were recorded as sixteen separate audio files. These varied in duration with the shortest running 17 seconds and the longest lasting 38 seconds. The entire narration was 7 minutes long. In order to help the reader isolate each narrative lexia from the others, a distinctive chime was placed at the beginning of each sound file.

We wanted the story to make sense regardless of the order in which participants engaged the objects. We resolved to write a story about a spy who is betrayed by his own agency for political reasons and has to flee for his life. By structuring the plot as a puzzle which is being pieced together by the central character/narrator, we were able to reflect the fragmentary nature of the interaction within the form of the story. Like a puzzle, we designed each narrative lexia with “conceptual

hyperlinks” that served as subtle guides to unraveling the mystery. Thus, when a reader selects the camera, she learns about a roll of film which was hidden inside a coffee grinder. Each lexia also includes a direct reference to its associated object.

4.3 Designing the Technology Psychometry, when it occurs in fiction, often requires that an object be held or touched in order to reveal its “memories”. We wanted to simulate this “hands-on” interaction with our system. As with the narrative design, we established several high level goals for the creation of this system:

• Interactors needed to be free to move around unencumbered by cables or other technology.

• Interactors needed to be able to use both of their hands freely, without the need for additional overt interactive “tools” or other interface devices.

• The interaction needed to encourage participants to physically handle the objects in the narrative, without interfering with the experience of the objects.

A glove-based wearable interface had the potential to address most of these goals, provided it could be made unobtrusive enough to prevent it from inferring with the tactile experience of the objects. After investigating several different sensing technologies, we settled on Radio Frequency Identification (RFID), which would allow us to tag each object individually and discretely. In order to read the information on these tags we designed and built a portable RFID reader which could be embedded in a soft fabric glove. The Reading Glove hardware is comprised of an Arduino Lilypad microcontroller, an Innovations ID-12 RFID reader, and an Xbee Series 2 wireless radio. These components, along with a power supply, are built into a fingerless glove (see Figure 2).

Figure 2. The Reading Glove (large image), and components (top row, left to right): Arduino Lilypad Microcontroller,

Xbee Wireless Radio, Innovations ID12 RFID Reader

The RFID reader is located in a small pocket on the palm of the glove, while the remaining components are secured within a pouch on the back of the hand. The glove has an adjustable

wrist strap and no fingers, which allows it to fit most hands comfortably. Figure 3 shows how these components are connected to each other.

Figure 3. Circuit Diagram for the Reading Glove hardware

The glove wirelessly transmits RFID tag information to a laptop computer running Max MSP, a programming environment which allows for easy prototyping of audio and video interactivity (see Figure 4). The signal is routed to a state switch in Max which triggers the playback of any associated media assigned to each tag.

Figure 4. Reading Glove Program in Max MSP

As the hardware reached completion we needed to make some decisions about the interaction logic of the system. The RFID reader transmitted a tag’s ID every time it detected it, which

meant that an interactor holding an object or turning it over in her hands could generate multiple activations from the same tag. We felt that re-triggering the audio every time the tag was detected would frustrate the interactor, and ultimately discourage physical play with the objects. However, the audio clips required between 17 and 38 seconds to listen to, which meant that a simple delay between activations was not a satisfactory solution. A delay had the potential to make the system feel unresponsive, or non-functional. To solve this problem we chose to “lock out” any given tag after the initial detection event, rendering it inert until a new tag was triggered. This meant that if an interactor wanted to interact with an object multiple times, he would need to switch to a second object, and then return to the first.

For objects with multiple lexia, we were faced with the dilemma of how much authorial control we wanted to exert over the reader’s experience of the different fragments. If we configured the system to play these in chronological order we would be structuring the way in which the story was presented, at least at an intra-object level. We were concerned that doing this would discourage interactors from exploratory interactions with the objects by quickly revealing the limitations of the available options. We made the decision to instead have the associated lexia presented at random, knowing that this was not a perfect solution. The random triggering of the lexia on an object meant that it was much more likely that an interactor would miss a fragment of the story; however it rewarded sustained interaction and exploration.

One final design challenge was discovered during our initial testing of the finished Reading Glove. We had initially set out to make the tags on the objects as unobtrusive as possible, in order to avoid interactions with the tags as “buttons” instead of with the objects themselves. This meant finding creative ways to disguise the tags on the objects without interfering with their ability to be read by the glove. Unfortunately, it quickly became evident that this was going to be impossible. The passive RFID tags work through principles of induction: when the electromagnetic field generated by the reader is intercepted by the antenna on the tag it induces a small current in the tag, which is enough to power a tiny transceiver attached to a tiny piece of memory containing the tag’s identification code. The effective range of this system is ordinarily a few inches, however, when the tag is placed in proximity to a metal object this range drops substantially or disappears entirely, depending on the metal. During the object selection phase we were unaware of this constraint and so 4 of the 10 objects used in the story were comprised of enough metal to render any tags in direct contact with them inoperative. This forced us to abandon our initial goal of disguising the tags entirely.

Instead, for the four problem objects – the metal rose, the antique camera, the silver goblets, and the telegraph key – we located the RFID tags on paper tags, wrapped in brown duct tape to blend in with the color-scheme of the objects. The remaining objects were tagged directly, using the same brown duct tape as a visual indicator of the tag’s presence. One participant remarked that the paper tags made the objects feel like “artifacts from a museum collection”. However, this meant that each tag had a clear visual indicator of its presence on an object.

4.4 Testing the Technology We have not yet performed a formal study of this work, but we have run a set of informal user trials, intended to interrogate some of the above design decisions in preparation of doing a more extensive study.

Participants, selected from the graduate student population, were asked to interact with the Reading Glove story for as long as they liked. Each participant was given the same set of instructions, including information about the functioning of the glove. Each session was videotaped for future review and analysis. A short video of the studies may be viewed online [17]. Two of the seven participants did not speak English as their first language, which we were concerned would problematize their experience of the audio narration, however only one of these participants experienced any difficulty with the story, which we discuss below.

We structured this study to focus on several questions intended to explore the functioning of the objects as semantically meaningful artifacts and the operation of the glove as a natural interface:

1. Could participants successfully piece together and recount the basic story?

2. Could participants map specific objects to specific narrative information and themes?

3. Was there a correlation between time spent engaged with the objects and the comprehension of the narrative?

4. Did the glove-base interaction qualitatively change how interactors approached the objects compared to a non-wearable version?

To test the first three questions, we asked participants to re-tell the story to us, and asked targeted questions about specific objects. To test the fourth question we split the participant group in half randomly. One group interacted with the objects while wearing the glove and the other group was instructed to leave the glove palm-up on the table and scan the objects over it. Due to time limitations, only seven participants were able to complete the study, with four wearing the glove and three scanning objects over a stationary glove. With such a small study population we cannot draw generalizable conclusions; however the anecdotal evidence and critiques from the participants provided valuable insight into certain aspects of the Reading Glove’s design.

One concern with this study was that the population from which the participants were drawn was not wholly representative of the general public. Participants in this study were “tech-savvy” graduate level researchers, many of whom had a direct interest in games, narrative, tangibility, and interaction. In our experience, graduate students interacting with research prototypes tend to get caught up in trying to second-guess the technology. Given that our goals for this study were to critique the design of the prototype this was not necessarily a drawback in this case. The pilot study ended up bearing a close resemblance to a process of expert review. At this stage in the design, we believe that this is a suitable and valid mechanism for critiquing the work.

Our biggest concern with this first prototype was that participants would allow the novelty of the interaction to

distract them from the narrative content. The system as designed is meant to be read rather than played with, and we worried that participants would grow impatient with the length of the audio files, or that the oral nature of the story would prove inaccessible to participants accustomed to visual and textual narratives. We were pleasantly surprised when six of the seven participants took the time to thoroughly “read” the story. Unsurprisingly, there seemed to be a direct correlation between time spent engaged with the prototype and overall narrative comprehension, across both conditions. Table 1 shows the time each participant spent interacting with the system before deciding to stop reading.

Of the seven participants, six were able to successfully recount the central details of the story. Only Participant 4 was unable to reconstruct the sequence of events when asked to. To a certain extent this was likely due to language comprehension issues, as Participant 4 was not entirely confident in her English language abilities. This might also account for her taking less time to interact with the system than the other participants, who all had a greater mastery of the language.

Table 1. Participants' Reading Time

Condition Participant # Time Spent Reading

Wearing Glove

Participant 1 12 min 26 sec




Not Wearing Glove




When asked to describe the role of specific objects in the story or specific object themes, all participants were able to make meaningful connections, regardless of which group they belonged to.

4.4.1.1 Touching & Triggering Interestingly, for at least one of the participants the glove-based interaction interfered with her ability to engage with the objects to the extent that she desired. When asked what she liked about the interaction, Participant 1 said “I like that I could touch things…I love touching things! When I go to a museum I suffer because I can’t touch things.” This excitement over touching the objects interfered at first with her ability to access the narrative information, because she would pick up an object and trigger an event, and then would set the object down and want to play with other objects while listening to the first event. Unfortunately, picking up new objects triggered new events, interrupting the previous lexia before she had finished listening to it. She expressed frustration over the pacing of the system saying “Even though I was able to touch I couldn’t really touch them as I wanted…I can touch, but I have to wait so it was really slow when I had to wait, and I wanted to keep touching things and inspect them, but I wasn’t able to fully finish inspecting them until I was finished hearing [the audio triggered by the initial touch].”

Although she was not happy with the ways in which the glove limited her exploration of the objects, after the first time that she inadvertently triggered an event, she learned to only handle objects when she wanted to learn more about their role in the narrative. This raises a design question: had she been able to interact freely with any object without triggering responses, would she have been able to maintain a coherent mapping of which lexia were related to which object? We discussed how the interaction could be changed to better satisfy her expectations, ultimately concluding that had she been in the second group that was not wearing the glove that she would have had a more enjoyable interaction.

We can apply the terminology introduced above to this situation to gain a better understanding of what was going on. When Participant 1 first picked up an object and received the audio feedback the object was ready-to-hand, or transparently immediate. In this situation, the object operated as the instantiating point for the narrative event. When she set this object down, however, and picked up a new object, the associated narrative event interrupted this immediacy, creating a moment of breakdown where she was forced to grapple with the objects as interactive instruments, rendering them present-at-hand or hypermediated. In order to correct for this unwanted behavior, she was forced to re-engage with the first object, and to stay engaged with it while experiencing the associated lexia. This creates conditions that foster a present-at-mind experience of the object, by encouraging the interactor to linger on details of the object that might otherwise be passed over.

4.4.1.2 Memories & Objects Most of the participants commented that they enjoyed the way in which the story fit together like a puzzle, and many of them commented on the ways in which the objects served as external referents for the story content. Participant 2 remarked that “it was interesting how I could tie specific memories to specific objects.”

Participant 3 said “I really like the fact that in addition to the audio you have these, sort-of touchstones, so like you can go back and listen to that part of the story, you have like…a visual. Just like in real life if you’re remembering something, like if you’re looking around your room and you see…‘I remember getting that statue at GenCon’ or something. So having that visual touchstone as a memory holder I think is a cool thing.” Participant 7 also enjoyed the objects, and also remarked on his general enjoyment of non-linear narrative. In these cases we see evidence of the participants engaging the objects at a semantic level, which we frame as present-at-mind.

This non-linearity presented far fewer problems than we had initially anticipated. Participant 2, for example, never listened to several important pieces of the story. However, when asked to recount the chain of events he was able to fill in the gaps in the story based on his understanding of the lexia on either side of the missed pieces. Aside from Participant 4, Participant 6 had the most difficulty constructing a picture of the narrative. When asked about his experience he said that he was considering each narrative lexia as an isolated “allegory”, and that he felt the overall message was “too subtle” for him to grasp. This may have been in part due to the path that he took through the objects, although further analysis of each

interactor’s “navigation” of the story is needed before this can be fully understood.

In observations of the relationships between the participants and the objects across the two groups, it was clear that the group wearing the glove spent much more time handling the objects, playing with them, and generally engaging with their physicality. The three participants in the second group all exhibited the same interaction pattern. They would pick up an object, scan it over the glove, and then set it back down on the table while they listened to the associated audio clip. We do not have enough data to conclude whether or not this had a measurable impact on the participants’ narrative comprehension, however. This initial study suggests that the glove based interaction may well afford a richer experience of the tangible objects.

5. CONCLUSIONS & FUTURE WORK The initial testing of the Reading Glove indicates that it has the ability to communicate a rich and detailed non-linear narrative experience that is largely grounded in physical artifacts. More time needs to be spent with the video data of the pilot study before any further work can be done on this project, however an obvious next step is a more formal controlled experiment. In particular, it would be interesting to compare a version of the story with the objects against a version using generic tokens.

Our observations of the initial round of interactions have suggested possible quantitative measures which may be used to triangulate both the observations of the interactors and the analysis of the interview data. In particular, we think it will be very interesting to combine coded video data with system logs in order to get a clear picture of how long each participant is interacting with each object, and in what order the participants are encountering the narrative lexia.

We would also like to put this system in the hands of a less tech-savvy population. These initial studies helped us to learn where the system broke down, what things interactors found confusing, and what information should be provided to the participants before beginning. We intend to use the knowledge gleaned from this study to construct a more formal protocol to further investigate this system.

In this paper we have presented a new wearable interface for tangible interactive storytelling, inspired by the paranormal notion of psychometry. Psychometry represents an extension of the human sensory system into an external realm of meaning and association. Our system augments the semantic perceptions of the interactor, revealing a stratum of memory encoded in a collection of compelling objects.

One goal of this system was to author an object-based story where the objects were loci of narrative meaning. In order to understand this, we proposed an extension of the Heideggerian notions of present-at-hand and ready-to-hand, which have been used in HCI to understand the ways in which tools are more or less “visible” at a functional level. We argue that in order to understand tangible interfaces at a narrative level it is necessary to consider a third vector: present-at-mind. In order to explore a semantically present tangible interface in greater detail, we designed the Reading Glove system, which uses a new authoring methodology to couple story events and associations

with physical artifacts. The iterative design process of this system demonstrates an integrative approach to tangible storytelling, and that the initial success of the prototype indicates the value of this method. We believe that for tangible storytelling there needs to be a close relationship between the content of the system and the design of the interaction and tangibility. In order to accomplish this, the design process needs to be able to address both of these concerns in dialogue with each other.

Our initial testing of the Reading Glove, via an informal expert review process, indicates that it is possible to communicate a rich narrative experience along audio, visual, and tactile modalities. The pleasure which our interactors displayed in their interactions with the Reading Glove is encouraging, as was the ease with which they adapted to the wearable interface. We believe that by designing systems to be present-at-mind it is possible to author richly meaningful interactive experiences.

6. ACKNOWLEDGMENTS We would like to thank Aaron Levisohn for his support of this project via the IAT 884 Tangible Computing course. We would also like to thank Greg Corness and Andrew Hawryshkewich for their invaluable assistance in the development of the hardware and software for this project. Finally, we want to acknowledge the excellent work of photographer Beth Tanenbaum, seen in Figure 1.

7. REFERENCES [1] Bantock, N. The Egyptian Jukebox: A Conundrum. Viking Adult, 1993.

[2] Bantock, N. The Museum at Purgatory. Harper Perennial, 2001.

[3] Bolter, J. D. and Grusin, R. Immediacy, Hypermediacy, and Remediaton. The MIT Press, Cambridge, Mass, USA, 1999.

[4] Budd, J., Madej, K., Stephens-Wells, J., de Jong, J. and Mulligan, L. PageCraft: Learning in context: A tangible interactive storytelling platform to support early narrative development for young children. In Proceedings of the IDC'07 (Azlborg, Denmark, June 6-8, 2007). ACM Press, 2007.

[5] Chenzira, A., Chen, Y. and Mazalek, A. RENATI: Recontextualizing narratives for tangible interfaces. In Proceedings of the Tangible and Embedded Interaction (TEI'08) (Bonn, Germany, 2008). ACM Press, 2008.

[6] Cyan Worlds Myst. Broderbund, 1993.

[7] del Toro, G. Hellboy. Sony Pictures Entertainment (SPE), USA, 2004.

[8] Dourish, P. Where the Action Is: The Foundations of Embodied Interaction. MIT Press, Cambridge, 2001.

[9] Holmquist, L. E., Helander, M. and Dixon, S. Every object tells a story: Physical interfaces for digital storytelling. In Proceedings of the NordiCHI2000 (2000), 2000.

[10] Hoskins, J. Biographical Objects: How Things Tell the Stories of People's Lives. Routledge, New York, 1998.

[11] Kleine, S. S., Kleine, R. E. and Allen, C. T. How is a possession "me" or "not me"? Characterizing types and an

antecedent of material possession attachment. Journal of Consumer Research, 22, 3 1995), 327.

[12] Madej, K. Characteristics of Early Narrative Experience: Connecting Print and Digital Game. PhD Thesis, Simon Fraser University, Surrey, BC, 2007.

[13] Mazalek, A., Davenport, G. and Ishii, H. Tangible viewpoints: A physical approach to multimedia stories. In Proceedings of the Multimedia (Juan-les-Pins, France, 2002). ACM Press, 2002.

[14] Mazalek, A., Wood, A. and Ishii, H. genieBottles: An interactive narrative in bottles. In Proceedings of the ACM SIGGRAPH Conference (August 12-17, 2001). ACM Press, 2001.

[15] Newman, K. The case for the narrative brain. In Proceedings of the Second Australasian Conference on Interactive Entertainment (Sydney, Australia, 2005). Creativity & Cognition Studios Press, 2005.

[16] Star, S. L. and Griesemer, J. R. Institutional ecology, 'translations' and boundary objects: Amateurs and professionals

in Berkeley's museum of vertebrate zoology, 1907-39. Social Studies of Science, 19, 3 1989), 387-420.

[17] Tanenbaum, J. Handy Transparency: Unobtrusive Interfaces for Distributed Object-Based Tangible Interactions. http://www.youtube.com/watch?v=xUiBgPgvTNU, accessed on December 6, 2010

[18] Thompson, S. Writing Object Stories. http://www.migrationheritage.nsw.gov.au/objects-through-time/documenting/writing-object-stories/, accessed on December 06, 2009

[19] Ullmer, B. and Ishii, H. Emerging Frameworks for Tangible User Interfaces. J. M. Carrol (ed.), Human-Computer Interaction in the New Millenium, 2001, 579-601.

[20] Walker, R. and Glenn, J. About the Significant Objects Project. http://significantobjects.com/about/, accessed on December 06, 2009

Control of Augmented Reality Information Volume by

Glabellar FaderHiromi Nakamura and Homei Miyashita

Meiji University 1-1-1, Higashimita ,Tama-ku, Kawasaki City, Kanagawa 214-8571

+81- 44 - 934 -7171

ce97409,[email protected]

ABSTRACT In this paper, we propose a device for controlling the volume of

augmented reality information by the glabellar movement. Our

purpose is to avoid increasing the sum of the amount of

information during the perception of "Real Space +Augmented

Reality" by an intuitive and seamless control. For this, we focused

on the movement of the glabella (between the eyebrows) when the

user stare at objects as a trigger of information presentation. The

system detects the movement of the eyebrows by the amount of

the light reflected by a photo-reflector, and controlling information volume or the transparency of objects in augmented

reality space.


B.4.2 [Input/Output Devices]: Channels and controllers; H.3.3

[Information Search and Retrieval]: Information filtering; H.5.1

[Multimedia Information Systems]: Artificial, augmented, and

virtual realities;

General Terms

Human Factors

Keyword

glabellar, photo reflector, information volume

1. INTRODUCTION The term AR refers both to the technique of superimposing virtual

information on real-world visual information and to the resulting

visual state. In the realm of mixed reality (MR) formed by this

combination of real and virtual information, the volume of

information presented is inherently greater than in the ordinary

visual realm, and this has raised concerns that the dramatic

increase in information volume may impede awareness of the real-

world environment.

This has led us to the concept of an information volume fader,

analogous to sound volume faders in music production mixers, for

fading in and out the information volume in the AR presentation.

In this concept, a “reality mixer” enables seamless fade-in and

fade-out of AR information superposed on a real-world

presentation. It may also enable smooth crossfade and linking of

different virtual worlds. The operating interface is hands-free,

enabling control while both hands are being used for other

purposes. Both arbitrary and intentional control of presented

information is desirable, for natural operation.

We have applied this concept to development of the “FAR

Vision” (“Fader of Augmented Reality Vision”) interface for

controlling the volume of the AR information added to

presentations. The fader operation is controlled by glabellar

movement[1]. Experimental trials have been performed to verify

the accuracy of glabellar movement detection and optimize

detection points, as two prerequisites for intuitive, seamless

control of the presentation.

2. RELATED WORK One information-adding system which is currently distributed as

an iPhone application program is the “Sekai Camera”[2].it

enables on-screen perusal of added information, referred to as an

“Air Tag”, relating to the geographical location of the camera the

user is holding. Users can also upload information to the system.

Since its introduction, however, the Air Tags in some locations

have quickly become so voluminous that they impede real-space

observation. Restriction of consumer-generated media (CGM) is

undesirable, and yet its voluminous on-screen addition can make

it difficult to observe not only the real-space view but also the

added information itself, and may ultimately cause an aversion to

using the application.

A method of added-information display discrimination or

information volume control is necessary, to avoid this problem. In

one type of discrimination, only information of interest to the user

is added. The Sekai Camera system performs this type of

discrimination based on the direction in which the iPhone is

pointed. This method is effective in concept, but has been found

lacking in intuitive feel and on-demand response. As it requires

holding and pointing, moreover, it is rather unsuitable for use

during work and other activities involving use of the hands. Head-

mounted displays (HMDs) can provide a continuous “hands-free”

view of added information while worn, but in their present

configuration require temporary removal to turn off the added-

information display.







Augmented Human Conference, April 2–3, 2010, Megève, France. Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00.

The Sekai Camera system provides one type of added-information

volume control named "Air Filter", in the form of a slider that

filters out certain information based on dates, distances, and other

parameters but includes no capability for dynamic information-

depth control.

Various studies have been reported on the control of added-

information “volume” in AR. In one such study, Tenmoku et al.

discussed information volume control and performed on-screen

highlighting in accordance with distance[3]. The methods

described in this study, however, are not applicable to systems,

such as Sekai Camera, in which the added information is

concentrated within a specific domain, and means of turning the

information addition off and on are not discussed.

3. SYSTEM In the system proposed herein, the depth and permeability of the

AR information is controlled in a stepwise manner by the fader,

which is operated by changing the glabellar inclination. In the

absence of any applied inclination, or “brow knitting”, mapping is

performed for added-information exclusion. When the user

“peers”, the accompanying change in glabellar inclination results

in a “paranormal” effect, in which the AR comes into view. The

glabellar movement, unlike that of the iris, can be either

intentional or unconscious. The name of the system, “FAR

Vision”, was accordingly chosen to connote both “far vision” and

“Fader of AR Vision”, and thus convey its mixed-reality (MR)

effect, a mixing of real-world and AR-world imagery.

A photo-reflector (Rohm RPR-220) is used to detect the glabellar

movement and thus exact its control of the fader, and may be

attached to spectacles, an HMD, or other such devices. The skin

surface of the glabella is illuminated by infrared light (IR) and

changes in the reflected IR intensity are monitored with an IR

sensor, for non-contact detection of glabellar movement.

An analogous device has been reported for detection of temporal

movement[4], but is inherently limited to conscious operation by

the nature of temple movement. The glabella-based detection, in

contrast, enables operation based on unconscious emotional

response as well as conscious intention.

Fig. 1. Glabellar fader on spectacles and on HMD.

4. EXPERIMENT For the proposed system it is essential to establish the detection

accuracy and detection target position, to heighten the level of

intuitive, seamless operation of the glabellar fader. We therefore

investigated the “width”, as defined below, between fader output

values at multiple detection points on the glabella of seven

participants.

For each subject, the measurements were performed at ten points

in 2 mm intervals in one direction along a horizontal baseline

extending between the inner tips of the eyebrows, beginning with

the center of the baseline as 0 mm. With the device held stationary

by hand, three measurement sets were performed at each

measurement point for each subject, with each set consisting of

measurement with the eyebrows relaxed, then fully knit, and then

again relaxed. The maximum and minimum output values in each

set were selected, and from among these, the largest and smallest

values were taken as the “largest maximum output” and “smallest

minimum output”, respectively. The difference between these two

output values was defined as the “output width” for that subject.

5. RESULTS AND DISCUSSION Table 1. Experimental results:

Subject A B C D E F G

(a) 14 16 14 16 14 16 18

(b) 49 89 32 32 71 26 80

(c) 18 30 13 26 23 10 29

(d) 2.72 2.97 2.46 1.23 3.09 2.60 2.76

(a) point of maximum output width, in mm from 0-mm centerline;

(b) output width at point (a); (c) output width at centerline;

(d) ratio between (b) and (c)

As shown in Table. 1, for the seven participants in this experiment,

the output width was generally largest in the region between 14

and 18 mm from the centerline. Some differences were found

among the subjects in relation to distance from the centerline and

maximum output width, but for most of the subjects the output

width was uniformly small throughout the region from near the

centerline up to the point just before the 14- to 18-mm region

where it tended to exhibit the maximum values, as shown for

Subject A in Fig. 2.

Fig. 2. Glabellar output widths found for Subject A.

These results may be attributed to a characteristic feature of

glabellar skin movement. When the brow is knit, the central

region tends to protrude and a crease forms near the inner tip of

each eyebrow. The skin movement is far more pronounced in the

regions near the eyebrow tips than in the central region,

presumably resulting in a greater change in reflection intensity

and thus in a greater output width in those regions.

Variation was also found in the number of distinguishable output

steps that could be detected in the output width, but as shown in

Fig. 1, the number was at least 26 and ranged up to 89, and was

generally at least twice as large as the number of detectable steps

in the central region. In terms of information volume percentage,

one step corresponds with 1.15 to 4% of total information volume.

This is considered quite sufficient for control of information

volume as envisioned for the proposed system, even though the

requirements will naturally vary with the image and display

conditions. It should therefore be possible to obtain comparatively

seamless information volume control, using detection points just

to the side of the eyebrow tip, 14 to 18 mm from the glabellar

centerline.

6. CURRENT AND FUTURE OUTLOOK Current plans call for experimental evaluations directed toward

improving operating ease and mapping of the subjective degree of

eyebrow knitting of individuals to display content and enabling

more intuitive operation. Development efforts are also envisioned

for other interfaces and systems centering on information volume

control.

7. REFERENCES [1] Hiromi Nakamura, Homei Miyashita. A Glabellar Interface

for Presentation of Augmented Reality, Proceedings of

Entertainment Computing 2009 (in Japanese), pp.187-188

2009.

[2] Sekai Camera: http://support.sekaicamera.com/en

[3] Ryuhei Tenmoku, Masayuki Kanbara, N. Yokoya. Intuitive

annotation of user-viewed objects for wearable AR systems,

Proceedings of IEEE, International Symposium on Wearable

Computers’05, pp. 200-201, 2005.

[4] Kazuhiro Taniguchi, Atsushi Nishikawa, Seiichiro

Kawanishi, Fumio Miyazaki "KOMEKAMI switch:A novel

wearable input device using movement of temple, "Journal of

Robotics and Mechatronics, Vol.20, No.2 ，pp. 260 -272 ,

2008.

Towards Mobile/Wearable Device Electrosmog Reduction through Careful Network Selection

Jean-Marc Seigneur

University of Geneva 7, route de Drize

Carouge, Switzerland

[email protected]

Xavier Titi

University of Geneva 7, route de Drize

Carouge, Switzerland

[email protected]

Tewfiq El Maliki HES-SO Geneva 4, rue de la Prairie

Genève, Switzerland

[email protected]

ABSTRACT

There is some concern regarding the effect of smart phones and

other wearable devices using wireless communication and worn

by the users very closely to their body. In this paper, we propose a

new network switching selection model and its algorithms that

minimize the non-ionizing radiation of these devices during use.

We validate the model and its algorithms with a proof-of-concept

implementation on the Android platform.


C.1.2 [Network Architecture and Design]: Wireless

Communication. H.1.2 [User/Machine Systems]: Human

Factors. K.4.1 [Public Policy Issues]: Human Safety. K.6.2

[Installation Management]: Performance and usage

measurement.

General Terms

Algorithms, Management, Measurement, Performance, Human

Factors

Keywords

electrosmog, wireless hand-over.

1. INTRODUCTION There are more and more wireless products that are carried out by

users from broadly used mobile phones to more specific devices

such as cardio belts and watches to monitor heart rates whilst

practicing sport. These devices use different wireless technologies

to communicate between each other and their Internet remote

servers, for example, to store the sport session data. Those devices

bring interesting aspects for the users. However, there is some

raising concern about the effect of the non-ionizing

electromagnetic radiations of the wireless devices on the user’s

health. Those electromagnetic radiation exposures are generally

coined “electrosmog”. Non-ionizing radiations mean that they do

not carry enough energy per quantum to remove an electron from

an atom or molecule.

Section 2 presents the related work. Section 3 surveys the main

different wireless technologies used by mobile devices from the

point of view of their electromagnetic radiated emission. In

Section 4, we propose our networks switching selection model

and algorithms to minimize exposures to electrosmog and we

validate them via discussing a proof-of-concept implementation.

Section 5 concludes the paper.

2. RELATED WORK The potential harmful effects of electrosmog have been

researched in many occasions and there are still doubts regarding

these effects beyond the transformation of electromagnetic energy

in thermal energy in tissues. However, even a sceptical recent

survey [1] underlines that the precautionary principle, meaning

that efforts for minimizing exposure, should be followed,

especially for teenagers.

One of the first means to reduce exposure, besides stopping using

it or using it only when needed and in good conditions (close to

the base station...), is to use a mobile phone with low Specific

Absorption Rate (SAR). However, as the SAR indicated on the

mobile phones is measured at their full power strength, some

phones with higher SAR may better manage their power strength

and end up emitting less than phones with lower SAR that emit

more often at full power even if it is not needed. In the USA, the

FCC has set a SAR limit of 1.6 W/kg, averaged over a volume of

1 gram of tissue in the head and in any 6 minute period. In

Europe, the ICNIRP limit is 2 W/kg, averaged over a volume of

10 grams of tissue in the head and in any 6 minute period.

Interestingly, the iPhone user manual underlines that it may give a

higher SAR than the regulation if used in direct contact with the

body: “for body-worn operation, iPhone’s SAR measurement may

exceed the FCC exposure guidelines if positioned less than 15

mm (5/8 inch) from the body” [2].

As it is less common to stay very close to a mobile phone mast for

a long time, working on reducing the phone emission that is

carried all day long very close to the human body should have

more effect for most users. However, Crainic et al. [3] have

investigated parallel cooperative meta-heuristics to reduce

exposure to electromagnetic fields generated by mobile phones

antennas at planning time whilst still meeting coverage and

service quality constraints. It is different than our approach that








Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00.

focuses on reducing electromagnetic fields generated by the

users’ devices at use time.

Algorithms and systems to seamlessly switch between the

available networks to remain always best connected are being

researched [4-7]. They mainly focus on quality of service (QoS)

for decision-making. In this paper, we add another dimension to

the network selection issue and underline that electrosmog

exposure should also be taken into account.

3. WIRELESS TECHNOLOGIES SURVEY There are different wireless technologies that can be used for

communication by the devices carried out by the users. In this

section, we survey the main ones including those that are less

well-known by the general public but are still important regarding

the increased number of wearable communicating sport/health

devices products.

3.1 WI-FI The most widespread wireless network technology on portable

computer devices is Wi-Fi. Wi-Fi is unlicensed. There are

different types of Wi-Fi networks, for example, Wi-Fi IEEE

802.11b and 802.11g use the 2.4 GHz band and 802.11a uses the

5 GHz band. 5 GHz signals are absorbed more readily by solid

objects in their path due to their smaller wavelength and for the

same power they propagate less far than 2.4 GHz signals. The

average Wi-Fi range is between 30m and 100m. Mobile computer

devices that integrate Wi-Fi are not made to seamlessly switch

between nearby available Wi-Fi networks. There are a number of

security issues with Wi-Fi: WEP encryption has been broken for a

while; WPA encryption creates separate channels for each user

but most public Wi-Fi access points only ask for authentication

and does not encrypt afterwards... Although privacy is beyond the

scope of the paper, it is important to note that for privacy’s sake

and the sensitive aspects of some communicated information such

as heart rate profile, secure network should be considered.

WiMAX is different than Wi-Fi, is more dedicated to long range

systems covering kilometers and is rarely integrated in mobile

devices for now. The peak power of Wi-Fi 802.11b/g is 100 mW

and 802.11a is 1 W. Kang and Gandhi [8] found that SAR near-

field exposure to a Wi-Fi 100 mW patch antenna radiated from a

laptop computer placed 10 mm below planar phantom is 2.82

W/kg 1g SAR and 1.61 W/kg 10g SAR at 2.45 GHz and at 5.25

GHz is 1.64 W/kg 1g SAR and 0.53 W/kg 10g SAR. A French

organization study found that all Wi-Fi 2.4 GHz cards studied are

under 2W/kg 10g SAR limit from 0.017 to 0.192 W/kg [9] at less

than 12.5 cm.

3.2 GSM, UMTS/GPRS, 3G… Regarding mobile phones, although more and more smartphones

integrate Wi-Fi, their most widespread wireless network

technology remains the one provided by their telecom operator:

GSM (around 900 MHz or 1800 MHz; maximum distance to cell

tower from 1 km to 20 km [10]), GPRS, EDGE, UMTS 3G

(around 2GHz; from 144 kB/s in moving vehicles to more than 2

MB/s for stationary users [11]; maximum distance to cell tower

from 0.5 km to 5 km [10])… The telecom operators have paid

licensed to be able to use them. There is different encryption

between each user using a cell. Mobile phones switch seamlessly

between GSM/3G cells and more and more mobile phones

integrate now Wi-Fi. However, only a few phones and telecom

providers allow the users to start a phone call with Wi-Fi and

switch seamlessly to GSM/3G when the user leaves the Wi-Fi

zone. It is also difficult for users to switch to other networks than

the ones provided by their telecom provider. From an energy

consumption point of view, according to Balasubramanian et al.

[12], for 10 kB data size, GSM consumes around 3.25 times less

than 3G and 1.5 times less than Wi-Fi (if the cost of scan and

transfer is taken into account). However, for 500 kB+ data size,

GSM consumes as much as 3G and twice more than Wi-Fi (even

if the cost of scan and transfer is taken into account). 3G

consumes around 1.9 times more than Wi-Fi (if the cost of scan

and transfer is taken into account). It is worth noting that the

energy spent for GSM/3G networks can vary a lot depending on

the distance between the user and the network antenna as the

distance can be quite long compared to Wi-Fi: for example for

GSM 900, a phone power output may be reduced by 1000 times if

it is close to the base station with good signal. The peak handset

power limit for GSM 900 is 2 Watts and 1 Watt for GSM 1800.

The peak power of 3G UMTS is 125 mW at 1.9 GHz. It has been

found that in rural area, the highest power level for GSM was

used about 50% of the time, while the lowest power was used

only 3% of the time. The corresponding numbers for the city area

were approximately 25% and 22%. The results showed that high

mobile phone output power is more frequent in rural areas

whereas the other factors (length of call, moving or stationary,

indoor or outdoor) were of less importance [13]. Factors that may

influence the power control are the distance between hand set and

base station and attenuation of the signal, the length of calls (the

phone transmits on the maximum allowed power level at the onset

of each call), and change of connecting base station, ‘‘hand-over’’

(the phone will temporarily increase output power when

connecting to a new base station). Hand-overs will be made when

the mobile phone is moved from one cell covered by one base

station to another cell, but may also occur on demand from the

base station owing to a high load on the network at busy hours.

The iPhone 3G user guide indicates that 10g SAR is 0.235 W/kg

for GSM 900, 0.780 W/kg for GSM 1800, 0.878 for UMTS 2100;

and 0.371 for Wi-Fi. It was worse for the iPhone with 1.388 W/kg

1g SAR for UMTS 1900 and 0.779 W/kg 1g SAR for Wi-Fi.

Combined with the fact that its user guide mentioned that it might

be higher at closer than 1.5 cm and that both 3G and Wi-Fi may

be enabled at the same time, it means that the iPhone can have a

1g SAR much higher than the 1.6 W/kg limit: above 1.388 +

0.779 = 2.167 W/kg.

3.3 Bluetooth, Zigbee, ANT… Bluetooth, based on IEEE 802.15.1, also uses the 2.4 GHz band

with a data rate of around 1 MB/s. A large number of mobile

phones integrates Bluetooth. Discovery and association of

Bluetooth devices are not designed to be seamless. Bluetooth

2.1+ pairing uses a form of public key cryptography and is less

prone to Wi-Fi types of attacks [14]. The peak power of Bluetooth

ranges from 1 mW to 2.5 mW. The normal range of Bluetooth is

around 10m, which is lower than Wi-Fi. With lower distances,

Bluetooth has lower consumption than Wi-Fi: around 3 to 5 times

more according to Su Min et al. [15]. However, for resource

constrained wearable devices such as heart belts and cardio/GPS

watches, Bluetooth is still consuming too much energy. It is the

reason that a new Bluetooth specification called “Bluetooth low

energy” has been released recently and would consume between

1% to 50% of normal Bluetooth depending on the application

[16]. “Bluetooth low energy” is more seamless: it can support

connection setup and data transfer as low as 6ms. “Bluetooth low

energy” can use AES-128 encryption. As Bluetooth consumed too

much energy for resource constrained devices, other networking

technologies have been used. Zigbee based on IEEE 802.15.4-

2003 runs at 868 MHz in Europe, 915 MHz in the USA and

Australia, and 2.4 GHz in most other places. Zigbee consumes

around 10 to 14 times less than Wi-Fi according to Su Min et al.

[15]. The downsize of Zigbee is that it has a much lower data rate

from 20 kB/s to 250 kB/s. Another main network technology that

was used in many sport/health monitoring devices is ANT, which

is proprietary. ANT and Zigbee can send data in less than 10ms.

However, ANT can send bigger files faster as its transmission is 1

MB/s, which means lower energy to submit large files than

Zigbee [17]. Fyfe reports even lower energy consumption for

ANT compared to Zigbee for small data size (8 bytes) [18].

Anyway, Zigbee and ANT are not available on mobile phones.

“Bluetooth low energy” seems a good candidate to replace ANT

and Zigbee due to its openness and number of products already

using Bluetooth. Martínez-Búrdalo et al. [19] have found that

Bluetooth generates very low 10g SAR of around 0.037 W/kg,

unfortunately none of these networking technologies are

considered as main connecting technologies maybe due to their

limited range.

3.4 Comparison Summary Table 1. Networking Technologies Comparison

*: The energy consumption comparison is roughly derived from

the results and information given in the references cited in this

paper.

#: L: Low; M: Medium; H: High / ~: estimated based on [19]

n/a: non-available

4. CAREFUL NETWORK SELECTION Our goal is to minimise the exposure of the mobile user to

electromagnetic radiations while still allowing the users to benefit

from the communication of their devices with enough quality of

service.

Based on the networking technologies that we have surveyed in

the previous sections, the exposure can be significantly reduced

by choosing among the different networking technologies

available. On recent mobile phones, there are 4 main choices:

GSM, 2G, 3G and Wi-Fi. However, it may be cumbersome for the

user to learn which networking technology to choose depending

on what they are doing with their phone and to constantly

manually switch from one network to another. Fortunately, recent

mobile phone operating systems such as Android provide an

Application Programming Interface (API) that allows third-party

applications to switch from one networking technology to

another. In this section, we first describe our network selection

model and its algorithms and then we explain how we have

validated our approach with a proof-of-concept application

implemented on an Android phone.

4.1 Network Switching Selection Model and

Algorithms We define Ni a network i among a set of n available networks in

[1; n].

Each network Ni is associated with a 10g SAR in W/kg defined as

SARi for the specific device carried out by the user. The related

work surveyed above has underlined that different mobile devices

have different SARs.

In this case, the optimal policy to minimise the electromagnetic

radiation from the mobile device is to select the network with the

lower SAR. The algorithm in pseudo-code is:

Nchosen = N1

for (int i=1; i<=n; i++)

if (SARi < SARchosen) Nchosen = Ni

That type of policy works well for voice call activities since the

duration time depends on the length of the conversation.

However, for other activities that can be carried out faster with

faster networking technologies, such data exchange (file

download, health data monitoring transmission...), the data rate

of the network should be taken into account. We define the data

rate of Ni as DRi in MB/s.

In this case of data exchange activities, the optimal policy to

minimize electromagnetic radiation is different. The file size of

the data to be exchanged is the same for all networks. If we define

the time of exposure with Ni as Ti and FS the file size of the data,

we have:

Ti = FS / DRi

If we define the exposure during that data exchange with Ni as Ei,

for the optimal policy that would choose Ni, we want Ei <= Ej for

all j different than i in [1; n].

Ei = SARi * Ti

Ei <= Ej

SARi * Ti <= SARj * Tj

SARi * (FS / DRi) <= SARj * (FS / DRj)

SARi <= (DRi / DRj) * SARj

The corresponding pseudo-code is then:

Nchosen = N1


if (SARi < ((DRi / DRchosen) * SARchosen)) Nchosen = Ni

Network

Technology

Freq

uen

cy (GH

z)

En

ergy C

on

sum

ptio

n*

(Wi-F

i reference)

10

g S

AR

3G

iPh

on

e

(W/k

g)

Sea

mless M

ob

ility#

Da

ta R

ate (M

B/s)

Ma

xim

um

Dista

nce (m

)

Op

enn

ess#

Secu

rity#

Wi-Fi [2.4

; 5] 1 0.371 L

[11;

54]

[30;

150] H L

GSM [0.9

;

1.8]

[0.67;

2]

[0.235;

0.78] H 0.0096

[1000;

20000] L M

3G 2 [1.8;

2] 0.878 H

[0.144;

2]

[500;

5000] L M

Bluetooth 2.4 0.25 ~0.037 L [1; 3] 10 H M

Zigbee [0.8

;

2.4]

0.085 n/a M [0.02;

0.25] 100 M M

ANT 2.4 0.017 n/a M 1 30 L M

Bluetooth

Low Energy 2.4

[0.01;

0.125] n/a M 1 10 H M

The related work surveyed in the previous sections has underlined

that some networking technologies may have much lower SARs

than the maximum SAR measured and reported in their

specification depending on the context of use. For example, for

GSM 900, a phone power output may be reduced by 1000 times if

it is close to the base station with good signal. Let us define this

attenuation depending on the context and each networking

technology as Ai for Ni. We assume that the activity happens in

the same context from its start to its end. For example, the user is

not moving during the activity, hence the distance between the

mobile device and the base station does not change.

In this case, the pseudo-code for “voice call” activity optimal

policy becomes:

Nchosen = N1


if (SARi < ((Achosen/Ai) * SARchosen)) Nchosen = Ni

The pseudo-code for “data exchange” activity optimal policy

becomes:

Nchosen = N1


if (SARi < ((DRi / DRchosen) * (Achosen/Ai) * SARchosen)) Nchosen = Ni

If seamless hand-over between networking technologies was

possible, i.e., the activity would not be stopped when the network

becomes unavailable and the next network must be used, and the

user would move during the activity, our selection algorithm

would be again carried out at time of new hand-over.

4.2 Proof-of-Concept Validation of the Model

and its Algorithms In order to validate our previous algorithms, we have investigated

how to implement them in an Android Google phone application.

As the SARs are different for each phones, our application must

be first configured with the SARs phone’s values that phones

vendors must provide by law (at least in the US and in Europe)

with the specification of their phones. It is done manually for the

proof-of-concept but it could be automated by fetching that

information on remote servers publishing phones SARs values

because the phones’ model can be programmatically obtained via

the Android API and used to fetch the right values.

Then, our application asks the user which activity is going to be

carried out: “voice call” or “data exchange”. Depending on that

next activity, the right policy is chosen, i.e., “voice call” policy or

“data exchange” policy. The next user activity may be

automatically inferred and when a new activity is detected the

networking selection could be automatically triggered but this is

beyond the scope of the proof-of-concept prototype.

There are four main networking technologies that can be chosen

on current smart phones: 3G, 2G, GSM and Wi-Fi. The Android

API facilitates getting information about the different networks

available in proximity:

http://developer.android.com/intl/fr/reference/android/net/ConnectivityManager.html#getAllNetworkInfo()

http://developer.android.com/intl/fr/reference/android/net/wifi/WifiManager.html#getScanResults()

Few phones and telecom providers allow the users to make phone

calls directly through Wi-Fi. So we assume that Wi-Fi is not

possible in our proof-of-concept prototype for “voice call”

activity. In addition, the Android API does not have an API to

force switching to one or another of the available GSM, 2G or 3G

networks. So in the “voice call” activity case, we can only display

a message to the users explaining that they should manually select

a GSM network or if not possible set the “Use Only 2G

Networks” settings.

Concerning the “data exchange” activity, we use the data rate in

MB/s returned by the following method:

http://developer.android.com/intl/fr/reference/android/net/wifi/WifiInfo.html#getLinkSpeed()

As no Android API returns the data rate of GSM, 2G or 3G

networks, we use a 3G fixed data rate of 0.5 MB/s and of 0.0096

MB/s for GSM. Future versions of the API may provide more

information about the type of network found and we could use

finer-grained data rates from this information, for example, it may

return a dynamically inferred 2G data rate.

If the outcome of the algorithm suggests using one of the Wi-Fi

networks, the Android API facilitates programmatically switching

to this Wi-Fi network thanks to the following methods:

http://developer.android.com/intl/fr/reference/android/net/wifi/WifiManager.html#enableNetwork(int, boolean)

If the suggestion is to use GSM, 2G or 3G, as there is no API to

switch to these networks, a message can be displayed to the user

who can manually select a GSM network, set the “Use Only 2G

Networks” settings or select the potential 3G network.

Concerning the attenuation factor depending on the context, we

mainly rely on the distance to the network antenna. Wi-Fi does

not change its transmitting power and we assume it can transmit

as long as it is listed in the Wi-Fi networks discovered by the

Android WifiManager. We could refine that assumption by only

selecting Wi-Fi networks with a Received Signal Strength

Indication (RSSI) above a threshold by using the following

method:

http://developer.android.com/intl/fr/reference/android/net/wifi/WifiInfo.html#getRssi()

As the distances to the GSM, 2G and 3G antennas have a

significant impact on the “real” SAR, we have used the signal

strength returned by the Android API:

http://developer.android.com/reference/android/telephony/NeighboringCellInfo.html#getRssi()

If the signal strength returned is good, which corresponds to the

returned value 31, i.e., -51 dBm or greater, we use a 0.01

attenuation factor for the network SAR. Another approach may be

to use the GPS locations of the user as given by the mobile phone

GPS and of the antenna as given by third-parties information.

Further work would also be required to fine tune the SAR

diminution depending on the distance and the networking

technology.

5. CONCLUSION It is still not sure that the average level of electrosmog

experienced by the users is harmful. However, as the

precautionary principle is to minimize exposure and the major

source remains the electromagnetic radiations emitted by the

user’s mobile devices, it is worthwhile trying to minimize their

radiations.

With new smartphones, it is possible to switch to one or another

of the networking technologies available. We have used this

possibility to select the networking technologies depending on

their characteristics and the context (user’s next activity, distance

between the user and the base stations...) in order to minimize

these radiations.

There are still a few functionalities to fine-tune and automate but

such a network selection switching approach should be considered

by the manufacturers of mobile communicating devices worn

close to the human body if they want to say that they cared about

the precautionary principle.

6. ACKNOWLEDGMENTS This work is sponsored by the European Union, which funds the

FP7-ICT-2007-2-224024 PERIMETER project [5].

7. REFERENCES [1] J. Vanderstraeten, "Champs et ondes GSM et santé:

revue actualisée de la littérature ". Bruxelles, BELGIQUE:

Association des médecins anciens étudiants de l'Université libre

de Bruxelles, 2009.

[2] Apple. (2009, accessed on 04/12/2009). Guide

d'informations iPhone 3GS. Available:

http://manuals.info.apple.com/fr_FR/iPhone_3GS_informations_i

mportantes_F.pdf

[3] T. G. Crainic, B. Di Chiara, M. Nonato and L.,

"Tackling electrosmog in completely configured 3G networks by

parallel cooperative meta-heuristics," Wireless Communications,

IEEE, vol. 13, pp. 34-41, 2006.

[4] Q. Song and J. Abbas, "Network selection in an

integrated wireless LAN and UMTS environment using

mathematical modeling and computing techniques," Wireless

Communications, IEEE, vol. 12, pp. 42-48, 2005.

[5] PERIMETER. (accessed on 04/12/2009). Available:

http://www.ict-perimeter.eu

[6] H. Liu, C. Maciocco, V. Kesavan and A. L. Y. Low,

"IEEE 802.21 Assisted Seamless and Energy Efficient Hand-

overs in Mixed Networks," ed, 2009, pp. 27-42.

[7] M. Kassar, B. Kervella and G. Pujolle, "An overview of

vertical hand-over decision strategies in heterogeneous wireless

networks," Computer Communications, vol. 31, pp. 2607-2620,

2008.

[8] K. Gang and O. P. Gandhi, "Effect of dielectric

properties on the peak 1-and 10-g SAR for 802.11 a/b/g

frequencies 2.45 and 5.15 to 5.85 GHz," Electromagnetic

Compatibility, IEEE Transactions on, vol. 46, pp. 268-274, 2004.

[9] Supélec, "Etude « RLAN et Champs

électromagnétiques » : synthèse des études conduites par

Supélec," 2006.

[10] G. Maile, "Impact of UMTS," in Conference on Mobile

Networks & the Environment, 2000.

[11] Wikipedia. (accessed on 04/12/2009). 3G. Available:

http://en.wikipedia.org/wiki/3G

[12] N. Balasubramanian, A. Balasubramanian and

A.Venkataramani, "Energy consumption in mobile phones: a

measurement study and implications for network applications,"

presented at the Proceedings of the 9th ACM SIGCOMM

conference on Internet measurement conference, Chicago,

Illinois, USA, 2009.

[13] L. Hillert, A. Ahlbom, D. Neasham, M. Feychting, L.

Jarup, R. Navin and P. Elliott, "Call-related factors influencing

output power from mobile phones," J Expos Sci Environ

Epidemiol, vol. 16, pp. 507-514, 2006.

[14] T. G. Xydis and S. Blake-Wilson, "Security

Comparison: BluetoothTM Communications vs. 802.11," ed:

Bluetooth Security Experts Group, 2002.

[15] S. M. Kim, J. W. Chong, B. H. Jung, M. S. Kang and D.

K. Sung, "Energy-Aware Communication Module Selection

through ZigBee Paging for Ubiquitous Wearable Computers with

Multiple Radio Interfaces," in Wireless Pervasive Computing 2nd

International Symposium, 2007.

[16] Wikipedia. (accessed on 04/12/2009). Bluetooth low

energy. Available:

http://en.wikipedia.org/wiki/Bluetooth_low_energy

[17] R. Morris. (2008, accessed on 04/12/2009). Comparing

ANT and ZigBee. Available:

http://www.embedded.com/products/softwaretools/206900253

[18] K. Fyfe. (2009, accessed on 04/12/2009). Low-power

wireless technologies for medical applications. Available:

http://acamp.ca/alberta-micro-nano/.../Ken-Fyfe-HealthMedical-

Dec09.pdf

[19] M. Martínez-Búrdalo, A. Martín, A. Sanchis and R.

Villar, "FDTD assessment of human exposure to electromagnetic

fields from WiFi and bluetooth devices in some operating

situations," Bioelectromagnetics, vol. 30, pp. 142-151, 2009.

Bouncing Star Project: Design and Development of

Augmented Sports Application

Using a Ball Including Electronic and Wireless Modules

Osamu Izuta Graduate School of Electro-

Communications University of

Electro-Communications JP

[email protected]

Toshiki Sato Graduate School of

Information Systems University of

Electro-Communications

JP

[email protected]

Sachiko Kodama University of


[email protected]

Hideki Koike University of


[email protected]

ABSTRACT In our project, we created a new ball, “Bouncing Star” (Hane-Boshi in Japanese), comprised of electronic devices. We also created augmented sports system using Bouncing Star and a computer program to support an interface between the digital and the physical world. This program is able to recognize the ball’s state of motion (static, rolled, thrown, bound, etc.) by analyzing data received through a wireless module. The program also tracks the ball's position through image recognition techniques. On this system, we developed augmented sports applications which integrate real time dynamic computer graphics and responsive sounds which are synchronized with the ball's characteristics of motion. Our project's goal is to establish a new dynamic form of entertainment which can be realized through the combination of the ball and digital technologies.


H.5.3 [Group and Organization Interfaces] Computer-supported cooperative work.

General Terms

Algorithms, Design, Experimentation, Human Factors

Keywords ball interface, wireless module, sensing technology, image recognition, augmented sports, interactive surface, computer– supported cooperative play

1. INTRODUCTION In Ishii’s “PingPongPlus”[1], the pioneering research of Augmented Sports, the use of 8 microphones beneath a table to sense the location of a ping pong ball created a novel athletics driven, tangible, computer-augmented interface that incorporated

sensing, sound, and projection technology. After Ishii’s research, several athletic-tangible interfaces which use a ball have been created. Some years later, Moeller and Agamanolis devised an experiment to play ‘sports over a distance’ through a life-size video conference screen using an unmodified soccer ball [2]. Also Rudorf and Brunnett developed a table tennis application which could achieve real time tracking of the huge speed movement of a ball [3]. In 2006, Iyoda developed a VR application for pitching, in which a player could pitch a ball, which included a wireless acceleration sensor, into a “screen shaped” split curtain equipped with IR sensors [4]. That same year, Sugano presented an augmented sports game named “Shootball” which used a ball equipped with a shock sensor and wireless module to do experiments for a novel, goal-based, sports game. Their system used multiple cameras, and multiple projectors in the field [5].

In a more artistic field, Kuwakubo created a ball device, “HeavenSeed,” which by means of an accelerator sensor and wireless module, generated sounds when it was thrown [6]. Torroja also created a ball shaped sound instrument [7] to generate music when it is thrown or rolled using the similar techniques as Kuwakubo.

As stated above, there have been many projects which have developed new ball sensing interfaces, yet there has not been a ball interface which activates the full potential of the ball itself. During the playing of a ball based sport, the ball itself has a variety of states such as rolling, bouncing, being thrown, etc. as well as a rapid change of position. If we identify such states and connect these inputs directly to graphical and acoustic output, we can create new dynamic ball based sports in which player can dynamically move their bodies, and the audience can enjoy a complete synthesis of the scene (players, the ball, and a dynamic interactive play field).

Goal of our “Bouncing Star” Project Our goal is to develop a new system for the creation of new ball-based sports and to realize augmented sports applications using the system. To this end, we specify three necessary functions for our ball. The three functions are:

Precise detection of the ball's bounce

Precise tracking of the ball's position

Durability of the ball against shock

We named our ball "Bouncing Star" that is aimed to have these 3

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.Augmented Human Conference, April 2–3, 2010, Megève, France.

Copyright © 2010 ACM 978-1-60558-825-4/10/04…$10.00.

functions and started the Bouncing Star Project to create new ball-based sports systems and applications using the developed ball.

2. SYSTEM OVERVIEW In our system, the ball's characteristic of motion is synchronized with real time dynamic computer graphics and sounds. A high-speed camera and a projector are used. This camera is fixed at a place where it can capture the whole playing field and acquires images of the infrared lights within the ball on the play field at a frame rate 200fps. The camera is equipped with an IR filter that detects only infrared lights. Corresponding computer graphics are generated in a PC and are projected onto the floor. Sounds generated in a separate application are played through two separate stereo speakers. Figure 1 shows a layout overview of the system, and Figure 2 is a photograph of the application of the system.

Figure 1. Overview Diagram of the System

Figure 2. Playing field of the SPACE BALL Application

Developed Using the Bouncing Star System

3. IMPLEMENTATION TECHNOLOGY

3.1 Design of the Ball: "Bouncing Star" "Bouncing Star" is a simple ball which houses electronic devices inside, yet is strong enough to be thrown and bounced off walls and floors. It can recognize its various states, "thrown", "bounced", "rolled" using built-in sensors. Furthermore, the ball is equipped

with infrared LED's, so the system can detect the dynamic position of the ball using a high-speed camera. In addition, the ball itself emits LED light and can change its color and emitting speed based on the ball’s status. We produced two different ball based game applications using these characteristics of the Bouncing Star. In these applications, real time CG and sounds are generated in the playing field, in addition to changes in the color and flicker speed of the light directly link to the movement of the ball.

The Bouncing Star ball is composed of a core component and cover around the core. The Core is comprised of electric circuits housed in a spherically shaped plastic. The Cover serves first as protection for the core.

3.2 The Core The Core is composed of a PIC micro controller, a three axis acceleration sensor, a sound sensor, a ZigBee wireless communications module, a lithium ion battery, 6 full color LED's, and 6 infrared LED's.

The weight is 170g. The acceleration sensor is able to acquire accelerations between +6G to -6G every 3axis (X-axis, Y-axis, and Z-axis). The sound sensor is able to interpret sounds that occur within the ball. The data from both sensors are calculated by a micro controller (PIC16F88) in 256 steps approximately 160 times per second. The wireless module uses a “ZigBee” platform which allows reliable connectivity over approximately 30 meters. Wireless communications between environment, PCs, or another Bouncing Star, etc. use the RS-232C serial protocol.

Figure 3. Inside of a Core

3.3 Shock Absorbing Mechanism The Cover component of the ball demands a material which is strong enough for use with applications in real ball games, yet it must have enough transparency to transmit LED light through the outside of the ball.

Our first attempt was to make structure to cover the core with a spherical shell made of silicone rubber. This cover had good transparency and good elasticity allowing the ball to bounce well. However, the weight was 380g for the cover alone.

For the second cover, we made a special beach ball type cover which could open and close to allow for removal of the core. The weight of the cover lowered to 120g and the shock against the core when it was bounced became very small because of the beach ball’s air cushion. But a problem arose in that the surface of the beach ball was not strong and could be damaged allowing the air to leak. In addition, the problem of keeping the core at the center of the ball proved difficult.

Figure 4. Three Different Types of Covers

Above Left: Beach Ball, Abobe Right: Rubber Ball

Below: Sepak Takraw Ball

For the third option we used a Sepak Takraw ball which knit by six boards of polypropylene. This cover was both lightweight and strong enough to serve as a protective cover. The weight was 80g. This material could easily both fix the core at the ball's center and allow for easy removal of the core. We measured the reflection coefficient for each ball (Table 1).

Table 1. Specs of Three Different Types of the Ball

Ball

Type Material Diameter Weight Reflection

Coefficient

Rubber Ball

Silicone

Rubber 98 mm 550g 0.70

Beach Ball

Vinyl 220 mm 290g 0.34

Sepak Takraw

Ball Polypropylene 135 mm 250 g 0.43

3.4 Bounce Detection Algorithm We developed a new algorithm to recognize ball's states (static, rolled, thrown, caught, etc) using information from the acceleration sensor and sound sensor. Since the ball is spun irregularly during play we therefore composed acceleration values of the X, Y, Z-axis and calculated A value of total acceleration (1).

When the ball collides with something, the sound sensor detects S value of sound occurring within the ball. We compare acceleration value A and sound value S with constant threshold As and Ss for each. By combining these, it became possible to recognize the states of the ball (Table 2).

Table 2. Algorithm to Recognize Ball’s States

A > As A < As

S > Ss Bound (Nothing)

S < Ss Thrown

Static

Rolled

Floating

Table 3. Modes of Emission Pattern of Light inside Ball

Mode Name Description

Gradation Change color gradationally when the ball

rolled, and change color slowly when the ball bounces

Pulse The light pulses, the cycle of pulse depends

on the acceleration value "A"

Skip Change color quickly

when the ball bound or rolled

Burning As the acceleration value "A" increases, change ball’s color from blue to green,

yellow, to become red

Vanishing Turn off the light for a few seconds

when the ball is thrown

Moreover, the detection of the three different states of the ball (static, rolled, floating) use the acceleration of gravity value. When the ball is on the ground, we can detect value A as a constant acceleration of gravity value. This static state is recognized if the acceleration of gravity value does not change. The rolled state is recognized by analyzing this acceleration of gravity value in each axis of X, Y, and Z. While the floating state is recognized by value A = 0. Because, when the ball is thrown and floats in the air, the acceleration sensor cannot detect the acceleration, the PIC equipped within the ball performs these operations. Therefore the state of the ball is always transmitted to the ball states recognition program installed in the PC.

The threshold As and Ss are easily changed from the PC interface to modify the value based on the environment of the play field or the specific application's demand for the ball. The changed threshold values are kept in case someone switches off the ball. The threshold value is always saved in the memory of PIC when it changes.

In the event if we cannot use a projector and a high-speed camera in the system, the emission of LED light inside the ball continue to be linked to the movement of ball, and we can play new sports with the ball alone. Five different kinds of full color LED's emission patterns were programmed for the ball in the PIC (Table 3).

3.5 Position Tracking Algorithm We developed an image recognition program to recognize the position of the ball by a camera using the infrared lights housed within the ball. In a real demonstration environment, we first calibrate the field coordinates from the camera coordinates using projective transformation. Second, the image captured by the

222ZYXA ＋＋= (1)

camera is labeled after binarization with a constant threshold and also recognizes the position of the infrared lights. When several sources of lights were detected in the space which is nearer than the some set distance threshold, the ball is considered to be located at the center position of those sources of lights. Because 6 infrared LED's are equipped inside the ball depending on the shape, there is the case that some sources of lights are detected for one ball.

3.6 Creation and Projection of Graphics and

Sound The graphics are created in accordance with the ball tracking information (location and timing of bounce). They are written in Visual C++ with Open GL, or DirectX. We used a NVIDIA Geforce 6800GT graphic board.

A projector is suspended 10 m above the play field surface where the graphics are to be projected. We used BenQ SP 870 projector for experiments in University of Electro-Communications and public demonstrations held at three different places. (National Art Center in Tokyo [8], Laval Virtual 2008 [9], SIGGRAPH 2008 [10])

Environmental sound effects are created in accordance with the timing of a bounce. Sound effects are designed so as to match the context of both graphics and the player’s emotion on that scene.

4. APPLICATIONS

4.1 Simple Graphics Effect for Ball Play In Ishii’s “PingPongPlus”, existing sport “Ping Pong” was adapted under their new applications. But in our study, we had to go upstream to the origin of a ball play; that is, questioning how do we play with a ball, when it has effective interactive graphics? Taking a very bottom up approach, first, we created simple three dimensional graphics which could be used as graphic effects for the movement of the ball itself.

We created three different two dimensional modes (Fluid mode, Star mode, Spotlight mode) and then we made a three dimensional graphics mode in which people can hit virtual 3d objects (spheres and boxes) seen under the floor using Bouncing Star ball, and see physical interaction between the real ball and virtual 3D objects.

Fluid Mode: Two dimensional fluid motion graphics are generated slowly at the point where the ball bounces.

Star Mode: Many particle star shapes spread quickly from the point where a ball is located. Stars are turning itself.

Figure 5. Star Mode

Figure 6. Spotlight Mode

Spotlight Mode: A bright white spotlight moves according to the ball’s position. The spotlight is always above the ball, as the result, player is high lighted too.

3D Shape Mode: People can hit the CG image of cubes and spheres which are projected on the floor using the real ball. These CG shapes moves according to real-time physical simulation, as the result, people feel they are naturally interacting using their ball with the virtual physical world.

Figure 7. 3D Shape Mode

4.2 Augmented Sports Applications After making these simple graphic modes in which people fundamentally enjoy playing the ball with the interactive graphics on the floor, we proceeded to create more complicated sports game applications, in which people more dynamically move their bodies and can compete or collaborate in context of sporting game scenario.

Space Ball 1 We developed an application named "SPACE BALL 1" based on a condition when our system had no wireless communication module and microphone in the ball. Therefore the information about the ball information was derived from two high speed cameras. This was the only information that our application in PC could acquire for the system. The information that acceleration sensor detected was used only to change the emission of light from the ball itself.

A projection CG of 10 * 10 squares was spread across the field. (Figure 8) Player could get points by hitting the ball on these

panels, and two players could fight for turning these panels for points. Our challenge was to recognize the bound of the ball by the image recognition alone with the second high-speed camera, and display CG effects, such as many scattered stars, when the ball hits the boundary. However, the boundary identification with the second camera resulted in many false judgments.

Figure 8. Space Ball 1 (Laval Virtual 2008)

Space Ball 2 In "SPACE BALL 2", we use the ball which has the sound sensor and the wireless communication module inside it. This application generates dynamic CG effects on the play field that change in sync with the ball's characteristic motion as detected by the ball states recognition program. We prepared three different ball states, "bounce", "rolled", and "flying" which are detected by the program. The program then uses the position information of the ball (this is achieved through the high speed camera) as parameters to decide the direction of the game. Table 3 and Figure 4 show how the direction of game is determined by using ball’s information. This application is designed as a multi-player cooperative game. There is a time limit of 60 seconds per game. A player can score by hitting the ball on the target projected CG spot. The targets, of the same color, are displayed on the play field. Their color and placement can be changed when player bounces the ball outside the field, (at this moment the color of the ball also changes). The players can choose their favorite placement of the target spots, making it easier to get high scores by changing the ball's color though dribbling it on the floor with their hands. Hitting a target in one bounce, or rolling the ball on a line of targets generates higher scores. Figure 9 shows the rules of how to get points and how to make new targets, as well as how to make the time limit longer.

Sound effects have an important role in Space Ball 2. We applied up-beat music as a basic BGM during the play. This up-beat music is aimed at making people excited during the game. On the continuous BGM, we added four different sound effects in accordance with the ball's bounce. Sounds differ based on the context of the scene, letting players know what happened in their game (ex. They changed the targets coordinates, they got points by hitting the target; the ball simply bounced inside the play field but failed to hit the target). Each sound was designed and recorded beforehand and plays when a bounce occurs with no delay.

Table 4. Direction of “SPACE BALL 2” Using Ball’s

Information

Ball’s information Direction in of “SPACE BALL 2”

Ball bounced outside

of the play field

Change color of the ball

Change target’s placement

Ball bounced inside of the play field

Display the shock wave effect

Hit surrounding targets

Randomly generated several different color new targets

Rolled Extend remaining time by number and interval of hit targets, if you hit several

targets in one throw

Flying in the air Can’t get points when the ball moves

above the targets

Figure 9. Description of Rules to Get Points and Make New

Targets in Space Ball 2

Figure 10. Floor Coordination

Figure 11. Shock Wave Effect at Bound in “SPACE BALL 2”

5. DISCUSSION

5.1 Discussion about Play in Applications Through 2008 to 2009, we created many experiments in both outdoor and indoor fields on the campus of University of Electro-Communications. We also made multiple public demonstrations in three different places. Hundreds people including small children experienced our applications, and enjoyed the practical execution of the interaction.

In some cases, we gave people only the Bouncing Star Ball without projected images. In these cases, we found that "vanishing ball mode" got a favorable reception from many people. In this mode the ball seems to have really disappeared. This mode is used in a dark place, so players feel a sense of thrill to catch the disappearing ball. This sense created a popular reaction to this mode.

There were small gaps between the ball’s movement and the game CG scene in "SPACE BALL 1", and detection in this game was relatively low using only the ball position information, because at that time the application couldn’t yet use the acceleration and sound information from the ball.

Therefore we added the wireless communication module and a combination of the sound and acceleration sensors in the ball for "SPACE BALL 2". In doing this, it became possible to sync various ball movements, therefore getting both the ball's states and the ball's position information. This enabled us to realize the real-time bounce detection for the game. In addition, through analysis of the acceleration information, we realized the “rolled” status and “flying” status identification. Through development of this software, we could realize several unique ways to recognize states of the ball, which were unavailable in the past project used because it used only electronic devices.

However, we cannot yet adapt the system of the Bouncing Star to all the movement of a ball. For example, if the ball is used in baseball, it is impossible in the current system to distinguish the difference of the curves of ball which pitcher throws. Therefore we are now developing a prototype, which includes a gyro sensor, and are doing experiments on roll direction identification. However, this presents some difficulties because the roll speed of the ball is too fast, and the gyro sensor cannot yet detect the roll direction of the ball precisely. We think that this problem can be settled by putting multiple pieces hardware together though a device which works similar to the software used for ball states identification.

5.2 Mix of Ball and Digital Technologies We exhibited Space Ball 2, at SIGGRAPH2008 New Tech Demos. It was impressive that in addition to the players many spectators watching cheered and said the game was beautiful. We think the combinations of the application showing CG, the sounds, and the light of the LED ball working with a rule based sports game increases both the players and spectators excitement. We would like to investigate the design methodology relative how to increase pleasure and excitement of spectators using our system.

6. CONCLUSION In this research project, we developed a ball "Bouncing Star" which has a plastic core including electronics devices (sound sensor, 3 axis acceleration sensor, IR and full color LED's) covered in material to protect the core against shocks during

bouncing. We tested three different cover materials for the ball, and at this point have a cover consisting of silicon rubber, which proved to be the easiest to bounce, though the weight is still heavy for small children. The Sepak Takraw type cover was the toughest, but it did not bounce accurately like a rubber ball. Every type can be used in our applications, and almost all people enjoyed playing with the silicon rubber ball cases. People often showed concentration and enthusiasm for their games. Players made full body movement for the games, and sometimes people played continuously for 15 minutes, perspiring during their workout. Not only the players, but also we observed many audiences who enjoy the whole game scene around the play field.

The fusion of the ball and digital technology just began in this past decade, but we believe the ball interface for augmented sports has great potential to create a new phase for human communication and entertainment activity. Our next phase would to make our tools an interface to connect the large real physical world to virtual information resource very smooth and very intuitive, turning to a seamless natural state for human beings.

7. ACKNOWLEDGMENTS

We would like to thank all the members who worked for this

project in our laboratories, and Division of Technical Staffs at University of Electro-Communications for their mechanical engineering support. This development and research was supported by the CREST project of the Japan. Science and Technology Corporation (JST).

8. REFERENCES [1]Hiroshi Ishii, Craig Wisneski, Julian Orbanes, Ben Chuu, and

Joe Paradiso. PingPongPlus: Design of an Athletic-Tangible Interface for Computer-Supported Cooperative Play. Proceedings of CHI’99, pp.394-401, 1999.

[2]Florian Floyf Mueller, Stefan Agamanolis, Sports Over a Distance, ACM Computers in Entertainment, Vol.3, No.3, July 2005, Article 4E, 2005.

[3]Stephan Rusdorf, Guido Brunnett, Real Time Tracking of High Speed Movements in the. Context of a Table Tennis Application, Proc. of ACM VRST, pages 192–200,. Monterey, 2005.

[4]Akihiko Iyoda, Hidetaka Kimura, Satoru Takei, Yoshifumi Kakiuchi, Xiaodong Du, Sotaro Fujii, Yoshihiro Masuda, Daisuke Masuno, Kazunori Miyata, A VR Application for Pitching Using Scceleration Snsor and Strip Screen, Journal of the Society for Art and Science, Vol.5, No.2, pp.33-44, 2006.

[5]Yoshiro Sugano, Jumpeo Ohtsuji, Toshiya Usui, Yuya Mochizuki, Naohito Okude, SHOOTBALL: The Tangible Ball Sport in Ubiquitous Computing. ACM ACE2006 Demonstration Session, 2006.

[6]Ryota Kuwakubo “HeavenSeed” is described in his web site. Ryota Kuwakubo’s site: http://www.vector-scan.com/

[7]Yago Torroja in UPM developed a ball device which is an interface for musical expression that sends MIDI messages through a wireless connection to a computer.

[8]“Bouncing Star” was presented at “Leading Edge Technology Showcase” exhibit held at National Art Center in Tokyo, Feb/6-Feb/17/2008.

[9]Osamu Izuta, Jun Nakamura, Toshiki Sato, Sachiko Kodama, Hideki Koike, Kentaro Fukuchi, Kaoru Shibasaki, Haruko Mamiya, "Bouncing Star" Laval Virtual 2008, 2008.4 (exhibition)

[10]Osamu Izuta, Jun Nakamura, Toshiki Sato, Sachiko Kodama, Hideki Koike, Kentaro Fukuchi, Kaoru Shibasaki, Haruko Mamiya, Digital Sports Using the "Bouncing Star" Rubber Ball Comprising IR and Full-color LEDs and an Acceleration Sensor, ACM SIGGRAPH 2008 New Tech Demos (Los Angeles), Article No. 13, 2008.

[11]Osamu Izuta, Toshiki Sato, Haruko Mamiya, Kaoru Shibasaki, Jun Nakamura, Sachiko Kodama, Hideki Koike, BouncingStar: Development of a Rubber Ball Containing Electronic Devices for Digital Sports, Proceedings of WISS 2008, pp.41-44, 2008.

[12]Koji Tsukada, Maho Oki, Chamelleon Ball, Proceedings of Interaction 2009, pp.119-120, 2009.

[13]Hiroshi Ishii. and Brygg Ullmer, Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms, in Proceedings of Conference on Human Factors in Computing Systems (CHI '97), (Atlanta, March 1997), ACM Press, pp. 234-241.

[14]Volker Wulf, Eckehard F.Moritz, Christian Henneke, Kanan Al-Zubaidi, and Gunnar Stevens, Computer Supported Collaborative Sports: Creating Social Spaces Filled with Sports Activities, Proceedings of the Third International Conference on Educational Computing (ICEC 2004), pp.80-89. 2004.

On-line Document Registering and Retrieving Systemfor AR Annotation Overlay

Hideaki Uchiyama, Julien Pilet and Hideo SaitoKeio University

3-14-1 Hiyoshi, Kohoku-kuYokohama, Japan

uchiyama,julien,[email protected]

ABSTRACTWe propose a system that registers and retrieves text doc-uments to annotate them on-line. The user registers a textdocument captured from a nearly top view and adds virtualannotations. When the user thereafter captures the doc-ument again, the system retrieves and displays the appro-priate annotations, in real-time and at the correct location.Registering and deleting documents is done by user interac-tion. Our approach relies on LLAH, a hashing based methodfor document image retrieval. At the on-line registeringstage, our system extracts keypoints from the input imageand stores their descriptors computed from their neighbors.After registration, our system can quickly nd the storeddocument corresponding to an input view by matching key-points. From the matches, our system estimates the geomet-rical relationship between the camera and the document foraccurately overlaying the annotations. In the experimentalresults, we show that our system can achieve on-line andreal-time performances.

Categories and Subject DescriptorsK.5.1 [INFORMATION INTERFACES AND PRESENTA-TION]: Multimedia Information SystemsArticial, augmented,and virtual realities; I.4.8 [IMAGE PROCESSING ANDCOMPUTER VISION]: Scene AnalysisTracking

General TermsAlgorithms

KeywordsAugmented reality, Document retrieval, LLAH, Feature match-ing, Poes estimation

1. INTRODUCTIONAugmented reality is one of the popular research cate-

gories in computer vision and human computer interaction.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Augmented Human ConferenceApril 2-3, 2010, Megève, France.Copyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00.

AR applications are widely developed for game, education,industry, communication and so on. They usually need toestimate geometrical relationship between the camera andthe real world to overlay virtual objects with geometricalconsistency. One of the traditional approaches to estimatethe geometry is to use ducial markers [6]. In recent years,the research direction of AR is going towards using natu-ral features in order to reduce the limitations of a practicaluse [16].Nowadays, augmenting documents is gaining in popular-

ity and is called Paper-Based Augmented Reality [5]. Thepurpose of this research is to enlarge the usage of a phys-ical document. For example, a user can click words on aphysical document through a mobile phone. This enablesa document paper to be a new tangible interface for con-necting physical to digital worlds. Hull et al have proposeda clickable document, which has some colored words as aprinted hyperlink [5]. When the user reads the document,the user can click the colored word to connect the Internet.Also, the user can watch the movie instead of the printedpicture on the document. Their application was designed forextending the usage of existing newspapers or magazines.As a novel application for document based AR, we propose

a system that registers a document image with some user an-notations for later retrieval and augmentation on new views.Our system is composed of a single camera mounted on ahandheld display, such as a mobile phone, and of the physicaldocuments the user selects. No other special equipment suchas a ducial marker is necessary. The user captures the doc-ument, writes some annotations on the document throughthe display, and registers them in our system. When the usercaptures the registered document again, the annotations ofthe document are retrieved from the database and overlaidat the position selected by the user beforehand. Our systemcan be useful in case that the user does not want to writeannotations directly on valuable documents such as ancientbooks.The rest of the paper is organised as follows: we review

keypoint matching based registration for augmented realityin the next section. In Section 3, we introduce the usageof our system. Then, we explain the detailed algorithm ofour system in Section 4. We evaluate the way of capturingdocuments and processing time in Section 5, and concludein Section 6.

2. RELATED WORKSThe process of registration by keypoint matching between

two images can be divided into three parts; extraction, de-scription and matching.As a rst step, we extract keypoints which have distinctive

and dierent appearance from other pixels in each image.By using these distinctive keypoints, it is easier to establishcorrespondences between two images. Harris corner [4] andFAST corner [13] are widely used and keep the repeatabilityof the extraction under dierent viewpoints.Next, these keypoints are described as a high dimensional

vector for robust and stable matching. The vector is usu-ally computed with local neighbor region of the keypoint.The well-known descriptors such as SIFT [8] and SURF [2]with 128 dimensional vector are well designed to be invariantto illumination, rotation and translation change. Since thecomputational cost of SIFT is not sucient for real-timeprocessing, several attempts to reduce the cost have beendone [14, 16].Matching of descriptors can be addressed as a nearest

neighbor searching problem. KD-tree based approach [1]and a hash scheme [3] are typical as an approximated near-est neighbor searching. Though the nearest neighbor can-not be always searched, the computational cost is drasticallyreduced compared to full searching. Nister and Steweniushave proposed a recursive k-means tree as a tree structurefor quick retrieval [12]. Lepetit et al. have proposed anotherapproach by treating the keypoint matching as a classica-tion [7].The descriptors such as SIFT and SURF are well suited

to match keypoints with rich texture patterns. However,documents have generally repetitive patterns composed oftext. Since the local region of documents may be similarand not be distinctive, these descriptors do not work well.Instead of them, geometrical relationship of keypoints havebeen proposed [5, 11].As a descriptor for a document, Hull et al. have pro-

posed horizontal connectivity of word lengths [5]. Nakaiet al. have proposed LLAH (Locally Likely ArrangementHashing), which uses local arrangement of word centers [11].Uchiyama and Saito extended the framework of LLAH formore wide range tracking [15]. LLAH is applied to anno-tation extraction written in a physical documents [9] andextended to a framework for augmented reality [15].Since LLAH can achieve real-time processing thanks to

hashing scheme [11, 15], we develop the system based onLLAH as described in Section 4.

3. PROPOSED SYSTEMThe conguration of the system is only a camera mounted

handheld display such as a tablet PC or a mobile phone.The user prepares text documents in which the user wantsto write some annotations electronically. No other specialequipment is used.At the registration of the document into our system, the

users capture the documents from nearly top view as shownin Figure 1. While our system shows the captured doc-ument on the display, the user can write annotations onthe document through the display. We prepare two modes;text and highlighter. In the text mode, the users can writedown several sentences at specied positions on the docu-ment as shown in Figure 1(a). This mode works as memosand can be replaced with handwriting. In the highlightermode, the users can highlight the text on the document asshown in Figure 1(b). Since the highlighted areas are semi-

transparent, this mode can be considered as a virtual colorhighlighter pen.After the registration, the retrieval stage starts. When the

same document is captured again, the annotations are over-laid at the specied position. While rotating and translatinga camera, the users can watch the overlaid annotations aswritten on the document. Since many documents can beregistered in our system, our system can identify which doc-ument is captured now and overlay each annotation.The operations for registering and deleting documents are

by user's click. First, our system starts the capturing pro-cess. If the users register a document, the users click abutton. Then, our system switches to the registration stageand waits for user's annotation input. After the input, theusers click the button again to switch to the retrieval stage.During the retrieval stage, the users can watch the anno-tations on the document captured. When the users deletea document from the database, the users click the buttonwhile watching the annotations of the document. This op-eration is designed in order to avoid registering the samedocument. By using these user interactions, the users canregister and delete documents.Our system is designed for the people who do not want to

write annotations on the documents directly and can be alsoconsidered as an electronical bookmarking. In the previousrelated works, the document database was prepared from thedigital documents [5, 11, 15]. Since it is dicult to preparethe digital version of books and newspapers, our system canbe easier and more practical in terms of the usage becauseour system uses physical documents the user has on hand.

4. DETAILS

4.1 LLAHLLAH is a document image retrieval method [11]. Since

our system relies on it, we briey describe the method herefor completeness.First, the center of each word is extracted from the cap-

tured document image as a keypoint. The image is blurredby using a Gaussian lter and binarized by using adaptivethresholding as shown in Figure 2. Since the lter size ofboth processing aects our result, we will discuss their ef-fects in Section 5.2.Next, descriptors are computed for each keypoint. In Fig-

ure 3, x is a target keypoint. First, n nearest points of thetarget are selected as abcdefg (n = 7). Next, m points out ofthe n points are selected as abcde (m = 5). From m points,a descriptor is computed as explained in the next paragraph.Since the number of the selections is nCm = n!

m!(n−m)!, one

keypoint has nCm descriptors.From m points, 4 points are selected as abcd. From 4

points, we compute the ratio of two triangles. Since thenumber of the selections is mC4, the dimension of the de-scriptor is mC4.For quick retrieval in keypoint matching, the descriptor

is transformed into an index by using the following hashfunction:

Index =

mC4−1X

i=0

r(i)ki

!mod Hsize (1)

where r(i)(i = 0, 1, ...,m C4 − 1) is a quantized ratio of twotriangles, k is quantization level and Hsize is the hash size.

(a)

(b)

Figure 1: Annotation Overlay. (a) Red text is written asa memo. (b) Semi-transparent rectangle is highlighted aswritten by a color marker pen.

These descriptors allow matching keypoints of an inputimage with those of a reference image in the database.

4.2 Document registrationWhen the user captures a document, our system extracts

its keypoints and computes their descriptors. For each doc-ument, our system stores keypoints in a table as follows:

Document ID Keypoint ID (x, y) Descriptors

The document ID is numbered by captured order. Thekeypoint ID is also numbered by extracted order from theimage. (x, y) is the coordinate in the image. This allows oursystem to estimate the geometrical relationship between thecoordinate system of the stored image and the one of theinput image, making possible accurate annotation overlay.Previous method do not store descriptors [11, 15]. In con-trast, we need to keep them for the deletion process de-scribed in Section 4.4.For document retrieval, our system has a descriptor database

as follows:

(a) (b)

Figure 2: Keypoint extraction. (a) The document is cap-tured from nearly top viewpoint. (b) The white regionsrepresent the extracted word regions. The keypoint is thecenter of each region.

Figure 3: Descriptor. (1) Selection of n points. (2) Selectionof m points out of n. (3) Selectiong of 4 points out of m (4)Computation of the two triangles area ratio.

Descriptor (Document ID + Keypoint ID), ...

As described in Section 4.1, the descriptor is an index.At each index, the set of document ID and keypoint ID isstored. In our system, we use 16 bits for document ID and 16bits for keypoint ID, and store them as 32 bits integer. Sincethe same descriptor can be computed, we use a list structurefor storing several sets of document ID and keypoint ID ateach index.The descriptor database was generated as a hash table in

previous works [11, 15]. If the database can be generatedbeforehand as in [11, 15], the hash size can be optimized andoptimally designed by using all document images. Since oursystem starts from an empty database, it is dicult to de-termine the appropriate hash size. For avoiding large emptyspaces in the hash table, we use a sparse tree structure forthe descriptor database. Even though the computationalcost of a binary tree for searching will be O(log2 N) com-pared with O(1) of a hash table, it is enough for our purpose,as discussed in Section 5.3.

4.3 Document retrievalIn this process, keypoints are extracted, and their descrip-

tors and indices are computed as described in Section 4.1.For each keypoint, the several sets of document ID and key-point ID are retrieved from the descriptor database. If theretrieval is relatively succeeded, the same set of document IDand keypoint ID often appears for a keypoint. By selectingmaximum number of the counted sets, one set (documentID and keypoint ID) is assigned to a keypoint.After assigning one set to each keypoint, we count as-

signed document ID of each keypoint in order to determinethe document image captured currently. The document cap-tured is also identied by selecting maximum number of the

counts.For verifying that the selected document is correct, we

compute geometrical constraints such as fundamental ma-trix and homography matrix. Since the paper is put on thetable, we can use RANSAC based homography computationfor the verication [15].From the computed homography, we can overlay some AR

annotations at specied positions on the document. Thedocument retrieval and annotation overlay can be simulta-neously done in the same process.

4.4 Document deletionAs described in Section 3, the users can delete the docu-

ment while watching the annotation of the document. Thismeans that the users delete the current retrieved document.When the document is deleted, its document data such as

the sets of document ID and keypoint ID and their descrip-tors should be deleted. First, we delete the sets of documentID and keypoint ID from the descriptor database. Since wekeep descriptors (indices) for each keypoint in the registra-tion, we can delete the sets by accessing each index. Afterdeleting the sets, we delete other document data.

5. EXPERIMENTS

5.1 SettingThe parameters in LLAH aect the performance and ac-

curacy of document image retrieval. Since the inuence ofthe parameters has already been discussed in [11], we do notdiscuss it here and x the parameters through our experi-ments. Instead, we will discuss about the way of capturinga document and the processing time for our purpose.In LLAH, the parameters are described in Section 4.1 as

follows: n, m, k and Hsize. Since we set n = 6 and m = 5,the number of descriptors for one keypoint is 6C5 = 6. Thequantization level is k = 32 and the hash size is Hsize =223 − 1. As described in Section 4.2, the hash size is usedonly for computing descriptors. Each descriptor is storedin a binary tree structure. The quantization method of de-scriptors is the same as [11].In our current implementation, we use a laptop with a

re wire camera for a device. The laptop has Intel Core 2Duo 2.2 GHz and 3GB RAM. The size of the input imageis 640× 480 pixels, and the size for the keypoint extractionis 320× 240 pixels for fast computation. The focal length ofthe lens is xed as 6mm.

5.2 Image captureIn LLAH, the keypoint extraction is composed of smooth-

ing by a Gaussian lter and binarization by adaptive thresh-olding. The lter size of both methods needs to be deter-mined beforehand.Since the lter size aects the result of keypoint extrac-

tion, we have tested the keypoint extraction to images cap-tured from dierent position as shown in Figure 4. TheGaussian lter is 3× 3 and the lter for adaptive threshold-ing is 11 × 11. The character size is 10 pt written in a A4paper with two column format.If the camera is close to the document as 3cm, the each

character is individually extracted as shown in Figure 4(a).On the other hand, the word region cannot be extractedfrom the image captured far from the document (20cm) as

shown in Figure 4(b). The word regions are desirably ex-tracted in case of Figure 4(c).The result of keypoint extraction can be inuenced by

image size, character size in a physical document, distancebetween a camera and a document, two lters' size and alens. These parameters should be optimized by consideringthe use of the application. In our application, examples of acaptured image are as shown in Figure 1. The user capturesa A4 paper with 10 pt size's character from around 10 cmhigh.

5.3 Processing timeWe have measured the processing time with 200 small

parts of documents. The size of each small part is as shownin Figure 1. In this region, the average number of keypointswas around 180.The average processing time of each process is shown in

Table 1. The document registration without user's annota-tion took 1 msec. The document deletion also took 1 msec.From these result, user interactions can be done with nostress.Regarding the document retrieval including the annota-

tion overlay, the average time was 30 msec. Compared withthe previous related work [15], the computational cost wasreduced because the number of keypoints in a smaller imagewas fewer. Even though we use a tree structure for search-ing, we can still achieve about 30 fps and enough processingtime for AR.

Table 1: Processing timeProcess msec

Registration 1Retrieval 30Deletion 1

6. CONCLUSIONS AND FUTURE WORKSIn this paper, we presented an on-line AR annotation sys-

tem on text documents. The user can register text docu-ments with annotations virtually written on the document.Then, the user can watch the annotations by AR while cap-turing the same document again. Our system provides theuser interaction for registering and deleting documents. Thealgorithm of our system is based on LLAH. Our systemstores keypoints with their descriptors in the captured im-age. By using LLAH, our system can quickly identify whichdocument is captured and overlay its annotations. In theexperiments, we showed that our system could work real-time.In our current system, target documents are European

documents such as English and French. As a future work, wewill apply to any language by changing the keypoint extrac-tion method depending on the language [10]. Also, multipledocuments may be detected for showing many annotationssimultaneously. For handling a large scale change, keypointextraction on a pyramid image may be another direction.

7. ACKNOWLEDGMENTThis work is supported in part by a Grant-in-Aid for the

GCOE for high-Level Global Cooperation for Leading-Edge

(a)

(b)

(c)

Figure 4: Keypoint extraction at a distance. (a) The camerais set near the document(3cm). (b) The camera is set farfrom the document (20cm). (c) The distance is between (a)and (b) (10cm).

Platform on Access Spaces from the Ministry of Educa-tion, Culture, Sport, Science, and Technology in Japan andGrant-in-Aid for JSPS Fellows.

8. REFERENCES[1] S. Arya, D. M. Mount, N. S. Netanyahu,

R. Silverman, and A. Wu. An optimal algorithm forapproximate nearest neighbor searching xeddimensions. J. of the ACM, 45:891923, 1998.

[2] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. SURF:Speeded up robust features. CVIU, 110:346359, 2008.

[3] M. Datar, P. Indyk, N. Immorlica, and V. S. Mirrokni.Locality-sensitive hashing scheme based on p-stabledistributions. In Proc. SCG, pages 253262, 2004.

[4] C. Harris and M. Stephens. A combined corner andedge detector. In Proc. AVC, pages 147151, 1988.

[5] J. Hull, B. Erol, J. Graham, Q. Ke, H. Kishi,J. Moraleda, and D. Van Olst. Paper-based augmentedreality. In Proc. ICAT, pages 205209, 2007.

[6] H. Kato and M. Billinghurst. Marker tracking andhmd calibration for a video-based augmented realityconferencing system. In Proc. IWAR, 1999.

[7] V. Lepetit, J. Pilet, and P. Fua. Point matching as aclassication problem for fast and robust object poseestimation. In Proc. CVPR, pages 244250, 2004.

[8] D. G. Lowe. Distinctive image features fromscale-invariant keypoints. IJCV, 60:91110, 2004.

[9] T. Nakai, K. Iwata, and K. Kise. Accuracyimprovement and objective evaluation of annotationextraction from printed documents. In Proc. DAS,pages 329336, 2008.

[10] T. Nakai, K. Iwata, and K. Kise. Real-time retrievalfor images of documents in various languages using aweb camera. In Proc. ICDAR, pages 146150, 2009.

[11] T. Nakai, K. Kise, and K. Iwata. Camera baseddocument image retrieval with more time and memoryecient LLAH. In Proc. CBDAR, pages 2128, 2007.

[12] D. Nister and H. Stewenius. Scalable recognition witha vocabulary tree. In Proc. CVPR, pages 21612168,2006.

[13] E. Rosten and T. Drummond. Machine learning forhigh speed corner detection. In Proc. ECCV, pages430443, 2006.

[14] S. Sinha, J. Frahm, M. Pollefeys, and Y. Genc.GPU-based video feature tracking and matching. InProc. EDGE, 2006.

[15] H. Uchiyama and H. Saito. Augmenting textdocument by on-line learning of local arrangement ofkeypoints. In Proc. ISMAR, pages 9598, 2009.

[16] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond,and D. Schmalstieg. Pose tracking from naturalfeatures on mobile phones. In Proc. ISMAR, pages125134, 2008.

Augmenting Human Memory using Personal Lifelogs

Yi Chen Centre for Digital Video Processing

Dublin City University Dublin 9, Ireland

[email protected]

Gareth J. F. Jones Centre for Digital Video Processing

Dublin City University Dublin 9, Ireland

[email protected]

ABSTRACT Memory is a key human facility to support life activities, including social interactions, life management and problem solving. Unfortunately, our memory is not perfect. Normal individuals will have occasional memory problems which can be frustrating, while those with memory impairments can often experience a greatly reduced quality of life. Augmenting memory has the potential to make normal individuals more effective, and those with significant memory problems to have a higher general quality of life. Current technologies are now making it possible to automatically capture and store daily life experiences over an extended period, potentially even over a lifetime. This type of data collection, often referred to as a personal life log (PLL), can include data such as continuously captured pictures or videos from a first person perspective, scanned copies of archival material such as books, electronic documents read or created, and emails and SMS messages sent and received, along with context data of time of capture and access and location via GPS sensors. PLLs offer the potential for memory augmentation. Existing work on PLLs has focused on the technologies of data capture and retrieval, but little work has been done to explore how these captured data and retrieval techniques can be applied to actual use by normal people in supporting their memory. In this paper, we explore the needs for augmenting human memory from normal people based on the psychology literature on mechanisms about memory problems, and discuss the possible functions that PLLs can provide to support these memory augmentation needs. Based on this, we also suggest guidelines for data for capture, retrieval needs and computer-based interface design. Finally we introduce our work-in-process prototype PLL search system in the iCLIPS project to give an example of augmenting human memory with PLLs and computer based interfaces.

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Search and Retrieval - Search process, Query formulation, H.5.2 [User Interfaces (D.2.2, H.1.2, I.3.6)]: Graphical user interfaces (GUI), Prototyping, User-centered design

General Terms Algorithms, Design, Human Factors.

Keywords Augmented Human Memory, Context-Aware Retrieval, Lifelogs, Personal Information Archives

1. INTRODUCTION Memory is a key human facility inextricably integrated in our ability to function as humans. Our functioning as humans is dependent to a very significant extent on our ability to recall information relevant to our current context, be it a casual chat with a friend, remembering where you put something, the time of the next train or some complex theory you need to solve a problem in the laboratory. Our effectiveness at performing many tasks relies on our efficiency and accuracy in reliably recalling the relevant information. Unfortunately humans are frequently unable to reliably recall the correct information when needed. People with significant memory problems (e.g. amnesic patients) usually face considerable difficulty in functioning as happy integrated members of society. Other people, although having much less noticeable memory problems compared with the amnesic patients, may also experience some degree of difficulties in learning and retrieving information from their memory for various reasons. In this paper, we use the phrase normal people to refer to individuals with normal memory and normal lifestyles, as opposed to amnesic or mentally impaired patients. The desirability of a reliable and effective memory means that augmenting memory is a potentially valuable technology for many classes of people. Normal individuals might use a memory augmentation tool to look up partially remembered details from events from their life in many private, social or work situations. The augmented memory application itself might proactively monitor their context and bring to their attention information from their previous life experiences which may be of assistance or interest to their current situation. Details from these experiences could be integrated into personal narratives for use either in self reflection or to enable experiences to be shared with friends [1]. Sufficiently powerful augmented memories could not just support their users, but actually extend the user’s capabilities to enable them to perform new tasks or existing tasks more efficiently or faster. In order to provide augmented memory applications however we need some means to capture, store and then access personal

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Augmented Human Conference, April 2–3, 2010, Megève, France. Copyright 2010 ACM 978-1-60558-825-4/10/04…$10.00.

information from a person’s life experiences to form an augmented memory. The emerging area of digital life logging is beginning to provide the technologies needed to support these applications. Personal lifelogs (PLLs) aim to digitally record and store many features of an individual’s life experiences. These can include details of visual and audio experiences, documents created or read, the user’s location, etc. While lifelog technologies fall short of genuinely mimicking the complexities and processes of human memory, they are already offering the promise of life enhancing human augmentation, especially for episodic memory impaired patients, that is people who have problems remembering their daily experiences [2]. Existing studies in lifelogs have concentrated primarily on the physical capture and storage of data. One of the major activities in this area which has explored this topic in detail relates to Gordon Bell’s experiences of digitizing his life described in [3]. Bell explores the topic of “e-memories” and “Total Recall” technologies related through this own experiences of digitizing his life. While his work provides significant insights into the potential and issues or digital memories, it focuses very much on the technologies of capture and storage, and potential applications. Our work in the iCLIPS project in the Centre for Digital Video Processing (CDVP) at Dublin City University1 is exploring capture and search of personal information archives looking not just at data capture, but also the development of effective content indexing and search, and importantly in relation to this paper, that we are looking at the processes of human memory, the form and impact of memory failures and how these might be overcome using search of PLLs. In our work, we are concentrating on memory failures typically encountered by normal people, and using these to guide the development of a prototype interface to access a PLL as an augmented memory. The remainder of this paper is organised as follows. In section 2 we examine PLLs and related works on digital memory aid tools in a little more detail, Section 3 then looks at theoretical models of memory from the psychology literature, reviews some existing empirical studies regarding normal people’s memory problems and memory supports needs in their daily life, and discuss the possible function that PLLs may be able to provide for augmenting human memory In section 4 we postulate the guidelines for developing PLLs systems to augment human memory, giving suggestions on computer based interface designing, types of data to be capture, and retrieval techniques required. And finally section 5 we introduce the iCLIPS project and our prototype application.

2. BACKGROUND 2.1 Personal Lifelogs (PLLs) PLLs are typically captured using a range of software and hardware devices. A separate lifelog is captured for each individual. In our work we use software applications to log all activity on the individual’s desktop and laptop computers. This involves recording files created, edited and opened, logging web pages accessed, and archiving emails sent and received. Peripheral devices are used to continuously record other potentially significant data streams. These include visual information recorded using a wearable camera, in our case the 1 http://www.cdvp.dcu.ie/iCLIPS

Microsoft SenseCam2, or camcorders, some projects also use audio recording to record conversations, the most important source of our conventional daily communication. Due to privacy concerns related to continuous capture of audio data, we do not capture audio data in our work. However, other communications sources such as SMS messages and Twitter feeds can be monitored and included in a PLL. In addition there is a wide range of context information that can also be recorded. For example, location can be monitored using GPS sensors which then look up named locations in gazetteers, individuals present can often be inferred by monitoring nearby Bluetooth enabled devices, date and time are often easily be captured, and are very powerful context data for search of information. Another interesting source of data is biometrics. Research has shown a correlation between measurable biometric responses such as heart rate and skin conductance and temperature, personal arousal and memorability of events [4]. Thus capturing these biometric features can potentially be used to help locate events in a PLL of potential personal significance to its owner.

2.2 Related Works There have been a number of studies on developing memory-supporting applications. But most of current research claiming to use PLL data in supporting human memory is limited to presenting streams of captured episodes (e.g. video or audio records of certain episode) to the user, to have them “re-experience” the past, to look up required information or “re-encode” information encountered during that period and consolidate memory of it (e.g.[2, 5, 6]). While these applications appear to have promising results in clinical psychology studies with severe amnesic patients, that is, people who suffer from serious memory disorders (e.g. [2]), such applications may not be equally useful for people who have normal memory abilities. For example, in the study of [2], the subject (patient) can hardly recall anything that happened to her even after one day’s delay. Therefore, a simple review and consolidation of past events can be very helpful for them to maintain necessary episodic memory. A patient’s lifestyle can be very different from that of normal working people, in that they have enough time to review their experiences day by day. Therefore the “rehearsal” type memory aid (e.g. [5, 6]) is less likely to be favoured by normal people, unless it contains some important information which is difficult to remember or if there is some specific information they can’t recall. For example, it is not unusual that we need to find an object or a document but don’t remember where it is, or we meet someone we saw before but can’t recall the person’s name. Ubiquitous Memories [7] is a tool designed to automatically retrieve video clips which were captured when an object was previously presented. The developers also argue it to be a tool to help people find physical objects. VAM [8] was designed to automatically retrieve personal information such as the name of a currently encountered person by automatically detecting the face of the person. Audio life logs such as iRemember [9] are usually used to recover information one learned from audio conversations. Forget-Me-Not [10] helps people find documents by searching for actions in which the document is involved. The cues it presents to trigger memories of the target document related action also include other actions in the day which are presented

2 http://research.microsoft.com/en-us/um/cambridge/projects/sensecam

by iconized attributes, including people, location, actions on documents and time stamps, then allow filtering/searching for an action on the document. Most work in PLL capture to date has focused on short term studies of a few days or a week or two of data. To support research exploring technologies to truly augment human memory it is our belief that much longer term PLLs archives are needed for research. As should be obvious, capturing PLLs using current technologies requires a considerable investment by the subject capturing their own PLL. Software must be monitored and data carefully archived, more demandingly though the batteries on peripheral devices must be charged regularly and important data uploaded to secure reliable storage locations. The iCLIPS project has so far gathered PLLs of 20 months duration from 3 subjects. Our experiences in capturing and archiving this data are described in detail in [11].

3. MEMORY SUPPORT NEEDS Since people will only turn to use memory aid tools when they feel unconfident or incapable of retrieving a piece of information from their memory, we believe that a sound understanding of the memory problems people usually encounter in their daily life will provide a guide of the functionality of memory aid tools.

In this section we first explain memory problems and the mechanisms which cause problems based on psychology research, we then review existing studies in exploring normal people’s memory failures and needs for memory aid tools in daily life, and finally we discuss the possible functions that PLLs may be able to provide for augmenting human memory.

3.1 Theoretical Review Memory is a cognitive ability to encode, store and retrieve information. Encoding is the process of converting sensors received external stimuli into signals which the neuron system in the brain can interpret, and then absorbing the newly received information into long term storage, termed long term memory (LTM). Retrieval is the process of bringing back information from the LTM storage. Different types of retrieval approaches are used for different types of memory. The two basic categories of memory systems are procedural memory and declarative memory. Procedural memory is also called implicit memory, meaning that it is usually retrieved without explicit awareness or mental effort. Examples include memory of motor skills, oral language, and memory of some types of routines. Procedural memory usually requires minimum cognitive resource and is very durable. It has been found that even people with serious global memory impairments have preserved procedural memory. For this reason, memory aids for procedural memory are not explored in this paper. Declarative memory as opposed to procedural memory, usually involves explicit awareness during encoding and retrieval. There are two major types of declarative memory: semantic memory, meaning memory of facts, and episodic memory, referring to the memory of experiences, which is usually related to temporal context. Most of our memory problems are declarative memory problems. Although most memory problems can only be observed during retrieval, since current techniques are not advanced enough to know what’s happening in the human mind, failures at any stage can cause problems in memory. For example, failure to encode encountered information makes the information unavailable in

one’s memory. In the Seven Sins of Memory [12], Schacter characterizes seven daily memory problems including: transience, absent-mindedness, blocking, misattribution, suggestibility, bias, and persistence. These sins can generally fall into three categories of memory problems namely: forgetting (transience, absent-mindedness, blocking), false memory (misattribution, suggestibility, bias), and the inability of forgetting (persistence). In the remainder of this section, we explain the mechanisms for these memory sins (problems), and discuss the possible solutions that PLLs can offer.

Table 1. Seven Sins of Memory

Sins Meaning transience the gradual loss of memory overtime absent-mindedness

incapability to retrieve memory due to the lack of attention while encoding the information.

blocking the failure of retrieving encoded information form memory due to the interference of similar information retrieved or encoded before (proactive) or after this (retroactive)

misattribution remembering information without correctly recollecting where this information is from

suggestibility reconstructing a set of information with false elements, which are from the suggested cues at the time of retrieval

bias people’s current retrieved or reconstructed memory is influenced by current emotions or knowledge

persistence inability to forget things which one wants to forget

Encoding newly encountered information or thoughts needs to process them in a short term memory (STM) system, which is called working memory (WM). The WM system is comprised of subsystems including separate short term storage channels for visual spatial and acoustic (sound) information, and an episodic buffer which links newly incoming information with what is already in long term storage. WM also has a central executive module which assigns cognitive resource (especially attention) to the channels [13, 14]. Thus the absence of attention can reduce the encoding efficiency or even cause encoding failure of some information input at that time (this is the so-called “absent-mindedness” in the seven sins of memory). And information which was paid more attention to is more likely to be better encoded and therefore more likely to be better remembered. It has been suggested that emotion can often influence attention at encoding, and therefore influence the memory of items. Regarding LTM, it has been argued that information in human memory exists in an associative network, the activation of one piece of information (by external output, e.g. presenting that information again) can trigger the memory of its linked nodes [15]. The stronger the link, the more likely the trigger is going to happen. This is why recall is easier when a cue is presented (cued recall) than when there is not (free recall). It has been suggested by many psychology studies that it is the lack of proper links to information, rather than the loss of the memory of information itself that cause “forgetting”. Since one node of memory may be linked to several other nodes, it is important that only the required

information be triggered. Thus, inhibition is an important function of human memory. However, it may also induce ‘blocking’. A classic example is the ‘tip of the tongue' (TOT) phenomenon, where one is unable to recall the name of some well remembered information, feeling that the memory is being temporarily blocked. False memory, meaning memory errors or inaccurate recollection, also arises due to the structure of the associative memory network. According to Loftus [16], every time a piece of memory is retrieved, it is actually reconstructed with associated small nodes of information. False memories can bring various problems in daily life. For example, “Misattribution” of witnesses can cause serious legal problems if a witness does not know whether the source is from reality or was in a dream or on TV or even imagined. As for the sin of persistence, this is actually a problem of mental well-being and cognitive problems with memory. The reason for persistence is that unwanted and sometimes even traumatic memories are so well encoded, rehearsed and consolidated, that they may not be buried or erased. According to theories of forgetting, these memories can be “blocked” if the external cues can form strong link with memories of other experiences, ideally happy experiences. Therefore, having people rehearsing more happy memories may find these helpful to replace their memories of traumatic experiences. The question of which pieces of happy memory to present is beyond the scope of our work, and is left to clinical psychologists. In summary, there are two main reasons for difficulty in retrieving a memory, namely: absence of the memory due to failure at encoding, or the lack of proper and strong cues to link to and access the correct pieces of memory. For memory problems arising from both causes, PLLs may have the potential to provide supplements. Data in PLLs can provide some details which one failed to encode due to “Absent-mindedness”, or which have faded in one’s memory over time. It can also provide cues for memories which have been “blocked”.

3.2 Empirical Studies In this section, we further explore the needs for memory aids though some documented empirical studies, and use the results of this work to focus our investigation. In [17], Elsweiler et al explored people’s daily memory problems with a diary study in working settings with 25 participants from various backgrounds. They concluded that the participants’ diary inputs can be split into 3 categories of memory problem: Retrospective Memory problems (47% in their data entry), Prospective Memory (29%), and Action Slips (24%), which are also a type of prospective memory failure caused by firmly routine actions rooted in procedural memory. Since prospective memory failure and action slips usually happen before the person is made aware of them by experiencing the consequent error caused by the problem, it is unlikely that people will actively seek help from memory aids in these cases, unless the memory aid is proactive and intelligent enough to understand what is going on. Lamming et al [18] also did a diary study to explore possible memory problems during work, and found that the most frequently occurring memory problems include: forgetting one’s name, forgetting a paper document’s location, and forgetting a

word or phrase. Prospective memory problems were also found to be frequent and usually severe. The diary study by Hayes et al [19] took a more direct approach and explored the situations in which people wanted to use their memory aid tool, a mobile audio recoding device called Audio Loop, to recollect the recorded past. The questions in their diary study not only included memory failures, but also how much time the participants would be willing to spend on recalling such content. Their results showed that for neural events, people would spend an average of 336 seconds (σ = 172) to find the required information from voice records. 62% of the reported reasons for returning to an audio recoding were because of “cannot remember”, 33% out of 62% was transience type retrieval failure, while 29% out of 62% were due to failure of encoding (e.g. absent-mindedness). Another 26% of their reasons for searching recorded audio were to present the records themselves to other people. And finally 12% of recordings were marked as important before recording. While the reasons for rehearsing these predicted important records were not described, these results indicate that important events are likely to be reviewed, and that people may want to “rehearse” recoding of important parts to consolidate their memory of information encountered during the period. Due to limitations of the information they record (selective audio recording), and the specific tool they use, the scenarios in which people may need memory aids might be limited. For example, when the experience is largely made of visual memory, audio records may not be helpful and not be desired.

3.3 Summary While all of the above studies successfully discovered some daily memory problems, the non-monitored self-reporting approach is limited in that the people can only report their needs for memory support when they are aware of a difficulty in retrieving a memory. While it is true that people may only seek help for specific parts of their memory when they realize that they have problem in recollecting these pieces of information from their memory, they are not always very clear as to what they actually want to retrieve until they bring back the piece of memory. For example, sometimes people just want to review (mentally) some past episodes for fun or because of nostalgia. They usually look at some photos or objects which are related to past events, and which bring them more vivid memories of past experiences. Due to the richness of data, lifelogs can provide more details about the past than any physical mementos can do.

In short, lifelogs are a good resource for supporting retrospective memory problems, including those we have gradually forgotten, distorted, or we missed while encoding. Consolidating memory of useful information cam also can also be used to provide digital copies of episodes (e.g. when we need to give a video record of a meeting to some one who failed to attend), or provide memory cues to trigger a person’s organic memory about the original information, experiences, emotion, or even thoughts. Lifelogs might also be able to improve a subject’s memory capability by training them to elaborate or associate pieces of information. Indeed, supporting people’s memory is not only a matter of finding the missing or mistaken parts of memory for them but also improving their long term memory capabilities. It has been argued that the better memory is often related to the ability to associate things, and make decisions of which information to retrieve. For example, older people usually have less elaborated memory [20].

In the study by [21], psychologists found a tendency for people with highly elaborated daily schemas to recall more activities from last week better than people with poorly elaborated schemas. Therefore, memory-supporting tools may be able to assist people to associate things in order to elaborate and consolidate their memories, and which can facilitate retrieval by strengthening the links between memories and the cues that life logs systems can provide, and potentially enhancing their efficiency at performing various tasks.

4. GUIDELINE FOR DEVELOPING LIFE LOG APPLICATIONS Based on the previous sections, lifelogs should be able to provide the following:

• Memory cues, rather than external copies of episodic memory.

• Information or items themselves: semantic memory support, when one needs to exact details about previous encountered information, or when one needs the original digital item, e.g. a document.

Whether it is the information itself which is needed, or the target triggered memory, it is important that these items or this information can be retrieved when needed, and that relevant retrieved results can be recognized by the user. Indeed, what to retrieve and even what to capture and store in life logs depends on what needs to be presented to the user to serve the desired memory aid functions.

4.1 Presenting There are basically two rules for presenting information:

1. Provide useful information as memory cues

When items are presented to the user, it is desirable that the information shown can be recognized by the user as what they want, and that if the retrieval targets are cues that are expected to be useful to triggers to the user’s own memory about something, e.g. experiences which cannot be copied digitally, it is also essential that the retrieved targets are good memory cues for the memory that the user wants to recall, e.g. the memory of an experience.

Lamming et al. [18] suggested that memory supporting tools should not only provide episodes or information one forgets, but also episodic cues including other episodes with the temporal relationships among them, together with information about the characteristics of these episodes. It is suggested in [8] that the features usually visible in episodic memory cues are: who (a face, any people in the background), where (a room, objects and landmarks in the environment), when (time stamped, light conditions, season, clothing and hair styles), and what (any visible actions, the whether, etc.)

2. Avoid information overload

It is also necessary to avoid information overload when presenting material as a memory aid. In [22], it was found that when unnecessary information is reduced and important parts of information are played more slowly, their memory aid application achieved its best results. We suggest that text or static images

which can be used as a summary of events, can also be good at reducing information overload compared to viewing videos (e.g.[10]). This requires that the system either to detect important parts, or digitize and textualize describable features of physical world entities or events should be digitized to facilitate retrieval. The term digitalize in this paper means represent the existence of physical world entity as digital items, e.g. an image or a line of data in the database. These can be searched directly using certain features (cues), rather than with the features of episodes in which such information is encountered, e.g. features of a person and a corresponding profile. Overall appropriate cues really depend on what people tend to remember. Therefore it is important to explore the question of what people usually remember about the target.

4.2 Data Capture In principle, the more information that is captured and stored in lifelogs, the greater will be the chance that the required information can be found in the data collection. However, the more data that is collected the more the noise level may also increase and impose a greater burden on the life logger. In order for a PLL to support the above memory augmenting functions, the following data channels are needed:

1. Visual

For the majority of individuals, most of our information is inputted via our eyes, therefore it is important that encountered visual information be captured. While video can capture almost every moment when it is recording, watching video streams is a heavy information load. However, browsing static images or photos can be much easier job. Some automatic capturing cameras have been proved to provide rich memory cues [23]. The Microsoft SenseCam is one such wearable camera which automatically captures images throughout the wearer’s day. It takes VGA quality images at up to 10 images per minute. The images taken can either be triggered by a sensed change in environment or by fixed timeout. Other examples include the Eye Tap [24] and the other Brother [25].

2. Speech

Another important source of information in daily life comes from audio. For example, much useful information comes from conversations. However, as mentioned previously continuous audio recording has been argued to be intrusive and unacceptable to surrounding people. For this reason, it is difficult to carry out continuous audio recording. Some existing studies, such as [9] discussed early, record audio for limited significant periods, however we chose not to do this since this requires active decisions of when to begin and end capture and careful choice of when to do this to avoid privacy problems. We preferred to continuous and passive capture modes which are non-intrusive. An alternative source of much of the information traditionally conveyed in spoken conversation is now appearing in digital text communications as described in the next section.

3. Textual (especially from digital born items):

Nowadays, we communicate more and more with digital messages (email, instant message, and text message). These content sources contain an increasing portion of the information used in daily life which used to come from spoken conversations.

These digital resources, usually in the form of text, have less noise from surrounding environment and irrelevant people, and therefore have less likelihood of intruding on a third person’s privacy. Text extracted from communication records (e.g. emails, text messages) can be even used to assist narrative events and represent computer activities to trigger related episodic memory (e.g. [10]).

4. Context

As mentioned earlier, context information such as location and people presented can provide important memory cues for events [26]. Therefore they are both important for presenting events and can be useful for retrieving items related to events.

4.3 Retrieval The final and possibly most challenging component of an augmented memory application built on a PLL is retrieval. It is essential that useful information be retrieved efficiently and accurately from a PLL archive in response to the user’s current information needs. In order to be used most efficiently by the user, retrieval must have a high level of precision so as not to overload the user’s working memory. It is recognized that a key feature of good problem solving is the ability of an individual to retrieve highly relevant information so that they do not have to expend effort on selecting pertinent information from among related information which is not of direct use in the current situation. Being able to filter non-relevant information is an important feature of good problem solving.

Finding relevant information in such enormous data collections to serve a user’s needs is very challenging. The characteristics of PLLs mean that they provide a number of challenges for retrieval which are different to those in more familiar search scenarios such as search of the World Wide Web. Among these features are that: items will often not have formal textual descriptions; many items will be very similar, repeatedly covering common features of the user's life; related items will often not be joined by links; and the archive will contain much non-useful data that the user will never wish to retrieve. The complex and heterogeneous natures of these archives means that we can consider them to be a labyrinth of partially connected related information [27]. The challenge for PLL search is to guide the owner to elements which are pertinent to their current context, in the same way as their own biological memory does in a more in complex and integrated fashion. Traditional retrieval methods require users to generate a search query to seek the desired information. Thus they rely on the user’s memory to recall information related to the target in order to form a suitable search query. Often however the user may have a very poor recollection of the item from their past that they wish to locate. In this case, the system should provide search options of features that people tend to remember. For example, the location and people attending an event may be well remembered, thus the search engine should enable search using this information. In fact, the user may not even be aware of or remember that an item was captured and is available for retrieval, or even that a particular event occurred at all, so they won’t even look for this item without assistance.

We can illustrate some of the challenges posed by PLLs retrieval using an example. Consider a scenario where someone is looking for a particular photo from her PLL archive. All she remembers about the picture is that last time she viewed it, the sun was

glaring in the window and she was talking on the phone to her friend Jack. Conventional search techniques would not be capable of retrieving the correct photo based on these context criteria that are unrelated to its contents. Use of the remembered context would enable her to search for pictures viewed while speaking with Jack while the weather was sunny. The notion of using context to aid retrieval in this and other domains is not new. Context is a crucial component of memory for recollection of items we wish to retrieve from a PLL. In previous work we examined the use of forms of context data, or combinations of them, for retrieval from a PLL [28]. This work illustrated that in some situations a user can remember context features such as time and location, much better than the exact content of a search item, and that incorporating this information in the search process can improve retrieval accuracy when looking for partially remembered items.

Ideally, as argued by Rhodes [29], a memory augmentation system should provide information proactively according to the user’s needs in their current situation. Many studies on ubiquitous computing have been devoted to research into detecting events. For example, retrieving an object related to a recording when someone touches an object for which the sensor information is passed to the retrieval system as a query. Another system called Ubiquitous memories [7] automatically retrieves target objects related to a video recoding which automatically tagged when touching the object. Face detection techniques are used in [8] to tag a person related to a memory, and enable automatic retrieval of personal information triggered by detecting of the face.

Satisfying the need for high precision retrieval from PLLs discussed earlier requires search queries to be as rich as possible by including as much information as possible about the user‘s information need, and then to exploit this information to achieve the highest possible effectiveness in the search process. Our underlying search system is based on the BM25F extension to the standard Okapi probabilistic information retrieval model [30]. BM25F is designed to most effectively combine multiple fields from documents (content and context) in a theoretically well motivated way for improved retrieval accuracy; BM25F was originally developed for search of web type documents which, as outlined above, are very different to the characteristics of a life log. Thus we are also interested in work such as [31] which explores ways of combining multiple fields for retrieval in the domain of desktop search. Our current research is extending our earlier work, e.g. [28], to investigate retrieval behaviour using our experimental PLL collections to explore new retrieval models specifically developed for this data. In addition, PLL search can also include features such as biometric measures to help in location of highly relevant information [4].

5. iCLIPS - A PROTOTYPE PLL SEARCH SYSTEM The iCLIPS project at DCU is developing technologies for effective search of PLLs. To support this research, three researchers are carrying out long term lifelog data collection. As outlined in Section 2, these collections already include 20 months of data, including visual capture of the physical world events with Microsoft SenseCams [32], full indexing of accessed information on computers and mobiles phones, and context data including location via GPS and people with Bluetooth. The Microsoft

SenseCam also captures sensor information such as light status and movements (accelerometer). Our system indexes every computer activity and SenseCam image with time stamps and context data including location, people, and weather. It enables search of these files by textual content and above context such. Part of our work continues to focus on the development of novel effective search algorithms to best exploit content and context for PLL search. The other focus of the project is the development of a prototype system to explore user interaction with a PLL to satisfy their desire for information derived from their previous life experiences.

One of the reasons for the success of popular established search engines such as Google is that their interface is simple to use. Once a few concepts have been understood users are able to use these search engines to support their information search activities. However, simple interfaces to existing collections work well to a large extent due to the features of the data being searched and the background of the users. In the case of web search engines the domain knowledge, search experiences and technical background of searchers is very varied. However, the size of the collection

being searched with its inherent redundancy of data with information often repeated in different forms in multiple documents meaning that pieces of information are accessible from different sources using a wide range of queries from users with differing linguistic sophistication and knowledge of the domain. Additionally in the case of the web link structures generated by the community of web authors can be exploited to direct searchers to authoritative or popular pages. In the case of specialised collections such as medical or legal collections, users are typically domain experts who will use a vocabulary well matched to that in documents in the collection. As outlined in Section 5.3 the characteristics of PLL collections are quite different to conventional search collections. An interface to search a PLL collection requires that the user can enter queries using a range of content and context features. The memory association between partially remembered life events means that more sophisticated interfaces supported browsing of the PLLs using different facets are likely to be needed to support the satisfaction of user information needs. Essentially users need an interface to enable them to explore the labyrinth of their memory using different recalled facets of their experiences.

Figure 1. Sample iCLIPS interface Figure 1 shows our prototype interface for use of a PLL as a daily memory aid for normal people. In particular, it aims to serve the functions of: providing specific information or digital items to supplement the parts of memory which are not available to be retrieved; providing cues of specific episodes to assist the user to rehearse experiences during that period. It also seeks to assist users in improving memory capability though repeatedly associating events or information. This interface requires user effort to look for or choose the information to be

presented, thus both searching and browsing panels are included.

Search

The interface provides a range of search options to cater for the different types of information people may be able to recall about the episodes or items, such as location, people present, weather conditions and date/time. We understand the burden of trying to recall and enter all of these details for a single search, so we adopt the virtues of navigation, and put more weight on the

presentation and browsing of results. This is particularly important in cases where over general search queries may bring too many results for easy presentation. For example, sometimes people just want to have a look at what happened during certain periods, e.g. when they were in middle school, and enter a time-based query: year 1998, this may result in huge amount of result data being retrieved which must then be explored by the user.

Navigation

To avoid information overload when there are a large number of items as results, and provide instant memory cues for each small step, we adopt the advantages of location-based hierarchical folder structures to let users navigate and browse search results which are grouped either temporally or by attributes such as location or core people attended. Based on psychology literature, we believe that when, where and who are well remembered features of episodes, therefore grouping items based on these features makes it easier for users to remember and know where there target is. It also enables them to jump to other results which have similar attributes (e.g. in the same location, with same group of people). By doing so we also expect the system to help people remember more context data for each event or item, generating more useful associations in their memory and elaborating them.

Presenting results

While presenting the results, we provide context cues to help people recognize their target and related folders more easily. Since temporally adjacent activities are argued to be good episodic memory cues, the system enables preview of folders by presenting landmark events or computer activities (if there are any) on a timeline. A “term cloud” (a group of selected keywords, similar to a conventional “tag cloud”) of the computer activities is also presented in the form of text below the timeline, by clicking a word, its frequency of appearance is displayed. Again this is designed to provide more memory cues for recalling what the user was doing with documents which contain such keywords. For example, one may remember that the target needed was previously encountered during the period when he/she read a lot about “SenseCam”. The name of the location and the people are also included in the term clouds for the same reason.

Due to the complex functions provided in the interface, it is not suitable for portable or wearable devices. Thus it is not aimed at solving memory problems which need solution urgently while the person is away from computers. Alternative interfaces potentially automatically taking account of current user context (location, associates nearby, and time) would be needed for mobile interaction is planned to a part of our further study.

We are currently undertaking user studies to evaluate the prototype system. These evaluations include the reliability with which episodes in the results can be recognized from the features presented to the searcher, whether they feel that it is easy to recall at least one piece of information required by the search fields, and the effectiveness of the retrieval algorithms. If these functions are fully working, we can explore how the life loggers prefer to use these data in supporting their memory, and what functions they may want to use in different situations, with our system and our data collection.

6. CONCLUSIONS AND FURTHER WORK In conclusion, developments in digital collection and storage technologies are enabling the collection of very large long term personal information archives in the form of PLLs storing details of an individual’s life experiences. Combining these with effective tools for retrieval and presentation provides the potential for memory aid tools as part of the augmented human.

Effective solutions will enable user’s to confirm partially remembered facts from their past, and be reminded of things they have forgotten about. Applications include recreational and social situations (e.g. sharing details of a life event), being reminded of information in a work situation (e.g. previous meetings with an individual, being provided with materials encountered in the past), and potentially for more effective problem solving. Integrating these technologies to really support and augment humans requires that we understand how memory is used (and how it fails), and to identify opportunities for supporting individuals in their life activities via memory aids. The iCLIPS project is seeking to address these issues by developing technologies and protocols for collection and management, and for effective search and interaction with PLLs.

Our current work is concentrated on completing our prototype system to explore memory augmentation using long-term PLL archives. Going forward we are seeking methods for closer integration between PLLs, the search process and human use of memory, possibly involving mobile applications and presentation of using emerging display technologies such as head up displays and augmented reality.

7. ACKNOWLEDGMENTS This work is supported by grant from Science Foundation Ireland Research Frontiers Programme 2006. Grant No: 06/RFP/CMS023.

8. REFERENCES [1] Byrne, D. and Jones, G., "Creating Stories for Reflection

from Multimodal Lifelog Content: An Initial Investigation," in Designing for Reflection on Experience, Workshop at CHI 2009, Boston, MA, U.S.A., 2009.

[2] Berry, E., et al., "The use of a wearable camera, SenseCam, as a pictorial diary to improve autobiographical memory in a patient with limbic encephalitis: A preliminary report," Neuropsychological Rehabilitation: An International Journal, vol. 17, pp. 582 - 601, 2007.

[3] Bell, G. and Gemmell, J., Total Recall: Dutton 2009. [4] Kelly, L. and Jones, G., "Examining the Utility of

Affective Response in Search of Personal Lifelogs," in 5th Workshop on Emotion in HCI, British HCI Conference, Cambridge, U.K, 2009.

[5] Devaul, R. W., "The memory glasses: wearable computing for just-in-time memory support," Massachusetts Institute of Technology, 2004.

[6] Lee, H., et al., "Constructing a SenseCam visual diary as a media process," Multimedia Systems, vol. 14, pp. 341-349, 2008.

[7] Kawamura, T., et al., "Ubiquitous Memories: a memory externalization system using physical objects," Personal Ubiquitous Comput., vol. 11, pp. 287-298, 2007.

[8] Farringdon, J. and Oni, V., "Visual Augmented Memory (VAM)," presented at the Proceedings of the 4th IEEE International Symposium on Wearable Computers, 2000.

[9] Vemuri, S., et al., "iRemember: a personal, long-term memory prosthesis," presented at the Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences, Santa Barbara, California, USA, 2006.

[10] Lamming, M. and Flynn, M., "Forget-me-not: intimate computing in support of human memory," in Proceedings FRIEND21 Symposium on Next Generation Human Interfaces, Tokyo Japan, 1994.

[11] Byrne, D., et al., "Multiple Multimodal Mobile Devices: Lessons Learned from Engineering Lifelog Solutions," in Handbook of Research on Mobile Software Engineering: Design, Implementation and Emergent Applications, ed: IGI Publishing, 2010.

[12] Schacter, D. L., The seven sins of memory. . Boston: Houghton Mifflin, 2001.

[13] Baddeley, A., "The episodic buffer: a new component of working memory?," Trends in Cognitive Sciences, vol. 4, pp. 417-423, 2000.

[14] Baddeley, A. D., et al., "Working Memory," in Psychology of Learning and Motivation. vol. Volume 8, ed: Academic Press, 1974, pp. 47-89.

[15] Anderson, J. and Bower, G., Human associative memory: A brief edition: Lawrence Erlbaum, 1980.

[16] Loftus, E., "Memory Distortion and False Memory Creation," vol. 24, ed, 1996, pp. 281-295.

[17] Elsweiler, D., et al., "Towards memory supporting personal information management tools," J. Am. Soc. Inf. Sci. Technol., vol. 58, pp. 924-946, 2007.

[18] Lamming, M., et al., "The Design of a Human Memory Prosthesis," The Computer Journal, vol. 37, pp. 153-163, March 1, 1994 1994.

[19] Hayes, G. R., et al., "The Personal Audio Loop: Designing a Ubiquitous Audio-Based Memory Aid," ed, 2004, pp. 168-179.

[20] Rankin, J. L. and Collins, M., "Adult Age Differences in Memory Elaboration," J Gerontol, vol. 40, pp. 451-458, July 1, 1985 1985.

[21] Eldridge, M. A., et al., "Autobiographical memory and daily schemas at work," Memory, vol. 2, pp. 51-74, 1994.

[22] Hirose, Y., "iFlashBack: A Wearable Electronic Mnemonics to Retain Episodic Memory Visually Real by

Video Aided Rehearsal," presented at the Proceedings of the 2005 IEEE Conference 2005 on Virtual Reality, 2005.

[23] Sellen, A. J., et al., "Do life-logging technologies support memory for the past?: an experimental study using sensecam," presented at the Proceedings of the SIGCHI conference on Human factors in computing systems, San Jose, California, USA, 2007.

[24] Mann, S., "Continuous lifelong capture of personal experience with EyeTap," presented at the Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences, New York, New York, USA, 2004.

[25] Helmes, J., et al., "The other brother: re-experiencing spontaneous moments from domestic life," presented at the Proceedings of the 3rd International Conference on Tangible and Embedded Interaction, Cambridge, United Kingdom, 2009.

[26] Tulving, E., Elements of episodic memory: Oxford University Press New York, 1983.

[27] Kelly, L. and Jones, G. J. F., "Venturing into the labyrinth: the information retrieval challenge of human digital memories," presented at the Workshop on Supporting Human Memory with Interactive Systems, Lancaster, UK, 2007.

[28] Kelly, L., et al., "A study of remembered context for information access from personal digital archives," presented at the Proceedings of the second international symposium on Information interaction in context, London, United Kingdom, 2008.

[29] Rhodes, B. J., "The wearable remembrance agent: a system for augmented memory," presented at the Proceedings of the 1st IEEE International Symposium on Wearable Computers, 1997.

[30] Robertson, S., et al., "Simple BM25 extension to multiple weighted fields," presented at the Proceedings of the thirteenth ACM international conference on Information and knowledge management, Washington, D.C., USA, 2004.

[31] Kim, J., et al., "A Probabilistic Retrieval Model for Semistructured Data," presented at the Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, Toulouse, France, 2009.

[32] Gemmell, J., et al., "Passive capture and ensuing issues for a personal lifetime store," presented at the Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences, New York, New York, USA, 2004.

Aided Eyes: Eye Activity Sensing for Daily Life

Yoshio Ishiguro†, Adiyan Mujibiya†, Takashi Miyaki‡, and Jun Rekimoto‡,§

†Graduate School of Interdisciplinary Information Studies,The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo Japan

‡Interfaculty Initiative in Information Studies,The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo Japan

§Sony Computer Science Laboratories, 3-14-13 Higashigotanda, Shinagawa, Tokyo, Japanishiy, adiyan, miyaki, [email protected]

ABSTRACTOur eyes collect a considerable amount of information whenwe use them to look at objects. In particular, eye movementallows us to gaze at an object and shows our level of interestin the object. In this research, we propose a method thatinvolves real-time measurement of eye movement for humanmemory enhancement; the method employs gaze-indexedimages captured using a video camera that is attached tothe user’s glasses. We present a prototype system with aninfrared-based corneal limbus tracking method. Althoughthe existing eye tracker systems track eye movement withhigh accuracy, they are not suitable for daily use becausethe mobility of these systems is incompatible with a highsampling rate. Our prototype has small phototransistors,infrared LEDs, and a video camera, which make it possibleto attach the entire system to the glasses. Additionally, theaccuracy of this method is compensated by combining imageprocessing methods and contextual information, such as eyedirection, for information extraction. We develop an infor-mation extraction system with real-time object recognitionin the user’s visual attention area by using the prototypeof an eye tracker and a head-mounted camera. We applythis system to (1) fast object recognition by using a SURFdescriptor that is limited to the gaze area and (2) descrip-tor matching of a past-images database. Face recognitionby using haar-like object features and text logging by usingOCR technology is also implemented. The combination ofa low-resolution camera and a high-resolution, wide-anglecamera is studied for high daily usability. The possibility ofgaze-guided computer vision is discussed in this paper, as isthe topic of communication by the photo transistor in theeye tracker and the development of a sensor system that hasa high transparency.

Categories and Subject DescriptorsH.5.2 [Information Interfaces and Presentation]: UserInterfaces—Theory and methods

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Augmented Human Conference April 2-3, 2010, Megève, FranceCopyright 2010 ACM 978-1-60558-825-4/10/04 ...$10.00.

General TermsInformation extracting for lifelog

KeywordsEye tracking, Lifelog computing, Gaze information

1. INTRODUCTIONLifelog systems have been a topic of considerable research

[3]. The development of a lifelog computing system has ledto the hope that the human memory can be augmented. Forextracting beneficial information from augmented humanmemory, we consider the “five W’s and one H” (Who, What,When, Where, Why, and How). These provide very impor-tant contextual information. Location estimation methodscan answer “Where” [12], and a wearable camera can pro-vide the other information. However, we cannot accuratelydetect a person’s actions by using only image information.According to visual lifelog researches, it is definitely neces-sary to extract important parts of life events from enormousamounts of data, such as people, objects, and texts that wepay attention to. Therefore, we consider using eye activityfor obtaining contextual information.

Eye tracking has been extensively studied in medical, psy-chological, and user interface (UI) researches [5, 6] for morethan a century. The study of eye tracking has provided uswith a considerable amount of information such as gazedobject, stress, concentration ratio, and degree of interest inthe objects [4]. Interaction research using eye tracking hasbeen studied. In particular, in this field, wearable comput-ing research has been actively studied using eye movements(gaze information) because wearable devices allow intuitiveand free-hand control [17].

Even though the current eye tracking method was devel-oped several decades ago, it still involves the use of headgearwith horns that are embedded camera or requires electrodesto be pasted on the user’s face and/or other large-scale sys-tems to be used for a psychological experiment. In otherwords, this system currently cannot be used for daily activi-ties. In this case, a “daily usable system”means a commonlyacceptable system that can be used in public in daily life.Moreover, making the system accurate as well as portableis a complicated task. A daily usable system for eye activ-ity sensing could be utilized in many research areas such aswearable computing.

•

•

Figure 1: Concept of the eye enhanced lifelog computing

In this research, for human memory enhancement, weexamine a method to extract significant information fromlarge-scale lifelog data by using eye activity information.We develop a new method that involves real-time measure-ment of eye movements for automated information extrac-tion. The method makes use of gaze-indexed images cap-tured by a video camera and an eye tracker with a low ac-curacy but a high wearability ; both the camera and the eyetracker are attached to the user’s glasses.

2. EYE-ENHANCED LIFELOG COMPUTINGLifelog with a surrounding video image can give us a con-

siderable amount of information. On the other hand, hu-mans obtain surrounding information from their eyes andgaze at interesting objects. However, it is impossible torecord this type of information by using only a camera. Con-sequently, first, the gazed information is detected from avideo. Then, the gazed objects and the user’s state are ex-tracted from the video and the eye movement. After this,related information is retrieved by the extracted informa-tion. Finally, the results are added to lifelog as shown inFigure 1. For such reasons, we need to record three types ofeye activity —gaze direction, eye movement, and eyeblink frequency— for using lifelog and UI methodology.Details of each type of eye activity are explained in this sec-tion.

2.1 Gaze DirectionIt is difficult to extract significant information from the

video image of large-scale personal life log data. For exam-ple, omnidirectional-camera video images contain a consid-erable amount of information that is not related to humanmemories; the camera image may not relate to the gazedobject. Therefore, it is difficult to know which objects arebeing focused on only from the images. In this research, ob-taining a video lifelog with gaze information is our objective.Gazed objects such as faces and texts are extracted from avideo lifelog, and this information is used for understandingwhom you met, what you saw, and what you were interestedin.

Gaze direction is used for pointing in the UI research area;however, it is well known that has “Midas touch problem”and that it is difficult to use gaze direction without a triggersuch as a key input [8].

2.2 Eye MovementNot only the gaze direction but also the eye movement has

meaning. In particular, microsaccade indicates the targetof one’s potential interest [4]. The microsaccade is a veryquick eye movement, almost 600/s, and it is a spontaneousmovement caused when the eye gazes at stable targets. Thefrequency and direction of this movement change dependingon the person’s interest in the target. The measurement ofthis movement makes it possible to know human suscepti-bilities. The holding time of a gazed object is a consciousmovement, and the saccadic movement is an unconsciousmovement. Therefore, it is possible to extract more informa-tion from a susceptible mind by the measurement of saccadicmovements.

2.3 Eye Blink FrequencyPrevious research shows that eye movements can provide

information about a person’s condition. For example, theeye blink frequency shows the person’s degree of concentra-tion on his/her work [15].

The eye blink frequency decreases when a person concen-trates on his/her work. In contrast, the frequency increaseswhen he/she is not very focused on his/her work. Therefore,the measurement and detection of the eye blink frequencycan estimate the person’s level of concentration. The eyeblink has several characteristics. It is a motion that is ap-proximately 150 ms fast. An involuntary eye blink is anautomatic eye blink that has a shorter motion time thanthe voluntary eye blink, which is a conscious eye blink.

3. DESIGN OF EYEGLASS-EMBEDDED EYEACTIVITY SENSOR

3.1 Requested Specification for Eye SensingThe capability requirement is discussed in Section 2. Eye

movements are typically classified as ductions, versions, or

Figure 2: Prototype eye gaze recognizer with camera for lifelog

vergences. Eye movement has several moving speeds. Thereare several types of high-speed eye movements. For exam-ple, the microsaccade frequency is more than 1000 Hz, andthe eye blink speed is around 150 ms. The method must dis-tinguish precisely between eye movement and blinks for anaccurate detection of eye movements. Further, the humanview angle is almost 160 for each eye. Therefore, a 5 res-olution is sufficient for information extraction because thissystem aims not only to achieve a high accuracy but alsoextract information by a daily usable system using a com-bination of eye activity information and image processingmethods.

3.2 Eye-tracking Technology CandidatesThere are several types of eye trackers. In this study, we

consider in four different trackers:

Camera based system: The video-based systems [9, 11]can capture a gaze texture image. This is the most com-monly used tracker; however, it requires an extremely so-phisticated optics system having a light source, lenses, andhalf mirrors. Additionally, it requires a large (table topsize) measurement system for quick eye movements (over1000 Hz). Scale-wise, it is possible to develop a smallersystem; however, currently, such a system cannot measurehigh-speed eye movements.

Search coil and Optical lever: These methods [13, 18]are used for laboratory experiments in a certain region ofspace. However, these methods are not user friendly as theusers are expected to wear special contact lenses that usinga negative pressure on their eyes.

Electrooculogram (EOG): Eyes have a steady electricpotential field, and this electric signal can be derived by us-ing two pairs of contact electrodes that are placed on theskin around one eye. This is a very lightweight approach [2]and can work if the eyes are closed. However, it requires aneye blink detection method and has other issues. For exam-ple, an electrode is required and is affected by electronoise.

Infrared corneal limbus tracker: An infrared corneallimbus tracker [14] is also a very lightweight tracker. It canbe built by using a light source (infrared LED) and lightsensors (phototransistor) and only requires very low com-putational power. This approach is also affected by noisefrom environmental light. However, this is a very simpleapproach; no electrodes are required. This approach can

sufficiently detect eye blinks. Therefore, it has a high con-structability for daily use.

Therefore, we use an “infrared corneal limbus tracker” inour study. This method has a lower accuracy than themethod of search coil and optical lever. However, our pur-pose is to extract significant information; hence, the accu-racy of this method can be enhanced by combining imageprocessing methods and contextual information such as eyedirection.

3.3 Prototype of Eye Activity SensorFour phototransistors and two infrared LEDs are mounted

on the eye glassed as shown in Figure 2. A small camera ismounted on the glasses for recording surrounding informa-tion, and not for eye tracking. An infrared LED and fourphototransistors are mounted inside of the glasses.

The infrared light is reflected by the eye surface and isreceived by the phototransistor. These sensor values throwto instrumentation amplifier and analog/digital (AD) con-version, then input to the microprocessing unit (MPU). Inthis study, ATmega128 from Atmel is used for the MPU andAD conversion. The MPU clock frequency is 16 MHz, andthe AD conversion time is 16μs per channel.

Before the measurement, the head position and the displayare fixed for a calibration, and then, the display shows thetargets to be gazed in the calibration (Figure 3). The sensorwearer gazes at the target object on the display, and theMPU records the sensor value. One target has 240 points(W 20 points x H 12 points) and each points are gazed for 1second. After the calibration, the system estimates the gazedirection by using the recorded data. The recorded dataand sensor value are compared first. Then, the center ofgravity is calculated from the result in order to estimate thegaze direction. Simple method is enough for this researchbecause only gaze area in the picture is needed to know forusing information extraction system.

3.4 Life Events Extracting SystemWhen an infrared limbus tracking method is used, the

sensor value is changed rapidly by eye blinking. The speedis approximately 150 ms, as shown in Figure 4. Therefore,the system can simply distinguish between blinks and othereye movements. Further, the system extracts informationas face, texts, and preregistered objects. Pre-registered ob-jects are recognized in real time by the user’s visual attentionarea. We use fast object recognition by using the SURF [1]

Figure 3: A calibration method for the gaze recog-nizer system. The head position and the display arefixed for a calibration and then the display showstargets. A user gazes target object on the displayand MPU records sensor value.

Figure 4: An example of a fluctuation in the sensordata by an eye blink

Figure 5: Image feature extraction by SURF [1], forreal time object recognition

Figure 6: An example graph of eye blink frequency

descriptor for matching images that is limited to the gazedarea with the past-images database (Figure 5).

Face recognition using haar-like objects by “OpenCV Li-brary1” is implemented for logging of “When I meet some-one.” This method can extract the human face first, andthen the system records the time, location, and face image.

Additionally, text logging with the OCR technology“tesseract-ocr2” is implemented. This system can extract a clipped im-age, wherein it is clipped that gazed area of head-mountedcamera image. This system attempts to extract text fromthese clipped images. Finally, the extracted text is recordedalong with time and location data for life logging.

4. CASE STUDY USING PROTOTYPE SYS-TEM

4.1 Preliminary ExperimentAn infrared limbus tracker is a commonly used tracker;

therefore, the detail of hardware evaluation experiment isspared. We checked the specifications of the proposed proto-type system. More than 99% of the eye blinks were detectedin 3 min. Very slow eye blinks caused the 1% failure of de-tection. The gaze direction angle was 5, and the processingrate was set as 160 Hz in the preliminary experiments.

4.2 Concentration State MeasurementsOur system can detect eye blinks with a high accuracy.

We recorded the eye-blink detection and the user’s tasks forapproximately 1 hour, as shown in Figure 6. The resultsshowed that the eye blink frequency changed with a changein the tasks. The frequency was slower when the user con-centrated on the objects. Therefore, the system could tellthe user’s concentration states and we consider that can usefor human interface technique such as displaying and anno-tation.

4.3 Life Event ExtractionThe proposed method in this study extracts pre-registered

objects, human face, and character by using images and eyegaze information. Figures 7 and 8 show the extraction ofobjects such as posters. In this situation, the user observeseach poster of the 100 pre-registered posters in the room.

1http://opencv.willowgarage.com/wiki/2http://code.google.com/p/tesseract-ocr/

Figure 7: Photographs of experimental environment

Figure 8: Object recognition scene by proposed sys-tem. This figure shows that the object recognitionsystem can identify two different objects next eachother.

The IDs of these extracted objects are logged with time, ac-tual images, and the eye direction when the system detectsthe pre-registered objects, as shown in Figure 9. Figure 10shows the optical character reading of the gazed informa-tion. An image of the gazed area is clipped, characters areextracted from the clipped image. Additionally, the faceimage is extracted along with the actual time, as shown inFigure 11. Usually, when multiple people stand in front ofthe camera, such as in a city or a meeting room, the normalrecorded video image does not tell you who you are lookingat. However, this method can pick up who you are lookingat by using gaze information. Our system can handled withmultiple objects that shown up in head-mounted camera.Finally, these three pieces of data are logged automatically.

5. HIGHER REALIZATION OF DAILY US-ABILITY AND FUTURE POSSIBILITES

From these case studies, it is concluded that informationextraction by means of image processing requires the use ofa wide-angle, high-resolution camera for providing more ac-curate information. However, it is difficult to mount such adevice on a person’s head. Moreover, the prototype of theinfrared limbus tracker is very small, but the phototransis-tors obstruct the user’s view. In this section, a combinationof a wide-angle, high-resolution camera and a head-mountedcamera along with the limbus tracker structure without pho-totransistors is discussed.

5.1 Combination with High Resolution WideAngle Camera

Having a large-size camera such as a commercially used

Figure 9: Gaze direction and extraction results (ID0 means no objects was extracted)

Figure 10: An example image of OCR extraction forclipped image by gaze information using tesseract-ocr

Figure 11: An example image of face extraction.Faces are extracted from clipped image of head-mpunted camera by gaze information.

Figure 12: An example image of view point in wide-angle camera by head-mounted camera. Gazed po-sition in head-mounted camera is known, thus it ispossible to project gaze position to high resolutioncamera image by using position relation of two im-ages.

USB camera mounted on the head interferes in daily commu-nication. Therefore, we embed a very small, poor-resolutioncamera for capturing the surrounding information in the eyetracker. Hence, this camera can be integrated into the user’seye glasses and can capture the user’s actual view. Onthe other hand, the small camera has a very poor perfor-mance, and it is difficult to obtain a high frame rate anda high resolution by using such a camera. Therefore, theimage processing of information extraction methods is attimes not possible. Therefore, we consider a strap-on cam-era (such as SenseCam [7] that can be dangled around one’sneck) that has fewer problems than a head-mounted cam-era. Strap-on cameras do not disturb any communicationand can be attached to the body more easily than a head-mounted camera. Therefore, we can use a high-resolutioncamera with wide-angle lens. This prototype system com-pares the SURF descriptor between the head-mounted cam-era and the strap-on camera and then calculates the homog-raphy matrix. From the results, we can identify the focusof the head-mounted camera from the strap-on camera’s im-ages. As a result, a high-resolution image can be used forthe information extraction, as shown in Figure 12.

5.2 Improving Transparency of Eye-trackerDeveloping a new system for daily use that is so comfort-

able that the user is not even aware of wearing it is ourlong-term objective. The infrared limbus tracker has a verysimple mechanism; therefore, it has highly possibility thatmodification of camera based system. This tracker does notrequire lens and focal distance. The camera-based systemcan use a half mirror to see the eye image; however, the sys-tem has to be in front of the eyes, as shown in Figure 13.

Because of the above-mentioned reasons, we consider atransmissive sensor system. The infrared limbus trackerdoes not have a focal point unlike a camera, and it is easy

Figure 13: Illustrations of transparent infraredcorneal limbus tracker

to design the light path, as explained in Figure 13. In thisfigure, acrylic boards (refractive index = 1.49) are cham-fered at approximately 30, and an infrared reflection filteris placed in between. The infrared light reflected by the eyecompletely reflected in the acrylic material, and then, thelight is received by the phototransistor that is placed out ofthe user’s view.

5.3 Modulated Light for Robustness Improve-ment and Using for Information Transmis-sion

Since the infrared cornea limbus tracker is affected by theenvironmental light, this method needs to be devised suchthat the infrared light can be modulated for a lock-in am-plifier (also known as a phase-sensitive detector) [16]. Inother words, this tracker allows the measurement of envi-ronmental light through a reflecting eye surface. In fact, theembedded phototransistor received the modulated backlightof the normal display from the user’s view during the ex-periments. This phenomenon with a lock-in amplifier canisolate the reflected light from the modulated tracker lightsource that measures eye movements and the modulated en-vironmental light. It is also possible to get information fromobjects when the user gaze light sources as studied in [10].

6. CONCLUSIONSIn this research, we have described an infrared corneal lim-

bus tracker system to measure the eye activity for contextualinformation obtained by information extraction from thelifelog database. It is possible to use the proposed methodin daily life. In fact, we combined the low-accuracy, high-wearability eye tracker and image processing methods in oursystem. In the case study, we could detect the eye blinkswith a high accuracy and estimate the participant’s concen-tration state. Then, we combined this tracker and an imageprocessing method such as face detection, OCR, and objectrecognition. Our eye tracking system and eye activity infor-mation successfully extracted significant information fromthe lifelog database.

Finally, we discussed the possibility of developing a trans-missive sensor system with an infrared corneal limbus trackerand two cameras having different resolutions for our long-term objective of designing a system suitable for daily use.In addition, since the eyes follow objects even when theuser’s body moves, information about the eye direction canbe used for image stabilization and it can be effective utilizedin image extraction methods. We believe this research can

contribute to the utilization of augmented human memory.

7. ACKNOWLEDGMENTSThis research was partially supported by the Ministry of

Education, Science, Sports and Culture, Grant-in-Aid forJSPS Fellows, 21-8596, 2009.

8. REFERENCES[1] H. Bay, T. Tuytelaars, and L. V. Gool. Surf: Speeded

up robust features. In 9th European Conf. onComputer Vision, May 2006.

[2] A. Bulling, D. Roggen, and G. Troster. Wearable eoggoggles: eye-based interaction in everydayenvironments. In Proc. of the 27th int. conf. extendedabstracts on Human factors in computing systems,pages 3259–3264, 2009.

[3] B. P. Clarkson. Life Patterns: structure from wearablesensors. Ph.D thesis, 2002.

[4] S. M. Conde and S. L. Macknik. Windows onthe mind.Scientific American, 297(2):56–63, 2007.

[5] A. Duchowski. Eye Tracking Methodology. Springer,2007.

[6] J. M. Findlay and I. D. Gilchrist. Active Vision: ThePsychology of Looking and Seeing. Oxford UniversityPress, 2003.

[7] J. Gemmell, G. Bell, and R. Lueder. MyLifeBits: apersonal database for everything. Commun. ACM,49(1):88–95, 2006.

[8] R. J. K. Jacob. Eye movement-based human-computerinteraction techniques: Toward non-commandinterfaces. In Advances in Human-ComputerInteraction, pages 151–190. Ablex Publishing Co,1993.

[9] D. Li, J. Babcock, and D. J. Parkhurst. openEyes: alow-cost head-mounted eye-tracking solution. In Proc.of the 2006 symp. on Eye tracking research &applications, pages 95–100, 2006.

[10] Y. Mitsudo. A real-world pointing device based on anoptical communication system. In Proc. of the 3rd Int.Conf. on Virtual and Mixed Reality, pages 70–79,Berlin, Heidelberg, 2009. Springer-Verlag.

[11] T. Ohno. Freegaze : a gaze tracking system foreveryday gaze interaction. Proc. of the symp. on eyetracking research & applications symposium, 2002.

[12] J. Rekimoto, T. Miyaki, and T. Ishizawa. Life-Tag:WiFi-based continuous location logging for life patternanalysis. In 3rd Int. Symp. on Location- andContext-Awareness, pages 35–49, 2007.

[13] D. Robinson. A method of measuring eye movementusing a scleral search coil in a magnetic field. In IEEETrans. on Bio-Medical Electrics, number 10, pages137–145, 1963.

[14] W. M. Smith and J. Peter J. Warter. Eye movementand stimulus movement; new photoelectricelectromechanical system for recording and measuringtracking motions of the eye. J. Opt. Soc. Am.,50(3):245, 1960.

[15] J. A. Stern, L. C. Walrath, and R. Goldstein. Theendogenous eyeblink. Psychophysiology, 21(1):22–33,1983.

[16] P. A. Temple. An introduction to phase-sensitiveamplifiers: An inexpensive student instrument.American Journal of Physics, 43(9):801–807, 1975.

[17] D. J. Ward and D. J. C. MacKay. Artificialintelligence: Fast hands-free writing by gaze direction.Nature, 418:838, 2002.

[18] A. Yarbus. Eye movements and vision. Plenum Press,1967.