minority report system - mdh · master’s thesis, 20p, d-level minority report system ... patrik...

Computer Engineering Program Master’s Thesis, 20p, D-level

Minority Report System

Gesture Recognition with Senseboard for Siblings Data Surface Interface Paradigm Prototype (DSPrototype)

Ronah Kabagambe [email protected]

Mälardalen University Department of Computer Science and Electronics (IDE) Västerås, Sweden 2007

Supervisor: Rikard Lindell

Examiner:

Lars Asplund

Ronah Kabagambe: Master’s Thesis, 20p, D-level, IDE, Västerås, Sweden Minority Report System, Gesture Recognition with Senseboard for Siblings DSPrototype

2

Abstract This thesis was partly inspired by Steven Spielberg's science fiction movie "Minority Report" in which the actor using advanced gloves and gestures, controls digital information on a virtual screen without the use of a keyboard or mouse. Could these technologically advanced ideas be implemented and used in real applications today? We live in a swiftly computer technological developing environment where research and use are rapidly growing towards smaller devices i.e. pocket computers, and the control of objects from afar using Bluetooth. An interactive multi-modal prototype system called Minority Report System is introduced in this thesis that includes the Senseboard, a handheld Bluetooth Human Interface Device, the Data Surface Interface Paradigm Prototype (DSPrototype) a navigation tool whose information content is presented on an infinitely large two-dimensional surface, and a prototype program, MRProgram that connects through Bluetooth to the Senseboard, through sockets to the DSPrototype, and performs gesture recognition. The DSPrototype is used for graphical user interfaces and currently applied to music creativity improvisation and since in this system, there are no windows, icons, menus, files or applications, content is navigated by cursor movement and typed commands such as play and zoom. The purpose of the thesis was to use the Senseboard instead and alone to send such commands. The Minority Report System interacts with the user by recognizing gestures from earlier input data and uses the following scenario: using the Senseboard, the user who sees and manipulates the information content of the DSPrototype, signs a gesture i.e. play which is received by the MRProgram as data signals. These signals are interpreted into a command by gesture recognition through signal processing and the use of an Artificial Intelligence algorithm. The command is passed on to the DSPrototype that finally executes it by playing a short music piece. The result is a functioning software prototype system that implements 5 sets of commands including cursor movement. Usability testing with five test subjects gave qualitative conclusions, and the collective assessment was that with further development the Senseboard together with the MRProgram definitely was of great use in i.e. teaching, studying and music creativity.

Sammanfattning Detta examensarbete var delvis inspirerat av Steven Spielbergs science fiction film ”Minority Report” där skådespelaren i filmen med avancerade handskar och gester, styr digital information framför en virtuell skärm utan tangentbord eller mus. Kunde dessa tekniskt avancerade idéer genomföras och användas i verkliga applikationer idag? Vi lever i en miljö med en snabb datorteknisk utveckling där forskning och användning växer snabbt mot mindre apparater t.ex. handhållna datorer, samt kontroll av föremål från avstånd med Blåtand. Examensarbetet introducerar en multi-modal mjukvaruprototyp som kallas Minority Report System som inkluderar Senseboard, en handmanövrerad Blåtands HID-enheten (Human Interface Device), navigationsverktyget Data Surface Interface Paradigm Prototype (DSPrototype) vars informationsinnehåll presenteras på en oändlig stor två-dimensionell yta, och prototypprogrammet Minority Report Program (MRProgram) som ansluter sig genom Blåtand till Senseboard, genom sockets till DSPrototype och dessutom gör gest igenkänning. DSPrototype som används till grafiskt användargränssnitt, tillämpas aktuellt inom musikalisk kreativitet och eftersom i detta system finns varken fönster, ikoner, menyer, filer eller applikationer så navigeras innehållet av musmarkören samt skrivna kommandon som spela och zoom. Målet med examensarbete är att utnyttja Senseboard istället och ensam för att skicka såna kommandon. Minority Report System:et interagerar med användaren genom gest igenkänning från tidigare insatt data och använder följande scenario: genom att använda Senseboard, användaren som ser och styr DSPrototype:ns informationsinnehåll, tecknar en gest t.ex. spela, som tas emot av MRProgram som data signaler. Signalerna tolkas till ett kommando via gest igenkänning genom signal behandling och användandet av en Artificiell Intelligens (AI) algoritm. Kommandot skickas till DSPrototype som exekverar kommandot genom att spela ett kort musikstycke. Resultatet av examensarbetet är en fungerande mjukvaruprototyp system som har implementerat 5 uppsättningar av kommandon inklusive styrningen av muspekaren. Under utvecklingsarbetet har fem testanvändare utprovat användarvänligheten och deras samlade kvalitativa utvärdering är att Senseboard tillsammans med MRProgram med vidare utveckling definitivt bör finna viktiga användningsområden gällande undervisning, studier samt musikalisk kreativitet. Key words: Gesture recognition, Senseboard, DSPrototype, Human Computer Interaction, Bluetooth, Mac OS X


3

Acknowledgements In the course of this thesis many different people helped in explaining, supporting and giving advice. Without the scientific skills, experience and guidance as well as the opportunity and means to do so from the following this project wouldn’t have been possible. So I take this opportunity to thank: my supervisor Rikard Lindell and examiner Professor Lars Asplund for their tireless efforts, critical comments and constructive suggestions as well as in seeing to it that the thesis was well done and finished. Richard Bonner, Senior Lecturer at the Department of Mathematics and Physics for explaining the mathematics behind the artificial intelligence (AI) algorithms, doctoral student Tommy Gunnarsson and Ph.D. student Peder Norin at IDE, for taking the time to explain the basics and use of Bluetooth on Mac OS X as well as connections between different Bluetooth devices. Gunilla Alsiö, President & CEO of Senseboard Technologies AB in Stockholm, as well as Patrik Dai Javad at Sony Ericsson Mobile Communications AB in Stockholm, for the opportunity to use their devices. Above all, I am eternally grateful to my family and friends for their love and support but mainly for the patience that you have all shown. To my parents and the teachers who shaped and directed the course of my professional future no words can begin to describe the gratitude I feel. To the interviewees and everyone who participated in the user studies as well as gave responses to the project that was an inspiring example of you and I thank you from the bottom of my heart. Last but not least thanks to me for the courage, strength, patience, stubbornness and will, despite huddles, to follow through. \\Ronah Kabagambe


4

Table of Contents 1 Background ......................................................................................................................................................6 2 Purpose ............................................................................................................................................................7 3 Thesis Plan.......................................................................................................................................................8 4 Problem formulation.........................................................................................................................................8

4.1 Limitations ...............................................................................................................................................8 5 Method ..............................................................................................................................................................8 6 Related work ....................................................................................................................................................9

6.2 Gesture recognition techniques .............................................................................................................9 6.3 Different inventions using a glove..........................................................................................................9

7 Research survey ............................................................................................................................................10 7.1 Mac OS X and Bluetooth......................................................................................................................11

7.1.1 Mac OS X .....................................................................................................................................11 7.1.1.1 Mac OS X Architecture as layers ...........................................................................................11 7.1.1.2 Sockets ....................................................................................................................................14

7.1.2 Bluetooth ......................................................................................................................................15 7.1.1.1 How Bluetooth Works .............................................................................................................15

7.1.3 Bluetooth on Mac OS X...............................................................................................................15 7.1.3.1 Mac OS X Bluetooth Protocol Stack ......................................................................................15 7.1.3.2 Mac OS X Bluetooth Profiles and Applications .....................................................................17 7.1.3.3 The Mac OS X Bluetooth API Overview ................................................................................17 7.1.3.4 Bluetooth Classes ...................................................................................................................18

7.2 Simple DirectMedia Layer ....................................................................................................................18 7.3 Senseboard ...........................................................................................................................................19 7.4 The DSPrototype ..................................................................................................................................20

7.4.1 A collaborate tool .........................................................................................................................20 7.4.2 A database surface......................................................................................................................20 7.4.3 The DSIP approach .....................................................................................................................21

7.5 Sign-, body language and gesture study ............................................................................................23 7.5.1 The history of sign language.......................................................................................................23 7.5.2 Body language .............................................................................................................................24 7.5.3 Sign language recognition...........................................................................................................24

7.6 Signal processing..................................................................................................................................24 7.7 Recognition Algorithms.........................................................................................................................25

7.7.1 Hidden Markov Model..................................................................................................................25 7.7.1.1 Definition ..................................................................................................................................25 7.7.1.2 Three problems of HMMs .......................................................................................................25

7.7.2 Self-Organizing Maps ..................................................................................................................26 7.7.2.1 Architecture..............................................................................................................................26

7.7.3 Growing Hierarchical Self-Organizing Map................................................................................27 7.7.3.1 Architecture..............................................................................................................................27

7.7.4 Brief Summary .............................................................................................................................28 7.8 Human-Computer Interaction...............................................................................................................29

8 Realization......................................................................................................................................................30 8.1 Determining signs/gestures..................................................................................................................31 8.2 Technological studies ...........................................................................................................................32

8.2.1 Bluetooth communication............................................................................................................32 8.2.2 Socket connection .......................................................................................................................33 8.2.3 Gesture recognition and Learning ..............................................................................................34 8.2.4 MRProgram interface ..................................................................................................................34 8.2.5 Changes to the DSPrototype ......................................................................................................35

8.3 Setting up the interaction......................................................................................................................35 8.4 An example of using the thesis prototype system ..............................................................................35

9 Evaluation.......................................................................................................................................................36 9.1 Users......................................................................................................................................................36 9.2 Studies...................................................................................................................................................36

9.2.1 Questionnaire...............................................................................................................................37 9.3 Compilation of User Responses ..........................................................................................................37

10 Results .......................................................................................................................................................38


5

11 Conclusions ...............................................................................................................................................39 11.1 Future works..........................................................................................................................................41

12 References.................................................................................................................................................42 12.1 Literature: ..............................................................................................................................................42 12.2 White papers: ........................................................................................................................................42 12.3 Links:......................................................................................................................................................43

13 Appendices ................................................................................................................................................44 13.4 User Questionnaire – Minority Report .................................................................................................44 13.5 Feedback from Questionnaire..............................................................................................................45

Table of Figures

Figure 1: Project tools ......................................................................................................................................7

Figure 2: Mac OS X Architectural Layers ....................................................................................................12

Figure 3: Quartz and the graphics and windowing environment ...........................................................13

Figure 4: The Mac OS X Bluetooth protocol stack and the Bluetooth Protocol Stack .......................16

Figure 5:Bluetooth classes in the Bluetooth protocol stack ...................................................................18

Figure 6: The Senseboard .............................................................................................................................19

Figure 7: Overview (zoomed out) of the data surface ..............................................................................22

Figure 8: The zoom action. ............................................................................................................................22

Figure 9: Architecture of a 7x7 SOM ............................................................................................................26

Figure 10: Architecture of a trained GHSOM ..............................................................................................28

Figure 11: Minority Report System ..............................................................................................................31

Figure 12: Sign/gesture commands .............................................................................................................32

Figure 13: User Interface ................................................................................................................................35


6

1 Background The name of this Master’s thesis “Minority Report System, Gesture Recognition with Senseboard for Siblings Data Surface Interface Paradigm Prototype (DSPrototype)” is partly inspired by Steven Spielberg's science fiction movie “Minority Report”, and is carried out at Mälardalen University in Västerås, Sweden. Reading the specification of this thesis also reminded me of a movie called “Johnny Mnemonic”(1995, directed by Robert Longo and starring Keanu Reeves) where the lead character, using advanced gloves, eye shades and gestures, accesses and moves digital information in a three dimensional world. A human interface device (HID) is a computer device which interacts directly with and takes input from users i.e. mouse, keyboard, PDA, printers, Senseboard etc, and when using a computer, the keyboard and mouse are the two most commonly used devices in accessing and controlling programs and information. The company Senseboard Technologies AB [1] wanted to examine how their Senseboard, a handheld Bluetooth HID, in place of the keyboard and mouse can be used in the same way, to control a program via Bluetooth. The program of choice was a navigation tool called the Data Surface Interface Paradigm Prototype (DSPrototype) [2], a software prototype from the Siblings interface paradigm project [3] used for Graphical User Interfaces (GUIs), currently applied to music creativity improvisation and manipulated through commands such as play and stop using keyboard and mouse. The user sees this tool as information content presented on an infinitely large two-dimensional surface. By developing a system which includes the Senseboard used to send sign/gesture commands, a prototype program that connects through Bluetooth to the Senseboard, through sockets to the DSPrototype, as well as performs gesture recognition, and lastly the DSPrototype that executes the commands, the Senseboard can be used to manipulate the DSPrototype without the keyboard and mouse but in the same way they had. A Bluetooth mobile phone (Ericsson P910i) from Sony Ericsson [4] helped in acquiring knowledge of Bluetooth connection between an already established product and a computer. Universal design [5] is about designing systems so that they can be used by anyone in any circumstance meaning designing for diversity including people with sensory, physical or cognitive impairment, people of different ages or people from different cultures and backgrounds. There are five senses: sight, sound, touch, taste and smell, and in computing the visual channel is used as the predominant channel for communication through graphics, text, video and animation. The importance of sound is that it keeps us aware of our surroundings by i.e. reacting to sudden noises such as beeps or having an emotional effect especially music, and with touch its providing important information such as tactile feedback, which plays a major role in operation of common tools i.e. cars, instruments, pens, the Senseboard and anything that requires holding or moving. Multi-modal systems [5] are those that use more than one human input channel in the interaction i.e. speech, non-speech sound, touch, handwriting and gestures. The thesis introduces a multi-modal interactive software prototype system called Minority Report System (referred to in the entire report as the system), which interacts with the user by recognizing gestures from earlier input data, and uses similar ideas as depicted in the movies. The user sees information on the navigation tool she/he wishes to manipulate i.e. play a sound and using the Senseboard signs/gestures the play command, a gesture recognized by the system’s prototype program Minority Report Program (MRProgram) and sent to the DSPrototype that reacts by playing a short music piece for the user to listen to. This thesis report elaborates the various areas and technologies covered, methods used and also results concluded during the entire life cycle of the thesis.


7

The tools used in the thesis project, shown in Figure 1 below, include an Apple PowerBook computer, an Ericsson mobile phone, the Senseboard and a Bluetooth module used as a gateway for communication with the Bluetooth external devices, since the computer was not equipped with inbuilt Bluetooth technology.

Figure 1: Project tools Bluetooth module Ericsson (P910i)

Apple PowerBook Senseboard

2 Purpose The purpose of this thesis was to develop a software prototype system that uses the MRProgram to receive sign/gesture input signals from the Senseboard via Bluetooth, perform gesture recognition and learning through signal processing and the use of Artificial Intelligence (AI) algorithms, and send them as commands via sockets to control the navigation tool DSPrototype without the use of keyboard or mouse but in the same way they had. The system interacts with the user by recognizing gestures from earlier input data and the MRProgram was also to furthermore update its understanding of gestures it already knows in an online interactive manner. The following scenario that is a description of a possible interaction, was to be used by the system: Using an Apple PowerBook computer, the MRProgram establishes via Bluetooth a connection to the Senseboard. The user with the help of sign-, body language and commonly used gestures in combination with the Senseboard, signs a gesture i.e. play that is received by the MRProgram and through gesture recognition is interpreted into a command. Via sockets the command is passed on to the DSPrototype that executes the command by playing a short music piece. Furthermore changes in the DSPrototype were to be made in order to effect the execution of commands and user studies were to be conducted in the aim of determining the usability of the Senseboard with the MRProgram, if sign-, body language and commonly used gestures are a good way of interacting with computers, as well as user experience and attitude towards the system. Since the DSPrototype was already written in the Mac OS X operating system environment, in order to ease connection and communication, the thesis system was also to be developed in the same environment.


8

3 Thesis Plan The thesis work was originally intended for more than two people with different expertise in among others signal processing, AI and programming but was downsized for two people extending over a period of 20 weeks for each person. The first 6 weeks were for doing research study in different subjects and techniques related to the thesis project such as Aritificial Intelligence (AI) algorithms, Senseboard, DSPrototype, sign language and gestures as well as sign language and gesture recognition techniques, the Mac OS X operating system environment, Bluetooth, Kernel Extensions, USB, BSD-Unix, signal processing, sockets, SDL, OpenGL and Usability engineering. In addition, a weekly report was kept on the progress of what had been studied in order to gather key information and references for future use in i.e. the final report. The next 10 weeks were to be focused on designing, implementing, testing, debugging and connecting the MRProgram to the latest stabile DSPrototype as well as performing user tests on the system. Four weeks to deadline were to be dedicated to thesis report writing and preparation of the demonstration and presentation of the work done for the thesis project. We were on schedule to begin with however the situation changed 11 weeks in the project when one person quit, consequently leaving a tight work schedule for the remaining one. In order to complete the thesis successfully, re-evaluation was inevitable which meant that it was necessary to reduce areas of research and implementation, such as the possibility for the MRProgram to update gestures online while interacting with users.

4 Problem formulation Since the purpose of this thesis was to develop a software prototype system that uses the MRProgram to receive sign/gesture input signals from the Senseboard via Bluetooth, perform gesture recognition and send them as commands via sockets to control the DSPrototype as well as make changes in the DSPrototype and conduct user studies, the problems to be solved are: ♦ How do the Senseboard and DSPrototype work? ♦ How does the Senseboard connect to the Apple PowerBook computer and access the information

sent from the former? ♦ How do we determine what sign/gesture commands are to be used with the Senseboard? ♦ How is gesture recognition and learning of gestures performed? ♦ How does socket connection between the Minority Report Program and the DSPrototype work? ♦ How do we modify the DSPrototype so as to execute the commands? ♦ How will users respond to the use of the Senseboard, the sign/gesture commands as well as the

system?

4.1 Limitations Prototypes are artifacts that simulate or animate some but not all features of the intended system [5]. The Minority Report System was intended as a research prototype implemented with limited functionality to show a proof of concept or test of concept as a foundation for analysis and testing to find out the usability of the Senseboard as well as the interactive system, and not a final/complete developed product.

5 Method In order to find a way to solve the questions stated in the problem formulation of this thesis and realize its purpose, the work had to be split into different categories and the following methods somewhat modified to specific needs were used: ♦ Conceptual studies were conducted that contain related work and relevant theories, ♦ A research survey was done to gather information from a number of sources, in order to understand

different parts, subjects and tools of the thesis project. Based on the research survey, the following objectives were realised: Determining the signs/gestures that were used to manipulate the contents of the DSPrototype, Conduction of technological studies by designing and implementing the MRProgram program for

Bluetooth connection, signal processing, gesture recognition and socket connection, as well as making changes to the DSPrototype for socket connection and command execution.


9

♦ In the field of Human-Computer Interaction (HCI), empirical studies were conducted through user studies to ascertain the usability of the Senseboard with the MRProgram and the system, user experience and attitude as well as if using sign language and gestures is a good way of interacting with computers. These studies include different techniques such as interviews, observations, questionnaires, test subjects, material, data processing etc.

6 Related work Gesture has become the subject of attention in multi-modal systems where controlling the computer with certain movements of the hand would be advantageous in many situations where there is no possibility of typing or when other senses are fully occupied or as support communication for people who have hearing loss, if signing could be translated into speech or vice versa [5]. Like speech, gesture is user dependent and subject to variation and co-articulation, and the technology of capturing gestures is expensive using either computer vision or a special dataglove (a 3D input device consisting of a lycra glove with optical fibers laid along the fingers with vast potential in gesture recognition and sign language interpretation). The main focus in this thesis is on the glove, as this is the closest comparison to the Senseboard, and specific areas in which recognising glove gestures are used within Robotics, Virtual Reality, Computer Vision, Neural Networks and Hidden Markov Model and 3D Animation.

6.2 Gesture recognition techniques There has been a lot of research in this area surrounding gesture recognition because there are lots of different models and algorithms to choose from in order to achieve the best way of interpretation. Which model and algorithm one chooses will probably depend on the desired outcome of one’s work i.e. does it have to be fast or is it crucial that the sign is interpreted with as little fault as possible? A very common way to achieve this is the usage of the search algorithm Hidden Markov Model (HMM), which in combination with the algorithm Baum-Welch one will get fast online gesture recognition. Another way is to use the Self-Organizing Maps (SOM), which is an unsupervised learning neural network. The most common gesture recognition on the market is a camera or a cyber glove. For further reading refer to White paper: Online, interactive learning of gestures for human/robot interfaces by Lee, C.; Yangsheng Xu [6].

6.3 Different inventions using a glove In 1988 a model called Dexter II [7] could, for the first time, interpret fingerspelling for the deaf-blind, including the wrist movement of the hand, which developers have had problems solving. In 1994 came Ralph (Robotic ALPHabet) [8], which is a much faster and more natural interpreter than its predecessors such as Dexter II. With Virtual Reality one can use capture gloves with sensors that can capture both isolated and continuous signs, as well as see the hand movements that in turn get interpreted into written or spoken language. James Kramer developed the Cyber glove in which there are 18-22 sensors that can receive entire signed words and not just the alphabet [9]. Computer vision [10] is a way of capturing hand movements through the use of cameras. The camera and the use of a glove with markings on top of every finger or another way was the use of different colour rings on each joint on the gloves but finally the camera was not needed. It is more difficult to use a camera for the capturing of signs since problems arise such as: the light has to be right, and it also requires much more complex calculations. The 3D animation is the use of images for signing and one development is the use of avatars (a.k.a. synthesized signers or personal digital signers) that are virtual 3D animated images that can sign words and sentences. They are often used in the development of educational purposes for deaf students [11] i.e. comprehension of a story increased from 17% to 67% when seeing it signed. This was the result of a research study at the Florida school for the Deaf and Blind [12]. The advantages of using avatars compared to signs shown in videos are that avatars can be manipulated in terms of signing speed and angle, they use less computing space and can automatically transliterate text for example into signed English [9].


10

7 Research survey There was a lot of reading to do in a wide range of areas, which by criteria were interrelated. Information was gathered through literature, the Internet, Senseboard Technologies AB, white papers as well as from the supervisor, doctoral students, lecturers and the examiner. Therefore we briefly summarize what is relevant to this thesis in regards to these areas of research. Most of the information in the following chapters, is considered common knowledge, therefore no references are given except where needed. The areas covered are: ♦ Chapter 7.1 Mac OS X and Bluetooth

An operating system (OS) is a set of computer programs with the purpose of easing the use of it i.e. Microsoft Windows, Unix, BSD, Mac OS X etc by managing the hardware and software resources. The Mac OS X operating system is the environment in which the DSPrototype is written and for ease in connection to it subsequently the Minority Report System including the MRProgram will use the same environment as well.

Kernel Extensions (KE) or device driver, via which the Senseboard connects to the Mac OS X, is also known as software driver and is a specialized hardware-dependent computer program developed to allow OS interaction with hardware devices.

Sockets, which provide the connection between the MRProgram and the DSPrototype, are a method for virtual communication between a client program and a server program in a network or on the same computer.

Open Graphics Library (OpenGL) on which the DSPrototype is built upon. Bluetooth technology is a wireless short-range radio technology that provides a way to

connect and exchange information between mobile devices such as the Senseboard, mobile phones, laptops, PCs, printer and digital cameras.

Bluetooth on Mac OS X, for access to the information such as data signals (any time-varying quantity) sent by the Bluetooth devices.

Universal Serial Bus (USB) an interface used to connect external devices such as HIDs (Bluetooth module, Senseboard, mobile phone, mouse, keyboard) to a computer. Both data and electric current can be sent through a USB-cable.

♦ Chapter 7.2 Simple Direct-Media Layer (SDL) also on which the DSPrototype is built upon so as to later modify the prototype in order to execute the sign/gesture commands.

♦ Chapter 7.3: the Senseboard, a handheld HID from Senseboard Technologies AB that sends information as data signals via Bluetooth to the computer.

♦ Chapter 7.4: Publications in the Siblings project to understand the platform and part of the goals of the DSPrototype, currently used for GUIs (methods of interacting with a computer through direct manipulation of graphical images and text), as well as the commands it uses. An interface defines the communication boundary between two entities such as a piece of software, a hardware device or a user. The DSPrototype is the user interface (the interface between a user and a computer), which provides means of input, allowing the user to manipulate a system i.e. receiving sign/gesture signals from the Senseboard via sockets, and output, allowing the system to produce the effects of the user i.e. executing the “PLAY” command by playing a short music piece for the user to hear.

♦ Chapter 7.5: Studying the sign language alphabet and commonly used gestures to analyse and determine what signs and gestures were needed for the sign/gesture commands. Furthermore gesture control and sign language recognition, as well as what interaction designs and techniques to be applied.

♦ Chapter 7.6: Signal processing that may be required in processing the data signals from the Senseboard.

♦ Chapter 7.7 In-depth research on AI algorithms, specifically those relevant to sign language and gesture recognition.

♦ Chapter 7.8: The field of HCI to ascertain the usability of the Senseboard and MRProgram as well as the interactive system, by studying different techniques that can be used.


11

7.1 Mac OS X and Bluetooth The task of an OS is to partly manage and supervise the rest of the programs used so they can share common resources such as terminals, keyboards and memory, and partly to give the user a way of interacting with the computer. The Senseboard connects through Bluetooth via the Kernel Extension (KE) or device driver to the Mac OS X operating system and then to the thesis MRProgram that finally connects through Sockets to the DSPrototype, which builds on SDL and OpenGL.

7.1.1 Mac OS X Mac OS X was first released on 16th March 1999 as Mac OS X Server 1.0 and as Darwin 0.1. It is a Unix-like OS from Apple Computer Company based on FreeBSD [13]. Apple, famous for the release of Apple Macintosh, is a Californian company founded by Steve Jobs and Steve Wozniak in 1976 that makes personal computers (PCs) and extra equipment. ♦ FreeBSD is a descendant of the Unix-like OS 4.4 Berkeley Software Distributions (BSD) developed at

the University of California/Berkeley and is released under a free license, which means that it is available free of charge and comes with full source code.

♦ Unix, created in 1970, is an OS with qualities that makes it useful as a server and different from regular PC- operating systems such as Microsoft Windows because several people can use the computer simultaneously (multi-user), and the graphic window system is run as another program (multi-task). In practice, the term UNIX is applied to other multi-user Portable Operating System Interface (POSIX)-based systems that do not seek UNIX branding i.e. Mac OS X, FreeBSD, GNU/Linux, NetBSD and OpenBSD.

♦ BSD is a component which uses FreeBSD as the primary reference codebase and provides or supports i.e. process model (process Ids, signals), security policies such as file permissions, threading support and networking support (BSD sockets).

♦ Mac OS, the first commercially successful OS that used a GUI, was Apple's primary OS since 1984. To produce Mac OS X, technologies from the then existing Mac OS were combined with OPENSTEP, an OS with an open platform comprising of several Application Programming Interfaces (APIs) and frameworks built on technology developed by the NeXT Computer Company purchased by Apple in December 1996. ♦ An API is a set of definitions of the ways in which one piece of computer software communicates with

another. A framework is a hierarchical directory that contains shared resources, such as a dynamic shared library, .nib files (GUI pieces in Apple applications), image files, localized strings, header files, and reference documentation in a single package, which can be shared simultaneously by multiple applications.

7.1.1.1 Mac OS X Architecture as layers The central characteristic of the Mac OS X architecture [14] is the layering of system software, with one layer having dependencies on, and interfaces with, the layer beneath it as seen in Figure 2 below, resulting in the user experience provided by the applications and technologies. ♦ The Core OS layer [15] contains the kernel, device drivers and low-level BSD commands and all

other technologies are built on top of it as it provides a foundation on which to develop software. The kernel is the fundamental part of an OS and a piece of software responsible for among others the start and stop of programs, handling the file system and most importantly providing secure access to the machine's hardware to various computer programs. The following are some of the basic concepts relevant to the thesis, that derive from this layer: Darwin: is the open source UNIX-based foundation beneath the Mac OS X interface. Open

source commonly refers to any software with publicly available source code, regardless of its license. Most of the technologies in this layer are referred to as Darwin, a complete OS integrating technologies such as networking facilities, support for multiple integrated file systems and Apple technologies. Darwin also includes command-line tools, can be used to create kernel extensions and its design modularity makes it possible to dynamically add things such as device drivers, network extensions, and new file systems.


12

Figure 2: Mac OS X Architectural Layers

o Mach is a component responsible for various low-level aspects of the system i.e. pre-emptive multitasking protected memory, BSD system calls (mechanisms used by an application to request service from the OS drivers that control the hardware input and output directly), real-time support etc.

o Device drivers in Mac OS X are created with the I/O Kit, a framework that offers an object-oriented programming model and takes into account underlying features such as virtual memory, memory protection, and pre-emption. The kernel environment also includes a number of ready-made device drivers and features such as plug and play, dynamic device management, and power management for both desktops and portable.

o File systems: there is support for many different file systems and volume formats. The file system is based on extensions to BSD and an enhanced Virtual File System (VFS) design. VFS is a set of standard, internal file-system interfaces and utilities for building such extensions. Some of the features included are: permissions on removable media, access control lists and long filenames.

Networking: There is built-in support for different media types, standard network protocols such as Domain Name Services (DNS), File Transfer Protocol (FTP), and services in the computing industry. The network protocol stack is based on BSD and the architecture provided by Network Kernel Extensions (NKEs) facilitates the creation of modules, which implement new or existing protocols that can be added to the stack. NKEs can extend the networking infrastructure of the kernel dynamically without recompiling and re-linking.

♦ The Core Services layer [16] includes low-level features such as data formatting, memory

management, and low-level network communication, most of which are included in the Core Foundation framework. Core Foundation [17] is a library set of C-based programming interfaces derived from the

Foundation framework of the Cocoa object layer, with benefits such as the increased capability for sharing code and data among frameworks, libraries, and applications in different environments and layers, as well as an architecture and corresponding API for plug-ins. The Core Foundation Framework is a set of C-based interfaces that provide basic data management features and among the data types that can be manipulated are Strings, Dates and times, Preferences, Streams, Ports and Sockets etc.

Programmers’ Code

Graphics and Multimedia (Quartz, QuickDraw, QuickTime,

OpenGL, Core Audio etc)

Core Services (Core foundation, Carbon core, Apple Events, data formatting, memory

management, stream-based I/O, network communication etc)

Core OS (BSD, Device Drivers, File Systems, The Kernel, Mach, Java Support, Networking etc)

Application Services (HTML rendering, disc recording,

speech synThesis, recognition etc)

User Experience (Aqua, Accessibility, Bundles and Packages, etc)

Application Environment Classic Carbon Cocoa Java BSD


13

♦ Application services: an application, by its very nature, must display a GUI and allow users to manipulate its windows and controls. This layer offers services [16] most relevant to developers in having programmatic interfaces or an impact on writing software, for all application environments. This includes features such as address book and font management, speech synthesis and recognition etc. This layer also includes various other component frameworks: The Process Manager manages all processes in Mac OS X and controls access to shared

resources i.e. CPU time, by managing the scheduling and execution of applications. The Carbon Event Manager dispatches events to the appropriate event-handler for that event,

based on the type of event and the destination application environment. o An Apple event is a high-level event that applications can send to other applications on

the same computer, on a remote computer, or even to themselves. Apple events are the primary mechanisms in Mac OS X for inter-application communication. Applications typically use them to request services and information from other applications, or to provide services and information in response to such requests.

♦ The Graphics and Multimedia layer implements services for audio, video and the rendering of two- and three-dimensional (2D/3D) graphics. Technologies and services [18] in this layer can be integrated into programmers’ applications, some of which are: Core Audio technology for managing high-quality audio software and hardware. Quartz, the core of the windowing environment and also the primary technology used for 2D

rendering as well as for window management, consists of two entities: o Quartz 2D a client API and 2D graphics rendering library used in applications to draw

primitive shapes and text in their windows, and o Quartz Compositor a window server that provides services to clients through the Quartz

client API but doesn’t perform any rendering itself as well as low-level services such as event routing and window- and cursor management.

Figure 3 below shows Quartz and the graphics and windowing environment.

Figure 3: Quartz and the graphics and windowing environment

http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html -> System-Level Technologies -> Graphics, Imaging and Multimedia

QuickTime is a graphics and an application environment with features for interactive

multimedia for manipulating, streaming, storing and enhancing video, sound, animation, graphics, text, music, and VR (Virtual Reality, an environment simulated by a computer).

Open Graphics Library (OpenGL) is used in combination with Simple DirectMedia Layer (SDL) to build the DSPrototype. OpenGL is the system API and library for rendering 2D and 3D images. It is an industry-wide adopted standard for developing portable 2D and 3D graphics applications, specifically designed for games, animation, medical imaging, and applications that need a robust framework for visualizing shapes and other special effects. Each OpenGL command directs a drawing action or causes special effects, and developers can create lists of these commands for repetitive effects. Several libraries such as SDL are built on top of or beside OpenGL to provide features not available in OpenGL itself to provide rudimentary cross platform windowing and mouse functionality and if unavailable can easily be downloaded and added to a development environment. OpenGL was designed to be a graphic output-only providing only rendering functions whereby the core API has no concept of windowing systems, audio, printing to the screen, keyboard/mouse or other input devices meaning input event handling, but this allows the code that does the rendering to be completely independent of the OS it is running on, allowing cross-platform development. However some integration with the


14

native windowing system is required to allow clean interaction with the host system and this is performed through the add-on APIs such as Core OpenGL for Mac OS X, and better integration with Mac OS X's application frameworks is provided by APIs layered on top of CGL. Additionally the SDL library provides functionality for basic windowing using OpenGL, in a portable manner.

♦ The Application environment [19] consists of the frameworks, libraries, and services (along with

associated API) necessary for the runtime execution of programs developed with that API. This environment is where most developers start their projects with the Carbon for porting applications from other platforms or to work solely in C or C++, Cocoa for new developers, and the Java and X11 environments for developers with applications to run on multiple platforms. This layer has dependencies on all underlying layers of system software: Carbon is a set of programming interfaces derived from earlier procedural C-based Mac OS

APIs used to create applications for all types of users. The Carbon environment has support for all standard Aqua user interface elements such as windows, controls, and menus, which can be designed using Interface Builder. It also provides an infrastructure for handling events, managing data, and using system resources. o Aqua is a set of guidelines that describe the appearance as well as the consistent and

familiar behaviour of Mac OS X applications. Cocoa is an advanced object-oriented API for rapidly developing applications, where the

environment is also suited for Objective-C and C++ developers. Objects in the Cocoa framework handle behaviour such as menu management, window management, document management, Open and Save dialogs, and pasteboard (clipboard) behaviour. Using Interface Builder, application interface is created graphically rather than programmatically.

♦ The User Experience layer is a concept that identifies methodologies for creating applications where each methodology displays itself as a set of guidelines, recommended technologies, or a combination of the two. An important part of this user experience is how third-party applications support features that users need or have come to expect and some of the features that users look for and some of the technologies besides Aqua [20] supporting them are among others: Accessibility that represents both a technology and a set of guidelines to support assistive

technology devices such as screen readers, for people with some type of disability or special need. Other built-in support are zoom features, speech recognition, text-to-speech etc.

7.1.1.2 Sockets Network-aware applications are becoming more important as the computer world becomes more networked, therefore Linux provides a standard networking API called the Berkeley socket API also known as the BSD socket API [21] which is designed as a gateway to multiple protocols since Linux already supports many protocols such as TCP/IP, AppleTalk, and IPX. This API also comprises a library for developing applications in the C programming language that perform Inter-Process Communication (IPC). IPC is a set of techniques for exchanging data between two or more threads in one or more processes on one or more computers connected by a network. The most important protocol, available through Linux’s socket implementation, is TCP/IP that drives the Internet. A Unix Domain Socket (UDS) or Inter-Procedure Call socket (IPC socket) is a virtual socket, similar to an Internet socket, that is used for IPC and is restricted to a single machine, meaning the connection goes from the local computer to itself. The UDS’ do not work across networks but are the point of interest since they are the ones implemented in this thesis project. Sockets are created through the socket() system call which returns a file descriptor, used after proper initialisation for read() and write() requests. A system call is the mechanism used by an application program to request service from the OS. The socket is close() ed when a process is finished with it so as to free resources used by it. UDS” are implemented through file abstraction like most Linux resources whereby information is passed, by reading from and writing to files. The addresses (socket files representing Unix domain addresses) are pathnames created in the file system when a socket is bound through bind(), to a pathname. To establish a connection you need both a server and a client process to both interact with sockets. The sockets are connection-oriented meaning that each connection to the socket results in a new communication channel. Since the server process could be handling many connections simultaneously, it has a different file descriptor for each. The Unix domain in sending information, uses both datagram (fixed length destination-addressed messages) and stream (bi-directional) interfaces but in this thesis the latter is used. Sockets provide the connection between the MRProgram and the DSPrototype, and through them commands are passed. Sockets are described in detail in [21].


15

7.1.2 Bluetooth Bluetooth is a wireless short-range radio technology for connecting mobile devices first developed mainly by the Ericsson Company, and is now standardised by the Bluetooth Special Interest Group (SIG). ♦ Ericsson [22] is a phoneshare company founded by Lars Magnus Ericsson in 1876 in Sweden, and

among others manufactures equipment for telephony, tele- and datacommunications. ♦ Bluetooth SIG [23] is a privately held trade association driving the development of Bluetooth and

comprises of leaders, among which Ericsson is a promoter member, in the telecommunications, computing, automotive, industrial automation and network industries.

Bluetooth provides a wireless solution for reducing the cable clutter of peripherals (stationary and mobile) making it possible to transmit both data and voice signals over short distances of about 5–10 meters between usually battery-powered devices i.e. cell phones, mice, computers, personal data assistants (PDAs, a type of handheld computer), HIDs such as the Senseboard, thereby simplify communication and synchronization between devices [24]. Although excelling in low-bandwidth data transfer, Bluetooth is not intended as a replacement for high-bandwidth cabled peripherals such as external hard drives or video cameras [25]. More applications of Bluetooth include: ♦ Transfer of files, contact details, calendar appointments and reminders between devices ♦ Wireless controller of game consoles ♦ Replacement of traditional wired serial communications in test equipment, GPS receivers, medical

equipment and traffic control devices. ♦ Wireless networking between PCs in a confined space requiring little bandwidth. ♦ Wireless control of and communication between a cell phone and a hands-free headset or car kit. ♦ Wireless communication with PC input and output devices most commonly mouse, keyboard and

printer.

7.1.2.1 How Bluetooth Works Bluetooth works [26] by Bluetooth devices operating at 2.4 GHz in the license-free, globally available Industrial, Scientific, and Medical (ISM) radio band. The advantage being worldwide compatibility but with a potential disadvantage of Bluetooth devices sharing this band with many other Radio Frequency (RF) emitters which include automobile security systems, other wireless communications standards and ordinary noise sources (such as microwave ovens). In order to remedy this, Bluetooth uses a fast frequency-hopping scheme, hopping from frequency to frequency, and uses shorter packets than other standards in the ISM band. Since the devices use a radio communications system, they do not have to be in line of sight of each other, and can even be in other rooms as long as the received transmission is powerful enough. Wireless communications present special security challenges so Bluetooth has different built-in levels of security such as ”Frequency Hopping” and the publicly available cipher algorithm known as SAFER+, in order to authenticate a device’s identity for the device-pairing process. This process involves creating a special link used to create and exchange a link key that once verified, is used to negotiate an encryption mode the devices will use for their communication.

7.1.3 Bluetooth on Mac OS X Since Apple's Bluetooth support integrated into Mac OS X version 10.2 provides managers and abstractions that transparently perform Bluetooth connection–oriented tasks for many types of applications and devices without including any Bluetooth-specific code, the Bluetooth API frameworks may never be needed but there are some exceptions to this, such as applications that may need to access Bluetooth-specific attributes and messages. [27]

7.1.3.1 Mac OS X Bluetooth Protocol Stack The foundation of Mac OS X Bluetooth support is Apple’s implementation of the Bluetooth protocol stack [28] that defines how the technology works and includes both in-kernel and user-level portions. Devices are identified by unique 6 byte addresses - like Ethernet. Shown on the left side of Figure 4 below are the layers of the Bluetooth protocol stack with the Bluetooth profiles built into Mac OS X, and shown on the right for comparison is the Bluetooth Protocol Stack which is the heart of the Bluetooth specification that defines how the technology works and provides robust guidelines that ensure interoperability of Bluetooth devices and compatibility of Bluetooth technology.


16

Figure 4: The Mac OS X Bluetooth protocol stack and the Bluetooth Protocol Stack

http://developer.apple.com/documentation/DeviceDrivers/Conceptual/ Bluetooth/index.html; -> Bluetooth on Mac OS X -> The Mac OS X Bluetooth Profiles and Applications

http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; Bluetooth Technology Basics -> Bluetooth Architecture

The in-kernel implementations of the Mac OS X Bluetooth Protocol stack are: ♦ Bluetooth module is the hardware component neither an application nor even the host has access to

but is at the bottom of the stack and implements the Bluetooth radio, baseband, and link manager protocols found in the Bluetooth protocol stack. Radio layer describes the physical characteristics a Bluetooth device’s receiver-transmitter

component must have i.e. modulation characteristics, radio frequency tolerance, and sensitivity level. This radio module is responsible for the modulation and demodulation of data into RF signals for transmission in the air.

Baseband and link controller layer: the baseband portion is responsible for formatting data for transmission to and from the radio layer as well as handling the synchronization of links, whereas the link controller portion is responsible for executing commands from as well as establishing and maintaining the link specified by, the link manager.

Link manager translates the Host Controller Interface (HCI) commands it receives into baseband-level operations, and among others establishes and configures links as well as managing power-change requests.

♦ HCI layer transmits data and commands from the layers above to the Bluetooth module below and vice

versa. To implement the functions of this layer, an in-kernel object AppleBluetoothUSBHCIController provides support for Bluetooth USB, an interface designed to improve plug and play capabilities by allowing devices to be hot-swapped meaning connected or disconnected without powering down or rebooting the computer. When a device is first connected, the computer recognizes it and loads the device driver it needs. Therefore, any hardware that supports the USB HCI specification should work with the Bluetooth implementation on Mac OS X.

♦ Logical Link Control and Adaptation Protocol (L2CAP) layer provides transport for the higher-level protocols and profiles and as the primary communication gateway between two Bluetooth-enabled devices, this layer implements the ability to register as a client of an L2CAP channel and write data to the channel. It is possible to send and receive data to and from the Radio Frequency Communication (RFCOMM) layer and the Service Discovery Protocol (SDP) layer at the same time using the L2CAP layer’s multiplexing feature.

♦ RFCOMM protocol layer’s mission is to make a data channel appear as an RS232 serial port and implements the ability to create and destroy RFCOMM channels as well as control the speed of the channel.

Apple implements the L2CAP and RFCOMM layers in the kernel although applications can use objects in the user-level L2CAP and RFCOMM layers to access the corresponding in-kernel objects.


17

The layers above the user-kernel boundary of the Mac OS X Bluetooth Protocol stack are accessible to applications and the implementations are: ♦ L2CAP and RFCOMM layers in user space are not duplicates of the in-kernel L2CAP and RFCOMM

layers, and they represent APIs an application uses to communicate with the corresponding in-kernel layers.

♦ SDP layer is more of a service than a protocol, which uses an L2CAP channel to communicate with remote Bluetooth devices to discover available services. Apple provides an SDP API used to discover what services a device supports. Bluetooth specification defines a service as any feature usable by another (remote) Bluetooth device.

♦ Object Exchange (OBEX) protocol layer like the Hyper Text Transfer Protocol (HTTP) protocol supports transfer of simple objects, like files, between devices using the RFCOMM channel.

7.1.3.2 Mac OS X Bluetooth Profiles and Applications Mac OS X also implements several Bluetooth profiles as shown in the Mac OS X Bluetooth protocol stack on the left of the above Figure 4, where each profile defines a particular usage of the protocols and how each profile is built on top of particular protocols [29]. Some of the available profiles are: ♦ HID profile supports Bluetooth-enabled HID-class devices, such as keyboards, mice, mobile phones

and the Senseboard in order to work transparently with a Mac OS X system. In addition the HID Manager API can be used to access a Bluetooth device.

♦ Serial port profile provides a bridge from the RFCOMM protocol to the built-in serial port driver. ♦ Synchronization. Supports synchronization of data between a computer and a device such as a

Bluetooth-enabled PDA. ♦ Object push profile allows the transfer of small files that are no more than several hundred Kilobytes

in size, between Bluetooth-enabled devices i.e. mobile phone and computer. ♦ FTP allows a Bluetooth device to be treated as a remote file system, which can be browsed to get

directory listings and transfer files. Mac OS X also provides Bluetooth-specific applications available in /Developer/Applications /Utilities/Bluetooth, to guide users through various set-up procedures, such as configuring new Bluetooth devices and setting up serial-port communication but to use them, a computer must include a Bluetooth module. Some of the Bluetooth applications are: ♦ Bluetooth File Exchange uses the FTP profile to support the exchange of files i.e. images or

documents between two connected Bluetooth devices such as PDAs and mobile phones. ♦ Bluetooth Serial Utility allows set up of a serial-port emulation. ♦ Bluetooth Setup Assistant guides the user through the configuration of a new Bluetooth device,

setting it up to work with system services, such as iSync. ♦ Packet Logger monitors and saves to a log file all Bluetooth traffic being transmitted on the computer

and can be used to help debug problems in applications or with Bluetooth hardware. ♦ Bluetooth Explorer which allows for:

Verification that a new Bluetooth service is properly registered Viewing of a computer’s Bluetooth hardware information Performing inquiries and viewing results of discovered devices Viewing active Bluetooth connections Selection of different Bluetooth hardware attached to the computer

7.1.3.3 The Mac OS X Bluetooth API Overview The Mac OS X Bluetooth API [30] consists of two frameworks found in /System/Library/Frameworks, IOBluetooth.framework and IOBluetoothUI.framework that provide all the methods and functions an application needs to use Bluetooth-specific functionality. The frameworks APIs are written in both C and Objective-C and follow a naming convention to show the similarities and ease reading between the two versions. ♦ The Bluetooth Framework contains the API used to perform Bluetooth-specific tasks and contains

methods and functions that i.e. create and destroy connections to remote devices, discover services on a remote device, perform data transfers over various channels and receive Bluetooth-specific status codes or messages.

♦ The Bluetooth UI Framework contains the API used to provide a consistent user interface in applications. This API provides Aqua-compliant panels an application can present to the user, which help to perform tasks i.e. creating connections, pairing with remote devices and discovering services.


18

7.1.3.4 Bluetooth Classes Applications can be developed in C or Objective-C but one can also access classes and objects in a C or C++ application using references of the form ObjectNameRef. The Bluetooth framework contains 11 classes, some of which are base classes that provide useful subclasses. Other classes are accessible only through instances that are created by intermediate objects. The inheritance relationships of the Bluetooth framework classes, directly or indirectly, inherit from NSObject the root class of most Objective-C class hierarchies. Figure 5 below shows Bluetooth classes [31] that applications can use and how they fit into the Mac OS X Bluetooth protocol stack.

Figure 5:Bluetooth classes in the Bluetooth protocol stack

http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; -> Bluetooth on Mac OS X -> The Bluetooth Classes

7.2 Simple DirectMedia Layer The SDL library was created by Sam Lantinga and was first released in 1998 by a company called Loki Software. SDL [32] is a cross-platform multimedia library designed to provide low level access or support to audio, keyboard, mouse, joystick, file access, event handling, timing, threading, 3D hardware via OpenGL, and 2D video framebuffer, and is used by MPEG playback software, emulators, and many popular games. SDL supports among others Linux, Windows, MacOS, Mac OS X, and FreeBSD, and is written in C, but works with C++ natively, as well as has bindings to several other languages, including Ada, C#, Haskell, Java, Lisp, Objective C, Pascal, Perl and Smalltalk. SDL is open source, as long as one links with the dynamic library, which makes it a common choice for many multimedia applications. SDL itself acts as a cross-platform wrapper, and is often used with OpenGL to provide fast 3D rendering. The syntax of SDL is function-based meaning all operations are done by passing parameters to functions and special structures are also used to store the specific information SDL needs to handle. The library is divided into several subsystems, namely the Video (which handles both surface functions and OpenGL), Audio, CD-ROM, Joystick and Timer subsystems.


19

7.3 Senseboard Information from Senseboard Technologies AB [1] and the Senseboard manual as well as from Gunilla Alsiö and Lars Asplund, was gathered and studied in order to understand how the Senseboard works. It’s a two-part device whereby both devices are identical and are placed on the hands as shown in Figure 6 below, in the curve between thumb and forefinger, and behind the knuckles. But to achieve the goals of this project, only one of the devices was needed.

Figure 6: The Senseboard

Photos taken from the Senseboard manual

The Senseboard as seen in Figure 6 has different parts to it:

1. Light-Emitting Diode (LED – a semiconductor device that emits incoherent narrow-spectrum light) lights up as blue denoting Bluetooth activity,

2. LED that has two functions whereby if it lights up green it denotes that the Senseboard is active for use and if it is red it denotes charging the battery.

3. USB-port via which the battery can be recharged from a computer. The Senseboard has to be switched off in order to do this.

4. A button that is used to change data values or reset the Senseboard, which is called the value button in the entire thesis.

5. ON/OFF button. The Senseboard connects to the computer through Bluetooth technology, via the Bluetooth module, and sends data signals as text strings, one string at a time with z, x and y coordinates beginning with the Most Significant Byte to the Least Significant Byte (MSB to LSB). The values in a string depend on what hand the Senseboard is on, and are the opposite values of the other hand. The string is 18 characters per line of hexadecimal (hex) numbers 0-9 A-F, starting with the number 1 or 0 depending on whether the button (number 4 in Figure 6) is pressed, then space or the minus sign for negative numbers, then four hex z-values, then space or the minus sign, then four hex x-values, then space or the minus sign, then four hex y-values, then space, and lastly carriage return or line feed, for example [1 098A-0215 BCFD CR] or [0 –0123 ABCD 1E4F CR].


20

7.4 The DSPrototype The MRProgram establishes a connection via sockets to the DSPrototype and then sends the sign/gesture commands, which are later executed. What the user sees is the two-dimensional surface of the DSPrototype, whose contents he/she manipulates, and therefore in order to understand it and make changes that will effect the commands, not only did we have to study OpenGL and SDL but also three publications from the Siblings interface paradigm project [3] were read that gave a background understanding of this navigation tool. The DSPrototype, a software prototype from the Siblings project that is led by Rikard Lindell, is used for GUIs currently applied to music creativity improvisation. According to its members, the Siblings project is leaving today’s desktop behind and instead shifting towards a database that contains structured information. It is searching for a general interaction paradigm for multi-modal user interfaces that will improve some of what is perceived as the shortcomings of the file system GUIs i.e. failings in ways with support collaboration, scaling and not fully supporting management of large flows and deposits of information. What the Siblings project proposes is an interface paradigm where in order to search for information, the file system and graphical user interface are exchanged with a database that holds the information, and a data surface interaction where content is visualised on an infinitely large two-dimensional surface. The Siblings project means for this paradigm to be scalable, support collaboration, be searchable, reachable, and accessible at all times so that the user should not have to think about what element of information goes with what application. The following chapters summarize the research and evaluations described in the papers ”Users Say: We Do Not Like to Talk to Each Other” [33], “When Information Navigation Divorces File Systems - Database Surface Prototype Results” [34] and “The Data Surface Interaction Paradigm” [35], that were carried out between 2003 and 2005 by Rikard Lindell and Thomas Larsson (a Lecturer and Ph.D. student at the Department of Computer Science and Engineering at Mälardalen University), and resulted in the construction of the DSPrototype.

7.4.1 A collaborate tool Lindell, 2003 states in the paper “Users Say: We Do Not Like to Talk to Each Other” [33], that current music technology is moving towards software emulation of music hardware equipment as it runs natively on a computer's processor and has the advantage of letting users save all parameters to disk saving space and money. With software the advantages a vintage synthesiser of the 70s had with direct audible and tactile feedback as well as precision, are gone, and the user interaction through the desktop metaphor interface, designed for one on one interaction, complicates collaboration because hardware units were accessible to multiple users, whereas the computer is only accessible to one user making the creation of music, which used to be a social activity, a solitary one. The basis of the experiment by Lindell, 2003 was to introduce a possible way of collaboration in computer-supported composition. A collaborative music improvisation test was performed using a music software program and three PowerBook computers connected and synchronised through MIDI to create a common acoustic space. The experiment would serve as requirements input for the design of a collaborative user interface metaphor for information navigation and manipulation of multi-modal content. The users’ common denominator was that though they enjoyed creating music together, they did not like to talk to each other while improvising music, which Lindell, 2003 found was unacceptable for collaboration in the long run. When designing a collaborate tool that simplifies collaboration and communication as well as gives the users some degree of control of the elements of a song that was put there by another participant, the users suggested a common visual workspace with feedback of what everybody else was doing. They also suggested spatially semantically organised groups according to characteristics of genre, tempo, timbre, and key in order to solve the problem of navigation of sound files.

7.4.2 A database surface According to Lindell, 2003 in the paper “When Information Navigation Divorces File Systems - Database Surface Prototype Results” [34], the interface paradigm design of the desktop metaphor found on various platforms as well as in many application areas is very similar to the original design of the Star system at Xerox PARC in the late 70s for office applications and desktop publishing. Computational resources available then were weak with limited memory and storage space but since then, flow and storage of information that users have to handle have vastly increased. Due to the availability of the Internet people have a new stage to meet, share information and collaborate, yet the tools and the interfaces of computers are still unsatisfactory i.e. collaborating in the creation of a content rich document such as music, animations or movies, is still limited to sending files attached to emails or on shared file servers. Furthermore many of the most devastating user errors are mode-errors, which come from the user not recognising the mode of the


21

system. Some feel that it is impossible to create a modeless interface but rather than yield to modes, the belief by Lindell, 2003 is to get to the root of the problem which is the file, because files have two modes: open and closed. In the closed state users have to navigate their files with a file management tool which has a few attributes, most commonly name and file extension, providing the user with a hint to the file's content, and the application programs that extend the computer's system abilities are distributed as files making it the users task to appoint the application program to a file type. The conviction of Lindell, 2003 is that the spine of the computer system should be changed from a file system to a database, which also contains all service components to manipulate its contents. Since more and more services are database driven i.e. web-news pages, corporate business systems, all contents users normally locate on a computer's file system volume should be put in this database. Database concurrency enables collaboration, sharing and communication, which would allow many users to access the same data i.e. simultaneously work on a document across networks. Furthermore an immediate consequence of database persistence is the modelessness of the contents, which means the user does not have to invoke open, close or save commands. Database systems are also well suited for information access and search tools because with the vast storage capacity of the current systems, powerful information query mechanisms are vital in database servers. The big issue for Lindell, 2003 was to find a general visualisation technique to visualise the content of this database with such a vast range of different types of objects and information. Instead of using current visualisation techniques i.e. local scope zoom, a global zoom interface paradigm for the entire database was proposed, in which the database is visualised as an infinitely large two-dimensional surface. Due to the layout, the users know the kind of information elements from their position and with no size restrictions to the database surface the information space can grow indefinitely. To test the idea of using the zoom interface paradigm to visualise the contents of a database, Lindell, 2003 carried out a few experiments to among others show if a database surface navigated by zoom interface paradigm, could replace a file system in music creation, and if fluent zoom utilising a grab-and move metaphor to pan the database surface, yielded better results than discrete zoom utilizing a trajectory zoom method that allows both zoom and pan in one action. Fluent zoom was found more satisfactory than discrete zoom according to the test subjects, however discrete zoom could not be rejected as a navigation method for the database surface in the global scope. Due to the support found in user evaluations on the basic idea of using a database surface instead of a file system, the next step for Lindell, 2003 was to implement a prototype for music creation with emphasis to live music creativity, collaboration, and concurrency. A database was to be at its core, allowing collaboration and music creation across the Internet.

7.4.3 The DSIP approach In contrast to the desktop metaphor and based on research and evaluations described in the papers “Users Say: We Do Not Like to Talk to Each Other”, 2003 and “When Information Navigation Divorces File Systems - Database Surface Prototype Results”, 2003, a different content centric data surface interaction paradigm (DSIP) for graphical user interfaces of computers was constructed by Lindell et al., 2005 and is described in the paper “The Data Surface Interaction Paradigm”, 2005 [35]. In short, the DSIP applies to music creativity improvisation designed to work for large deposits and flows of information, user collaboration, open-ended creative tasks, and multi modal interfaces. It is content-centric meaning small-embedded software plug-in components implement the functionality so users do not have to conduct any explicit file management as the system takes care of the information and all content remains visually present in its context and the surrounding set of elements. All content is visualized on a flat, infinitely large two-dimensional surface with varied scaling that permit hierarchical relations between different content information elements. Content is navigated by graphical trajectory zoom and pan, which visualize all the details even in de-zoomed state i.e. overview, and navigation is further aided by incremental search. Feedback is immediate for each keystroke and for search condition satisfaction, reminding users of content locus so information that is not of interest becomes transparent. The search also selects the content for command invocation and manipulation that is provided by the content of the selected component, therefore context help and text completion aid users in quickly finding the suitable command. A single model sustains recognition for learning and recollection for efficient use, which is advantageous to creative tasks and actions. In the data surface environment, shared synchronized surface areas favor collaboration because users can work more easily together on a project on the shared flat surface, which is also suitable for creative and open-ended collaborative tasks. Visual feedback makes users aware of each other’s actions and provides an external referent for negotiations. The interaction styles favor multi modal user interfaces instead of pixel precision and window manipulation and by removing the WIMP-components (Windows, Icon bars, Menus and Pointing devices), the paradigm allows other interaction techniques such as eye gazing, gestures, handwriting and speech recognition. [35]


22

To test the DSIP approach, a prototype tool called Data Surface Interface Paradigm Prototype (DSPrototype) that has a set of unique available components, was designed and implemented by Lindell et al., 2005. The components are: a text component that allows users to write text in any empty region of the data surface, a sound component that plays and displays streamed sound contents, a sound controller component used for manipulation of sound length, pitch, volume level and pan, and finally a song arrangement matrix component that helps in the arrangement of sound controller components to form a song. Figure 7 below shows an overview of the visual appearance of the content for which these music tool components were created. In the figure numbers, (1) in the top left corner show all the sound loops in the experiments, (2) in the bottom half shows the song arrangement component, and (3) shows that loop components copied to the arrangement become embedded in a loop controller component.

Figure 7: Overview (zoomed out) of the data surface

Rikard Lindell and Thomas Larsson (2005), The Data Surface Interaction Paradigm; in proceedings of TPCG ’05, Eurographics Association [35]

Navigation of the content through trajectory zoom interaction is done with the scroll wheel of the mouse as visualized by the example in Figure 8 below. The figure shows letters; (a) before the zoom action is applied, the user aims the cursor at the position marked with a white cross in the upper left corner, (b) how the user has started to turn the scroll wheel and (c) how the user has reached the desired level of zoom. The surface is in Swedish because the user interviews performed by Lindell et al., 2005 were done in that language.

Figure 8: The zoom action.

a)

b)

c)

Rikard Lindell and Thomas Larsson (2005), The Data Surface Interaction Paradigm; in proceedings of TPCG ’05, Eurographics Association [35]


23

The idea behind designing a text-based command evocation method by Lindell et al., 2005, was that the users should not have to look at the tool, only the content information. So the number of available commands is limited by the selected content and for each key press, a feedback help list displays the commands (i.e. help, play, stop), which enables users to investigate the results of their intended action in advance. Real-time synchronization of data surface contents on different devices supports collaboration, mutual awareness of action, and mutual modifiability because all actions and commands are echoed to the other collaborating participant’s machine. The appearance of the visual and acoustic contents is exactly the same on all machines and users can comment on the contents by typing messages. The data surface information content is linked to a database containing components arranged in hierarchical scene graphs providing the components relative position and scale. SDL was used in the implementation since it makes for easy setup of an OpenGL context. The prototype was tested by Lindell, et al. 2005 on ten user subjects and followed up by debriefing interviews focusing on the aspects of navigation, layout and design, command invocation, note writing, and collaboration. The DSPrototype according to these evaluations showed very good results for a loop-based improvisation and live music creativity tool. Users were very pleased with the navigation as the approach supports their creativity for the task. Although the command model was not fully embraced, the users enjoyed the music tool implemented based on the DSIP. Using the Senseboard to send commands to the DSPrototype eliminates the use of the keyboard or mouse, and using the Mac OS X environment eases connection to the MRProgram as well as setting up the thesis system.

7.5 Sign-, body language and gesture study Face-to-face contact is the most primitive form of communication in terms of technology although considering the style and interplay between different channels, it’s the most sophisticated mechanism available because it involves not just speech and hearing but also the subtle use of body language and eyegaze [5]. We gesture using our body although mainly our hands to indicate items of interest, consciously or deliberately by pointing to the item or a slight wave of the hand or alignment of the body. To analyse what signs and gestures were needed for the sign/gesture commands used in combination with the Senseboard, required studying the sign language alphabet, sign language and gestures, as well as the relationship between sign language and speech language from a cognitive perspective. In addition, how this modularity can be used in this thesis and has been used to steer computers. Research was also done on gesture control and sign language recognition, as well as what interaction designs and techniques were applied to these. Other arbitrary body language and gesture conventions are found in i.e. choreography, classic ballet, mime dance and flight attendants’ signal language. Specifically for this thesis, the subjects of great interest were, Human Hand Gesture Research and Sign Language Recognition in order to examine the signs or gestures required when using the Senseboard to send the following commands in order to manipulate the DSPrototype content:

• Play/ Stop/ Mute sound • Zoom in/ out • Increase/ Decrease volume • Write text • Help • Pan

7.5.1 The history of sign language The origin of sign language [36] is an ongoing discussion and there are different opinions as to where it all started. The most likely explanation is that it has always existed since the first deaf person. Some people argue that sign language has its origin in France 1770 when a Frenchman called Abbé de l’Epée started a school for the deaf in Paris. L’Epée taught teachers from other countries how to teach sign language to the deaf, but he could only teach the technique and not the language since it varies from country to country. The different languages is another indication that the deaf have communicated with signs, body language and mimic long before there were schools for the deaf. In Germany, Samuel Heinicke, a pedagogue working with the deaf was critical to sign language because he thought that the deaf would be better off learning the so-called speech method, which is the spoken language. He went as far as forbidding the sign language at his school and today there are several schools for the deaf who teach the speech method. In Sweden the first written evidence of sign language is in a document, dated 1759, from the academy of sciences and was part of a description of the parish of Ålhem, composed by the senior master of Kalmar high school. It stated that a


24

man called Lars Nilsson had made himself understood by signing because he could not speak. Aron Borg started in 1808 the first Swedish school for the deaf and blind, and although he was not a student of l’Epée, he was later influenced by his work. Borg used the students’ own signs and started group exercises to develop uniform sign language and mimic among the students. This school called Manillaskolan still exists today in Stockholm. Some examples of sign languages are American Sign Language (ASL), British Sign Language (BSL) and Swedish Sign Language. More is described in Signs of the Time: A Review of the Impact of Artificial Intelligence on Sign Language and Deaf Education Past, Present and Future by Becky Sue Parton, Dr. Mark Mortensen and Dr. Ennis-Cole [36].

7.5.2 Body language Through body language we express our feelings and emotions, both deliberately and instinctively. Even infants express their feelings this way, though mostly instinctively, and they learn fast manipulate in order to get what they want. Also the body language differs between countries, epochs and cultures, in the way of communicating and expressing oneself. Usually the head movements are important in the body language, for example pointing out a direction, where pointing while your head is looking in the same way makes the pointing stronger than if your head is turned the opposite way. [37] Dance is also part of the body language and different dances follow a certain code. The classical dances have a very strict code that they follow, while the modern dances vary more but are more receptive to new movements. In oriental dancing, for example in Bali and Java, the dancers express themselves through the body movements and the most important body part in the dance is the position of the hands and fingers. It is called mudras, and has very strict codes and the gestures can be read like a book. Further details are described in Brita Bergman’s Tecknad Svenska [37].

7.5.3 Sign language recognition Sign languages are just like spoken languages in that they vary from country to country, have their own grammar and rules. Sign language is a well-defined language and therefore easy to use when interpreting gestures. The problems lie in the usage of the signs because not everyone is doing the exact same movement, and also difficult to tell the computer when a sign starts and stops. Another thing that comes to mind is how fast a person will be able to sign in order for the computer to interpret correctly. These are difficult issues that one has to control in order to make gesture recognition work properly. A person using sign language does not only use the hands to make him or herself understood but also mimic. Mimic is the usage of the mouth, facial expression and head movements. Of course one can have a full body suite or camera to capture every movement of a person but needless to say this would make it even harder to interpret than sign language [38]. For further reading, refer to white paper, Machine recognition of Auslan signs using PowerGloves: Towards large-lexicon recognition of sign language by M. Waleed Kadous [38].

7.6 Signal processing A signal [39] is defined as any variable that carries or contains some kind of information that can i.e. be conveyed, displayed or manipulated i.e. speech encountered in telephony and radio, biomedical signals such as the electroencephalogram (brain signals), sound and music reproduced by the cd player, video and image which most people watch on TV, and radar signals used to determine the range and bearing of distant signals. Signal processing [39] is the analysis, interpretation and manipulation of signals via hardware or software to obtain the spectrum of the data or to transform the signal into a more suitable form. This includes storage and reconstruction, separation/filtering of information to remove or reduce interference and noise (i.e. aircraft identification by radar or Bluetooth identification of a particular device), compression (i.e. image compression) and feature extraction (i.e. speech-to-text conversion). In this thesis data signals are sent as text strings via Bluetooth by the Senseboard to the MRProgram and are processed by extracting the desired information as well as performing operations on the signals for gesture recognition.


25

7.7 Recognition Algorithms The three AI algorithms possibly relevant to this thesis are: Hidden Markov Model (HMM) [40], Self-Organizing Maps (SOM) [41] and Growing Hierarchical Self-Organizing Map (GHSOM) [42]. The aim was to find out if one of them could be used in the implementation of the MRProgram separately or combined with one other or all, to perform gesture recognition, hence studying, the mathematics behind them such as probability theory and problems as well as talking to Richard Bonner, Senior Lecturer at the Department of Mathematics and Physics to explain the mathematics behind these AI algorithms.

7.7.1 Hidden Markov Model The HMM is the result of attempting to statistically model the speech generation. During the past several years it has become the most successful speech model used in Automatic Speech Recognition (ASR), the main reason for this success being its ability to characterize the speech signal in a mathematically tractable way. The basic task of ASR is to derive a sequence of words from a stream of acoustic information. HMMs can also be used to interactively recognize gestures, perform online learning of new gestures and iteratively update its model of a gesture.

7.7.1.1 Definition An HMM is a representation of a Markov process, which cannot be directly observed, a “doubly stochastic” system [40]. A Markov process is a finite state machine or finite state automation with probabilities for each transition, that is, a probability that the next state is sj given that the current state is si. Finite state machines move through a series of states and produce output either when the machine has reached a particular state or when it is moving from state to state. The HMM is a finite set of states, each of which is associated with a probability distribution and produces an output with a certain probability. Transitions among the states are governed by a set of probabilities called transition probabilities. The HMM also has an output alphabet, output probabilities, and initial state probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the current state, that is visible to an external observer and therefore states are “hidden” to the outside; hence the name Hidden Markov Model. An HMM is also an AI algorithm, a search algorithm based on chains of probability, where the probability of a specific event is depending on the previous events.

7.7.1.2 Three problems of HMMs There are three problems commonly associated with HMMs [40]: ♦ The Evaluation Problem: How does one determine the probability with which a given sequence of

observation symbols would be generated by an HMM i.e. the problem of recognizing a gesture from a given set of input data? There is also the problem of ambiguity between two or more gestures. The algorithm commonly used as a solution to this problem is the forward algorithm.

♦ The Decoding Problem: How does one determine what is the most likely sequence of internal states in a given HMM that produced the given observations? The algorithm commonly used, as a solution is the Viterbi algorithm where the whole state sequence with the maximum likelihood is found.

♦ The Learning Problem: How do we adjust the HMM parameters, so that the given set of observations (called the training set) is represented by the HMM in the best way i.e. developing the HMM which will be associated with a gesture? The algorithm commonly used as a solution is the Baum-Welch (BW) algorithm.

Further reading can be found in a 1996 Master’s thesis report, A hybrid ANN-HMM ASR system with NN based adaptive preprocessing by N. D. Warakagoda [40].


26

7.7.2 Self-Organizing Maps The SOM is a neural network recognition algorithm invented by Professor Teuvo Kohonen in the early 1980s as a data visualization technique that reduces the dimensions of data [41]. The first application area of the SOM was speech recognition, more accurately, speech-to-text transformation. The way SOMs reduce dimensions is by producing a map that projects the data usually onto a 1 or 2 dimensional display, which plot similarities of the data by grouping similar data items together. SOMs accomplish two things that can be used at the same time; reducing dimensions of data by clustering and displaying similarities. The basic SOM can be visualized as a network of 1 or 2 dimensional arrays, the cells (nodes or neurons) of which become specifically tuned to various input signal patterns or classes of patterns in an orderly fashion. The learning process is competitive and unsupervised, meaning that no teacher is needed to define the correct output (or in reality the cell into which the input is mapped) from an input. When an input arrives, the neuron that is best able to represent it wins the competition and is allowed to learn it even better. Basically only one map node (winner) at a time is activated and corresponding to each input. If there exists an ordering between the neurons, the competitive learning algorithm can be generalized, but if not, not only the winning neuron but also its neighbours on the map are allowed to learn. Neighbouring neurons will gradually specialize to represent similar inputs, and the representations will become ordered on the map. This is the essence of the SOM algorithm. The learning process of SOMs may be described as a competition among the units to represent the input patterns. The unit with the weight vector being closest to the presented input pattern in terms of the input space wins the competition. The weight vector of the winner as well as units in the vicinity of the winner are adapted in such a way as to resemble more closely the input pattern.

7.7.2.1 Architecture Figure 9 below is a graphical representation of self-organizing maps and the learning process. The map consists of an output space with a square arrangement of 7x7 neural processing elements, i.e. units, shown as circles on the right hand side of the figure. The black circle indicates the unit that was selected as the winner, c, for the presentation of input pattern x(t). The neuron vector of the winner, mc(t), is moved towards the input pattern and thus, mc(t+1) is closer to x(t) than mc(t) was. Similar, yet less strong, adaptation is performed with a number of units in the vicinity of the winner. These units are marked as shaded circles. The degree of shading corresponds to the strength of adaptation. Thus, the neuron vectors of units shown with darker shading are moved closer to x than units shown with lighter shading.

Figure 9: Architecture of a 7x7 SOM

http://www.ifs.tuwien.ac.at/~andi/ghsom/description.html

As a result of the training process, similar input data are mapped onto neighbouring regions of the map.


27

The advantages include: SOMs are very easy to understand because it’s simple to read a map depending on the colour connection of the neurons hence anyone can quickly pick up on how to use them in an effective manner, and they work very well in that they classify data well and can easily be used to evaluate their own quality since one can actually calculate how good a map is and how strong the similarities between objects are. However the disadvantages include: getting the right data which is not easy because unfortunately one needs a value for each dimension of each member of samples in order to generate a map, sometimes simply not possible and often very difficult to acquire all of this data, a limiting feature often referred to as missing data. Furthermore every SOM is different and finds different similarities among the sample vectors. Since SOMs organize sample data so that in the final product the samples are usually surrounded by similar samples, however similar samples are not always near each other and therefore a lot of maps need to be constructed in order to get one final good map. Finally a major drawback is that SOMs are very computationally expensive since as the dimensions of the data increases; dimension reduction visualization techniques become more important, which unfortunately then time to compute them also increases. For calculating a similarity map as the one in Figure 9, the more neighbours you use to calculate the distance the better similarity map you will get, but the number of distances the algorithm needs to compute increases exponentially. More of the above is described in 1999 courses by Tom Germano [41].

7.7.3 Growing Hierarchical Self-Organizing Map The GHSOM [42] is proposed as an extension to the SOM mainly because of two shortcomings of the SOM: ♦ SOM has a fixed network architecture i.e. the number of units to use as well as the layout of the units

has to be determined before training. ♦ Dynamically growing variants of the SOM, tend to produce huge maps that are hard to handle and

input data hierarchical in nature should be represented in a hierarchical manner for clarity of representation.

The key idea of the GHSOM is to use a hierarchical structure of multiple layers, where each layer consists of a number of independent SOMs, therefore in some respects the GHSOM is an incrementally growing version of the SOM. The GHSOM grows both in a hierarchical and a vertical way according to the data distribution, allowing a hierarchical decomposition and navigation in sub-parts of the data, and in a horizontal way, meaning that the size of each individual map adapts itself to the requirements of the input space. This provides a convenient interface for navigating large digital libraries, as it closely follows the model of conventional libraries, which are also structured by i.e. different floors, sections and shelves. The structure is similar to a tree where SOM at each level will branch out to more SOMs at the next level.

7.7.3.1 Architecture As shown in Figure 10 below, layer 0 consists of a single-unit SOM, layer 1, which provides a rough organization of the main clusters in the input data, consists of a 2x2 SOM and for each unit in this layer's SOM, additional SOM may be added to the next layer of the hierarchy. The three independent maps in layer 2 offer a more detailed view on the data and five units from two of the second layer maps have further been expanded into third layer maps to provide sufficiently granular input data representation. The same structure applies to lower level SOMs and in addition to that, the training algorithm will determine the size of lower level SOM.


28

Figure 10: Architecture of a trained GHSOM

http://www.ifs.tuwien.ac.at/~andi/ghsom/description.html As mentioned before the GHSOM will grow in two dimensions i.e. in width (by increasing the size of each SOM horizontally) and in depth (by increasing the levels of SOM in a hierarchical way). For growing in width, each SOM will attempt to modify its layout and increase its total number of units systematically so that each unit is not covering too large an input space. The starting point of the growth process is with layer 0, which consists of only one single unit SOM. The weight vector of this unit is initialised as the average of all input data and the deviation of the input data, i.e. the Mean Quantization Error (MQE) of this single unit is computed. The MQE, a criterion to guide the training process, is calculated as the sum of the distances between the weight vector of a unit i, and the input vectors mapped onto this unit. Hence the MQE for the entire SOM is the mean of all the units’ quantization errors. Then training of the GHSOM starts with a small map of units in layer 1. As for deepening the hierarchy of the GHSOM, the general idea is to keep checking whether the lowest level SOMs have achieved sufficient coverage for the underlying input data. The training process and unit insertion procedure now continues with these newly established SOMs. The major difference to the training process of the second layer map is that now only that fraction of the input data is selected for training which is represented by the corresponding first layer unit. The strategy for row or column insertion as well as the termination criterion is essentially the same as used for the first layer map. The same procedure is applied for any subsequent layers of the GHSOM. The growth of the hierarchy is terminated when no further units are available for or require further expansion. The advantages of GHSOM include: providing a convenient way to self organize inherently hierarchical data into layers, and giving users the ability to choose the granularity of the representation at the different levels, at the same time the GHSOM algorithm will also determine the levels of SOM automatically, which is an improvement over the SOM. The disadvantages however include: depending on the fraction and threshold chosen, the structure of the GHSOM does not necessarily lead to a balanced hierarchy i.e. a hierarchy with equal depth in each branch. The depth of the hierarchy will reflect the diversity in input data distribution that should be expected in real world data collections. High fraction and threshold will probably generate a flat GHSOM with large SOMs and on the other hand, a low fraction with a low threshold will probably turn out a deep hierarchy with small maps. The resultant grouping turned out may not be too meaningful. In extreme cases one may end up with only one large map therefore some knowledge about the input data will be useful. Some trial and error will be needed to experiment with the values for fraction and threshold.

7.7.4 Brief Summary After much research on these three algorithms, the algorithm easily understood, less complicated and suitable for our implementation was the HMM but in a much simpler version to suit our needs, because developing maps for either the SOM or GHSOM proved time consuming and more complex. The problem of recognizing a gesture from a given set of input data as well as the ambiguity between two or more gestures is an example of the Evaluation problem. Developing the HMM to be associated with a gesture is an example of the Learning problem.


29

7.8 Human-Computer Interaction HCI [5] the area of shared interest between Computer Science and Cognitive Science, is the study of the interaction between a person/user or people/users and computers, concerning the physical, psychological and theoretical aspects of this process. The user being whoever is trying to get the job done using technology, the computer meaning any technology ranging from a desktop computer to a process control system or non-computerized parts, and interaction meaning any communication between user and computer that is direct (dialog with feedback and control throughout performance of the task) or indirect (batch processing or intelligent sensors controlling the environment). The importance is the user interacting with the computer in order to accomplish something. For designing a system, HCI is a major part of the design process involving the design, implementation and evaluation of interactive systems such as the thesis interactive system, in the context of the user’s task and work. “HCI is undoubtedly a multi-disciplinary subject. The ideal designer of an interactive system would have expertise in a range of topics: psychology and cognitive science to give her knowledge of the user’s perceptual cognitive and problem-solving skills; ergonomics for the user’s physical capabilities; sociology to help her understand the wider context of the interaction; computer science and engineering to be able to build the necessary technology; business to be able to market it; graphic design to produce an effective interface presentation; technical writing to produce manuals, and so it goes on. There is obviously too much expertise here to be held by one person (or indeed four)…” [5] Since people use computers to accomplish work the major issues of concern are: the people, the computers, the tasks performed, and the usability of the system in supporting the user’s task meaning that if the system forces the user to adopt an unacceptable mode of work then it is not usable. In order to ascertain success and usability of the thesis interactive system, user experience and attitude towards this system as well as if using sign language and gestures is a good way of interacting with computers, the system has to be: useful in accomplishing what it is required namely gesture recognition; usable in order to do it easily and naturally i.e. without danger of too much error; used making people want to use it i.e. be attractive, engaging, fun etc and finally provide feedback on performance such as execution of a command The system needs to be tested to ensure that it actually behaves as expected and meets user requirements therefore evaluation tests the usability, functionality and acceptability of an interactive system and takes place in the laboratory or in the field. The evaluation techniques used are expert analysis, which is done by the designer or a usability expert and is useful for minimizing the cost of early design errors and prototypes, and user participation, which studies actual use of the system and normally requires a working prototype or implementation. In this thesis the latter is used. “Evaluation has three main goals: to assess the extent and accessibility of the system’s functionality, to assess users’ experience of the evaluation and to identify any specific problems with the system.”[5] The functionality of the system has to meet the user’s requirements, meaning the design should enable more ease in user performance of intended tasks by matching the use of the system to the user’s expectations of the task. Evaluation also includes measuring the user’s performance with the system, to assess the effectiveness of the system in supporting the task. Assessing the user’s experience of the interaction and its impact on the user involves aspects such as ease in learning the system, system usability and user satisfaction in terms of enjoyment and emotional response. Finally identifying specific problems involving unexpected results or confusion amongst users, trouble spots that can be then be rectified. “User participation in evaluation tends to occur in the later stages of development when there is at least a working prototype of the system in place. This may range from a simulation of the system’s interactive capabilities, without its underlying functionality…through a basic functional prototype to a fully implemented system.” [5] Once a prototype has been developed evaluation through user participation can be done using different methods that include empirical or experimental methods involving working with users and gathering data to be analyzed, observational methods involving watching the user interacting with the system while recording their actions, and query techniques involving questionnaires and interviews. These methods can be used to get both qualitative and quantitative data [43]. Qualitative evaluation looks at how the user feels about the system, problems experienced, changes they feel might be needed, etc, and can be categorized but not reduced to numerical measurements whereas quantitative evaluation takes measurements.


30

8 Realization The relevance and importance of chapter 7 was in having a general knowledge into the background of ♦ The Mac OS X environment in which the thesis system was to operate through studying its

architectural layers including the services provided such as: Connections using device drivers allowing plug and play when adding devices such as mice,

keyboards and Bluetooth modules and devices, File systems used during socket connections, Core services that include libraries and frameworks as well as application services provided,

when programming interfaces or writing software. Useful when designing, implementing, debugging and testing the MRProgram, as well as the system,

How graphics are handled with OpenGL to understand about the DSPrototype Application environments that were used to develop the thesis system. Sockets that provide connection between the MRProgram and DSPrototype

♦ Bluetooth technology and how it is used in today’s society as well as its relevance to the thesis, ♦ Bluetooth on Mac OS X in connections to the Bluetooth HIDs i.e. the computer, Senseboard and

mobile phone, and how these connections are done using a Bluetooth module inserted in a USB port then using device drivers. Furthermore what frameworks and applications are used when implementing the MRProgram so as to access the data signals from the Senseboard in order to perform gesture recognition.

♦ SDL for a better understanding of the DSPrototype so as to alter it in order to execute commands sent via sockets from the MRProgram.

♦ The Senseboard, which connects to the computer through Bluetooth technology, via the Bluetooth module, and sends continuous data signals as text strings.

♦ The DSPrototype, which was developed in the Mac OS X environment through research and evaluations as described in three publications from the Siblings interface paradigm project. It is a software prototype used for GUIs currently applied to music creativity improvisation where in order to search for information, the file system and graphical user interface are exchanged with a database that holds the information, and a data surface interaction seen by the user as a two-dimensional surface, whose contents he/she manipulates via keyboard and mouse, using commands such as Play sound, Stop sound, Zoom in, Zoom out, Increase/Decrease volume, Mute sound. Using the Senseboard to send commands to the DSPrototype eliminates the use of the keyboard or mouse. The MRProgram establishes a connection via sockets to the DSPrototype and then sends the sign/gesture commands, which are later executed. Hence it was important to understand this navigation tool in order to later make changes that would effect the commands and using the Mac OS X environment eases connection to the MRProgram as well as setting up the thesis system.

♦ Sign-, body language and gesture study to analyse what signs and frequently used gestures were needed for the sign/gesture commands used in combination with the Senseboard. This required studying the sign language alphabet, sign language and gestures, as well as the relationship between sign language and speech language from a cognitive perspective. Since we gesture using our body and through body language we express our feelings and emotions, both deliberately and instinctively, how this modularity can be used in this thesis and has been used to steer computers. Research was also done on gesture control and sign language recognition, as well as what interaction designs and techniques were applied to these. Since sign language is a well-defined language its easy to use when interpreting gestures though the problems lie in the usage of the signs because not everyone is doing the exact same movement, its difficult to tell the computer when a sign starts and stops, and also how fast a person will be able to sign in order for the computer to interpret correctly. Difficult issues that we have to control in order to make gesture recognition work properly.

♦ Signal processing, the analysis, interpretation and manipulation of signals such as sound, images, radar signals etc and in this thesis those of interest being the data signals sent as strings by the Senseboard. These signals have to be filtered and processed for gesture recognition.

♦ Three AI algorithms relevant to this thesis that can be used to perform gesture recognition: Hidden Markov Model (HMM) a representation of a Markov process, which cannot be directly

observed, a “doubly stochastic” system, and can also be used to interactively recognize gestures, perform online learning of new gestures and iteratively update its model of a gesture. HMM is also an AI algorithm, a search algorithm based on chains of probability, where the probability of a specific event is depending on the previous events.


31

Self-Organizing Maps (SOM) a neural network recognition algorithm invented as a data visualization technique that reduces the dimensions of data, with speech recognition, more accurately, speech-to-text transformation as the first application area. SOMs accomplish two things that can be used at the same time; reducing dimensions of data by clustering and displaying similarities.

Growing Hierarchical Self-Organizing Map (GHSOM), proposed as an extension to the SOM mainly because of shortcomings of the SOM. GHSOM is an incrementally growing version of the SOM and grows both in a hierarchical and a vertical way according to the data distribution. The structure is similar to a tree where SOM at each level will branch out to more SOMs at the next level.

The aim was to find out if one, two or all of them could be used in the implementation of the MRProgram separately or combined to perform gesture recognition.

♦ HCI the study of the interaction between a person/user or people/users and computers, concerning the physical, psychological and theoretical aspects of this process. In order to ascertain success and usability of the thesis interactive system, user experience and attitude towards this system as well as if using sign language and gestures is a good way of interacting with computers, the system has to be: useful in accomplishing what it is required namely gesture recognition; usable in order to do it easily and naturally i.e. without danger of too much error; used making people want to use it i.e. be attractive, engaging, fun etc and finally provide feedback on performance such as execution of a command.

Based on the research survey, the thesis interactive prototype system namely Minority Report System shown in Figure 11 below, was realised with the following objectives: ♦ Determining the signs/gestures that will be used to manipulate the contents of the DSPrototype, ♦ Conduction of technological studies by designing and implementing the MRProgram interface which

includes Bluetooth connection, signal processing, gesture recognition and socket connection, as well as making changes to the DSPrototype for socket connection and command execution.

Figure 11: Minority Report System

8.1 Determining signs/gestures There are many commands in manipulating the DSPrototype such as Play sound, Stop sound, Zoom in, Zoom out, Increase/Decrease volume, Help, Mute sound, Write text, Pan etc, but in this project only a few were chosen. Adapting sign language to the Senseboard as well as taking into account the commands that manipulate the DSPrototype, resulted in the sign/gesture commands: play, stop, zoom in, zoom out, as seen in Figure 12 below. Furthermore free gesture movement of the hand does cursor manipulation.

Senseboard

1) User signs/gestures

MRProgram DSPrototype Bluetooth module

2) Data signals sent via Bluetooth connection

Apple PowerBook with Mac OS X

4) Gesture commands sent through Socket connection

Device drivers/ Kernel extensions

3) Signal processing, Gesture recognition

5) Execution of the commands


32

Figure 12: Sign/gesture commands

Play: Pointing upwards, the user moves the finger downwards to point in front of them

Stop: Fingers upwards and palm facing forward, the user moves the arm forwards away from them

Zoom in: Palm facing up and arm extended away from them, the user moves the arm towards him/her while closing the palm

Zoom out Opposite gesture of Zoom in. Half-closed palm facing up, the user moves the arm extending it away from them, while opening the palm

8.2 Technological studies The following steps were taken to implement the thesis interactive prototype system: This includes: ♦ Implementing the MRProgram, which includes o Bluetooth communication between the mobile phone/ Senseboard and the computer, then the

MRProgram. o Socket connection between the MRProgram and the DSPrototype so as to send commands, o Gesture recognition and learning with one or more AI algorithms, o Designing an interface for the MRProgram,

♦ Making changes to the DSPrototype to execute commands.

8.2.1 Bluetooth communication Another essential framework needed besides Carbon, CoreServices and ApplicationServices, is the IOBluetooth.framework. In order for communication between any Bluetooth device and the computer, there has to be a Bluetooth pairing. Bluetooth Pairing happens when two Bluetooth enabled devices agree to communicate with one another and join what is called a trusted pair. When one device recognizes another device in an established trusted pair, each device automatically accepts communication, without the discovery and authentication process that usually happens during Bluetooth interactions.


33

The Bluetooth Setup Assistant program is used when pairing is being set up and this is what happens when setting up the computer and the Senseboard/ mobile phone: ♦ The computer searches for other Bluetooth enabled devices in the area and finds the Senseboard,

which has a setting that makes it discoverable when other Bluetooth devices search, announcing its willingness to communicate. After detection, the Senseboard broadcasts its Bluetooth Device Name, which is SenseB_MdH.

♦ The computer asks the user to enter a Passkey or PIN A passkey (or PIN) is similar to a simple made-up password that is shared and entered by both

devices to prove that both users agree to be part of the trusted pair. The passkey is entered on the spot when asked and can be anything but often, it’s a zero.

♦ The computer sends the passkey to the Senseboard for comparison and the Senseboard using its own standard, unchanging passkey sends this back to the computer. If the Senseboard's passkey is the same entered by the computer, a trusted pair is automatically formed.

Once a trusted pair is developed, communication between the two devices becomes relatively seamless, and doesn't require the above standard authentication process that occurs between two devices that are strangers. In the System Preferences under Bluetooth, one can see all the Bluetooth devices that are or have been connected to the computer. The user can see if the devices are paired, connected, their device names and addresses etc. In the MRProgram: ♦ Using the Bluetooth device address, a device reference is created whereby there is only one

reference within a single application for a given remote address. ♦ A baseband connection to the device is then opened and if it is successful, an RFCOMM channel is

opened. If this is also successful a callback is registered for events generated by the RFCOMM channel in this case the data signals sent by the Senseboard. If one of the above steps fails, the whole connection is closed and starts over.

♦ When disconnecting the Bluetooth connection, the RFCOMM channel is first closed, which starts an inactivity timer that will close the baseband connection if no other channels (L2CAP or RFCOMM) are open after a set period of time.

8.2.2 Socket connection The MRProgram program acts as the server side and uses the following system calls [21]: ♦ socket() to create a socket, which returns a file descriptor i.e. “./sample-socket”, ♦ bind() to attach an address to the socket, ♦ listen() to establish connections to that socket.

Information can now be passed through the socket by the read() and write() functions. A four-character message i.e. ‘StoC’ is sent using the write() function to ensure that the socket is working and the read() function is called so as to receive information from a client connection in this case the DSPrototype. The socket is now ready or pending to be connected to the DSPrototype. If one or all of the above steps fail, the whole connection is closed and the process has to start from the beginning. When the DSPrototype as a client process wants to connect to the server, it creates a socket in similar fashion as the server side with the exception that it tells the system the address earlier established by the server, that it wants to connect it to i.e. “//Users//build//sample-socket”, and then through connect() tries to connect to the desired address. A message i.e. ‘CtoS’ is also sent and further communication is the same as with the server side. When the server accept() s the client connection, a communication channel is established and when the messages are read by both sides, its acknowledgment that the connection has been formed. Thereafter information can be sent and received by both processes. A socket is close() ed when a process is finished with it.


34

8.2.3 Gesture recognition and Learning The Hidden Markov Model algorithm was chosen since it was easier to grasp and understand, though a simpler version of it is used in this thesis. Given that there are three problems associated with HMMs [39] the ones relevant to this thesis are: the first one namely the Evaluation problem for gesture recognition, and the third one namely the Learning problem for generating the HMMs used in gesture recognition. ♦ The Evaluation Problem: recognizing a gesture from a given set of input data. Signal processing is

first done on the signals sent by the Senseboard, and then it is determined which of a set of HMMs, each modelling a different gesture, is most likely to have generated that sequence. In addition the problem of ambiguity between two or more gestures has to be solved.

♦ The Learning Problem: How do we adjust the HMM parameters, so that the given set of observations, the training set, is represented by the HMM in the best way i.e. developing the HMM which will be associated with a gesture.

Since the data signals from the Senseboard to the MRProgram are sent as strings of hex numbers, signal processing involves extracting and performing operations on the relevant information in a string, which is the first character and the z, x, and y coordinates, for gesture recognition. The strings are thus read one string and one character at a time. The button on the Senseboard that changes the values of the data signals when pressed or not, denotes 2 types of gestures depending on the first character of the string. If the value button on the Senseboard is not pressed, the first character in the string is “1” i.e. [1 098A-0215 BCFD CR] which denotes a sign/gesture for cursor movement, otherwise the first character is “0” i.e. [0 –0123 ABCD 1E4F CR] denoting another type of sign/gesture such as play and stop. I choose to call these modes, cursor mode and gesture mode. If it’s cursor mode, the extracted string values are processed into decimal numbers later converted into x and y screen coordinates (positive integers), which are later used to determine a gesture on the direction of the cursor i.e. moving the hand from one side to the other is a gesture that moves the cursor in that direction string by string. Even if cursor movement is a gesture there are no HMMs generated for this gesture recognition. The coordinates are sent directly in a string of four characters i.e. 1234, through the socket to the DSPrototype for execution (manipulation of the onscreen cursor). If it’s gesture mode, all the input signal strings with the first character 0 are read and stored until the first character is 1. Then the stored strings are processed into decimal numbers and evaluated by a set of HMMs in this case intervals of parameters, each modelling a different gesture in order to determine gesture recognition. To ease recognition and ambiguity between gestures, the first strings determine where the sign began and the last string where the sign ended hence the direction of the gesture. Adjusting and updating the HMMs is done manually after each gesture for optimisation. When a gesture is validated by the HMMs, a string of four characters such as “PLAY”, “STOP”, “ZMIN”, “ZOUT”, is sent through the socket connection to the DSPrototype for execution. Due to time limitation and constrictions there was no implementation of online learning of the gestures developed so learning of the gestures was done offline with every user to further better recognition. The MRProgram is written in Objective-C++ and I tested it to determine its ability to accurately recognize gestures as well as optimise the HMM parameters.

8.2.4 MRProgram interface Using the Xcode program, a Carbon Application (Nib Based) project was created and using the Interface builder a simple interface, shown in Figure 13 below was created for the gesture recognition program, MRProgram, for testing and debugging. This interface is not seen by the user and is only used to set up the thesis system.


35

Figure 13: User Interface

When the connect button is pressed, Bluetooth communication to the Senseboard commences, a server socket connection is established and the program is ready for cursor management and gesture recognition. The disconnect button stops the Bluetooth communication but one can reconnect again.

8.2.5 Changes to the DSPrototype In order for the DSPrototype to receive information or the commands sent by the MRProgram, a client socket is created whereby incoming data can be read. This data is interpreted and results in the immediate execution of zoom actions while the rest of the commands are converted into an event, SDL_USEREVENT. The x and y coordinates of where the cursor is hovering are also saved to know if a command can be carried out on a valid location i.e. play a tune and not just empty space or outside the application window, and then the event put in the event queue. Actions are executed depending on what events are in the event queue i.e. play, stop etc.

8.3 Setting up the interaction The following steps are relevant in that they are used to set-up the entire interaction: ♦ The MRProgram is first run, which shows the application interface window. ♦ Pressing the connect button initiates the Bluetooth connection to the Senseboard as well as creates a

server socket connection. Thereby the program is ready to be used for gesture recognition and cursor management.

♦ The DSPrototype is then run, resulting in the two-dimensional surface of the navigation tool, and the creation of the client socket connection.

♦ Then messages can be sent to and from both programs acknowledging that the socket connection has been formed.

Users can now use the Senseboard to manipulate the contents of the navigation tool using gestures.

8.4 An example of using the thesis prototype system What the user sees is the information presented on a two-dimensional surface of the DSPrototype, which he/she manipulates through the Senseboard using gestures. The cursor mode is the standard mode unless the value button on the Senseboard is pressed. The user with the help of the Senseboard moves the cursor to an item he/she wants to manipulate and presses a button on the Senseboard to stop the cursor from moving as well as save the coordinates. Then while pressing the button again, the user signs/gestures the command i.e. play, which is recognized by the thesis MRProgram as a valid gesture and interpreted into a play command. The message ‘PLAY’ is sent through the socket to the DSPrototype, which reads it from the socket event queue and executes the command hence playing the item over which the cursor was standing at the time the coordinates were saved.


36

9 Evaluation In the field of HCI, empirical studies were conducted through user studies to ascertain and evaluate the success and usability of the Senseboard, the commands that is if using sign language and gestures is a good way of interacting with computers, the MRProgram, the interactive system, as well as user experience and attitude in terms of satisfaction, towards all these. The evaluation technique I used is the user participation technique [43], which studies the actual use of the interactive system and includes using observation techniques to gather information by observing users interacting with it, and query techniques based on asking the user about the system directly so as to get their detailed view on needs, preferences, impressions, experience, requirements and tasks. This was done on a working prototype implementation to gather qualitative data that resulted in improvements made to the interaction after each evaluation of a test subject as well as ascertain if the interactive system was useful, usable, used and provides feedback on performance.

9.1 Users Initially there was supposed to be a more extensive user study of a minimum of 20 users/test subjects between the ages of 13 and 70. However only 5 were used because it was more difficult than was first thought to find the time and place to conduct the evaluations. The users were 2 men and 3 women between the ages of 20 and 65 who did not require specific knowledge of Bluetooth or the use of a computer, although knowledge in how to handle a computer, that is, using it at least every other day, makes for an easy introduction on how to use the Senseboard. Regarding the types of users [43], 2 were expert users proficient in the use of computers, one was a mandatory user who has to use the system as part of their job and the rest were discretionary users who do not have to use the computer system as part of their job. Testing on subjects between 20 and 65 may have influenced the results since those that are younger tend to have more computer use and experience.

9.2 Studies The evaluation tests took place in the laboratory [43] whereby the users were taken out of their normal work environment to take part in controlled experiments. They were carried out in a quiet room in school as well as at home in order to observe the users in an interruption-free environment to get their individual responses without outside influence. Query techniques have the advantage of the possibility of revealing issues not previously considered and to gain qualitative information the query techniques I used were debrief interviews carried out while the users were experiencing the interactive system, and a questionnaire after the experience. The evaluation was a solo user experiment whereby one user at a time was interviewed in front of a computer screen. I also wanted to be able to address problems that could arise during the interaction as well as get feedback on what problems I may not have anticipated. Beginning the experiment, the users received a code of conduct [43] that involved a brief oral introduction so they understood the purpose of the experiment, what is expected in terms of tasks to carry out to grasp the system, and that the system was being tested and not them. They were given further instructions on the sign/gesture commands used as well as shown how to sign them using the Senseboard. Then they were given the freedom to get acquainted with the system and test the commands for themselves. The observation techniques whereby I watched and recorded user actions, emotional expression and reactions include the think aloud method which allows for verbal communication whereby the users were asked to think aloud while they carried out activities with the system, and the protocol method [5] of paper and pencil that allowed me to take manual notes on interpretations and events as they occurred, as well as computer logging where the system automatically records user actions and reports what the user is doing on the system, in this case the different values from individual user signs/gestures were recorded in order to update the HMMs and improve gesture recognition. Think aloud involves the user asking about usability problems, satisfaction, bugs, impressions and the different aspects of the system, and the evaluator asking the user questions or clarifying problems as they occur to maximized the effectiveness for identifying problem areas. I asked them to comment spontaneously loudly and honestly about what they thought about the Senseboard, how it can be used, how it felt and looked, anything positive or negative, comments about the commands and execution of them as well as the interactive system as a whole. I carefully noted their answers and reactions so as to evaluate them later.


37

The conducting of lab studies instead of field studies (conducted in the user’s work environment) may have resulted in recording situations that may never arise in the real world as well as missing observations because of the unnatural situation. Although observation is a good way of finding out how tasks are done and seeing what the problems are, it’s important that the observer does not interfere with the workflow because people being watched can act in a different way [43]. With computer logging huge quantities of data generated were a bit tedious and time consuming to deal with, and using the paper and pencil, getting details was limited by the writing speed. With the think aloud method, valuable insight was gained about how the users operated the system and their strategies for carrying out tasks though some did require encouragement through questions and comments. However by using a combination of different methods and techniques they complemented one another.

9.2.1 Questionnaire A questionnaire is a method of querying the user with fixed questions in advance likely to be less probing. The questionnaire was self-administered [43] where the users read and filled in the answers without assistance although this can results in poor control over the answers. It was done afterwards so that the test subjects were more aware of the thesis system, the use of the Senseboard and had a general picture of the system as a whole. The styles of questions [5] on the questionnaire were general questions that helped establish the background of the user in terms of age, gender and occupation, as well as open-ended questions that ask the user to freely provide his/her own unprompted opinion on a question by i.e. making suggestions on improvements. The users were given questionnaires, consisting of 11 questions, to fill out with instructions that only reinforced that I needed anything positive or negative they can think of or possibly imagine about the thesis system because this would help me to make improvements. I tried to make the questionnaire as self-explanatory as possible so as not to influence the outcome of it in case the test subjects should ask for more explanations but I was nearby to answer any questions that could arise. I felt that the questionnaire was an excellent way of getting out information about what to revise and redo. On one hand as an afterthought, I should have made some initial attempts with the questionnaire first during the research study, so as to make some changes in order to receive optimum information but on the other hand I obtained rather precise information on it after the test subjects had used the Senseboard and experienced the interaction. Open-ended questions can produce too much data not easily analyzed because of the diversity but they however helped in identifying errors or made suggestions I had not considered.

9.3 Compilation of User Responses The answers given in the interviews as well as in the questionnaires were analysed and processed to draw relevant qualitative conclusions. The common thing among all was the positivism towards using the Senseboard in order to execute commands that could be seen and experienced directly, as well as excitement as to how it works and the opportunity to use it. The most common comments were “Wow, I can sit in my bed or sofa and control anything at home” which is less clutter, “This is good because am lazy sometimes” and “This is fun and cool for music and games” as well as “This is useful for the deaf too”. The majority felt that it definitely was of use if developed further and that it could be used in anything from teaching to studying to playing music etc. The signs were easy to understand and remember as well as relate to in regards to what they were to do. The greatest disadvantage was the irritation in having to repeat a sign many times before it was executed and difficulties in moving the cursor even though gesture recognition was improved and updated after each subject evaluation. This helped immensely in trying to correct bugs and decrease faults as well as improving gesture recognition in the system and the differences and difficulties experienced by the first subject were less than that of the last.


38

10 Results The results of the thesis were more or less realized in accordance to the specifications and aims given in the beginning. The first part of the project, research, with a few minor adjustments turned out as expected in form of weekly reports. There was so much information to sift through though and the problem became how to minimize such overflow. There was a problem in not being able to meet a deaf person so as to gain more insight into sign language but the library helped. The problem of learning the Mac OS X system and environment as well as finding applications and programming languages that were compatible took time but the knowledge gained was priceless. The application program MRProgram had a rough start in implementing functions that work but that was later rectified even though much time was lost. A simpler version on the HMM algorithm was used for gesture recognition and the information gathered for the use of all of the AI algorithms was priceless in order to have some understanding of what can be used and developed further in the future. Socket communication also took some time figuring out but works well with the kind of information sent at a fast pace. However there was no time to deepen knowledge as to the deeper workings of DSPrototype and so that part of the project was mainly superficial in order to get the commands working. Unfortunately there was no time to implement the rest of the signs and gestures but 5 namely: Cursor movement, Play, Stop, Zoom in and Zoom out. Using the user participation technique for evaluation, user studies were carried out in a laboratory on a working prototype implementation to gather qualitative data that resulted in improvements and re-evaluation of the interactive system for better gesture recognition meaning in order for the commands to be executed quickly without too many repetitions. The use of test subjects helped in narrowing down the differences in how different people sign, which in turn improved the gesture recognition. The users deemed the interactive system useful since it accomplishes what it is required namely gesture recognition; usable with further development in order to sign/ gesture as well as move the cursor easily and naturally without danger of too much error; used since they could envision many uses for i.e. the Senseboard and MRProgram for teaching the deaf or controlling their environment and found the system engaging and fun, and finally it provides feedback on performance by executing commands such as play, stop, zoom in and zoom out. The results were encouraging in updating and further improving the gesture recognition in the system. The final result of this thesis is a multi-modal interactive prototype system run in the Mac OS X operating system environment whereby the program MRProgram connects via Bluetooth, a Bluetooth module and device drivers to the Bluetooth HID Senseboard, connects to the Siblings navigation tool DSPrototype via sockets, performs gesture recognition via signal processing then AI algorithm application on the input signals from the Senseboard, converts them into commands and sends them to the DSPrototype where they are executed. This works well even though it still takes time and few tries for a command to be interpreted correctly. Unfortunately learning of gestures is done offline but that can also be rectified with further development. Through qualitative evaluations I found strong indications that the approach of using the Senseboard and MRProgram can be useful in teaching, playing music etc.


39

11 Conclusions The company Senseboard Technologies AB wanted to examine how their Senseboard, in place of the keyboard and mouse can be used in the same way, to control a the navigation tool DSPrototype via Bluetooth. This was to be realized by developing a software prototype system, which includes the Senseboard used to send sign/gesture commands as data signals, a program that connects through Bluetooth to the Senseboard, through sockets to the DSPrototype, as well as performs gesture recognition, and lastly the DSPrototype that executes the commands. The user sees the DSPrototype as information content presented on an infinitely large two-dimensional surface and by using the Senseboard, he/she can manipulate it by using commands such as play and stop. Hence the purpose of this thesis, Minority Report System, Gesture Recognition with Senseboard for Siblings Data Surface Interface Paradigm Prototype (DSPrototype), was to develop this software prototype system that interacts with the user by recognizing gestures from earlier input data through the system’s prototype program, which was also to furthermore update its understanding of gestures it already knows in an online interactive manner. This thesis report describes how a functioning multi-modal interactive software prototype system was developed, used and can be applied. The system uses the program, Minority Report Program (MRProgram), to connect to the Senseboard, perform gesture recognition on the input data signals, and send commands through socket connection to the DSPrototype that executes them. In order to find a way to solve the questions stated in the problem formulation of this thesis and realize its purpose, the work had to be split into different categories and different methods were used:

♦ Conceptual studies were conducted that contain related work and relevant theories in gesture control, sign language and looking at what has been done earlier in the fields of Human Hand Gesture Research and Sign Language Recognition.

♦ A research survey was in order to understand different parts, subjects and tools of the thesis as well as the relevance and importance in having a general knowledge into the background of: The Mac OS X environment in which the thesis system was to operate, Sockets used for

communication between the MRProgram and the DSPrototype, Bluetooth technology and Bluetooth on Mac OS X in connections between the Bluetooth HIDs as well as frameworks and applications used when implementing the MRProgram, Open Graphics Library (OpenGL) used in combination with Simple DirectMedia Layer (SDL) to build the DSPrototype, the Senseboard used to send commands to manipulate the DSPrototype, the DSPrototype to later make changes to it that would effect the commands, Sign-, body language and gesture study to analyse what signs and frequently used gestures were needed for the sign/gesture commands, Signal processing for the data signals sent as strings by the Senseboard, Artificial Intelligence (AI) algorithms relevant to this thesis that can be used to perform gesture recognition, and finally Human-Computer Interaction (HCI) in order to ascertain success and usability of the thesis interactive system.

♦ Based on the research survey, the following objectives were realised: Determining the signs/gestures that were used to manipulate the DSPrototype, which resulted

in the sign/gesture commands: play, stop, zoom in, zoom out and cursor manipulation. Conduction of technological studies by designing and implementing the MRProgram, and

making changes to the DSPrototype to execute commands. ♦ In the field of HCI, empirical studies were conducted through user studies to ascertain the usability of

the Senseboard with the MRProgram and the system, user experience and attitude as well as if using sign language and gestures is a good way of interacting with computers.

Using this information and knowledge gained from the research, the questions posed in the problem formulation chapter are answered as seen below:

♦ How do the Senseboard and DSPrototype work? The Senseboard is a two-part device whereby both devices are identical and are placed on the hands but only one was needed for this thesis. It connects to the computer through Bluetooth technology, via the Bluetooth module, and sends data signals with z, x and y coordinates as text strings in hex. The DSPrototype is a navigation tool seen by the user as information content presented on an infinitely large two-dimensional surface manipulated through commands by keyboard and mouse and later by the Senseboard through sign/gesture commands.

♦ How does the Senseboard connect to the Apple PowerBook computer and access the


40

information sent from the former? Bluetooth is a wireless short-range radio technology for connecting mobile devices and since the Apple PowerBook computer runs Mac OS X operating system, there is Apple Bluetooth support integrated into it that transparently performs Bluetooth connection–oriented tasks. Via a Bluetooth module and device drivers, the computer establishes a connection to the Senseboard whereby a trusted pair is formed between them. The MRProgram receives information from the Senseboard through this connection.

♦ How do we determine what sign/gesture commands are to be used with the Senseboard? Using the information gained in the research of the sign-, body language and gesture study, 5 signs/gestures were determined as commands namely: play, stop, zoom in, zoom out and cursor manipulation.

♦ How is gesture recognition and learning of gestures performed? A simpler version of the Hidden Markov Model (HMM) AI algorithm was used, involving the use of the first character in the string as well as intervals to establish different commands. Signal processing which involves extracting and performing operations on the relevant information in a string, is first done on the input data signals, and then it is determined which of a set of HMMs, each modelling a different gesture, is most likely to have generated that sequence. The learning of gestures is done offline by adjusting and updating the HMMs manually with every user to further better recognition.

♦ How does socket connection between the Minority Report Program and the DSPrototype work? Sockets are a method for virtual communication between a client program and a server program in a network or on the same computer. Therefore through sockets, information such as commands to manipulate the DSPrototype, are passed between the thesis program that is the server program and the DSPrototype program. A string of four characters such as “PLAY”, “STOP”, “ZMIN”, “ZOUT”, is sent through the socket connection to the DSPrototype for execution.

♦ How do we modify the DSPrototype so as to execute the commands? In order for the DSPrototype to receive information or the commands sent by the MRProgram, a client socket is created whereby incoming data can be read. This data is interpreted and results in the execution of actions such as play and stop.

♦ How will users respond to the use of the Senseboard, the sign/gesture commands as well as the system? The user responses were mainly positive towards using the Senseboard and in regards to understandable sign/gesture commands. They also felt that for further use there should be development in areas such as cursor manipulation and sign/gesture commands so as to decrease the irritation in having to repeat a sign a few times before it is executed, and easier manipulation of the cursor. Overall they were excited as to how it works and will work when developed further since they found it fun had ideas as to how to use it.

The users found the visual representation appealing and took delight when a command was seen executed. Sign/gesture information can be read from the Senseboard and commands sent to the DSPrototype are executed without delay, although the gestures have to be repeated a number of times before they are executed, and cursor manipulation is slow because of the way the Senseboard is handled in order to move the cursor. Therefore more research and work is needed in gesture recognition and learning in terms of signal processing and AI algorithms with people experienced in these fields for accurate reading and manipulation of the data as well as the cursor. This would make for easier and faster use of the Senseboard. Though the sets of commands implemented were minimized and testing was done with fewer test subjects than initially suggested, the idea to use signs/gestures without the use of a keyboard or mouse to control the DSPrototype, was accomplished. Test subjects helped in narrowing down the differences in how different people sign and also in re-evaluating some of the implementation in order for the commands to be executed quicker. What is needed is more test subject evaluations for precision in gesture recognition so as the AI algorithms can learn and interpret the sign/gesture commands quicker and also in an online interactive manner, and for feedback on what signs/gestures i.e. can be used, are logical etc.


41

11.1 Future works This Master’s thesis project covers a broad range of topics such as Operating Systems, Mac OS X, Device drivers, Computer Graphics, OpenGL, SDL, Bluetooth, Senseboard, Sign- and body language, Gestures, Signal processing, Artificial Intelligence, Gesture Recognition, Human-Computer Interaction etc and there is great potential here for further development in all these fields. There are many ways of developing the use of the Senseboard i.e. controlling other programs or devices, adding more sign/gesture commands, precision in gesture recognition, using the Senseboard for drawing purposes, presentations, as a virtual keyboard or mouse, for the handicapped etc. Since there is an edge to it, there are no limitations except imagination.


42

12 References All links were accessed and checked to be valid on 30th April 2007.

12.1 Literature: [5] Abowd Gregory D., Beale Russell, Dix Alan, Finlay Janet, (2004), “Human-Computer Interaction Third Edition”, Pearson Education Limited, England. [21] Johnson Micheal K., Troan Erik W., “Linux Application Development”, (1998), Addison Wesley Longman, Chapter 16. [36] Dr. Ennis-Cole, Dr. Mortensen Mark, Parton Becky Sue, (Year ????), “Signs of the Time: A Review of the Impact of Artificial Intelligence on Sign Language and Deaf Education Past, Present and Future” (Publisher name and address ????). [37] Bergman Brita, (Year???????), ”Tecknad Svenska”, TUFF – Teckenspråksutbildning för föräldrar, Utbildningsdepartementet (Publisher name and address ????). [39] Ifeachor C. Emmanuel, Jervis W. Barrie, (1993), “Digital signal processing: a practical approach”, Addison- Wesley Publishing Company. [43] Faulkner Xristine, “Usability Engineering”, (2000), Grassroots Series, Macmillan Press Ltd.

12.2 White papers: [2] Larsson Thomas, Lindell Rikard, (2005), “The Data Surface Interface Paradigm”, in proceedings of TPCG ’05, Eurographics Association, Vol.27, No.3, pp 289-350 [6] Lee C., Yangsheng Xu, “Online, interactive learning of gestures for human/robot interfaces”, (1996), Robotics and Automation, Proceedings., 1996 IEEE International Conference on Volume 4, pp 2982 – 2987. [7] Meade Alexander, “Dexter--A finger-spelling hand for the deaf-blind”,(1987), International Business Machines Corporation, Information, Products Division, Charlotte, NC Robotics and Automation. Proceedings. IEEE International Conference in Volume 4, pp 1192 – 1195. [8] Jaffe David L., “RALPH, a fourth generation fingerspelling hand”, (1994), Rehabilitation Research and Development Center, 1994 Report, VA Medical Center, pp 32-32. [9] Kramer James, Leifer Larry J., “A ‘Talking Glove’ for nonverbal deaf individual”, (1990), Technical Report CDR TR 1990 0312, Center For Design Research-Standford University. [10] Huang T., Pavlovic V., Sharma R., “Visual interpretation of hand gestures for human-computer interaction”, (1995), A review, Technical Report, UIUC-BI-AI-RCV-95-10, University of Central Florida. [11] Sims M, Wideman C., “Signing avatars”, (1998), Technology & Persons with Disabilities Conference. [33] Lindell Rikard, (2003), “Users Say: We Do Not Like to Talk to Each Other”, Graphical Communication Workshop ’03, Queen Mary University of London. [34] Lindell Rikard, (2003), “When Information Navigation Divorces File Systems - Database Surface Prototype Results”, The Good, the Bad, and the Irrelevant, The user and future of information and communication technologies, COST Action 269, University of Art and Design Helsinki. [35] Larsson, Thomas, Lindell Rikard, (2005), “The Data Surface Interaction Paradigm”, in proceedings of TPCG ’05, Eurographics Association. [38] Kadous M. Waleed, (1996), “Machine recognition of Auslan signs using PowerGloves”, Towards large-lexicon recognition of sign language, Proceedings of the Workshop on the Integration of Gesture in Language and Speech, Wilmington, DE, pp. 165–174. [40] Warakagoda N. D., (1996), “A hybrid ANN-HMM ASR system with NN based adaptive preprocessing”, Master’s thesis report, Institutt for Teleteknikk, Transmisjonsteknikk, http://jedlik.phy.bme.hu/~gerjanos/HMM/hoved.html . [42] Dittenbach M., Merkl D., Rauber A., (2000), “The Growing Hierarchical Self-Organizing Map”, In: Proceedings of the International Joint Conference on Neural Networks 2000 (IJCNN'2000), pp 24 – 27, Como, Italy.


43

12.3 Links: [1] Senseboard Technologies AB: http://www.senseboard.se [3] The Siblings project: http://www.mrtc.mdh.se/index.phtml?choice=publications&year=any&project=0035 [4] Sony Ericsson Mobile Communications AB: http://www.sonyericsson.com/spg.jsp?cc=se&lc=sv&ver=4002&template=ph1&zone=ph [12] Research study at the Florida school for the Deaf and Blind: http://jdsde.oxfordjournals.org/cgi/content/full/11/1/94. [13] FreeBSD: http://www.freebsd.org/. [14] Mac OS X Architecture: http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html; Mac OS X Architectural Overview -> A Layered Look at the Mac OS X Architecture. [15] Core OS: http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html; System-Level Technologies -> Core OS. [16] Application-Level Technologies: http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html -> Application-Level Technologies. [17] Core Foundation: http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html; Application-Level Technologies -> Core Foundation. [18] Graphics, Imaging and Multimedia: http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html -> System-Level Technologies -> Graphics, Imaging and Multimedia. [19] Application Environments: http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html -> Software Development Overview -> Application Environments. [20] User Experience: http://developer.apple.com/documentation/MacOSX/Conceptual/OSX_Technology_Overview/index.html; -> System-Level Technologies -> User Experience. [22] Ericsson Company: http://www.ericsson.com/about/compfacts/history/index.shtml. [23] Bluetooth SIG: http://www.Bluetooth.com/about/. [24] http://www.ericsson.com/technology/tech_articles/Bluetooth.shtml. [25] Bluetooth overview: http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; Bluetooth Technology Basics -> Bluetooth Overview. [26] How Bluetooth works: http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; Bluetooth Technology Basics -> How Bluetooth Works. [27] Bluetooth on Mac OS X: http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; - > Bluetooth on Mac OS X. [28] Mac OS X Bluetooth protocol stack: http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; -> Bluetooth on Mac OS X -> The Mac OS X Bluetooth Protocol Stack. [29] Mac OS X Bluetooth profiles and applications: http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; -> Bluetooth on Mac OS X -> The Mac OS X Bluetooth Profiles and Applications. [30] Mac OS X Bluetooth API Overview: http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; -> Bluetooth on Mac OS X -> The Mac OS X Bluetooth API Overview– Two Frameworks. [31] Bluetooth classes in Mac OS X: http://developer.apple.com/documentation/DeviceDrivers/Conceptual/Bluetooth/index.html; -> Bluetooth on Mac OS X -> The Bluetooth Classes. [32] SDL: http://www.libsdl.org/. [41] Self-organizing maps: http://davis.wpi.edu/~matt/courses/soms/, Tom Germano, March 23, 1999.


44

13 Appendices

13.1 User Questionnaire – Minority Report

1. Gender, Age and Occupation? 2. Are you a frequent user of the computer? 3. Did you know anything about Bluetooth before? 4. What do you think is advantageous about the use of the Senseboard? Why? 5. What do you think is disadvantageous about the use of the Senseboard? Why? 6. Do you think its useful? Why and for what or Why not? 7. Would you use it yourself? Why and for what or Why not? 8. What do you think about the different functionalities i.e. play, stop, zoom etc? 9. What do you think about the signs i.e. easy, hard, useful, understandable etc? 10. Are there other signs you can think of that would suit better?

11. Anything else i.e. suggestions about changes, areas of use?


45

13.2 Feedback from Questionnaire Question Test Subjects

1. Female, 33, Receptionist Female, 28, Student 2. Yes Yes 3. Not at all Yes, somewhat 4. Perfect for teaching purposes i.e. zooming in for

more information so you don’t click around Being able to control appliances without being in sight of them i.e. computer

5. Learning different signs if they become many and now moving the cursor or signing properly

Might be too many signs, signing takes time and the cursor but I guess it can be corrected

6. Yes it is as a remote control when controlling sound, the computer etc because you don’t have to point at them or be near them

Yes it is well when doing something on the computer without actually having to be infront of it

7 Yes I would as above and also zooming information. I work at a hotel and it would help when lots to do with guests while you have to use the internet as well

Yes doing presentations at school, or looking for information on the computer that is in hierarchals or writing information without using the keyboard, just commands

8. They are fun to play with and show what you can do with the computer. I liked zooming the best

They give an idea as to how you can put in more. They are practical and useful when using a computer

9. They are hard to sign especially left handed as now since am right handed and also so the computer recognizes them but are easy to do and remember. They should be used for both hands too

They are easy to understand, sign and remember but it takes long for the computer to recognize them. Oh they do make sense too

10. Not at the moment no No I don’t think so but it would be fun with more signs

11. This is for frequent users of the computer especially looking for information but well also those that love to do music. The Senseboard has to be used frequently in order to remember signs but also so as to get used to signing in a way the computer remembers. But if the price was affordable, why not I would buy it. Would be useful in schools

Well less time in signing the same sign. One for each hand. Could be useful in schools like I said for presentation, looking for information or home actually playing music, drawing pictures etc

minority report system - mdh · master’s thesis, 20p, d-level minority report system ... patrik...

Documents