leanmes: human-machine interaction review - tut · pdf fileleanmes: human-machine interaction...

LeanMES:

Human-Machine Interaction Review

Theory and Technologies

Eeva Järvenpää & Changizi Alireza,

Tampere University of Technology

Date: 15.5.2015

MANU LeanMES Project Documentation

2

Document Information

Document number T3.3.

Document title Human-Machine Interaction – Theory and Technologies

Delivery date M22

Main Author(s) Eeva Järvenpää & Changizi Alireza, Tampere University of Technology

Participants Minna Lanz & Ville Toivonen, Tampere University of Technology

Main Task T3.3: Novel Human-Machine Interaction

Task Leader Harri Nieminen, Fastems

Publicity level PU = Public

Version V1.2

Revision History

Revision Date Author Organization Description

V1.1 1.4.2015 Eeva Järvenpää, Changizi Alireza

TUT Version for LeanMES consortium commenting

V.1.2 15.5.2015 Eeva Järvenpää TUT FINAL


3

Executive Summary

The transformation towards digital manufacturing is going on. Manufacturing IT-

systems allow real time data to be collected from the factory floor and displayed to those

who need it, when they need it. However, the human factor plays an important role in

manufacturing as the involvement of human causes uncertainty to the process.

Therefore, specific attention should be placed on human-friendly user interfaces, to

improve productivity and reliability of data, and make the workplaces more attractive

for future generations.

The purpose of this report is twofold. First of all, it intends to give an introduction to

human aspects that affect to the design of technical systems and especially their user

interfaces (UI) (Section 2), and to give guidelines for user-centric, human-friendly

interface design (Section 3). This theoretical part of the report is not targeted to any

specific user interfaces, but are general, and can be applied to any type of user

interfaces, e.g. for designing UIs of Manufacturing Operations Management Systems

(MOMS). Secondly, the report will review different existing and emerging human-

machine interaction technologies and give examples of their applications in industrial

contexts in Section 4. The categories of discussed technologies include: 1) Direct and

indirect input devices, which are used to transfer the user commands to the machine; 2)

Mobile interfaces and remote sensors, such as tablets, smart phones, smart watches, and

sensors used to collect data form user activities; 3) Virtual and augmented reality, which

refers to mixing the virtual and real world together; 4) Gesture and speech control,

which are used to control the system by body motions and voice commands.

From human-perspective, whether a system can be described being usable or not

depends on four factors, namely anthropometrics, behavior, cognition and social factors.

Anthropometrics refers to the physical characteristics, such as body type and size, of the

intended users. Behavior refers to the perceptual and motivational characteristics of

users, looking at what people can perceive and why they do what they do. Behavioral

characteristics are mostly related to the sensation with the basic senses (sight, hearing,

touch, smell and taste) and interpretation of the sensed stimuli. Cognitive factors include

learning, attention and memory and other aspects of cognition that influence on how

users think and what they know and what knowledge they can acquire. Social factors

consider how groups of users behave, and how to support them through design. (Ritter

et al. 2014.)

The usability of an user interface always depends on three aspects: 1) The specific user and its characteristics; 2) The task that is being done with the designed HMI; and 3) The

context and environment of use of the designed interface. Therefore no rules for user-

centric design can be given. However, several authors have given guidelines and

heuristic principles for designing user interfaces with good usability. In the following

are listed the most relevant guidelines, collected from (Nielsen 1995; Ritter et al. 2014;

Hedge 2003):

● Usage of terms and language: The system should speak the user’s language and use

words they already know and which are relevant for their context. The interface

should exhibit consistency and standards so that the same terms always mean the

same thing. Consistent use of words strengthens the chances of later successfully

retrieving these words from the memory.


4

● Use recognition rather than recall: Systems that allow users to recognize actions they

want to do will be easier initially to use than those that require users to recall a

command.

● Favour words over icons: Instead of displaying icons, words may be better. This is

because retrieving names from memory is faster than naming objects.

● Information reliability and quality: The user should not be provided with false,

misleading, or incomplete information at any time.

● Show only information which is needed: The system should be esthetic and follow a

minimalist design, i.e. do not clutter up the interface with irrelevant information.

● Provide feedback for the user: The current system status should always be readily

visible to the user.

● Make available actions visible: Make the actions the user can (and should) perform

easier to see and to do.

● Allow flexibility for different users: The system should have flexibility and efficiency

of use across a range of users, e.g. through keyboard short-cuts for advanced users.

● Ensure that critical system conditions are recoverable: The user should have the

control and freedom to undo and redo functions that they mistakenly perform.

While designing user interfaces, three selections need to be made. These include: 1)

Selection of the modality, which refers to the sensory channel that human uses to send

and receive a message (e.g. auditory, visual, touch); 2) Selection of the medium, which

refers to how the message is conveyed to the human (e.g. picture, diagram, video, alarm

sound); and 3) Selection of the technology to deliver the message (e.g. smart phone or

AR glasses). The multimodal interfaces, which use multiple different modalities (and

also media and technologies), are emerging. For example, the augmented reality

interfaces usually utilize multiple modalities, such as vision, speech and touch, and is

built by combining multiple technologies, such as different visual displays, speech

recognition and haptic devices.

Even though the most common UIs, at least in Finnish manufacturing environments, are

still pen and paper, it is believed that the transformation towards digitalization, for

example implementation of MES systems, will open doors for adoption of novel user

interfaces on the factory floor. Adopting new technologies into manufacturing industry

is usually quite slow, but there are signs from recent years that the emerging UI

technologies have tried to find their way into factory floors. This report intends to

introduce the existing and emerging UI technologies that could be used on the factory

floors in the future. By first discussing about the human characteristics important for

the design, and giving the general guidelines for interfaces with good usability, the aim is

to emphasize that while selecting and designing the media and technologies, the human

behavior and cognitive capabilities need to always be considered. The user, task and

context of use will affect to the optimal technology selection.


5

Table of contents

Executive Summary ........................................................................................................... 3

Table of contents ................................................................................................................. 5

1. Introduction ................................................................................................................. 6

2. Human aspects in user-interface design ............................................................ 7

2.1. Introduction to user-centric design ......................................................................... 7

2.2. ABCS Framework for user-centric design .............................................................. 8

2.2.1. Anthropometrics ......................................................................................................................... 8

2.2.2. Behavior .......................................................................................................................................... 8

2.2.3. Cognition......................................................................................................................................... 9

2.2.4. Social cognition and teamwork ...........................................................................................12

2.3. Human actions ............................................................................................................... 12

2.4. Input and output modalities of user interfaces ................................................. 13

2.4.1. Multimodal interfaces .............................................................................................................15

2.4.2. Theoretical principles of user-computer multimodal interaction ........................16

2.5. Adaptive system interfaces ....................................................................................... 17

3. Guidelines for designing user-centric, human-friendly interfaces ........ 18

3.1. User characteristics relevant for system design ............................................... 18

3.2. Task analysis .................................................................................................................. 19

3.3. Heuristic principles for designing interfaces with good usability .............. 20

3.4. System characteristics and cognitive dimensions ............................................ 22

3.5. Design of multimodal interfaces ............................................................................. 23

3.6. Design for Errors ........................................................................................................... 25

3.7. Display Designs .............................................................................................................. 26

3.7.1. Thirteen principles of display design ...............................................................................26

3.7.2. Visual design principles for good design .........................................................................28

4. Human-machine interaction technologies ..................................................... 32

4.1. Direct and indirect input devices ............................................................................ 33

4.2. Mobile Interfaces and Remote Sensors ................................................................. 35

4.2.1. Mobile Device and Remote Sensor Technologies ........................................................36

4.2.2. Mobile Devices and Remote Sensors Applications......................................................40

4.3. Virtual and Augmented Reality ................................................................................ 43

4.3.1. Technologies for Augmented Reality ................................................................................43

4.3.2. AR application examples ........................................................................................................48

4.4. Gesture and Speech Control ...................................................................................... 52

4.4.1. Technologies for Gesture and Speech control ...............................................................52

4.4.2 Gesture and Speech Control Application Examples .....................................................57

5. Conclusions ................................................................................................................ 60

References .......................................................................................................................... 61


6

1. Introduction

Human factors play a crucial role in the production environment. The desire towards

more agile and responsive manufacturing requires that the real time information of the

production status is always visible for those who might need it. This, in turn, requires

that the information is, on one hand collected from the production processes, and on the

other hand displayed to the workers in a human-friendly way. As noticed during the

interviews conducted in the 1st period of LeanMES-project (Järvenpää et al. 2015), the

contribution of human causes uncertainty to the process. This problem was especially

visible in the information inputting and searching. The current manual practices in

information inputting, e.g. re-typing information from paper documents to IT-systems,

don’t allow real time transparency to the operations, neither provide reliable data. As

the transformation towards digital manufacturing is finally starting in many companies,

the information previously provided by paper documents to the factory floor operator

(e.g. job lists and work instructions), could now be displayed by multitude of different UI

technologies in a digital, easily editable format. Same applies to the information

collection from the factory floor.

In order to mitigate the problems relating to human perceptual and cognitive

capabilities, as well as behavior, a special attention should be paid on the design and

selection of good and intuitive user interfaces and interaction technologies. The novel

ways of working on the factory floor should not only improve the efficiency and quality

of operations, but also be pleasurable for the workers. To attract future operators, the

manufacturing sector should target to social sustainability and adopt new UI

technologies in order to be more appealing and accessible for youngsters, who have

grown in a digital world.

The purpose of this report is two fold. First of all, it intends to give an introduction to

human aspects that affect to the design of technical systems and especially their user

interfaces (Section 2), and to give guidelines for user-centric, human-friendly interface

design (Section 3). Secondly, the report will review different existing and emerging

human-machine interaction technologies and give examples of their application in

industrial contexts (Section 4).


7

2. Human aspects in user-interface design

2.1. Introduction to user-centric design

When one reads a book or research article about user-centric (or human-friendly)

design, it is usually highlighted that no generic rules for user-centric design can be

written, because the characteristics of good design depends on the task, context and

users of the designed technology (e.g. Ritter et al. 2014; Smith et al. 2012; Courage et al.

2012). For instance, Ritter et al. (2014) states: “User-centered design is about considering

particular people doing particular tasks in a particular context.” Watzman and Re (2012)

have similar viewpoint: “The most important principle to remember, when thinking about

design, is that there are no rules, only guidelines. Everything is context sensitive. Always

consider and respect the user.”

Based on Courage et al. (2012) the users should be analysed by answering to questions

such as: Who are they? What characteristics relevant to the design do they have? What

do they know about the technology? What do they know about the domain? How

motivated are they? What mental models do they have of the activities the designed product covers? For understanding the task the user is trying to accomplish, the

following questions can be considered: What is the goal of the user? What steps involve

in achieving the goal? How the task is currently done, in which sequence and by which

methods? The analysis on users’ environments or context should clarify the physical

situation in which the tasks occur, technology available to the users, as well as social,

cultural and language considerations. (Courage et al. 2012)

Two terms, often used when discussing about user-centric design are “usability” and

“user experience”. These terms are sometimes mixed, even though their meaning is

different. As stated by Ritter et al. (2014) usability focus on the task related aspects and

getting the job done. On the other hand, user experience focus on user’s feelings,

emotions, values and their immediate and delayed responses. Three factors that

influence usability and user experience are: the system itself; the user and their

characteristics; and the context of use of the technology or systems. From the user’s

perspective, whether a system can be described being usable or not depends on (Ritter

et al. 2014):

● Shape and size of the users (anthropometric factors)

● External body functioning and simple sensory-motor concerns, and motivation

(behavioral factors)

● Internal mental functioning (cognitive factors)

● External mental functioning (social and organizational factors)

As the usability of a system is an inherent requirement for good user experience, this

section will mainly focus on the aspects that affect directly to the usability of an UI.


8

2.2. ABCS Framework for user-centric design

Ritter et al. (2014) presented an ABCS framework in which the design relevant human

characteristics are divided into four categories:

● Anthropometrics (A) - The shape of the body and how it influences what is

designed: consideration of the physical characteristics of intended users such as

what size they are, what muscle strength they have and so on.

● Behavior (B) – Perceptual and motivational characteristics, looking at what

people can perceive and why they do what they do.

● Cognition (C) – Learning, attention, memory, and other aspects of cognition and

how these processes influence design: users defined by how they think and what

they know and what knowledge they can acquire.

● Social factors (S) – How groups of users behave, and how to support them

through design: users defined by where they are – their context broadly defined

including their relationships to other people.

In the following sections, these four categories are discussed in more detail.

2.2.1. Anthropometrics

The physical attributes of the user will affect how they use a particular artifact. The

physical aspects of interaction relate to the posture and load bearing of the human body.

Relating to physical aspects the designer has to consider whether the human can reach

the controls, operate the lever, push the buttons and so on. Supporting correct posture

will affect to the well-being of the user. The load bearing is important to consider

especially when using portable or wearable devices (e.g. phones, tablets and head-

mounted displays). The human has to support the weight of the interface during the

interaction, but normally also during the whole day. (Ritter et al. 2014)

The perception of touch is divided into three types of tactual perception: Tactile,

kinesthetic and haptic perception. The tactile perception is solely mediated by the

change in cutaneous stimulation, i.e. when the skin is stimulated. The kinesthetic

perception is mediated by variations in kinesthetic stimulation, i.e. awareness of static

and dynamic body posture based on information coming from muscles and joints. Haptic

perception involves using information from the cutaneous sense and kinesthesis to

understand and interpret objects and events in the environment. Haptics is the most

common type of tactual perception. For instance most of the common input

technologies, e.g. physical keyboards, touch screens, pointing devices (mouse, trackpad,

tracker balls, etc) use some sort of haptic feedback to inform the user about the

performed actions. (Ritter et al. 2014)

2.2.2. Behavior

Behavioral characteristics are mostly related to the sensation and perception. People

have five basic senses: sight, hearing, touch, smell and taste. Sensation occurs when the

sense organs are stimulated and they generate some form of coding of the stimuli.

Perception occurs when this coded information is further interpreted using knowledge of the current context (physical, physiological, psychological, and so on) to add meaning.


9

The process of perception is subjective. This implies that simply presenting designed

stimuli in such a way that they will be sensed accurately does not necessarily mean that

they will be perceived in the way that the designer intended. (Ritter et al. 2014)

Most user interfaces use vision as the major sense. One of the most useful applications of

vision to interface design is to take advantage of how the eye searches. Certain stimuli

”pop out” from other stimuli and can therefore be used to draw attention to important

things. Ritter et al. (2014) stated that e.g. highlighting, using different color or making

the object to move or blink, makes the objects to “pop out” from others. Colors should be

used to emphasize things that are important. However, as advised by Ritter et al. (2014),

in order to help people with red-green color vision deficiency, redundant information

should be used.

Often it is important to consider how the different sensory modalities can be used

together to provide further information for the user (e.g. difficult conditions, such as

lack of light or persons with impaired vision or hearing). Also, if some elements on a

display, which are visually similar (such as same shape, but slightly different color)

should be processed differently, they should be made distinct by separating one or more

dimensions of the appearance by several JNDs (Just noticeable differences) (e.g. several

shades of color). (Ritter et al. 2014) Further details about the design of visual displays

are discussed in Section 3.7.2.

As discussed above, vision has an important role in most user interfaces. Welsh et al.

(2012) stated that people are more accurate and less variable under conditions in which

they have vision of the environment than when they do not. Furthermore, they said that

ballistic actions, such as keypress don’t require continual source of visual target

information and feedback during the execution, because online corrections can not be

made. On the other hand, for aiming movements, such as pointing a certain icon on the

display, continual and stable source of visual information about the effector and the

target is needed for efficient feedback-based corrections and movement accuracy.

(Welsh et al. 2012) Fitt’s law (Fitts 1954), relating to perceptual-motor interaction, is

often used as a predictive model of time to engage a target. The law indicates that the

time to point to an object is related to the distance from the object and inversely related

to the size of the object. The law implies that larger objects lead to faster pointing times

than smaller objects and shorter distances lead to faster reaction times. (Ritter et al.

2014; Welsh et al. 2012)

In addition to perceptual capabilities, also motivation affects to the behavior or a human.

Szalma et al. (2012) listed three organismic elements that are essential for facilitating

intrinsic motivation for task activity. These needs are competence, autonomy (personal

agency, not independence per se), and relatedness. Ritter et al. (2014) named these

elements as mastery, autonomy and purpose. Based on Szalma et al. (2012) three factors

that support autonomy are as follows: 1) meaningful rationales for doing a task, 2) acknowledgement that the task might not be interesting; 3) an emphasis on choice

rather than control.

2.2.3. Cognition

Cognition refers to mental capabilities of users relating to memory, attention and

learning. As stated by Ritter et al. (2014), user’s cognition is limited, for example the


10

working memory and attentional resources are limited, which affects to how much

information human can process at a time.

Memory

The way people use a system will be greatly influenced by how well they can retrieve

commands and locations of objects from memory. There are different types of memory,

that are used for different purposes (Ritter et al. 2014):

● Short-term memory: Is often used to store lists or sets of items to work with.

For unrelated objects, users can remember around seven meaningful items (+/-

2).

● Long-term memory: Information which is meaningful, and which meaning is

processed at encoding time, is easier to remember.

● Declarative memory: Facts and statements about the world

● Procedural memory: Includes acts, or sequences of steps that describe how to

do particular tasks.

● Implicit memory: Cannot be reported. Most procedural information is implicit

in that the precise details are not reportable. Information gets put into implicit

memory when the user works without a domain theory and learns through trial

and error.

● Explicit memory: Can be reported. Most declarative information is explicit in

that it can be reported. Users can perform tasks more robustly, and because they

can describe how to do the task, they can help others more readily. Users can be

encouraged to store information in explicit memory by helping them develop a

mental model of a task, and by providing them with time to reflect on their

learning.

Ritter et al. (2014) highlighted few mnemonics and aids to memory. For instance,

recognition is useful aid to recalling. Recognition memory is more robust than recall

memory. This implies that “it is easier to recognize something that you have previously

seen than to recall what it was you saw”. Many interfaces take advantage of recognition

memory by putting objects or actions in a place where they can be recognized instead of

requiring the user to recall them. In addition, anomalous or interesting things are better

retrieved from memory, than something which is not drawing the user’s attention in the

first place. (Ritter et al. 2014)

In case of lists, certain things affect how well the information on the lists can be

retrieved (Ritter et al. 2014):

● Primacy – Items appearing at the start of a list are more easily retrieved from the

memory.

● Distinctive items in a list are better retrieved.

● Items in a list that make sense (e.g. MES, ERP) are better retrieved than items

that do not have associations for everybody.

● Recency – Items appearing last in the list are better retrieved.

Attention

According to Ritter et al. (2014) attention refers to “the selective aspects of perception,

which function so that at any instant a user focuses on particular features of the


11

environment to the relative (but not complete) exclusion of others”. Welsh et al. (2012)

listed three important characteristics of attention: 1) attention is selective and allows

only a specific subset of information to enter the limited processing system; 2) the focus

of attention can be shifted from one source of information to another; 3) attention can

be divided such that within certain limitation, one may selectively attend to more than

one source of information at a time.

As discussed by Welsh et al. (2012) shifts of attention that are driven by stimuli are

known as exogenous, or bottom-up, shifts of attention. They are considered to be

automatic in nature and thus, for the most part, are outside of cognitive influences.

Exogenous shifts of attention are typically caused by dynamic change in the

environment, such as the sudden, abrupt appearance (onset) or disappearances (offset)

of a stimulus, a change in the luminance or color of a stimulus, or the abrupt onset of

object motion. Performer-driven, or endogenous, shifts of attention are under complete

voluntary control. This type of shift of attention can be guided by wide variety of stimuli,

such as symbolic cues like arrows, numbers or words. In this way, users can be cued to

locations or objects in the scene with more subtle or permanent information than the

dynamic changes that are required for exogenous shifts. However, the act of interpreting

the cue requires a portion of the limited information-processing capacity. Furthermore,

as stated by Welsh et al. (2012) it seems that “automatic” attentional capture is

dependent on the expectations of the user. Therefore, the designer of the interface has to

consider the perceptual expectations of the user. (Welsh et al. 2012)

Proctor & Vu (2012) stated that many studies have shown that it is easier to perform

two tasks together when they use different stimulus or response modalities, than when

they use the same modalities. Performance is also better when one task is verbal and the

other visuospatial than when they are the same type. According to multiple resource

models, different attentional resources exist for different sensory-motor modalities and

coding domains. (Proctor & Vu 2012) Therefore, dual tasks that use different perceptual

buffers will interfere less with each other. For instance, people can learn to drive and

talk at the same time in normal weather conditions, because driving does not use a lot of

audio cues. (Ritter et al. 2014)

Mental models and learning

Mental models are used to understand systems and to interact with systems. When the

user’s mental models are inaccurate, systems are hard to use. The model the user brings

to the task will influence how they use the system, what strategies they will most likely

employ, and what errors they are likely to make. It is therefore important to design the

system in such a way that the user can develop an accurate mental model of it. (Ritter et

al. 2014)

Mental model can be considered as a representation of some part of the world that can

include the structures of the world (the ontology of the relevant objects), how they interact and how the user can interact with them (Ritter et al. 2014). Payne (2012)

simplified the meaning of mental models into “what users know and believe about the

systems they use”. If the user’s mental model accurately matches the system, the user can

better use the mental model to perform their task, to troubleshoot the system and to

teach others about the task or system (Ritter et al. 2014).


12

The designer of the system must have an accurate mental model of how people will use

it. This requires understanding how people will use it, the tasks they will perform using

the system, and their normal working context. Making the system compliant with the

user’s mental model will almost certainly help reduce the time it takes to perform task,

reduce learning time and improve the acceptability of the system. Good interfaces will

help users to develop appropriate levels of confidence in their representations and

decisions. Often this means providing information to support learning, including

feedback on task performance and also providing information to build a mental model.

It is important to keep the human in the loop. This means keeping the users aware of

what the computer is doing, by providing them with feedback about the system’s state.

They can use this to detect errors, to update their own mental model of how the system

is working and to anticipate when they need to take an action. If users do not get

feedback, their calibration about how well they are doing will be poor to non-existent.

When it is not clear for the user what to do next, problem solving is used. Problem

solving uses mental models and forms a basis for learning. (Ritter et al. 2014)

One important concept, which aids in building the correct mental model of the system,

and therefore easing its usage, is the stimulus-response (S-R) compatibility. This means

that there should be clear and appropriate mappings between the task/action and the

response. It is typically seen as having physical aspects of an interface (e.g. buttons) and

displays match the world that they are representing. For example. buttons calling

elevator to go up, should be upper than the ones to call it to go down. (Welsh et al. 2012;

Ritter et al. 2014)

2.2.4. Social cognition and teamwork

Social processes – how people interact with each other – are important, because they

affect how systems and interfaces are used. Workplace systems are socio-technical

systems, meaning technical systems that are designed for and shaped by people

operating in social contexts. Two especially important social responsibility effects,

presented by Ritter et al. (2014), should be considered. These are diffusion of social

responsibility and pluralistic ignorance. The diffusion of social responsibility indicates

that a person is less likely to take responsibility for an action or inaction when they

think someone else will take the action. For instance this could happen when sending an

email to many and nobody takes the responsibility. Pluralistic ignorance refers to the

fact that people, especially inexperienced, often base their interpretation of a situation

on how other people interpret it. For example, if the other people don’t react to an alarm

sound, the rest will interpret it as “not important” as well. (Ritter et al. 2014)

2.3. Human actions

Based on Welsh et al. (2012) three basic processes can be distinguished in human

information processing: Stimulus identification, which is associated with processes

responsible for the perception of information; Response selection, which pertains to the

translation between stimuli and responses, and Response programming, which is

associated with the organization of the final output. (Welsh et al. 2012) When human

takes an action, it includes several stages. Norman (1988) defined seven stages of user

activities. The process of these stages should be seen as cyclic rather than linear

sequence of activities:


13

● Establish the goal

● Form the intention to take some action

● Specify the action sequence

● Execute the action

● Perceive the system state

● Interpret the system state

● Evaluate the system state with respect to the goals and intentions

Ritter et al. (2014) discussed about the “gulfs of evaluation and execution”, originally

defined by Norman (1988). In evaluation and execution phases the user has to make

mappings between the psychological and physical concepts. In evaluation phase it

means the following: When the user perceives the state of the system, this will be in

terms of physical concepts (usually variables and values) that the user will have to

translate into a form that is compatible with their mental model of how the system

operates. Gap between the physical concepts and the psychological concepts is called

gulf of evaluation. In execution phase the goals and intentions of the user’s

(psychological concepts) need to be translated into physical concepts, which are usually

actions that can be executed in the system. The gap between the goals and intentions,

and the physical actions are called gulf of execution. Interfaces show examples where

details and feedback on the state of the system can be difficult to interpret, and where it

can be difficult to work out what actions are available and how to execute them. In these

cases the gulfs of evaluation and execution are large. (Ritter et al. 2014)

The above mentioned gulfs lead to following implications for design (Ritter et al. 2014):

● Good design involves making sure that information that is crucial to task

evaluation and performance are made clearly visible to the user. What counts as

appropriate information will vary across tasks, and sometimes across users, and

even across context of use.

● Appropriate consideration should be given to:

○ Feedback – helps to reduce the gulf of evaluation because it shows the

effect of performing a particular tasks.

○ Consistency – helps users to help themselves (e.g. by applying knowledge

on other systems (e.g. place of buttons)).

○ Mental models – design should facilitate the development of appropriate

mental models, and support the use of those models by making the

appropriate information visible to users at the right time in the right

place.

● Critical systems should not be ”too easy to use”. Users must pay attention to

what they are doing.

2.4. Input and output modalities of user interfaces

Sutcliffe (2012) described the difference between medium and modality. A message is

conveyed by a medium and received through a modality. A modality is the sensory

channel that human uses to send and receive messages to and from the world,

essentially the senses. Two principal modalities that are used in human-computer

communication are vision and hearing. (Sutcliffe 2012) As the vision modality has been

widely covered in other sections of this report, this section concentrates to mainly to


14

hearing, namely speech and non-speech auditory modalities. Also touch will be shortly

included. Smell and taste are not discussed here, as their use in UIs is not yet common.

Non-speech auditory output refers to auditory stimulus, which is not spoken language,

but e.g. alarm or warning sounds. Hoggan & Brewster (2012) listed the advantages of

non-speech feedback (including also other than auditory feedback, such as touch):

● Vision and hearing are interdependent, they work well together (e.g. “our ears

tell our eyes where to look”).

● Hearing and touch have amodal properties, which relates to space and time and

involve points along continuum (e.g. location), intervals within continuum (e.g.

duration), patterns of intervals (e.g. rhythm), rates of patterns (e.g. tempo), or

changes of rate (e.g. texture gradients).

● Sound has superior temporal resolution.

● Sound and touch reduce the overload from large displays.

● Sound and touch reduce the amount of information needed on the screen.

● Sound reduces demands on visual attention.

● Sound is attention grabbing.

● Touch is subtle and private.

● Spatial resolution of tactile stimuli is high.

● Auditory or tactile form makes computers more usable by visually disabled

people.

On the other hand, Hoggan and Brewster (2012) (originally by (Kramer 1994)), brought

out some disadvantages of non-speech feedback:

● Sound has low resolution. Using sound volume or tactile amplitude, only a very

few different values can be unambiguously presented.

● Presenting absolute data is difficult.

● There is lack of orthogonality, changing one attribute of a sound or tactile cue

may affect the others.

● The auditory feedback (or input) may annoy to other persons nearby.

Hoggan & Brewster (2012) highlighted that nonspeech auditory or tactile feedback is

useful in mobile devices. As the devices are small, there is a very limited amount of

screen space for displaying information. Also, if the users are performing their tasks in

movement, e.g while walking or driving, they cannot devote all of their visual attention to the mobile device. (Hoggan & Brewster, 2012)

Speech is characterised by its transient nature, while graphics are persistent. While a

graphical interface typically stays on the screen until the user performs some action, the

message carried out by speech is immediately gone after it has been said. Listening to

speech taxes user’s short-term memory and if the message is long, something may be

forgotten. Therefore, in general, transience means that speech is not a good medium for

delivering large amounts of information. However, as people can look and listen at the

same time, speech may be good for grabbing attention or for providing an alternate

mechanism for feedback. (Karat et al. 2012)

Speech is also invisible. The lack of visibility makes it difficult to communicate the

functional boundaries of an application to the user. Because there is no visible menu or

other screen elements, it is much more challenging to indicate to the users what actions


15

they may perform and what words and phrases they must say to perform those actions.

It is also problematic when the speaker is not in a private environment, or when there

are other voices in the background that might interfere with the speech recognition.

(Karat et al. 2012)

In the future, the multimodal interfaces are expected to become more common, and

these interfaces will use also other modalities, such as haptic (sense of touch),

kinesthetic (sense of body posture and balance), gustation (taste) and olfaction (smell)

(Oviatt, 2012). Multimodal interfaces will be discussed in the next section.

2.4.1. Multimodal interfaces

Multimodal interfaces are becoming more common in human-machine interaction.

According to Dumas et al. (2009), multimodal systems are computer systems endowed

with multimodal capabilities for human/machine interaction and able to interpret

information from various sensory and communication channels. Multimodal interfaces

process two or more combined user input modes, like speech, pen, touch, manual

gesture, gaze or body movements, in a coordinated manner with multimedia system

output. Compared to unimodal interfaces, multimodal interfaces aim to provide a more

“human” way to interact with computer by using richer and more natural ways of

communications, such as speech, gestures and other modalities, and more generally all

the five senses. However, it has to be noted that the terms “natural interaction” and

“natural UI” are often used, when talking about new UIs. Hinckley & Wigdor (2012) gave

an operational definition for a natural UI: “the experience of using a system matches

expectations such that it is always clear to the user how to proceed and only a few steps

(with minimum of physical and cognitive effort) are required to complete common tasks.”

Therefore, one cannot state that one interaction technology would be more natural than

another. It is always dependent on the task that is supposed to be performed with the

technology.

It has been proved by Oviatt (1997) that compared to unimodal interfaces, multimodal

interfaces can improve the error handling and reliability, provide greater expressive

power, and provide improved support for users’ preferred interaction style. The

multimodal interfaces can support broad range of users and context of use, since the

availability of multiple modalities supports flexibility. For example, the same user may

benefit of the speech input in quiet conditions when the hands are occupied, while in

noisy environment e.g. touch input may be more efficient. Flexible personalization of the

interaction mode, based on user and context, is especially useful for people with

impaired vision, hearing or moving abilities. (Dumas et al. 2009)

According to Dumas et al. (2009) the findings in cognitive psychology indicate that

humans are able to process modalities partially independently and, thus, presenting

information with multiple modalities increases human working memory. Therefore,

increasing effective working memory by presenting information in a dual-mode form,

rather than a purely visual one, could expand human processing capabilities.


16

2.4.2. Theoretical principles of user-computer multimodal interaction

When human interacts with a machine, his/her communication can be divided into four

different states (see Figure 1). These are: Decision, Action, Perception and

Interpretation. The machine has similar four states. In the decision state the

communication message content is prepared consciously for an intention, or

unconsciously for attentional content or emotions. After that, in the action state, the

communication means to transmit the message (e.g. speech or gesture), are selected.

When the human communicates his/her message, in perception state, the machine uses

one or multiple sensors to grasp the most information from a user. During the

interpretation state, the system tries to give a meaning to the different information

collected in the previous state. In the computational state, action is taken following the

business logic and dialogue manager rules defined by the developer. In the action state

the machine generates the answer based on the meaning extracted in the interpretation

state. A fission engine determines the most relevant modalities to return the message,

depending on the context of use and the profile of the user. (Dumas et al. 2009)

Figure 1. A representation of multimodal man-machine interaction loop (Dumas et al. 2009).

Oviatt (2012) highlighted that commercially available multimodal interfaces primarily have been developed for mobile use, including cell phones, small PDA handhelds, and

new digital pens. The commercial solutions have avoided co-processing and interpreting

the linguistic meaning of two or more natural input streams. In this regard, they lag

substantially behind far more powerful research-level prototypes, and have yet to reach

their most valuable commercial potential. In some cases, these systems simply have

emphasized capture and reuse of synchronized human communication signals (e.g.,

verbatim speech, pen ink), rather than interpretation and processing of linguistic

meaning at all. (Oviatt 2012)

As stated by Oviatt (2012) there is a growing interest in designing multimodal interfaces

that incorporate vision-based technologies, such as interpretation of gaze, facial


17

expression, head nodding, gesturing and large body movements. These technologies

unobtrusively or passively monitor user behavior and don’t require explicit user

command to a computer. This contrasts with active input modes, such as speech or pens,

which the user deploys intentionally as a command issued to the system. Although

passive modes may be “attentive” and less obtrusive, active modes generally are more

reliable indicators of user intent. As vision-based technologies mature, one important

future direction will be the development of blended multimodal interfaces that combine

both passive and active modes. (Oviatt 2012).

2.5. Adaptive system interfaces

Jameson & Gajos (2012) defined user-adaptive system as “an interactive system that

adapts its behavior to individual users on the basis of processes of user model

acquisition and application that involve some form of learning, inference, or decision

making”. User-adaptive systems are different than adaptable systems, that offer the user

an opportunity to configure or otherwise influence the system’s longer term behavior,

e.g. by choosing options that determine the appearance of the user interface. Jameson &

Gajos (2012) stated that often a carefully chosen combination of adaptation and

adaptability works the best.

Jameson & Gajos (2012) discussed about suitable functions for adaptive systems:

Supporting system use

● Offering help adaptively, e.g. by suggesting the user the commands he/she could

use next.

● Taking over parts of routine tasks, e.g. sorting or filtering e-mail and scheduling

appointments and meetings. Systems of this sort can actually take over two

types of work from the user: 1) choosing what particular action is to be

performed (e.g. which folder a file should be saved in); and 2) performing the

mechanical steps necessary to execute that action.

● Adapting the interface to individual task and usage, i.e. adapting the

presentation and organization of the interface so that it fits better with the user’s

task and usage patterns.

● Adapting the interface to individual abilities, this is useful not only for people

with impairments, but also different environmental factors, such as temperature

may temporarily impair a person’s dexterity, a low level of illumination will

impact reading speed and ambient noise will affect hearing ability. It would be

good especially with mobile devices to adapt to the momentary effective abilities

of users.

Supporting information acquisition

● Helping users to find information, including support for browsing and query-

based search and spontaneous provision of information. The system can e.g.

suggest news articles based on the user’s previous clicks on other articles.

● Recommending products

● Tailoring information presentation. The properties of users that may be taken

into account in the tailoring of documents include: the user’s degree of interest

in particular topics; the user’s preference or need for particular forms of

information presentation; and the display capabilities of the user’s computing

device.


18

3. Guidelines for designing user-centric, human-friendly

interfaces

As stated in the beginning of the report, there are no universal rules for good user-

centric design. However, as the previous sections showed, in general, the human

behavior and cognitive capabilities are not totally unpredictable, and therefore some

guidelines for good user interface design can be given. The guidelines given in this

chapter are general, and can be applied to any user interface design, including the

planner and operator interfaces to the manufacturing operations management systems.

Watzman & Re (2012) listed audit questions for the usable interfaces (see Table 1). The

audit questions A are meant for figuring out the purpose and context of the usage of the

interface, while the audit questions B are more targeted towards finding the most

efficient way to perform the task that is supposed to be performed by using the designed interface.

Table 1. Audit questions for designing usable interfaces (Watzman & Re, 2012).

Audit questions A

● Who are the product users?

● How will this product be used?

● When will this product be used?

● Why will this product be used?

● Where will this product be used?

● How will the process evolve to support this product as it evolves?

Audit questions B

● What is the most efficient, effective way for a user to accomplish a set of tasks

and move on to the next set of tasks?

● How can the information required for product ease of use be presented most

efficiently and effectively?

● How can the design of this product be done to support ease of use and transition

from task to task as a seamless, transparent and even pleasurable experience?

● What are the technical and organizational limits and constraints?

3.1. User characteristics relevant for system design

The human characteristics relevant for design, were thoroughly covered in section 2 in

general level. Here few relevant characteristics, relating to a specific person who will be

using the system, by (Ritter et al. 2014) are summarized:

● Physical characteristics, limitations and disabilities

● Perceptual abilities, strength, and weaknesses

● Frequency of product use


19

● Past experience with same/similar product

● Activity ”mental set” (the attitude towards and level of motivation you have for

the activity)

● Tolerance for error

● Patience and motivation for learning

● Culture/language/population expectations and norms.

3.2. Task analysis

Task analysis provides a way to describe the users’ task and subtasks, the structure and

hierarchy of these tasks, and the knowledge they already have or need to acquire to

perform the tasks. Prescriptive analyses show how the user should carry out the task

(associated with normative behavior). Descriptive analyses, in contrast, show how users

really carry out the task, and are hence associated with actual behavior. (Ritter et al.

2014) Courage et al. (2012) highlighted that task analysis requires watching, listening to

and talking with users. Other people, such as managers and supervisors, and other

information sources, such as print or online documentation are useful only secondarily for a task analysis. Relying on them may lead to false understanding.

In addition to analyzing the users, their characteristics, expectations and level of

experience, it is crucial to consider also the context and environment where the system

is being used. Sutcliffe (2012) states that it is important to gather information on the

location of the use (office, factory floor, public/private space, and hazardous locations),

pertinent environmental variables (ambient light, noise levels, and temperature), usage

conditions (single user, shared use, broadcast), and expected range of locations

(countries, languages and cultures).

Different task analysis methods include (Courage et al. 2012: Ritter et al. 2014):

● Hierarchical task analysis (HTA)

● Task Analysis Grammar (TAG)

● Cognitive task analysis

● GOMS (Goals, Operations, Methods, and Selection rules)

● The keystroke level model

As stated by Courage et al. (2012) the efficiency-oriented, detailed task analyses, such as

TAG and GOMS have a place in evaluating especially those products for which efficiency

on the order of seconds saved is important.

Courage et al. (2012) listed different types of granularity levels for the task analysis:

● Analysis of a person’s typical day or week

● Job analysis: All the goals and tasks that someone does in a specific role – daily,

monthly, or over longer periods

● Workflow analysis: Process analysis, cross-user analysis, how work moves from

person to person

● High-level task analysis: The work needed to accomplish a large goal broken

down into sub-goals and major tasks.

● Procedural analysis: The specific steps and decisions the user takes to

accomplish a task.


20

For presenting the data of the task analysis, several method can be applied, such as

affinity diagrams, artifacts, flow diagrams, personas, scenarios, sequence diagram, user

need tables and user/task matrix. The user/task matrix becomes a major input to a

communication plan – to answer the question of what tasks to include in documentation

for people in different roles. (Courage et al. 2012)

As a result of task analysis, function allocation can be carried out. Function allocation is

done to identify the list of functions that the system (including both the human and the

machine) has to perform. These functions can then be allocated to either human or

machine, e.g. based on Fitt’s list, which is also referred to as the MABA-MABA (Men are

better at, Machines are better at) approach. However, as Ritter et al. brought out, the

designers often allocate all the tasks that they know how to automate, to the technology,

and leave the human to carry out all the others. (Ritter et al. 2014) This may not lead to

task allocation, which would optimize the capability utilization of both human and

machine. Also, if it is e.g. important for the user to learn about the task in order to be

able to take control of it in case of machine failure, it may not be wise to automate the

task completely, as it doesn’t facilitate learning.

3.3. Heuristic principles for designing interfaces with good usability

In this section, the heuristic principles for good UI design, presented by multiple

authors, will be discussed.

Nielsen (1995) listed 10 general usability principles, or heuristics, for user interface

design. They are summarized in the following list:

1. The current system status should always be readily visible to the user.

2. There should be a match between the system and the user’s world: the system

should speak the users language.

3. The user should have the control and freedom to undo and redo functions that

they mistakenly perform.

4. The interface should exhibit consistency and standards so that the same terms

always mean the same thing.

5. Errors should be prevented where possible.

6. Use recognition rather than recall in order to minimize mental workload of the

users.

7. The system should have flexibility and efficiency of use across a range of users,

e.g. through keyboard short-cuts for advanced users.

8. The system should be esthetic and follow a minimalist design, i.e. do not clutter

up the interface with irrelevant information.

9. Users should be helped to manage errors: not all errors can be prevented so

make it easier for the users to recognize, diagnose and recover.

10. Help and documentation should be readily available and structured for ease of

use.

Grice’s (1975) maxims of conversation are often used as a guideline for evaluating what

kind of information should be displayed to the user:

● Maxim of quantity - The message should be made as informative as required.

The message should not be more informative than is required


21

● Maxim of quality - Information that is believed to be false or for which there is

no adequate evidence, should not be displayed.

● Maxim of relevance - Only relevant information should be displayed.

● Maxim of manner - Obscurity of expression and ambiguity should be avoided.

The message should be brief (avoid unnecessary prolixity) and orderly.

Implications of human memory to system design by (Ritter et al. 2014):

● Use words that the users know.

● Use the words consistently to strengthen the chances of later successfully

retrieving these words from the memory.

● Instead of displaying icons, words may be better. This is because retrieving

names from memory is faster than naming objects.

● Systems that allow users to recognize actions they want to do will be easier

initially to use than those that require users to recall command. There is a trade

off, however, when the users become experts.

● Once something has been learned and stored to the long-term memory, it takes

some time to un-learn it. Therefore, the user should not be allowed to learn

incorrect knowledge. It takes a long time to correct this error.

Principles for design to avoid exasperating users by (Hedge 2003):

● Clearly define the system goals and identify potential undesirable system stages

● Provide the user with appropriate procedural information at all times

● Do not provide the user with false, misleading, or incomplete information at any

time

● Know the user

● Build redundancy into the system

● Ensure that critical system conditions are recoverable

● Provide multiple possibilities for workarounds

● Ensure that critical systems personnel are fully trained

● Provide system users with all of the necessary tools

The Gulfs of Evaluation and Execution were discussed in section 2.3. In the following list,

the design principles for making these gulfs narrower are discussed (Norman 1988;

Ritter et al. 2014):

1. Use both the knowledge in the world and the knowledge in the head. Provide

information in the environment to help the user determine the system state and

to perform actions, such as explicit displays of system state, and affordances on

the system controls.

2. Simplify the structure of tasks. Require less of the user by automating sub-tasks,

or using displays to describe information without being asked, or provide

common actions more directly. However, do not reduce this below their natural

level of abstraction.

3. Make the relevant objects and feedback on actions visible. Bridge the Gulf of

Evaluation. Make the state of the system easier to interpret.

4. Make the available actions visible. Bridge the Gulf of Execution. Make the actions

the user can (and should) perform easier to see and to do.


22

5. Get the mappings correct from objects to actions. Make the actions that the user

can apply natural.

6. Exploit the power of constraints, both natural and artificial, to support bridging

each Gulf. Make interpretations of the state and of possible actions easier by

removing actions that are not possible in the current state and reducing the

complexity of the display for objects that are not active or available.

7. Design for error. Users will make errors, so you should expect them and be

aware of their effects. Where errors can not be prevented, try to mitigate their

effects. Help the users see errors and provide support for correcting them.

8. When all else fails, standardize. If the user does not know what to do, allow them

to apply their knowledge of existing standards and interfaces.

3.4. System characteristics and cognitive dimensions

Systems are often evaluated based on seven characteristics (Ritter et al. 2014):

functionality, usability, learnability, efficiency, reliability, maintainability, and

utility/usefulness. The usability has been in the focus of this report. Other important

characteristic is efficiency. When designing user interfaces it is important to remember

that maximal efficiency is not always desired. As stated by Ritter et al. (2014) efficiency must be calculated in terms of technical efficiency that matches user efficiency

expectations of the task at hand. For instance, one-click payments in e-markets without

asking the user to review the order before payment, may be too efficient.

Ritter et al. (2014) presented 14 cognitive dimensions. Their goal was to provide a fairly

small, representative set of labeled dimensions that describe critical ways in which

interfaces, systems and environments can vary from the perspective of usability. The

cognitive dimensions help to discuss and compare alternative designs. These

dimensions focus on the cognitive aspects of interfaces and don’t address design trade-

offs related to the other aspects of users – anthropometric, behavioral and social

aspects. Below are listed the cognitive dimensions from Ritter et al. (2014):

1. Hidden dependencies: how visible the relationships between components are?

2. Viscosity: How easy it is to change objects in the interface?

3. Role-expressiveness: How clear the mappings of the objects are to their

functions?

4. Premature commitment: How soon the user has to decide about something?

5. Hard mental operations: How hard are the mental operations to use the

interface?

6. Secondary notation: The ability to add extra semantics

7. Abstraction: How abstract the operations and systems are?

8. Error-proneness susceptibility: How easy it is to err?

9. Consistency: How uniform the system is (in various ways, including action

mapping)?

10. Visibility: Whether required information is accessible without work by the user.

11. Progressive evaluation: Whether the user can stop in the middle of creating

some notation and check what you have done so far.

12. Provisionality: Whether the user can sketch out ideas without being too exact

13. Diffuseness: How verbose the language is?


23

14. Closeness of mapping: How close the representation in the interface (also

called notation) is to the end results being described?

Hidden dependencies are common for instance in the spreadsheets, which show the

user formulae in one direction only, that is, which cells are used to compute the value in

a cell, but not which cells use a given cell’s value. Another example is that applications

other than created them may be dependent on some files, e.g. graphics in reports.

Usually these dependencies are not visible and deleting the dependent files may be

hazardous. Therefore, all dependencies that may be relevant to the user’s task should be represented. (Ritter et al. 2014)

A viscous system is resistant to change. Even small changes can require substantial

effort, for example changing the numbering of every picture (and text referencing)

manually in Word document. Sometimes viscosity can be beneficial, e.g. it encourages

reflective action and explicit learning. When it is easy to make changes, it can lead to

many small, unnecessary changes being made. Viscosity is important especially in safety

critical applications or applications where incorrect action is expensive in time or

money. Viscosity can be implemented e.g. by asking the user to confirm the action “Do

you really want to do this action?”. (Ritter et al. 2014)

Role-expressiveness describes the extent to which a system reveals the goals of the

designer to the user. The purpose of each component of the system is understandable to

the user, e.g. buttons of the interface should be clearly recognizable as buttons that can

be pressed. Classic problem occurs when two similar looking features achieve different

function or when two different looking functions achieve similar effects. (Ritter et al.

2014)

Some mental operations are harder than others. For instance those operations which

are contradicting with the normal mental models, e.g. having to mentally change the size

of an object (which is normally considered as relatively constant aspect of an object) is

more difficult than applying simple rules of behavior. Also mentally rotating objects is

slower with larger objects than with small ones. Hard mental operations are easy to

implement computationally, but troublesome for users. Hard mental operations can be

solved at several levels, including either by avoiding the problem by understanding the

relative difficulty of operations, or by providing tools to assist in these operations.

(Ritter et al. 2014)

3.5. Design of multimodal interfaces

Human cognitive capacity is limited. Sometimes the limited resources may lead to

multimedia usability problems, discussed by Sutcliffe (2012):

● Capacity overflow may happen when too much information is presented in a

short period, swamping the user’s limited working memory and cognitive

processor’s capability to comprehend, chunk, and then memorize or use the

information. The connotation is to give users control over the pace of

information delivery.

● Integration problems arise when the message on two media is different, making

integration in working memory difficult; this leads to the thematic congruence

principle.


24

● Contention problems are caused by conflicting attention between dynamic

media, and when two inputs compete for the same cognitive resources. For

example speech and text require language understanding.

● Comprehension is related to congruence; we understand the world by making

sense of it with our existing long-term memory. Consequently, if multimedia

content is unfamiliar, we cannot make sense of it.

● Multitasking makes further demands on our cognitive processes, so we will

experience difficulty in attending to multimedia input while performing output

tasks.

In task-driven applications, the information requirements are derived from the task

model. In information-provision applications, such as websites with an informative role,

information analysis involves categorization and the architecture generally follows a

hierarchical model. In the third class of explanatory or thematic applications, analysis is

concerned with the story or argument that is, how the information should be explained

or delivered. (Sutcliffe, 2012)

Sutcliffe (2012) presented the classification for information components:

● Physical items relating to tangible observable aspects of the world

● Spatial items relating to geography and location in the world

● Conceptual-abstract information, facts, and concepts related to language

● Static information which does not change: objects, entities, relationships, states,

and attributes

● Dynamic, or time-varying information: events, actions, activities, procedures,

and movements

● Descriptive information, attributes of objects and entities

● Values and numbers

● Causal explanations

Sutcliffe (2012) suggested the following heuristics, collected from multiple sources, for

appropriate media selection:

● To convey detail, use static media, for example, text for language-based content,

diagrams for models, or still image for physical detail of objects.

● To engage the user and draw attention, use dynamic media, e.g. video, animation,

or speech.

● For spatial information, use diagrams, maps, with photographic images to

illustrate detail, and animations to indicate pathways.

● For values and quantitative information, use charts and graphs for overviews

and trends, supplemented by tables for detail.

● Abstract concepts, relationships, and models should be illustrated with diagrams

explained by text captions and speech to give supplementary information.

● Complex actions and procedures should be illustrated as a slideshow of images

for each step, followed by a video of the whole sequence to integrate the steps.

Text captions on the still images and speech commentary provide

supplementary information. Text and bullet points summarize steps at the end,

so choice trade-offs may be constrained by cost and quality considerations.


25

● To explain causality, still and moving image media need to be combined with

text.

Payne (2012) referred to other research on multimedia instructions (Mayer & Moreno

2002). Following principles were summarized:

● The multiple presentation principle states that explanations in words and

pictures will be more effective than explanations that use only words. When

words only are presented, learners may find it difficult to construct an

appropriate mental image, and this difficulty may block effective learning.

Studies have offered support for the general idea that learners will acquire

richer knowledge from narration and animation than from narration alone.

● The contiguity principle is the claim that simultaneous, as opposed to successive,

presentation of visual and verbal materials is preferred.

● The chunking principle refers to a situation in which visual and verbal

information must be presented successively, or alternately (against the

contiguity principle). It states that learners will demonstrate better learning

when such alternation takes place in short rather than long segments. The

reasoning is straightforward, given the assumptions of the framework: working

memory may become overloaded by having to hold large chunks before

connections can be formed.

3.6. Design for Errors

Errors often arise as a combination of factors at the anthropomorphic, behavioral,

cognitive and social levels in the ABCS framework. Each of the components – people,

technology, context – can give rise to errors. There are different type of errors such as

“slips”, which refer to errors that occur when someone knows the right thing to do, but

accidentally do something different, e.g pressing wrong buttons while typing; or

“mistakes”, which refer to errors that occur when the action is taken on the basis of an

incorrect plan. One specific type of errors is post-completion errors. These arise when

the goal for the task has been completed, but the goals for the subtasks have not. A good

example of such situation is getting the money from the ATM, but forgetting the card to

the machine. Good interface design can can help to reduce the errors that may happen

while interacting with the interface. The first step in “design for error” is to identify the

situations that can lead to erroneous performance. Secondly, appropriate mechanisms

must be put in place to either prevent the errors, or at least mitigate the adverse

consequences arising from those errors. For example, in order to avoid the post-

completion errors, the system should discourage the user from believing that they have

completed the task until all the important sub-parts are done, and to put the most

important goal last, where technology and the situation permits. Good design can help

provide more feedback on performance, and could also provide education along the way

about how to correct problems. (Ritter et al. 2014)


26

3.7. Display Designs

Displays are human-made artefacts designed to support the perception of relevant

system variables and to facilitate further processing of that information. A user must be

able to process whatever information that a system generates and displays; therefore,

the information must be displayed according to principles in a manner that will support

perception, situation awareness, and understanding. The term “display” doesn’t refer

only to visual displays, but includes all medias that are used to provide information to

the users (e.g. audio and haptic devices). (Wickens et al. 2004)

3.7.1. Thirteen principles of display design

Wickens et al. (2004) defined 13 principles of display design. These principles of human

perception and information processing can be utilized to create an effective display

design. The potential benefits of applying these principles are expected to be, for

instance: a reduction in errors, a reduction in required training time, an increase in

efficiency, and an increase in user satisfaction. It has to be noted that not all the

principles are applicable to all displays or situations and may even seem to be

conflicting. The principles may be tailored to the specific situation.

Perceptual principles

1. Make displays legible (or audible). A display’s legibility is critical and necessary

for designing a usable display. If the characters or objects being displayed cannot

be discernible, then the operator cannot effectively make use of them.

2. Avoid absolute judgment limits. Do not ask the user to determine the level of a

variable on the basis of a single sensory variable (e.g. color, size, loudness).

These sensory variables can contain many possible levels.

3. Top-down processing. Signals are likely perceived and interpreted in accordance

with what is expected based on a user’s past experience. If a signal is presented

contrary to the user’s expectation, more physical evidence of that signal may

need to be presented to assure that it is understood correctly.

4. Redundancy gain. If a signal is presented more than once, it is more likely that it

will be understood correctly. This can be done by presenting the signal in

alternative physical forms (e.g. color and shape, voice and print, etc.), as

redundancy does not imply repetition. A traffic light is a good example of

redundancy, as color and position are redundant.

5. Similarity causes confusion: Use discriminable elements. Signals that appear to be

similar will likely be confused. The ratio of similar features to different features

causes signals to be similar. For example, A423B9 is more similar to A423B8

than 92 is to 93. Unnecessary similar features should be removed and dissimilar

features should be highlighted.

Mental model principles

6. Principle of pictorial realism. A display should look like the variable that it

represents (e.g. high temperature on a thermometer shown as a higher vertical

level). If there are multiple elements, they can be configured in a manner that

looks like it would in the represented environment.


27

7. Principle of the moving part. Moving elements should move in a pattern and

direction compatible with the user’s mental model of how it actually moves in

the system. For example, the moving element on an altimeter should move

upward with increasing altitude.

Principles based on attention

8. Minimizing information access cost. When the user’s attention is diverted from

one location to another to access necessary information, there is an associated

cost in time or effort. A display design should minimize this cost by allowing for

frequently accessed sources to be located at the nearest possible position.

However, adequate legibility should not be sacrificed to reduce this cost.

9. Proximity compatibility principle. Divided attention between two information

sources may be necessary for the completion of one task. These sources must be

mentally integrated and are defined to have close mental proximity. Information

access costs should be low, which can be achieved in many ways (e.g. proximity,

linkage by common colors, patterns, shapes, etc.). However, close display

proximity can be harmful by causing too much clutter.

10. Principle of multiple resources. A user can more easily process information across

different resources. For example, visual and auditory information can be

presented simultaneously rather than presenting all visual or all auditory

information.

Memory principles

11. Replace memory with visual information: knowledge in the world. A user should

not need to retain important information solely in working memory or retrieve

it from long-term memory. A menu, checklist, or another display can aid the user

by easing the use of their memory. However, the use of memory may sometimes

benefit the user by eliminating the need to reference some type of knowledge in

the world (e.g. an expert computer operator would rather use direct commands

from memory than refer to a manual). The use of knowledge in a user’s head and

knowledge in the world must be balanced for an effective design.

12. Principle of predictive aiding. Proactive actions are usually more effective than

reactive actions. A display should attempt to eliminate resource-demanding

cognitive tasks and replace them with simpler perceptual tasks to reduce the use

of the user’s mental resources. This will allow the user to not only focus on

current conditions, but also think about possible future conditions. An example

of a predictive aid is a road sign displaying the distance to a certain destination.

13. Principle of consistency. Old habits from other displays will easily transfer to

support processing of new displays if they are designed in a consistent manner.

A user’s long-term memory will trigger actions that are expected to be

appropriate. A design must accept this fact and utilize consistency among

different displays.


28

3.7.2. Visual design principles for good design

The universal principles of visual communication and organization are (Watzman & Re,

2012):

● Harmony - Refers to grouping of related parts, so that all the elements combine

logically to make a unified whole. In interface design this is achieved when all

design elements work in unity.

● Balance - Offers equilibrium or rest. Provides the equivalent of a center of

gravity that grounds the page. Without balance, the page collapses, all elements

are seen as dispersed and content is lost. Balance can be achieved by using

symmetry or asymmetry.

● Simplicity - Is the embodiment of clarity, elegance and economy. Involves

distillation – every element is indispensable, if an element is removed, the

composition falls apart. Two common guidelines to achieve simplicity are: ”less

is more” and ”When in doubt, leave it out”.

Several things have to be considered when designing visual communications, such as

web pages, different visual displays or dashboards. These include aspects such as

typography, color, field of vision, page layout design, graphs and charts and. amount of

information on display.

Typography

Typographic choice affects legibility and readability, meaning the ability to easily see

and understand what is on a page. Legibility, the speed at which letters and the words

built from them can be recognised, refers to perception. Readability, the facility and ease

with which text can be read, refers to comprehension. Regardless of the media, legibility

and readability depends on different variables, such as point size, letter pairing, word

spacing, line length and leading, resolution, color and organisational strategies, such as

text clustering. Type size is also dependent on the resolution offered by output and

viewing devices, color usage, context, and other design issues. In choosing a typeface, its

style, size, spacing and leading, the designer should think about the final output medium

and examine this technology’s effect on legibility. Low quality monitors and poor

lighting have a major impact: serifs sometimes disappear, letters in small bold type fill in

and colored type may disappear altogether. Line spacing effects in such way that when

there is greater space between the words than there is between lines, the reader’s eye

naturally falls to the closest word, which may be below instead of across the line. White

on black (or light on a dark background) is generally regarded as less legible and much

more difficult to read over large areas, compared to the colors being other way around.

(Watzman & Re, 2012)

Color

The appropriate use of color can make it easier for users to absorb large amounts of

information and differentiate information types and hierarchies. Color is often used to:

Show qualitative differences; Act as a guide through information; Attract

attention/highlight key data; Indicate qualitative changes; Depict physical objects

accurately. For color to be effective, it should be used as an integral part of the design

program, to reinforce meaning and not simply as decoration. One important thing to

remember is that at least 9% of the population, mostly male, is color-deficient to some

degree, so color shouldn’t be used as an only cue. This is especially important in critical


29

situations, such as warnings. Therefore color should be used as a redundant cue when

possible. (Watzman & Re, 2012)

Field of vision

Field of vision refers to what a user can see on a page with little or no eye movement. A

good design places key elements in the primary field of vision, reflecting and reinforcing

the information hierarchy. Size, contrast, grouping, relationships, and movement are

tools that create and reinforce field of vision. The user first sees what is visually

strongest, not necessarily what is largest or highest. Animated cues, such as blinking

cursors and other implied structural elements like handles around selected areas

become powerful navigational tools if intuitively understood and predictably applied.

(Watzman & Re, 2012)

Page design

Two important functions of page design are motivation and accessibility. A well-

designed page is inviting, drawing the eye into the information. Motivation and

accessibility are accomplished by providing the reader with ways to quickly understand

the information hierarchy. At a glance, the page design should reveal easy navigation

and clear, intuitive paths to discovering additional details and information. This is called

visual mapping. A visually mapped product has:

● An underlying visual structure or grid, organizational landmarks, graphic cues

and other reader aids;

● Distinctly differentiated information types;

● Clearly structured examples, procedures, and reference tools;

● Well-captioned and annotated diagrams, matrices, charts, and graphics.

Grid enables a user to navigate a page quickly and easily. A grid specifies placement for

all visual elements. The user anticipates where a button will appear or how help is

accessed. A well-designed page should give a hint at all topics contained in the site,

provide high-level information about these topics, and suggest easy paths to access this

information. Consistent use of type, page structure, and graphic and navigational

elements creates a visual language that decreases the amount of effort it takes to read

and understand a communication piece. (Watzman & Re, 2012) The Gestalt principles,

illustrated in Figure 2, shows how different objects should be placed on the display, if

they should be regarded as a group by the user (Ritter et al. 2014).


30

Figure 2. Gestalt principles of visual grouping (Ritter et al. 2014).

Charts, diagrams, graphics and icons

People don’t have time to read. Therefore, in general, the users prefer well-designed

charts, diagrams, and illustrations that quickly and clearly communicate complex ideas and information. It is very difficult to create an icon that, without explanation,

communicates a concept across cultures. If an icon must be labeled, it is really an

illustration and the icon’s value as visual shorthand is lost. It is better to use a word or

short phrase rather than word and image when screen space is at minimum. (Watzman

& Re, 2012) On the other hand, Ritter et al. (2014) suggested that the use of icons can be

eased by text that appear on top of the icon, when going close with the mouse.

Photograph can easily represent an existing object, but issues relating to resolution and

cross-media publishing can make it unintelligible. Illustrations allow to present abstract

concepts or objects that do not exist and it can help to focus the viewer’s attention to a

certain detail. Graphics are invaluable tools for promoting additional learning and

action, because they reinforce the message, increase information retention and shorten

comprehension time. Different people learn through different cognitive modes or styles.

Therefore it may be wise to use various modes, such as text, charts, photos or allow the

mode to be customized. (Watzman & Re, 2012)

Amount of information

Especially, the hand-held devices have very limited space for presenting the

information. When evaluating how much information should be presented in the screen,

the demands from cognitive and visual perspectives may be contradicting. Schlick et al

(2012) stated that presenting little information on a screen at time helps to avoid

visibility problems resulting from high-information density. On the other hand,


31

presenting as much information on screen as possible allows users to have maximum

foresight (cognitive preview) of other functions on the menu, which should benefit

information access from a cognitive point of view and minimize disorientation. (Schlick

et al. 2012)


32

4. Human-machine interaction technologies

As discussed by Danielis (2014), after the industry has already undergone three

revolutions in the form of mechanization, electrification, and informatization, as fourth

industrial revolution, the Internet of Things and Services is predicted to find its way into

the factory. For this development, e.g., in Germany the term “Industry 4.0” has been

created. The vision is the so-called Smart Factory with novel production logic: The

products are intelligent and can be identified clearly, constantly located, and are aware

of their current state. These embedded production systems shall be interconnected with

economic processes vertically and combined to a distributed real-time (RT) capable

network horizontally. (Danielis 2014) An important role will be played by the paradigm

shift in human-technology and human-environment interaction brought about by

Industrie 4.0, with novel forms of collaborative factory work that can be performed

outside of the factory in virtual, mobile workplaces. Employees will be supported in

their work by smart assistance systems with multimodal, user-friendly user interfaces.

(INDUSTRIE 4.0. 2013)

The ongoing transformation towards digital manufacturing paves the way for adoption

of novel user interfaces for factory floor operators. While many of the technologies for

instance for augmented reality have been there for quite some time, their use in

industrial context has been rare to date (Nee et al. 2012). Adoption of manufacturing IT-

systems, such as MES (Manufacturing Execution System), will support the real time data

collection from the manufacturing operations in a digital format. This data, earlier non-

existent, can then be used throughout the organization for better and more

synchronized management and control of the operations. Such digitalization will also

allow the relevant real time information to be displayed to the factory workers through

multitude of different user interface technologies

This chapter will introduce some of the available and emerging human-machine

interaction technologies and show some examples of the applications. It will start by

discussing about direct and indirect input devices in general, after which it introduces

specific technologies, such as mobile devices, augmented reality, speech and gesture

recognition, in more detail. Each technology will be evaluated based on their technology

readiness levels (TRL), which are commonly used in European Commission’s (EC)

Horizon 2020 program. The evaluation is done based on the material available on the

technologies. The focus in this report is mainly between commercial and prototype

technologies. The technology readiness levels are, as defined by EC, the following:

TRL 0: Idea. Unproven concept, no testing has been performed.

TRL 1: Basic research. Principles postulated and observed but no experimental proof

available.

TRL 2: Technology formulation. Concept and application have been formulated.

TRL 3: Applied research. First laboratory tests completed; proof of concept.

TRL 4: Small scale prototype built in a laboratory environment ("ugly" prototype).

TRL 5: Large scale prototype tested in intended environment.

TRL 6: Prototype system tested in intended environment close to expected

performance.

TRL 7: Demonstration system operating in operational environment at pre-

commercial scale.

TRL 8: First of a kind commercial system. Manufacturing issues solved.


33

TRL 9: Full commercial application, technology available for consumers.

4.1. Direct and indirect input devices

In human-computer interaction, the human has to be able to give the commands and

input information to the computer in some way. The input devices can be either direct

or in-direct. This section will give examples of the devices belonging to these two

categories and introduce the characteristics of direct and in-direct input devices.

A direct-input device has a unified input and display surface. Indirect input device does

not provide input in the same physical space as the output. Examples of direct input

devices are touch screens and display tablets operated with a pen (or other stylus). In

contrast, e.g. mouse is an indirect input device, because the user must move the mouse

on a surface (the desk) to indicate a point on another surface (the screen). (Hinckley &

Wigdor, 2012) Welsh et al. (2012) stated, that even though mouse, keyboard and

joystick devices will continue to dominate for the near future, embodied, gestural and

tangible interfaces – where individuals use their body to directly manipulate

information objects – are rapidly changing the computing landscape. Example is the

touchscreen, which allows user to, instead of pointing and clicking with a mouse, to

directly pull, push, grab, pinch, squeeze, crush and throw virtual objects. The user

doesn’t need to use dissociated (mouse) and/or arbitrary (keyboard and joystick)

sensorimotor mappings to achieve his/her goals. These new modes of interaction allow

more direct mapping of the user’s movements on to the workspace. (Welsh et al. 2012)

Touch screen is the most common example of direct input devices. They are used for

instance in tablet devices, mobile phones, laptop screens and large wall-mounted

displays. There exist different kind of touch screen types (Hinckley & Wigdor 2012;

Schlink et al. 2012):

● Resistive touch screens - React to pressure generated by finger or stylus.

Require pressure and may be fatiguing to use, but can be used by operators

wearing gloves.

● Capacitive touch screens - A human touch on the screen’s surface results in an

alteration of the human body’s electrostatic field, which is measured as a change

in capacitance. Require contact from bare fingers in order the touch be sensed.

However, soft touch is enough.

● Surface acoustic wave touch screens - Use ultrasonic wave created by a

fingertip on the surface.

● Optical touch screen - Use several optical sensors around the corners of the

screen to identify the location of the movement or touch.

● Dispersive signal touch screens - Detect the mechanical load created by a

touch.

● Strain gauge touch screens - Are also known as force panel technology. Are

spring mounted in every corner. Identify the corresponding deflection when the

screen is touched and locates it.

Touch screens can also be divided into single touch and multiple touch screens. Single

touch interfaces are able to detect only one touch point at a time. They resemble mouse

and are good for pointing. Multi-touch interface is able to detect multiple fingers (i.e.


34

touch points) simultaneously and can be thus used e.g. for “pinch to zoom”. Capacitive

screens, optical (infrared) screens, and most recently resistive screens, can be used for

multi-touch purpose. (Schlick et al. 2012)

Hinckley and Wigdor (2012) highlighted, that direct input on wall-mounted displays is

commonplace, but the constant physical movement required can become burdensome.

Interacting with portions of the display, that are out of the view or beyond arm’s length,

may also raise challenges. As stated by Hinckley & Wigdor (2012), indirect input scales

better to large interaction surfaces, because it requires less body movement and also

allows interaction at a distance from the display.

Input technologies, which use gestures and other body input, are also categorized as

direct input devices. Gestures are considered as a natural way of interacting with

machines. However, in gesture-based interaction, the main challenge is to correctly

identify when a gesture, as opposed to and identical hand movements, starts and stops.

It is not so clear when the user is actually trying to interact with the machine and when

not. Similar challenge exists with speech interfaces. Also, gesture and other body input

may cause fatigue, e.g. if one’s arms have to be extended for long periods of time.

Indirect input devices can be divided into absolute and relative input devices. An

absolute input device senses the position of an input and passes this message to the

operating system. Relative devices sense only changes in position. Absolute mode is

generally preferable for tasks such as drawing, handwriting an tracing, whereas relative

mode may be preferable for traditional desktop graphical user interaction task, such as

selecting icons or navigating through menus. (Hinckley & Wigdor, 2012)

Common indirect input devices, in addition to mice and keyboards, are touchpads,

trackballs and joysticks (Hinckley & Wigdor, 2012):

● Touchpads are small, touch-sensitive tablets, which are often used in laptop

computers. Usually they use relative mode for cursor control, because they are

too small to map to an entire screen. The small size of the touchpad necessitates

frequent clutching, and may require two hands to hold down the button.

● Trackball senses the relative motion of a partially exposed ball in two degrees of

freedom. They may require frequent clutching movements because users must

lift and reposition their hand after rolling the ball through a short distance.

● Joysticks: Isometric joystick is a force-sensing joystick that returns to the center

when released. Isotonic joysticks sense the angle of deflection.

Keyboards are either indirect or direct input devices. The graphical keyboards of touch

screens are direct input devices. Many factors influence typing performance with

keyboards, including key size, key shape, activation force, key travel distance and the

tactile and auditory feedback provided by striking the keys. Touch screens’ graphical

keyboards require significant visual attention, because the user must look at the screen

to press the correct key. The quality of tactile feedback is poor when compared with a

physical keyboard because the user can not feel the key boundaries. A graphical

keyboard (as well as the user’s hand) occludes a significant portion of a device’s screen,

resulting in less space for the document itself. Furthermore, because the user typically

cannot rest his/her fingers in contact with the display (as one can with mechanical keys)

and also because the user must carefully keep other fingers pulled back so as to not


35

accidentally touch keys other than the intended ones, extended use of touch-screen

keyboards can be fatiguing. (Hinckley & Wigdor, 2012)

4.2. Mobile Interfaces and Remote Sensors

The basic consumers of mobile devices, such as smart phones and tablets, use such

devices for the purpose of media consumption, picture/video capture, social

collaboration, web browsing, communication, games, mapping and route planning.

Recently industry has found mobile devices useful; however the changes are happening

slowly. Future manufacturing operator tools are based on mobile communication,

decision support and IT, enhancing operator capability. The Operator of the Future1 -

project in Sweden has developed and tested concepts, relying on mobile technologies,

such as adaptive work instructions, dynamic checklists, logbook, reporting, localization,

remote support, decision support, statistics, remote monitoring and control.

The global market requires that decisions are taken as quickly as possible, even if people

who are responsible for it are out of physical limits of their companies. Therefore the

possibility to access critical information anywhere and anytime, with mobile devices, is

an indispensable key. For example, as stated by Moran (2013) with MES mobile

applications, data is made available on demand regardless of physical location,

providing real-time insight into operational and business performance. In

manufacturing, abnormal operating events that require action can occur at any time and

it is important that the right resources are aware of these events as near to real-time as

possible to minimize the impact on profitability. (Moran 2013)

Researchers have been interested in implementing mobile devices to set remote access

to HMIs recently. Using web services is one way for this integration Cavalcanti (2009)

described architecture of a system that provides access to factory floor information from

cell phones which can be called remote monitoring. The system uses many

communication technologies like OPC and Web Services enabling the critical

information like setpoints, alarms and thresholds to be viewed on the cell phone, from

anywhere. (Cavalcanti 2009).

Moreover, the use of wireless technologies has aimed to make the interfaces more flexible, simplify the installation, and target to cost effectiveness. The connections in

these technologies need highly reliable conditions even in severe environments. The

wireless systems have to deal with upper layer systems as well as other sensors in the

design. Therefore, communication protocols that are able to connect to the system are

necessary. Previous research has aimed for developing core technologies to implement

wireless technology in industrial use. Researched aspects are such as: 1) Creating

reliable low power mesh networks. The focus is on how mesh nodes in the system

decide to connect and how to minimize the number of routing tables required in an IP

network. Reliability also requires time synchronization of data transmission. 2)

Considering redundant routes between wireless system and upper layers and avoiding

density at the gateways. 3) Creating a seamless communication which has been possible

by IPv6 technologies application on nodes. 4) Final core has been to make modules ultra

low power.. In this way is that the interface will be fast, reliable and easy to access via

environment network. (Yamaji 2008)

1 Operator of the Future by Chalmers 2015. [Available in: http://www.chalmers.se/hosted/frop-en]

http://www.cheerpack.com/contact.html



36

4.2.1. Mobile Device and Remote Sensor Technologies

Tablet and smart phone devices

Tablets are familiar devices from the home use. They are slowly finding their ways to

the industrial use as well. Most of such devices have touch screen display, which is

operated by fingers or stylus. One example of a tablet device is a Motorola ET1

Enterprise2 (TRL9), released on 2011 (Figure 3). It is designed especially for use in

manufacturing companies. Double user log-in, integrated optical barcode scanner, and

swappable battery packs with a multi touch panel are some features of this tablet. The

operating system may be reached through several mobile Motorola devices which are

running android, windows, or windows CE. The device is equipped with WLAN, GPS, and

android 4.1.1 as operating system.

Figure 3. Motorola ET1 Enterprise Tablet. (Figure from Motorola America ET1 Enterprise page

2015).

Smart Watches

A smart watch is a watch that has more capabilities than only timekeeping. Modern

smart watches have similar operating system or sometimes even the same as in a mobile

phone. Such devices can have features like camera, accelerometer, thermometer,

altimeter, barometer, compass, cell phone, touch screen, GPS navigation, speaker, map

display, watch, a mass storage distinguishable by a computer, and a rechargeable

2 Motorola ET1 Enterprise Tablet 2015. [Available in: http://www.motorolasolutions.com/US-

EN/Business+Product+and+Services/Tablets/ET1+Enterprise+Tablet]



http://www.motorolasolutions.com/US-EN/Business+Product+and+Services/Tablets/ET1+Enterprise+Tablet


37

battery. Companies such as Samsung, LG, Asus, Sony, Motorola, Apple, Pebble,

Qualcomm, and Exetch have made their smartwatch products. (Melon 2012; Trew 2013)

Much has been written about smartwatches lately. However, valuable use cases are still

unclear. Independent research company Smartwatch Group has done an in-depth

analysis on what will be the most relevant application areas for smartwatches in 2020.

These are listed in Table 2.

Table 2. Smartwatch Group ranking for applications of smartwatches in 2020. (Smartwatch Group

2015).

Application Key Benefits 2020 Ranking

Personal assistance

Highly efficient, context-aware management of calendar, tasks, and information needs

1

Medical/health Basis for huge improvements in therapy for various patient groups; tool to manage medical records

2

Wellness Higher body awareness, more movement, better nutrition, less stress, improved sleep

3

Personal Safety Prevention of emergencies; auto-detection and fast support in case it happens

4

Corporate Solutions

Simpler, more efficient, safer and cheaper business processes 5

Other wireless interfaces

3Dconnexion SpaceMouse Wireless3

3Dconnexion presented the SpaceMouse Wireless (TRL 9), which is a wireless 3D mouse

and a new solution for industrial integration (Figure 4). The 3D mouse is designed as an

input device that helps the engineer to navigate in a 3D CAD environment in 6 degrees of

freedom. SpaceMouse Module is addressing the joystick market and is designed as an

alternative to a conventional joystick for use in industrial environments. The

components are provided in an open housing with a standard metric screw and slimline mount for easy integration. It is available with a serial or USB interface. KUKA use the

3Dconnexion industry module in a robot programing controller, where each robot is

taught how to move its arm (Figure 4). The conventional way would be to program axis

separately, but with the integration of the industry module in the KUKA SMARTPAD, it is

possible to move the arm freely in 6-degrees-of-freedom. This movement is recorded

and can be easily implemented in the robots program.

3 CadRelations Youtube Channel. 2014. Video: HMI 2014: 3Dconnexion, - programing industry robots gets easier.

[Available in: https://www.youtube.com/watch?v=oIbXW3BVaAI]

https://www.youtube.com/watch?v=EFdYqhIezig



38

Figure 4. 3Dconnexion SpaceMouse Wireless used in KUKA robot controller panel. (Screenshot from

HMI 2014: 3Dconnexion 2014).

Electronic Paper

Electronic paper is a technology that tries to show screens like ordinary paper. The

difference with backlight papers is the trial to reflect light and empty pixels like normal

papers. Use cases for electronic paper are wrist watches, eBooks, newspapers, displays

embedded in smart cards, status displays, mobile phones, and electronic shelf labels.

Moreover, electronic papers can also be used in in production environment as easily

updateable Kanban cards. (Dilip 2010)

An electronic shelf label (ESL) is an interesting case to be used in warehouses and shop

floor which is for labeling the price or quantity of a product (Figure 5). A communication

network allows the display to be automatically updated whenever a product price or

amount in warehouse is changed. This communication network is the true differentiation and what really makes ESL a viable solution. The wireless

communication must support reasonable range, speed, battery life, and reliability. The

means of wireless communication can be based on radio, infrared or even visible light

communication. Currently, the ESL market leans heavily towards radio frequency based

ESL solutions. Automated ESL systems reduce labor costs of pricing management,

improve pricing accuracy and allow dynamic pricing. Dynamic pricing is the concept in

which retailers can fluctuate pricing to match demand, online competition, inventory

levels, shelf-life of items, and to create promotions. (Dilip 2010)


39

Figure 5. An electronic shelf label (ESL). (Screenshot smarttag from vmsd online page September

2013).

Remote sensors – Example: Irisys people counting System4

InfraRed Integrated Systems Ltd. has made an infrared system so called Irisys People

Counting (TRL 9), which sensors are designed to detect the heat emitted by people

passing underneath it as infrared radiation (Figure 6). The units contain imaging optics,

sensors, signal processing and interfacing electronics all within a discretely designed

moulded housing. Up to eight virtual counting lines are defined by an operator using a

portable PC setup tool, and people are counted as they pass each line in a defined

direction. Mounting heights of between 2.2m and 4.8m can be accommodated with the

standard lens. Other lens options are available to cover higher mounting heights.

4 Irisys People Counting 2015. [Available in: http://www.irisys.net/people-counting]

http://www.irisys.net/people-counting


40

Figure 6. Irisys People Counter. (Screenshot from Irisys People Counting 2015).

4.2.2. Mobile Devices and Remote Sensors Applications

The main industrial application environments where mobile devices have been reported

are such as warehouses, military, emergency services, and construction workers. In

factory floor and construction sites, tablets have to be ruggedized and protected from

water and dust ingress. While these kinds of tablets and protection are available,

adoption has been still slow. One reason could be the restriction of supporting all

applications of a company to be running on one mobile device. However armies have

been able to use smartphones and develop modified versions of various platforms that

allows for access to email, documents, and a partitioned ecosystem of apps and other

enterprise apps at the high level of security necessary. (IQMS 2011) In the following, a

couple of application examples for mobile devices and remote sensors are shown.

Running Enterprise Resource Planning ERP on Mobile Devices

Innowera presented the application to run SAP for mobile devices using Innowera Web

and Mobile Server5. The application has built-in offline capabilities and offers device

management, user management, and back office integration capabilities. It can be

installed on iOS and Android without the need for writing a new app for each platform

and the possibility to be hosted on Microsoft Azure, AWS, HP Cloud. The InnoweraApp

can be downloaded from Apple iTunes or Google play. Afterwards the user need to

process to Innowera Web and Mobile Server (IWMS). If required, one can change

published processes using any HTML5 editor.

5 Innowera Mobile 2013. [Available in: http://innowera.com/web-and-mobile-server-for-sap.php]

http://innowera.com/web-and-mobile-server-for-sap.php


41

IQMS EnterpriseIQ mobile technology6

(TRL 9) extends manufacturing of ERP

functionality with real-time manufacturing, MES and ERP information on the go via

smart phones, PDAs, and tablets. IQMS’ ERP software allows checking production

process in real-time, take record and tries to give a full integration in ERP system. Strong

data encryption, as well as user defined security roles, ensure data is secure as taking

advantage of options such as CRM, document control, lot number changes, production

and reject reporting, quick inspections, real-time work center monitoring.

Pro-face Remote HMI7

Pro-face is software for developing human-machine interfaces (HMIs). Pro-face Remote

(TRL 9) is an HMI prepared and designed for implementing on tablets and smartphones.

Systems integrators on factory floor may use it for checking I/Os, what happened on the

system or following the machine steps and movements (Figure 7). The system status

monitoring may be synchronous or asynchronous. The system alarms may be viewed on

mobile device and in drastic cases it is easy to reach the right person contact info for

taking proper actions. Snapouts and remote monitoring are other features to be used.

Figure 7. Checking the machine movement with tablet device by Proface Remote HMI. (Screenshot

from Pro-face Remote HMI intro video 2013).

Tablets on factory floor and warehouse

Companies such as Cheer Packs North America8 use Microsoft Surface Pro tablet (TRL 9)

for office people, warehouse and quality management. In Figure 8 a quality specialist is

auditing on factory floor, inputting information and taking pictures to the quality

management software by using a Surface Pro device. As a task, user can take evidence of

possible problems, send it to someone else or save it for a future process. Based on the

employee feedback, the device has improved time efficiency, as it saves time from

operators, supervisors and quality inspectors in walking between different screens for

monitoring and information input.

6 IQMS Mobile ERP Apps for Manufacturing Companies 2015. [Available in: http://www.iqms.com/products/mobile-erp-

software.html] 7 Pro-face Remote HMI 2015. [Available in: http://www.profaceamerica.com/en-US/content/remote-hmi] 8 Surface Pro Youtube Channel. 2014. Video: Cheer Pack North America gains efficiency with Surface on the factory floor.

[Available in: https://www.youtube.com/watch?v=EFdYqhIezig]



http://www.iqms.com/products/mobile-erp-software.html






42

Figure 8. Quality specialist taking picture with Surface Pro. (Screenshot from Surface Pro intro on

factory floor Cheer Packs North America 2014).

QueVision System for Traffic Control9

QueVision combines infrared sensors over store doors and cash registers, predictive

analytics, and real-time data feeds from point-of-sale systems for a faster checkout

initiative (Figure 9). Kroger’s QueVision technology is powered by Irisys intelligent

Queue Management solution. It uses infrared sensors and predictive analytics to arm

store front-end managers with real-time data to make sure registers are open with

customers need them. The solution, across the Kroger family of stores, has reduced the

time a customer waits in line to check out, on average, from four minutes before

QueVision to less than 30 seconds today.

Figure 9. Kroger Traffic Control System aims to provide customer for a faster checkout. (Figure

from Kroger mobile innovations 2014).

9 Kroger Co’s QueVision for Traffic Control 2015. [Available in: http://ir.kroger.com/Mobile/file.aspx?IID=4004136&FID=22999227]

http://ir.kroger.com/Mobile/file.aspx?IID=4004136&FID=22999227


43

4.3. Virtual and Augmented Reality

Immersive Virtual Reality (VR) is a technology that enables users to enter into computer

generated 3D environments and interact with them. In VR technologies, the human body

movements are monitored by using different tracking devices. This enables intuitive

participation with and within the virtual world. Head mounted displays (HMDs) are a

commonly used display device for VR, using the closed view and non-see-through mode.

(Schlick et al. 2012)

Augmented reality is characterized by visual fusion of 3D virtual objects into a 3D real

environment in real time. Compared to VR, AR supplements reality, rather than

replacing it. With AR, developers create various virtual models in a way that users can

interact with those and distinguish between virtual and real world. An AR system

includes processor, sensors, display and input devices. The display system can be a

monitor or screen mounted in workplace, a head-mounted display, or eyeglasses.

(Graham 2012)

Even though AR technologies have existed already some years, their implementation in

real industrial environments has been rare (Nee et al. 2012). It is seen that the

emergence of manufacturing IT-solutions, which can collect and manage the

manufacturing information, will pave the way for more AR implementations.

Furthermore, as stated by Schlick et al. (2012) the recent advances in wearable

computer displays, which incorporate miniature TFT LCDs directly into conventional

eyeglasses or helmets, should simplify ergonomic design and further reduce weight of

the VR and AR technologies (Schlick et al. 2012).

The most common usage contexts for AR has been reported as conceptual product

design, education and training, visual tracking and navigation, work instructions and

remote help centers. (Nee et al. 2012; Graham 2012) In the following sections the

technologies and application examples of augmented reality will be discussed. The focus

will be on head-mounted displays, as the other technologies, such as mobile devices,

gesture control and speech recognition, are discussed on other sections of this report.

4.3.1. Technologies for Augmented Reality

Head-mounted displays (HMDs) are common technology in Augmented Reality

applications used to overlay the real world with virtual information is to use an HDM.

The overlaying can be done in two ways, either by using an HMD in see-through mode,

or by using an HMD in non-see-through mode, called video-see-through. The latter

approach optically isolates the user completely from the surrounding environment, and

the system must use video cameras to obtain a view of the real world. In optical see-

through HMD the user sees the real scene through optical combiners and no video

channel is needed. (Schlick et al. 2012)

The HMDs can generally be divided into following categories (Schlick et al. 2012):

● Monocular - Single display source, which provides the image to one eye only.

● Binocular (2D) - Two displays with separate screens and optical paths, enabling

both eyes to see the same image simultaneously.


44

● Binocular (3D) - Allow stereoscopic viewing with 3D depth perception. This is

produced by presenting two spatially slightly incongruent images to the left and

right eyes.

As discussed by Welsh et al. (2012) an HMI can assist with target detection, because it

overlays critical cue information over the actual environment, reducing the scanning

time required to sample and attend both the display and the environment. In the

following, few existing HMD products are introduced.

Google Glass10

Google glass is the smart wearable glass developed by Google (TRL 7). The sales section

of google glass beta version is stopped however the development is still proceeding and

the goal is to release a fine version of glasses. Google Glass projects the rendered image

through a lens and into the retina. Figure 10 shows a projector and a prism working

together.

Figure 10. A projector and a prism working together in a google glass (Figure from techlife 2013).

The result is that the user perceives a small translucent screen hovering at about arm’s length distance, as extended up and outward from the right eye. Since the colors are

cycling very quickly, the user perceives a full color video stream. The touch pad installed

beside the glass gives the capability to switch among the menu and search among past

and current events or taping on it opens one application. The camera also has the

possibility to take photos and record 720p videos. (Glass Help 2015) Figure 11 show

different parts of the glasses and Figure 12 a user wearing a google glass.

10 Glass Help 2015. [Available in: https://www.google.com/glass/help]

https://www.google.com/glass/help


45

Figure 11. Google glass structure including list of sensors and location of the processor. (Figure

from elsevier-promo online page 2015).

Figure 12. Google glass image preview. (Figure from Cult of Android online page October 2013).

EyeTap: The eye itself as display and camera11

EyeTap (TRL 7) is a device which allows, in a sense, the eye itself to function as both a

display and a camera. EyeTap is at once the eye piece that displays computer

information to the user and a device which allows the computer to process and possibly

alter what the user sees. That which the user looks at is processed by the EyeTap. This

allows the EyeTap to, under computer control, augment, diminish, or otherwise alter a

user's visual perception of their environment, which creates a Computer Mediated

Reality. Furthermore, ideally, EyeTap displays computer-generated information at the

appropriate focal distance, and tonal range. Figure 13 depicts and describes the basic

functional principle of EyeTap. Note from the diagram that the rays of light from the

environment are collinear with the rays of light entering the eye (denoted by the dotted

lines) which are generated by a device known as the aremac. "Aremac" is the word

11 Eyetap research project. [Available in: http://www.eyetap.org/]

http://www.eyetap.org/


46

camera spelled backwards and is the device which generates a synthetic ray of light

which is collinear with an incoming ray of light. Ideally, the aremac will generate rays of

light to form an image which appears to be spatially aligned, and appears at with the

same focus as the real world scene. (EyeTap Research Project Page 2015)

Figure 13. Basic functional principle of EyeTap. (Figure from eyetap online page 2015).

Canon Mixed Reality headset12

Canon's Mixed Reality (MREAL) headset (TRL 9) delivers augmented reality. Canon’s

Mixed Reality is pitched as a high-end tool for product designers in the automotive,

construction, manufacturing, or research fields. The system works differently than

Google Glass. MREAL's bulky-looking headset positions two cameras in front of eyes,

which display a combination of video from surroundings and computer-generated

graphics (Figure 14). Canon created MREAL to allow designers to interact with simple

designs of their products, which will look like highly detailed objects through the glasses

by the headset's computer-powered augmented reality. Basically, it allows designers to

interact with intricate, computer-generated versions of their ideas in a 3D environment.

The head-mounted display is linked to a controller, which is connected to a computer

generating the video of user surroundings.

12 Canon Mixed Reality (MREAL) headset [Available in: http://usa.canon.com/cusa/office/standard_display/Mixed_Reality_Overview]

http://usa.canon.com/cusa/office/standard_display/Mixed_Reality_Overview


47

Figure 14. Canon Mixed Reality (MREAL) headset system architecture using augmented reality

(Figure from Canon Mixed Reality headset online page 2015).

Microsoft HoloLens13

The HoloLens of Microsoft (TRL 5) wraps around user’s head and does not isolate user

from the world. It has the Intel SoC and custom Holographic Processing Unit as built in.

That does not just allow the user to see the digital world projected around but on top of

the real world. User can see the person standing next to and talk to them, avoid walking

into walls and chairs as well as looking at a computer screen, because HoloLens detects

the edge and does not project over it so there is no need to keep taking it on and off

during the work. One can take notes or answer email on a computer with a keyboard or

a pen instead of trying to force gestures and gaze. The HoloLens projected screen moves as the user moves the head. User can control the apps either with voice commands or by

using the equivalent of a mouse click as the air tap. Making a Skype call from HoloLens

is a good way to try out voice and gesture commands; it is possible to search the person

to call in the address book then air tap to connect. The other party does not require a

HoloLens and is able to see in Skype what the Holones user is looking at and for example

draw diagrams on the video that appear in user’s view (Figure 15).

13 Microsoft 2015. Microsoft HoloLens. [Available in:http://www.microsoft.com/microsoft-hololens/en-us]

http://www.microsoft.com/microsoft-hololens/en-us


48

Figure 15. Hololens example application for customer service purposes (Figure from Microsoft

Hololens online page teach and learn 2015).

4.3.2. AR application examples

Many companies and research groups recently started to create and develop methods to

use AR and this section aims to introduce some industrial and non-industrial application

examples.

Google Glass Applications14

Google have designed basic applications for glassware such as taking photos, record a

video, finding directions, or search google. However it takes time to get used to wearing

the glasses. There are also applications available in Glass appstore and from third

parties. For instance Tesco grocery Glassware lets the user to browse, view nutritional

information, and add items to the shopping basket hands-free. Other example is Magnify

that lets users zoom-in on objects located in front of them. Users with limited vision are

able to zoom in and out in order to see objects at a closer range with a voice command.

Magnify runs for 30 seconds and users have the option to extend the time. Currently,

also IFTTT15 (IF This Then That) is also available for Google Glass. This service exists to

automate the tasks user regularly perform across a wide range of popular apps and

services.

Augmented reality applications from SAP16

SAP is working with smart eyewear company Vuzix to bring augmented reality and

smart glasses into industrial environments. The applications are targeted especially to

field technicians and warehouse workers, where hands-free computing can aid in data

collection and operations. The two applications launched are the SAP AR Warehouse

Picker and the SAP AR Service Technician (TRL 9). Both applications utilize visualization

and voice recognition to receive instructions via the M100 Smart Glasses to complete

14 Glassware Apps Online Page [Available in: https://glass.google.com/u/0/glassware] 15 About IFTTT. [Available in: https://ifttt.com] 16 SAP. Augmented Reality Apps. [Available in: http://www.sap.com/pc/tech/mobile/software/lob-apps/augmented-reality-apps/index.html]

https://glass.google.com/u/0/glassware

https://ifttt.com/

http://www.sap.com/pc/tech/mobile/software/lob-apps/augmented-reality-apps/index.html


49

daily tasks without hand-held devices or instructions. The aim is to make the operations

faster, more efficient and better quality (reduce mistakes).

SAP AR Warehouse Picker17

SAP AR Warehouse Picker (Figure 16) aims to instruct the warehouse worker in the

picking operations, and to collect the information of the picked items. With the

application the users are able to scan barcodes and QR codes for handling units,

locations, products, stations and any other required scans. It is also possible to give

voice input for quantity confirmation. The usage of smart glasses and AR technology

eliminates the need for hand-held scanners, which have been making the picking

operations difficult by occupying one hand. The hands-free functionality reduces the

time the workers must spend interacting with handheld scanners and devices. To get

started, workers connect the smart glasses with the organization’s back-end or gateway

system and load warehouse picking tasks. Pickers are then guided through tasks

according to the steps required for each item to be picked. Voice-recognition and

visualization functionality drive task completion and accuracy with prompts and step-

by-step directions. Operators can navigate through software options and enter data (e.g.

completion of tasks) with voice command. The smart glasses include speakers for the

audio prompts, as well as built-in scanning functionality. For example, the application

can give workers audio prompts to scan a particular item with the smart glasses, pick an

item up off the shelf, or enter an item quantity. The authentication of users is verified by

scanning a unique QR code through the smart glasses.

Figure 16. SAP AR Warehouse Picker application guiding the worker in the picking operations

(screenshot from SAP Enterprise Mobile 2013).

17 SAP Enterprise Mobile. 2013. Video: SAP & Vuzix Bring you Augmented Reality Solutions for the Enterprise. [Available

in: https://www.youtube.com/watch?v=9Wv9k_ssLcI]

https://www.youtube.com/watch?v=9Wv9k_ssLcI


50

SAP AR Service Technician18

SAP AR Service Technician (Figure 17) aims to instruct the technician in service

operations. With the application, users have access to 3D visual enterprise models of

their workplace and the use of an expert calling feature, which allows a remote expert to

give directions to a colleague while streaming a visual from the head set. The application

supports voice-activated commands and audio-note functionality. The hands-free

functionality allows the operator to concentrate on the skilled and precise hand tasks.

To get started, technicians need to sync the smart glasses with a tablet or laptop to

retrieve all necessary data and any new voice notes from SAP Work Manager, left by

other workers and stakeholders. They can scan the QR code and select from a list of

procedures. Once the information for the current job is loaded into the smart glasses,

workers can navigate the software with voice-activated commands. They can browse 3D

visualizations and information including instructions, operational steps, and parts lists.

They can drill into details for more information on a specific part, listen to equipment

voice notes, and record new voice notes. Browsing through procedure steps happens by

commands such as “Next,” “Previous,” and “Step.” For each step, the 3D model of the part

or item will animate, and audio and textual instructions can be provided if available in

the visual enterprise model. In order to get “over-the-shoulder” expert assistance, the

field technician can use voice commands to select from a list of available experts and

make the call. The expert can see in real-time what the technician sees through the

camera in the smart glasses, and the technician can see the expert in the smart glasses.

Figure 17. SAP AR Service Technician application guiding the service operator (screen shot from

SAP 2014).

AstroVAR19

AstroVAR is a projected augmented reality system and a product from Delta Sygni Labs

(TRL 9). It enables visual communication between the remote expert and the on-site

personnel. Experts can see the situation and help from the office by using a laser pointer

showing visual instructions on workpieces and devices. With the expert knowledge, 18 SAP. 2014. Video: SAP and Vuzix bring you the future of Field Service. [Available in:

https://www.youtube.com/watch?v=UlpGDrSmg38]

19 Delta Sygni Labs AstroVAR product [Available in: http://deltacygnilabs.com]

https://www.youtube.com/watch?v=UlpGDrSmg38

http://deltacygnilabs.com/


51

straight to the point, the on-site personnel can fix the problem. Equipment is back in

service and the on-site site visit is avoided. Some features are such as wireless, no

glasses and feasibility to use (Figure 18).

Figure 18. Delta Sygni Labs AstroVAR product for technical support (Delta Sygni Labs online page

2014).

Simulo Engineering AR help platform20

Assembly tasks, disassembly, diagnosis routine, pre-assembly operations are samples of

AR use-cases improved in Simulo Engineering (TRL 9). AR implementation for work

instruction of arm loader is a fine example of teaching inexperienced workers to do new

tasks (Figure 19).

Figure 19. The assembly process for a manipulator using AR guides on screen installed in the environment. (Screenshot from Simulo Engineering using industrial Application of AR in 2012)

20 Simulo Engineering. AR industrial Applications. [Available in: http://www.simulo.it/]

http://www.simulo.it/


52

4.4. Gesture and Speech Control

Gesture and Speech control are often used in augmented reality applications. They are

becoming more common with the emergence of multimodal interfaces. As highlighted

by Karat et al. (2012) speech technology, like the other recognition technologies, lack

100% accuracy. This is due to the fact that the individuals speak differently from each

other, and because the accuracy of the recognition is dependent on an audio signal that

can be distorted by many factors. The accuracy depend on the choice of the underlying

speech technology, and making the best match between the technology, the task, the

users, and the context of use. Automatic speech recognition can have explicitly defined

rule-based grammars or it can use statistical grammars such as a language model.

Usually a transactional system uses explicitly defined grammars, while dictation systems

or natural language understanding (NLU) systems use statistical models. (Karat et al.

2012)

In general, it is effective to use speech applications for situations when speech can

enable a task to be done more efficiently, for instance, when a user’s hands and eyes are

busy doing another task (Karat et al 2012).

The dialog styles in speech recognition systems include: Directed dialog (system-

initiated) in which the user is instructed or “directed” what to say at each prompt; User-

initiated in which the system is passive and the user is not prompted for specific

information; Mixed initiative in which the system and the user take turns initiating the

communication depending on the flow of the conversation and the status of the task.

(Karat et al. 2012)

Hickley & Wigdor (2012) brought out some limitations relating to speech recognition.

First of all, it can only succeed for a limited vocabulary. The error rates increase as

vocabulary grows and the complexity of grammar increases, if the quality of the audio

signal from the microphone is not good enough, or if users employ out-of-vocabulary

words. Speech is inherently non-private in public situations, and can also be distracting

for persons nearby. Spatial locations are not easily referred by speech, which means

that speech cannot eliminate the need for pointing. (Hinckley & Wigdor 2012) In recent years, the robustness of speech recognition in noisy environments has been improved

by speech/lip movement integration. This kind of work has included classification of

human lip movement (visemes) and the viseme-phoneme mappings that occur during

articulated speech. (Dumas et al. 2009)

As stated by Hinckley & Wigdor (2012) for computer to embed themselves naturally

within the flow of human activities, they must be able to sense and reason about people

and their intentions e.g. to know when the user is trying to interact with the system, and

when he/she is talking or interacting (e.g. waving) with other people. This issue applies

to both gesture and speech control.

4.4.1. Technologies for Gesture and Speech control

Kinect

Kinect (codenamed in development as Project Natal and currently with TRL 9) is a line

of motion sensing input devices by Microsoft for video game consoles and Windows PCs.

Based around a webcam-style add-on peripheral, it enables users to control and interact


53

with their console/computer without the need for a game controller, through a natural

user interface using gestures and spoken commands (Project Natal 2009). The body

position is estimated in 2 steps. First the device draws a depth map by using structured

light, and then finds body position by machine learning. Inside the sensor case21, a

Kinect for Windows sensor (Figure 20) contains firstly an RGB camera that stores three

channel data in a 1280x960 resolution. This makes capturing a color image possible. It

also contains an infrared (IR) emitter and an IR depth sensor. The emitter emits infrared

light beams and the depth sensor reads the IR beams reflected back to the sensor. The

reflected beams are converted into depth information measuring the distance between

an object and the sensor. This makes capturing a depth image possible. Third is a multi-

array microphone, which contains four microphones for capturing sound. Because there

are four microphones, it is possible to record audio as well as find the location of the

sound source and the direction of the audio wave. Finally it includes a 3-axis

accelerometer configured for a 2G range, where G is the acceleration due to gravity. It is

possible to use the accelerometer to determine the current orientation of the Kinect.

Figure 20. Kinect sensor components. (Figure from Kinect for Windows Sensor Components and

Specifications 2015.).

The first-generation Kinect was first introduced in November 2010 in an attempt to

broaden Xbox 360's audience beyond its typical gamer base. Microsoft released the

Kinect software development kit for Windows 7 on June 16, 2011 (Knies 2011). This

SDK was meant to allow developers to write Kinecting apps in C++/CLI, C#, or Visual

Basic .NET (Stevens 2015).

SHADOW Motion Capture22

SHADOW motion capture system (TRL 9) uses inertial measurement units sealed in

neoprene fabric (Figure 21). The flexible sensors are small, lightweight, and

comfortable to wear. Inertial sensors measure rotation, not position. Shadow includes

software to estimate position based on the skeletal pose, pressure sensor data, and a

kinematic simulation. The position estimate updates in real time and streams to the 21 Kinect for Windows Sensor Components and Specifications 2015. [Available in: https://msdn.microsoft.com/en-

us/library/jj131033.aspx] 22 SHADOW motion capture system online page. [Available in: http://www.motionshadow.com/]

https://msdn.microsoft.com/en-us/library/jj131033.aspx


http://www.motionshadow.com/


54

viewing and recording systems with the current pose. Shadow skeleton data is viewable

in real time and compatible with most 3D digital content creation applications. The

software provides export to the industry standard FBX, BVH, and C3D animation and

mocap file formats. The Software Development Kit (SDK) supports network based

streaming of all synchronized pose data. The SDK is open source and available in many

popular programming languages. In 2013 a release of the Shadow full body inertial

motion capture system was presented, which builds on and extends existing hardware

and software platform.

Motion Shadow software requires a computer with Wi-Fi. The Motion Viewer and

Monitor applications are only available on the Windows platform. Motion Monitor on a

Wi-Fi enabled mobile device. Shadow also operates in standalone mode. Use your Wi-Fi

enabled mobile device as a remote control. The Motion User Interface supports the

following systems, no software or app required. It works on Apple iOS (iPhone, iPad,

iPod Touch), Android, and Windows Phone.

Figure 21. Shadow - a full body wearable sensor network for motion capture. (Figure from Motion

Node Channel 2013).

Thalmic Labs MYO armband23

MYO armband (TRL 7) senses muscle movements for Minority Report-style motion

control. MYO is an armband that translates the muscles' electrical activity into motion

controls (Figure 22). The sensor inside the armband has enough sensitivity to pick up

individual finger movements. Developers will be able to program for the controller as

well. To prevent accidental input, users must activate the motion control with a unique

gesture that is unlikely to occur normally. The armband will supposedly be one size fits

23 Thalmic Labs MYO gesture control armband 2014. [Available in: https://www.thalmic.com/en/myo/]

https://www.thalmic.com/en/myo/


55

all, and uses Bluetooth 4.0. While MYO is built for Windows and Mac, developers can

also integrate the device with their Android and iOS apps.

Figure 22. MYO armband has sensors to output the hand gesture. Developers are able to extract

styles based on a preferable common hand gesture. (Figure from Thalmic Labs MYO gesture control armband 2014).

Haptic Interfaces

Haptic devices (or haptic interfaces) are mechanical devices that mediate

communication between the user and the computer. Haptic devices allow users to touch,

feel and manipulate three-dimensional objects in virtual environments and tele-

operated systems. Most common computer interface devices, such as basic mice and

joysticks, are input only devices, meaning that they track a user's physical manipulations

but provide no manual feedback. As a result, information flows in only one direction,

from the peripheral to the computer. Haptic devices are input-output devices, meaning

that they track a user's physical manipulations (input) and provide realistic touch

sensations coordinated with on-screen events (output). Examples of haptic devices

include consumer peripheral devices equipped with special motors and sensors (e.g.,

force feedback joysticks and steering wheels) and more sophisticated devices designed

for industrial, medical or scientific applications (e.g., PHANTOM device). (Mimic

Technologies Inc. 2003)

Haptic interfaces are relatively sophisticated devices. As a user manipulates the end

effector, grip or handle on a haptic device, encoder output is transmitted to an interface

controller at very high rates. Here the information is processed to determine the

position of the end effector. The position is then sent to the host computer running a

supporting software application. If the supporting software determines that a reaction

force is required, the host computer sends feedback forces to the device. Actuators

(motors within the device) apply these forces based on mathematical models that

simulate the desired sensations. For example, when simulating the feel of a rigid wall

with a force feedback joystick, motors within the joystick apply forces that simulate the


56

feel of encountering the wall. As the user moves the joystick to penetrate the wall, the

motors apply a force that resists the penetration. The farther the user penetrates the

wall, the harder the motors push back to force the joystick back to the wall surface. The

end result is a sensation that feels like a physical encounter with an obstacle. (Mimic

Technologies Inc. 2003) Figure 23 shows an example for a haptic glove.

Figure 23. A Haptic Glove gives user the ability to touch virtual objects. (Figure from Digital Trends

October 2014).

Speaker separation HARK24 HARK developed by Kyoto university is a case which was introduced in 2010 for sound

separation to be implemented on the robots. The test demo available online shows the

capability of distinguishing voice of 4 different talkers (Figure 24).

Figure 24. HARK by university of Kyoto. (Screenshot from Willow garage ROS video 2010).

24Audition for Robots with Kyoto University (HARK). [Available in: http://www.hark.jp/]

http://www.hark.jp/


57

4.4.2 Gesture and Speech Control Application Examples

Robotic control by gesture recognition

Research example using Kinect cameras was done on gesture control for industrial

robots at the department of Information Technology & System Management in FH

Salzburg25. The task was prepared for positioning and picking different parts by

following the user hand and applying gesture control (Figure 25).

Figure 25. Control an industrial robot by hand using gesture control. (Screenshot from gesture

control for industrial manipulator intro in department of Information Technology & System Management in FH Salzburg 2014).

Material handling by gesture recognition

Many flows of materials and goods at factories and workshops take place manually. A

mobile machine can be useful, which is controlled by natural gestures, relieves the

workers of heavy loads, and transports them independently. Assistance system FiFi of

Karlsruhe Institute of Technology (KIT) aims for this purpose (Phys-engineering 2014).

FiFi is an assistance system developed to support man in direct environment in a

contactfree manner controlling (Figure 26). The mobile platform equipped with a

camera system is particularly suited for dynamic material flows at factories and

workshops. These flows require high flexibility and are usually executed by man. Typical

examples are high bay warehouses for car spare parts, consumer products of big online

traders or deliveries of goods between departments of big companies. Via a camera

system, the machine three-dimensionally acquires the gestures of the user and executes

his commands. For moving or switching into the different modes of operation, no

contact is required. It follows the user and may approach him up to an arm's length for

loading. When the user points to a line on the floor, it independently moves along the

line to the next station, where it is deloaded by the next user. A safety laser scanner

prevents it from colliding with objects or people and allows for safe operation. By a

gesture, a lifting system can be adjusted to various working heights.

25 fhsits Youtube Channel. 2014. Video: Control an Industrial Robot by Hand! - Gesture Control. [Available in:

https://www.youtube.com/watch?v=evSqu-d16Oo]



https://www.youtube.com/watch?v=evSqu-d16Oo


58

Figure 26. Mobile machine using gesture control for load carrying. (Figure from Phys engineering August 2014).

Jennifer by Lucas Systems26

Jennifer (available since 2012) is a voice picking system for mobile work in warehouses

(TRL 9). Workers can use the handy scanner to check on barcodes and receive voice

information as well as give commands about specific product (Figure 27). Afterwards,

the worker knows if that is the right location and also how many to pick from the

product to the basket. Workers may give voice commands to know whether chosen

place for moved product is right. System can inform user about other product details

such as expiration date.

Figure 27. Jennifer (available since 2012) is a voice picking system for mobile work in warehouses.

(Screenshot from Introduction to Voice Picking with Jennifer 2012).

26 Jennifer voice picking by Lucas Systems. [Available in: http://www.lucasware.com/jennifer-mobile/]




59

Hotel staffed by robots

A hotel staffed by robots will start working on July 2015 in Huis Ten Bosch, which is a

Japanese theme park. The two-story, 72-room Henn-na Hotel, which is slated to open

July 17, will be staffed by ten robots that will greet guests, carry their luggage and clean

their rooms. According to The Telegraph (Bridge 2015), the robots, created by robotics

company Kokoro, will be an especially humanoid model known as an "actroid". Actroid

robots (Figure 28) are generally based on young Japanese women, and they can speak

fluent Japanese, Chinese, Korean and English, as well as mimic body language and

human behaviors such as blinking and hand gestures. Three actroids will staff the front

desk, dealing with customers as they check in to the hotel. Four will act as porters,

carrying guests' luggage, while another group will focus on cleaning the hotel. The hotel

itself will also feature some high-tech amenities (Kaplan 2015), such as facial

recognition software that will allow guests to enter locked rooms without a key, and

room temperatures monitored by a panel that detects a guest's body heat.

Figure 28. Robots to serve guests in Japanese hotel. (Screenshot from washiungtonpost 2015).


60

5. Conclusions

Human-friendly interface design is crucial when aiming for efficient operations. Whether a system can be described being usable or not depends on four factors, namely anthropometrics, behavior, cognition and social factors. This report discussed about the user-centric design and the characteristics of human behavior and cognition that need to be taken into account when designing HMIs (human-machine interaction). As it was stated in this report, no generic design rules for usable HMI design can be given, because the usability always depends on three aspects: 1) The specific user and its characteristics; 2) The task that is being done with the designed HMI; and 3) The context and environment of use of the designed interface. However, several guidelines for human-friendly user interface design were reported.

While designing user interfaces, three selections need to be made. These include: 1)

Selection of the modality, which refers to the sensory channel that human uses to send

and receive a message (e.g. auditory, visual, touch); 2) Selection of the medium, which

refers to how the message is conveyed to the human (e.g. picture, diagram, video, alarm

sound); and 3) Selection of the technology to deliver the message (e.g. smart phone or

AR glasses). The multimodal interfaces, which use multiple different modalities (and

also media and technologies), are emerging. For example the augmented reality

interfaces, utilize often multiple modalities, such as vision, speech and touch, and is built

by combining multiple technologies, such as different visual displays, speech recognition

and haptic devices.

Several existing and emerging HMI technologies, including mobile devices, augmented

reality, as well as gesture and speech recognition technologies were introduced and

examples of their applications were given in this report. Even though the most common

user interface, at least in Finnish manufacturing environments, are still pen and paper, it

is believed that the transformation towards digitalization, for example implementation

of MES systems, will open doors for adoption of novel user interfaces on the factory

floor. However, when implementing these novel interface technologies, one has to

always consider the technology’s suitability for the specific task and context of use. Is

the fancy technology actually helping human to perform his/her task more efficiently, or

is it just fancy technology? Or is the complex colorful visualization eye-catching, but

doesn’t necessarily optimize the understanding of the specific task at hand? As stated by

Watzman and Re (2012):

”Good design does not needlessly draw attention to itself. It just works.”


61

References

Banerjee, A., Bommu, N., 2013. Design of Manufacturing Execution System for FMCG Industries.

International Journal of Engineering and Technology (IJET). Vol 5 No 3, ISSN : 0975-4024 pp.

2366

Cavalvanti, A.L.O., de Souza, A.J., Silva, D., Rocha, G., Filho, LF.S.L., 2009. Integrating Mobile

Devices and Industrial Automation through Web Services. 7th IEEE International Conference on

Industrial Informatics, pp. 173-176.

Courage, C., Jain, J. Redish, J. & Wixon, D. 2012. Task Analysis. In: Jacko, J.A. (Ed.). The Human-

Computer Interaction Handbook - Fundamentals, Evolving Technologies and Emerging

Applications. 3rd Edition. CRC Press. ISBN 978-1-4398-2944-8, pp. 956-982.

Danielis, P., Skodzik, J., Altmann, V., Schweissguth, E.B., Golatowski, F., Timmermann, D., Schacht,

J., 2014. Survey on real-time communication via ethernet in industrial automation environments.

IEEE Emerging Technology and Factory Automation (ETFA), PP. 1-8

Dilip, Soman; N-Marandi, Sara (2010). Managing Customer Value: One Stage at a Time. Singapore:

World Scientific. ISBN 9789812838285, p. 275.

Dumas, B., Lalanne, D. & Oviatt, S. 2009. Multimodal Interfaces: A Survey of Principles, Models

and Frameworks. In: Lalanne, D. & Kohlas, J. (Eds.): Human Machine Interaction, LNCS 5440, pp.

3-26, 2009. Springer-Verlag Berlin Heidelberg

Fitts, P.M. 1954. The information capacity of the human motor system in controlling the

amplitude of movement. J. Ecp. Psychol. 47:381-391.

Graham, M., Zook, M., and Boulton, A. Augmented reality in urban places: contested content and

the duplicity of code. Transactions of the Institute of British Geographers, DOI: 10.1111/j.1475-

5661.2012.00539.x 2012.

Grice, H. P. 1975. Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics

III: Speech acts. New York, NY: Academic Press.

Hedge, A. 2003. 10 principles to avoid XP-asperation. Ergonomics in Design, 11(3), pp. 4-9.

Hinckley, K. & Wigdor, D. 2012. Input Technologies and Techniques. In: Jacko, J.A. (Ed.). The

Human-Computer Interaction Handbook - Fundamentals, Evolving Technologies and Emerging


Hoggan, E. & Brewster, S. 2012. Nonspeech Auditory and Crossmodal Output. . In: Jacko, J.A. (Ed.).

The Human-Computer Interaction Handbook - Fundamentals, Evolving Technologies and

Emerging Applications. 3rd Edition. CRC Press. ISBN 978-1-4398-2944-8, pp. 211-235.

Jameson, A. & Gajos, K.Z. 2012. Systems That Adapt to Their Users. In: Jacko, J.A. (Ed.). The



Järvenpää, E., Lanz, M., Tokola, H., Salonen, T. Koho, M. 2015. Production planning and control in

Finnish manufacturing companies – Current state and challenges. Proceedings of the 25th


62

International Conference on Flexible Automation and Intelligent Manufacturing, FAIM2015, 23rd

– 26th June, 2015, Wolverhampton, UK. 8 p.

Karat, C.-M., Lai, J., Stewart, O. & Yankelovich, N. 2012. Speech and Language Interfaces,

Applications, and Technologies. In: Jacko, J.A. (Ed.). The Human-Computer Interaction Handbook

- Fundamentals, Evolving Technologies and Emerging Applications. 3rd Edition. CRC Press. ISBN

978-1-4398-2944-8, pp.367-386.

Knies, R. (February 21, 2011). Academics, Enthusiasts to Get Kinect SDK. [Accessed: 23.3.2015].

Kramer, G. 1994. An introduction to auditory display. In Auditory Display, ed. G. Kramer, 1–77.

Reading, MA: Addison-Wesley.

Nee, A.Y.C., Ong, S.K., Chryssolouris, G. & Mourtzis, D. 2012. Augmented reality applications in

design and manufacturing. CIRP Annals – Manufacturing Technology, Vol. 61, pp. 657-679.

Elsevier.

Norman, D.A. 1988. The psychology of everyday things. NY:Basic Books.

Orland, Kyle (February 21, 2011). News - Microsoft Announces Windows Kinect SDK For Spring

Release. Gamasutra. [Accessed: 23.3.2015].

Oviatt, S.L. 1997. Multimodal interactive maps: Designing for human performance. Human-

Computer Interaction 12, 93-129.

Payne, S.J. 2012. Mental Models in Human-Computer Interaction. In: Jacko, J.A. (Ed.). The Human-



Proctor, R.W. & Vu, K-P.I. 2012. Human Information Processing – An Overview for Human-

Computer Interaction. In: Jacko, J.A. (Ed.). The Human-Computer Interaction Handbook -

Fundamentals, Evolving Technologies and Emerging Applications. 3rd Edition. CRC Press. ISBN

978-1-4398-2944-8, pp. 21-40.

Ritter, F.E., Baxter, G.D. & Churchill, E.F. 2014. Foundations for designing user-centered systems -

What System Designers need to know about people. Springer. 442 p. ISBN 978-1-4471-5133-3.

Stevens, T., Kinect for Windows SDK beta launches, wants PC users to get a move on. Web article.

[Available in: http://www.engadget.com/2011/06/16/microsoft-launches-kinect-for-windows-

sdk-beta-wants-pc-users-t/] [Accessed: 23.3.2015]

Szalma, J.L., Hancock, G.M. & Hancock, P.A. 2012. Task Loading and Stress in Human-Computer

Interaction – Theoretical Frameworks and Mitigation Strategies. In: Jacko, J.A. (Ed.). The Human-



Smith, P.J., Beatty, R., Hayes, C.C., Larson, A., Geddes, N.D. & Dorneich, M.C. 2012. Human-Centered

Design of Decision-Support Systems. In: Jacko, J.A. (Ed.). The Human-Computer Interaction

Handbook - Fundamentals, Evolving Technologies and Emerging Applications. 3rd Edition. CRC

Press. ISBN 978-1-4398-2944-8, pp. 589-621.

Schlick, C.M., Winkelholz, C., Ziefle, M. & Mertens, A. 2012. Visual Displays. In: Jacko, J.A. (Ed.). The


Applications. 3rd Edition. CRC Press. ISBN 978-1-4398-2944-8, pp. 157.-191.


63

Sutcliffe, A. 2012. Multimedia User Interface Design. In: Jacko, J.A. (Ed.). The Human-Computer

Interaction Handbook - Fundamentals, Evolving Technologies and Emerging Applications. 3rd

Edition. CRC Press. ISBN 978-1-4398-2944-8, pp. 387-404.

Watzman, S. & Re, M. 2012. Visual Design Principles for Usable Interfaces - Everything is

Designed: Why We Should Think before Doing. In: Jacko, J.A. (Ed.). The Human-Computer

Interaction Handbook - Fundamentals, Evolving Technologies and Emerging Applications. 3rd

Edition. CRC Press. ISBN 978-1-4398-2944-8, pp. 315-340.

Welsh, T.N., Chandrasekharan, S., Ray, M., Neyedli, H., Chua, R. & Weeks, D.J. 2012. Perceptual-

Motor Interaction – Some Implications for Human-Computer Interaction. In: Jacko, J.A. (Ed.). The



Wickens, C.D., Lee, J.D., Liu, Y. & Gordon Becker, S.E. 2004. An Introduction to Human Factors

Engineering. Second ed. Upper Saddle River, NJ: Pearson Prentice Hall.

Yamaji, M., Ishii, Y., Shimamura, T., & Yamamoto, S., 2008. International Conference on Wireless

Sensor Network for Industrial Automation, Networked Sensing Systems, IEEE, pp. 253.

Web sources

Audition for Robots with Kyoto University (HARK). [Available in:

http://www.hark.jp/][Accessed: 23.3.2015].

Bridge, A., Robots to serve guests in Japanese hotel. February 2015 [Available in:

http://www.washingtonpost.com/news/morning-mix/wp/2015/02/06/futuristic-japanese-

hotel-will-be-run-almost-entirely-by-robots/] [Accessed: 26.3.2015].

CadRelations Youtube Channel. 2014. Video: HMI 2014: 3Dconnexion, - programing industry

robots gets easier. [Available in: https://www.youtube.com/watch?v=oIbXW3BVaAI] [Accessed:

23.3.2015].

Canon Mixed Reality (MREAL) headset [Available in:

http://usa.canon.com/cusa/office/standard_display/Mixed_Reality_Overview] [Accessed:

23.3.2015].

Chalmers. Operator of the Future. [Available in: http://www.chalmers.se/hosted/frop-en]

[Accessed: 26.3.2015].

Cult of android. Google Glass User Gets A Ticket For ‘Driving With Monitor Visible To Driver’.

2013. [Available in: http://www.cultofandroid.com/43993/google-glass-user-gets-a-ticket-for-

driving-with-monitor-visible-to-driver/] [Accessed: 29.3.2015].

Delta Sygni Labs AstroVAR product [Available in: http://deltacygnilabs.com] [Accessed:

23.3.2015].

Digital Trends October 2014. [Available in: http://www.digitaltrends.com/cool-tech/dexmo-

exoskeleton-glove-lets-feel-virtual-objects-hand/] [Accessed: 26.3.2015].

EC Horizon 2020 technology readiness level (TRL). [Available in:

http://ec.europa.eu/research/participants/data/ref/h2020/wp/2014_2015/annexes/h2020-

wp1415-annex-g-trl_en.pdf] [Accessed: 10.2.2015].

http://www.hark.jp/

http://www.washingtonpost.com/news/morning-mix/wp/2015/02/06/futuristic-japanese-hotel-will-be-run-almost-entirely-by-robots/

http://www.washingtonpost.com/news/morning-mix/wp/2015/02/06/futuristic-japanese-hotel-will-be-run-almost-entirely-by-robots/



http://usa.canon.com/cusa/office/standard_display/Mixed_Reality_Overview



http://www.cultofandroid.com/43993/google-glass-user-gets-a-ticket-for-driving-with-monitor-visible-to-driver/

http://www.cultofandroid.com/43993/google-glass-user-gets-a-ticket-for-driving-with-monitor-visible-to-driver/

http://deltacygnilabs.com/

http://www.digitaltrends.com/cool-tech/dexmo-exoskeleton-glove-lets-feel-virtual-objects-hand/

http://www.digitaltrends.com/cool-tech/dexmo-exoskeleton-glove-lets-feel-virtual-objects-hand/

http://ec.europa.eu/research/participants/data/ref/h2020/wp/2014_2015/annexes/h2020-wp1415-annex-g-trl_en.pdf

http://ec.europa.eu/research/participants/data/ref/h2020/wp/2014_2015/annexes/h2020-wp1415-annex-g-trl_en.pdf


64

Elsevier-promo 2015. Google Glass Animation. [Available in: http://www.elsevier-

promo.com/glasses/animation.html] [Accessed: 29.3.2015].

Eyetap research project. [Available in: http://www.eyetap.org/][Accessed: 23.3.2015].

Glass Help 2015. [Available in: https://www.google.com/glass/help] [Accessed: 29.3.2015].

fhsits Youtube Channel. 2014. Video: Control an Industrial Robot by Hand! - Gesture Control.

[Available in: https://www.youtube.com/watch?v=evSqu-d16Oo] [Accessed: 23.3.2015].

Glassware Apps Online Page [Available in: https://glass.google.com/u/0/glassware] [Accessed:

23.3.2015].

IFTTT. [Available in: https://ifttt.com] [Accessed: 23.3.2015].

Innowera Mobile 2013. [Available in: http://innowera.com/web-and-mobile-server-for-sap.php]

[Accessed: 23.3.2015].

IQMS Mobile ERP Apps for Manufacturing Companies 2015. [Available in:

http://www.iqms.com/products/mobile-erp-software.html] [Accessed: 23.3.2015].

IQMS Mobility in the Manufacturing Workplace 2011. [Available in:

http://www.iqms.com/products/brochures/Mobility_in_the_Manufacturing_Workplace.pdf]

[Accessed 22.2.2015].

Irisys People Counting 2015. [Available in: http://www.irisys.net/people-counting] [Accessed:

23.3.2015].

Jennifer voice picking by Lucas Systems. [Available in: http://www.lucasware.com/jennifer-

mobile/][Accessed: 23.3.2015].

Kroger Co’s QueVision for Traffic Control 2015. [Available in:

http://ir.kroger.com/Mobile/file.aspx?IID=4004136&FID=22999227] [Accessed: 23.3.2015].

Kaplan, S., Futuristic Japanese hotel will be run almost entirely by robots. February 2015

[Available in: http://www.telegraph.co.uk/travel/destinations/asia/japan/11387330/Robots-

to-serve-guests-in-Japanese-hotel.html] [Accessed: 26.3.2015].

Kinect for Windows Sensor Components and Specifications 2015. [Available in:

https://msdn.microsoft.com/en-us/library/jj131033.aspx] [Accessed: 23.3.2015].

Kinect Windows Team. Web article. [Available in:

http://blogs.msdn.com/b/kinectforwindows/archive/2012/01/09/kinect-for-windows-

commercial-program-announced.aspx] [Accessed: 23.3.2015].

MESA. MES Explained: A High Level Vision. [Available in: http://www.mesa.org/] [Accessed

05.03.2015].

Microsoft 2015. Microsoft HoloLens. [Available in:http://www.microsoft.com/microsoft-

hololens/en-us] [Accessed: 23.3.2015].

Mimic Technologies Inc. White Paper. 2003. [Available in: http://goo.gl/gkz3aS] [Accessed

25.03.2015].

http://www.elsevier-promo.com/glasses/animation.html

http://www.elsevier-promo.com/glasses/animation.html

https://www.google.com/glass/help



https://glass.google.com/u/0/glassware

https://ifttt.com/

http://innowera.com/web-and-mobile-server-for-sap.php



http://www.iqms.com/products/mobile-erp-software.html



http://www.iqms.com/products/brochures/Mobility_in_the_Manufacturing_Workplace.pdf

http://www.irisys.net/people-counting



http://www.lucasware.com/jennifer-mobile/

http://ir.kroger.com/Mobile/file.aspx?IID=4004136&FID=22999227

http://www.telegraph.co.uk/travel/destinations/asia/japan/11387330/Robots-to-serve-guests-in-Japanese-hotel.html

http://www.telegraph.co.uk/travel/destinations/asia/japan/11387330/Robots-to-serve-guests-in-Japanese-hotel.html


http://blogs.msdn.com/b/kinectforwindows/archive/2012/01/09/kinect-for-windows-commercial-program-announced.aspx

http://blogs.msdn.com/b/kinectforwindows/archive/2012/01/09/kinect-for-windows-commercial-program-announced.aspx








65

Molen, Brad (2012-01-14). Samsung Gear 2 smartwatches coming in April with Tizen OS.,

[Available in: http://www.engadget.com/2014/02/22/samsung-gear-2/] [Accessed: 23.3.2015].

Motion Node Channel 2013. [Available in: www.youtube.com/watch?v=5fOxM0uxTDo]

[Accessed: 29.3.2015].

Moran, M., Improving Manufacturing Performance with MES Mobile Applications. SpenTech. June

2013 [Available in: http://www.aspentech.com/] [Accessed 05.03.2015].

Motorola ET1 Enterprise Tablet 2015. [Available in: http://www.motorolasolutions.com/US-

EN/Business+Product+and+Services/Tablets/ET1+Enterprise+Tablet] [Accessed: 23.3.2015].

Nielsen J. 1995. 10 Usability Heuristics for User Interface Design. Web article. [Available in:

http://www.nngroup.com/articles/ten-usability-heuristics/] [Accessed: 2.3.2015].

Operator of the Future by Chalmers 2015. [Available in: http://www.chalmers.se/hosted/frop-

en] [Accessed: 23.3.2015].

Phys engineering. Gesture-controlled, autonomous vehicles may be valuable helpers in logistics

and trans-shipment centers. August 2014. [Available in: http://phys.org/news/2014-08-gesture-

controlled-autonomous-vehicles-valuable-helpers.html] [Accessed: 29.3.2015].

Pro-face Remote HMI 2015. [Available in: http://www.profaceamerica.com/en-

US/content/remote-hmi] [Accessed: 23.3.2015].

Recommendations for implementing the strategic initiative INDUSTRIE 4.0. 2013 p. 23.

[Available in: http://goo.gl/9vka6d] [Accessed: 24.2.2015].

SAP. Augmented Reality Apps. [Available in:

http://www.sap.com/pc/tech/mobile/software/lob-apps/augmented-reality-apps/index.html]

[Read 18.2.2015].

SAP Enterprise Mobile. 2013. Video: SAP & Vuzix Bring you Augmented Reality Solutions for the Enterprise. [Available in: https://www.youtube.com/watch?v=9Wv9k_ssLcI] [Accessed 18.2.2015].

SAP. 2014. Video: SAP and Vuzix bring you the future of Field Service. [Available in:

https://www.youtube.com/watch?v=UlpGDrSmg38] [Accessed 18.2.2015].

SHADOW motion capture system online page. [Available in:

http://www.motionshadow.com/][Accessed: 23.3.2015].

Simulo Engineering. AR industrial Applications. [Available in: http://www.simulo.it/] [Accessed:

23.3.2015].

SmartTag. ST Media Group International 2013 [Available in: http://vmsd.com/content/smarttag]

[Accessed: 29.3.2015].

Surface Pro Youtube Channel. 2014. Video: Cheer Pack North America gains efficiency with

Surface on the factory floor. [Available in: https://www.youtube.com/watch?v=EFdYqhIezig]

[Accessed: 23.3.2015].

Techlife 2013; How does Google Glass work?. [Available in:

http://www.techlife.net/2013/07/how-does-google-glass-work.html] [Accessed: 29.3.2015].

http://www.engadget.com/2014/02/22/samsung-gear-2/

http://www.youtube.com/watch?v=5fOxM0uxTDo





http://www.motorolasolutions.com/US-EN/Business+Product+and+Services/Tablets/ET1+Enterprise+Tablet

http://www.nngroup.com/articles/ten-usability-heuristics/



http://www.chalmers.se/hosted/frop-en

http://phys.org/news/2014-08-gesture-controlled-autonomous-vehicles-valuable-helpers.html

http://phys.org/news/2014-08-gesture-controlled-autonomous-vehicles-valuable-helpers.html



http://www.profaceamerica.com/en-US/content/remote-hmi

http://goo.gl/9vka6d

http://www.sap.com/pc/tech/mobile/software/lob-apps/augmented-reality-apps/index.html

https://www.youtube.com/watch?v=9Wv9k_ssLcI

https://www.youtube.com/watch?v=UlpGDrSmg38

http://www.motionshadow.com/

http://www.simulo.it/

http://vmsd.com/content/smarttag



http://www.techlife.net/2013/07/how-does-google-glass-work.html


66

Thalmic Labs MYO gesture control armband 2014. [Available in:

https://www.thalmic.com/en/myo/][Accessed: 23.3.2015].

Trew, James. (2013-10-26) "Sony SmartWatch 2 review". [Available

in:http://www.engadget.com/2013/10/26/sony-smartwatch-2-review] [Accessed: 23.3.2015].

Willow garage ROS video 2010. [Available in: http://wiki.ros.org/hark] [Accessed: 26.3.2015].

https://www.thalmic.com/en/myo/

http://www.engadget.com/2013/10/26/sony-smartwatch-2-review

http://wiki.ros.org/hark

leanmes: human-machine interaction review - tut · pdf fileleanmes: human-machine interaction...

Documents