measuring the impact of multimodal behavioural feedback loops … · 2017-01-11 · finally, an...

Measuring the Impact of Multimodal Behavioural FeedbackLoops on Social Interactions

Ionut DamianHuman Centered Multimedia

Augsburg UniversityAugsburg, Germany

[email protected]

Tobias BaurHuman Centered Multimedia

Augsburg UniversityAugsburg, [email protected]

Elisabeth AndréHuman Centered Multimedia

Augsburg UniversityAugsburg, Germany

[email protected]

ABSTRACTIn this paper we explore the concept of automatic behaviouralfeedback loops during social interactions. Behavioural feed-back loops (BFL) are rapid processes which analyse the be-haviour of the user in realtime and provide the user with livefeedback on how to improve the behaviour quality. In thiscontext, we implemented an open source software frameworkfor designing, creating and executing BFL on Android pow-ered mobile devices. To get a better understanding of theeffects of BFL on face-to-face social interactions, we con-ducted a user study and compared between four differentBFL types spanning three modalities: tactile, auditory andvisual. For the study, the BFL have been designed to im-prove the users’ perception of their speaking time in an ef-fort to create more balanced group discussions. The studyyielded valuable insights into the impact of BFL on conver-sations and how humans react to such systems.

CCS Concepts•Applied computing → Computer-assisted instruction;•Human-centered computing → HCI theory, conceptsand models; Ubiquitous computing;

Keywordsfeedback loop; behaviour analysis; live feedback; social sig-nal processing; realtime; social augmentation

1. INTRODUCTIONWith the arrival of new technologies, human-to-human

communication has never been easier. We can poke, tweet,share and like with a single tap on the screen of our smart-phone and, in doing so, interact with hundreds, if not thou-sands of other persons in an instant. However, in spite ofthese changes to our social life (or, perhaps, because of),a large part of the population is still experiencing fear offace-to-face social interactions [15]. This is especially truefor key-life-situations such as public speaking where recent

FeedbackGeneration

BehaviourAnalysis

Feed

back

Sensors

Figure 1: The behavioural feedback loop featuringits two main components: behaviour analysis andfeedback generation.

estimates suggest that as much as 75% of the population ex-periences some form of anxiety or nervousness [7]. In suchsituations we often wish someone (or something) would giveus tips on how to behave or what to say.

Automated realtime behavioural feedback loops (BFL) of-fer great promise in such scenarios by assisting the user inimproving the quality of her or his behaviour and achievemore beneficial social interaction outcomes [8]. A BFL firstanalyses the behaviour of the user in realtime and then usesit to generate and deliver feedback to the user on how toimprove its quality (see Figure 1). This is different fromcommon behavioural training approaches which usually in-volve external persons (i.e coaches) providing feedback orreinforcement. The feedback loop itself is not a new mech-anism, humans have always used environmental stimuli toadjust and adapt their behaviour in order to locate food,avoid danger or find a mating partner. However, only therecent boom in miniaturized sensing hardware, made theautomation of this process possible.

Previous implementation of BFL, while preciously few,have already demonstrated it to be a viable mechanism forassisting users during critical situations such as public speak-ing [6, 16] and job interviews [5]. However, since BFL aremeant to continuously augment the user’s social interac-tions, it is only natural to ask how well can this process in-tegrate within face-to-face interaction practices. Such socialinteractions have been mostly unaffected by the introductionof ever more diverse and ubiquitous technologies and an evermore intrusive media channel: We still look each other in theeyes and communicate using gestures and spoken language.Diverting from this archaic set of rules (e.g. texting while

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] is held by the owner/author(s). Publication rights licensed to ACM.

ICMI’16, November 12–16, 2016, Tokyo, JapanACM. 978-1-4503-4556-9/16/11...$15.00http://dx.doi.org/10.1145/2993148.2993174

201

talking, being a “glasshole”) is usually considered rude andimpolite. Thus, one of the greatest challenges for BFL liesin minimizing the distraction effect of the technology and toseamlessly blend into the interaction.

To investigate this issue, we conducted an exploratoryuser study with with 54 participants which were engagedin a group discussion. During the discussion, each partic-ipant was supported by a BFL which assisted the user incontrolling her or his speaking time. The BFL’s main aimwas to improve the balance and thus overall quality of thegroup discussion. During the study, we compared the ef-fectiveness and disruptiveness of four different BFL types:auditory, tactile, visual (head-mounted) and visual (remotedisplay). The BFL have been implemented using a custombuilt framework for designing and running BFL. At its core,the framework is composed out of two components. First,a behaviour analysis tool named SSJ which is capable ofanalysing and classifying social signals over multiple chan-nels in realtime. Secondly, a feedback manager which is ableto deliver multimodal feedback with the help of various out-put devices such as the Google Glass, the Myo armband orbone-conductance headphones.

Before presenting the implementation of BFL and the userstudy, we will first take a look at related work as well asdiscuss the general concept of BFL and what impact it canhave on the user’s attention. Finally, we will discuss theresults of the user study and conclude the paper by rehashingthe main contributions and providing an outlook.

2. RELATED WORKFrom a conceptual point of view, BFL share the goal of

more common computer-assisted training mechanisms: helpthe user improve their behaviour. However, whereas in mostsuch systems, this improvement happens during dedicatedtraining sessions prior to the actual interaction (a prioritraining) or afterwards (a posteriori training), a BFL at-tempts to enhance the user’s skills during the actual event(e.g. a real job interview or when actually speaking in pub-lic) with the help of live feedback. To this end, it is neithera priori nor a posteriori training. This makes the disruptioneffect much more critical for BFL. During a training session,it is not a priority for the system to be comfortable or forthe feedback to not distract, as training sessions are meantto be repeated multiple times and the user can learn to ig-nore such shortcomings. However, for instance, when usinga BFL during an real job interview, if the feedback is evenslightly distracting, it may negatively impact the outcomeof the interview and the chances of getting a job.

Our literature survey revealed relatively few examples ofsuch systems. Works such as Schneider and colleagues’ Pre-sentation Trainer [11], Barmaki and Hughes’s TeachLivE [2]and van der Linden’s MusicJacket [17], despite being theo-retically capable of providing live feedback, have been de-signed with a priori training in mind and little attention hasbeen given to the distraction effect of the systems.

A notable exception is the system introduced by Tanveerand colleagues [16]. Using the Google Glass, they analysethe audio signal of the user for speech rate and loudness, anddisplay corrective visual feedback to the user in realtime onthe Glass’s display. The aim of the system is to improvevoice modulation and reduce monotonicity while speakingin public. Unfortunately, the system’s evaluation yieldedambiguous results both in terms of performance in alter-

Perception

Action ReflectionRelevance, Consequences

Feedback

Figure 2: The three main phases of a feedback loopfrom the user’s point of view.

ing the users’ behaviours and distraction. More precisely,their analysis of the ratings of Mechanical Turk workers didnot yield any significant effects in perceived behaviour qual-ity or distraction. However, their self-report questionnairedata revealed that users preferred concrete what-to-do ver-bal instructions over a more abstract analytical graph-likefeedback in terms of helpfulness, understandability and dis-traction.

A similar system proposed by Damian and colleagues [6],is able to deliver realtime feedback for public speaking sce-narios using the visual or tactile [4] modality. However, de-spite promising initial results in terms of effect on behaviourquality, their evaluation is not exhaustive. Similarly to thework by Tanveer et al, they only measured the distractioneffects using subjective questionnaires and semi-structuredinterviews. Although their visual feedback method differedfrom the ones implemented by Tanveer et al., they also re-port a high perceived helpfulness and a low degree of per-ceived distraction from the primary task. However, a directcomparison between different feedback methods is missing.

A more objective measurement of the distraction effect oflive feedback was put forward by Ofek and colleagues [10].In a user study they investigated what information deliv-ery mode is better suited for providing users with secondaryinformation during a human-to-human conversation. Theytested three delivery modes, two visual and one auditory.Their results suggest that the audio modality is the mostdisruptive and least effective, and that delivering informa-tion visually in batches (rather than one at a time) is theleast disruptive and most effective.

In this paper we aim to further explore the concept of au-tomatic behavioural feedback loops and, for the first time,compare between different feedback methods spanning mul-tiple modalities. We will place our focus on measuring thedistraction effect of the feedback loop on the user, the user’sinterlocutors and the social interaction itself.

3. BEHAVIOURAL FEEDBACK LOOPSThe feedback loop lies at the core of various psychological

models, such as observational learning [12], operant condi-tioning [14] or social cognitive theory [1]. A feedback loop in-volves three main phases: perception, reflection and action.In the perception phase, the organism acquires informationfrom an internal or external source. This information is thenprocessed to evaluate its relevance, as well as any possibleconsequences this information may have for the organism.

202

Finally, an action is executed to place the organism in amore desireable state. This progression is illustrated in Fig-ure 2

One very successful example of such automated feedbackloops are dynamic traffic speed displays. These technology-enhanced traffic signs include a radar and a display to givedrivers realtime feedback on their speed. They were firstused in Garden Grove, California out of frustration over thepoor success rate of conventional methods to change drivingbehaviour near schools. To the surprise of everyone, despitetheir simplicity and redundancy (they provide the driverswith information they already know), average speeds de-creased by roughly 10%. In this example, in the perceptionphase, the driver sees the display and extracts the presentedinformation. The driver then analyses the information, rec-ognizes it refers to her or his behaviour, therefore deemingit relevant, and then remembers the consequences the infor-mation can have for her or him (e.g. possible accident orspeeding ticket). Finally, the driver executes an action (re-duces the speed of the vehicle) to change to a more desirablestate (safety).

In our case, we look at behavioural feedback loops (BFL)relative to their power to change social behaviour by increas-ing the user’s awareness of her or his own behaviour, pro-viding information on the quality of the current behaviourand clarifying what actions are needed to shift to a moredesirable behavioural state. To achieve these effects, we pro-pose a two-step pipeline (see Figure 1) where the behaviourof the user is first analysed in realtime and than, based onthe user’s current behaviour quality, feedback describing thecurrent behavioural state is automatically generated and de-livered to the user. The feedback should be subtle enoughto avoid disrupting the social interaction itself, but detailedenough to allow the user to evaluate its relevance and cor-rectly extract the possible consequences.

Form a psychological point of view, a BFL represents asecondary cognitive task which is executed in parallel to aprimary task, i.e. the social interaction itself. The qualityof the executions of the primary and secondary tasks is gov-erned by how attention is distributed among them. Thus,we can say that the BFL is distracting the user and disrupt-ing the social interaction if the execution of the feedbackperception task draws too much attention from the primarytask.

According to distributive attention models [9, 19], taskscan be carried out in parallel without quality degradationas long as enough processing resources are available. Thus,in order to reduce the amount of distraction caused by thebehavioural feedback loop, the feedback delivery mechanismneeds to be economical with its consumption of resources.More specifically, amount of consumed resources, frequencyof processing and resource overlap with the primary taskneed to be minimized. One way to achieve this, is to deliversimple feedback with a low frequency using modalities whichare not in use by the primary task. For example, in the caseof social interactions which rely heavily on visual and audiotasks, tangible feedback might be less disruptive.

4. BFL FRAMEWORKWe implemented an Android-based software framework

which allows the design, creation and execution of BFL. Itconsists of two main components: behaviour analysis andfeedback management.

4.1 Behaviour AnalysisThe first phase of a BFL is analysing the user’s behaviour.

To this end, we implemented SSJ, an open source java li-brary for recording, analysing and classifying social signalsin realtime specifically designed for the Android platform.SSJ is able to run on virtually all devices in the Androidecosystem including common smartphones, smart glasses(e.g. Google Glass, Lumus DK-40, Epson Moverio) andsmart watches, allowing an unprecedented flexibility andmobility for doing social signal processing.

SSJ has been inspired by Wagner and colleagues’ SocialSignal Interpretation Framework [18] for windows, and thusburrows several design principles from it. More specifically,SSJ is build around a very modular design, with each pro-cessing step being encapsulated in a distinct component.There are three major types of components: sensors, trans-formers and consumers.

A sensor handles the connection and the data retrievalfrom various sensor hardware. Currently, SSJ supports sev-eral sensing hardware including standard Android inertialmeasurement units (acceleration, gyroscope and magnetome-ter), gps, standard Android cameras and microphones, theempatica armband, the myo armband and the microsoftband 2. Transformers are signal processing steps whichtransform the incoming data stream. Some examples oftransformers currently in SSJ are: audio pitch, audio loud-ness, movement energy, physiological arousal, moving aver-age and butfilter. For the audio processing transformers,SSJ makes use of TarsosDSP [13] and PRAAT [3]. The con-sumers are end-of-signal components, often entrusted witheither signal classification or communication tasks.

SSJ applications are called pipelines and are formed out ofa sequence of components through which data streams flowfrom component to component. Whereas the general princi-ple of a pipeline is to maintain a constant data flow from leftto right, parallel data streams (e.g. generated by multiplesensors) are also possible. In this case, all data streams aresynchronized to a central clock. This way, even if a sensormisses frames or is not able to maintain the required samplerate, the data streams will still be consistent and compa-rable. For example, if a BFL requires both a heart ratesensor and an accelerometer to determine whether the useris stressed, SSJ will maintain the data streams coming fromthese two sensors synchronized so that any event recordedby the heart rate sensor can be matched to an event fromthe accelerometer. Furthermore, SSJ also allows multiplepipelines on different devices to be synchronized over thenetwork. For instance, data recorded by the smart glasseson the user’s head is synchronized with the data recordedby a smart watch.

Despite its all-java core, SSJ is able to run complex signalprocessing pipelines in realtime at high update rates (e.g.60 Hz) and thus ensures minimal output latency. SSJ isalready being used in multiple research projects and is freelyavailable for download1.

4.2 Feedback ManagementThe delivery of the feedback to the user is handled by the

feedback manager. It receives behaviour events from SSJand based on a predefined feedback strategy, delivers feed-back to the user with the help of various output devices.

1http://hcmlab.github.io/ssj/

203

http://hcmlab.github.io/ssj/

Similarly to SSJ, the feedback manager is also tailored tothe Android platform and is thus able to work with vari-ous output devices including smartphones, smart glasses orsmart watches.

The feedback manager is able to deliver both unimodaland multimodal feedback over three modalities: visual, au-ditory and tactile. Visual feedback can be rendered onsmartphones, tablets or on head-mounted displays such asthe Google Glass and takes the form of functional iconswhich can change in shape and colour. The tactile feed-back consists of vibrations delivered though the Myo arm-band whereas the audio feedback uses audio files to playback sounds.

The feedback manager models a feedback event as a re-action to a behavioural cue according to the current feed-back strategy. The strategy is defined using a simple xmlfile, allowing easy adjustments of the feedback mechanismswithout requiring to rebuild the app. A sample feedbackstrategy is illustrated below:

<feedback><class type="tactile"><behaviour name="SpeechDuration" sum="true"

ev="VoiceActivity" sender="SSJ"/><event from="90" to="999" duration="500"

intensity="70" lock="10000" multiplier="1.2"/></class></feedback>

In this example, the manager will trigger a tactile event(i.e. vibration) with a duration of 500 ms and an intensityof 70, when the total speech duration of the user exceeds 90seconds. A feedback strategy can also be configured to trackthe user’s response to a feedback event and, if necessary,adapt. In the example above, the feedback is configured toincrease in intensity every 10 seconds if the user continuesto speak.

The architecture of the component is modular and easilyexpandable to support new output devices or new behaviourcues. Furthermore, similar to SSJ, the feedback manager isalso open source and freely available for download2.

5. EFFECTS OF BFL ON SOCIAL INTER-ACTIONS

To get a better grasp on what impact a BFL can have onhuman-human interactions, we conducted a user study with54 participants. For this, we designed and implemented anexemplary BFL aimed at improving the balance of group dis-cussions by providing feedback to the participants regardingtheir speaking duration. The BFL was deliberately cho-sen to be simple in an effort to reduce the impact of per-sonal variations in feedback interpretation and behaviouradaptation. To main goal of the study was to investigatethe degree to which the BFL disrupts the social interactionboth in terms of objective paralinguistic statistics and sub-jective questionnaire-based measurement. Furthermore, weexplored the impact of not only the presence of the BFL,but also that of four different feedback delivery methods:auditory, tactile, visual (head-mounted) and visual (remotedisplay). To this end, we formulate the following researchquestions:

2http://hcm-lab.de/logue/

RQ1. Does the BFL improve the social interaction? Howdo the feedback methods compare?RQ2. Does the BFL disturb the user? How do the feedbackmethods compare?RQ3. Does the BFL disturb the other interlocutors? Howdo the feedback methods compare?

5.1 ProcedureThe user study featured a 4x2 mixed design. Table 1 sum-

marizes the study design. Each evaluation session consistedof two group discussions between four participants, held onthe same day back to back. During both discussions, eachparticipant was given one of four output devices. Each par-ticipant from one group was given a different device whichwas chosen by the experimenter prior to the session andaccounted for sensory-impairments (e.g. color-blind personswere not given visual feedback devices).

During the first discussion, i.e. the control condition, theBFL were inactive. The feedback loops were activated beforethe start of the second group discussion, i.e. the experimen-tal condition. The participants were told that the BFL wasactive either in the first or the second discussion, based onchance. The group discussions were mostly non-moderated.The experimenter intervened only at the beginning of eachdiscussion round, to propose a conversation topic (chosenbased on the interests of the participants), as well as in thecase the discussion ended prematurely, in which case theexperimenter would propose a new topic. Each group dis-cussion was designed to last 10 minutes. If the duration ofthe discussion exceeded 10 minutes, the experimenter wouldagain intervene and stop it.

During each discussion round, each participant had threetasks. First, she or he was asked to participate in the discus-sion in a normal fashion while attempting to keep the dis-cussion balanced, i.e. all participants should talk for roughlytwo minutes. Secondly, they were told that the devices willhelp them maintain this balance by giving them feedback onhow much time they had left to talk. Thirdly, similarly tothe study design of Ofek and colleagues [10], each participantwas given a wireless clicker which they had to press everytime they noticed one of their peers received feedback whiletalking. From a psychology point of view, the first task,the discussion, represents the primary task. The secondarytask is either the feedback task (when the participant wastalking) or the clicker task (when not talking). To achievethis, feedback was only delivered when the participant wastalking and the clicker was only allowed to be used when nottalking. Prior to the first discussion round, each participantreceived an extensive introduction on the study tasks andthe function of the output device.

5.2 MeasuresAfter each session, the participants filled out a question-

naire with items on the perceived disturbance of their ownfeedback device as well the devices of the others. All itemsconsisted of a 7-point Likert scale where 1 was labelled as “Ido not agree” and 7 as “I agree fully”. In total, we analysed13 items, with 4-5 items for each research question (Qx-y =item y of RQx). The Table in Figure 5 list all the items ofthe questionnaire.

In addition to the questionnaires, video and audio datahas been recorded for each session. For this, a webcam wasinstalled in the room and each participant wore a close-talk

204

http://hcm-lab.de/logue/

AD

CB

Figure 3: Setup of user study showing four participants, each wearing an output device: Myo armband (A),Aftershokz Bluez 2S bone conduction headphones (B), Google Glass (C), Micosoft Surface 2 Pro (D).

Table 1: Participant distribution across conditionsand devices

between-subjecttactile audio hmd monitor

within- CC 13 13 14 14subject EC 13 13 14 14

microphone. To aim of this data, especially the high qual-ity audio recordings, was to allow us to analyse whether thebehaviour of the participants changed between conditions.We decided against installing any more sensors to avoid ar-tificially altering the behaviour of the participants and keepthe interaction as natural as possible.

5.3 ParticipantsA total of 54 persons (mean age = 24.6, M/F = 40/14),

split over 14 sessions, participated in the user study. Dueto last-minute drop-outs, two sessions had to be carried outwith only 3 participants. The gender distribution acrosssessions was as follows: MMMM = 4, MMMF = 7, MMFF= 1, MFF = 2. The distribution of participants over theconditions is detailed in Table 1. Most participants werestudents of either computer science, physics, mathematics,law or economics. Users were compensated with snacks andan online shop gift card raffle.

The participants were recruited two to three weeks priorto the evaluation sessions from various sources and, in mostcases, had no prior connections between each other. Therecruitment consisted of an online questionnaire which in-cluded items on demographics, session scheduling and inter-ests. The data gathered from the recruitment questionnairewas used to create balanced discussion groups in terms ofinterests and gender.

5.4 ApparatusFor the evaluation study, we used four output devices:

Myo Armband (tactile feedback, Figure 3-A), AftershokzBluez 2S bone conduction headphones (auditory feedback,Figure 3-B), Google Glass (head-mounted visual feedback,Figure 3-C) and Micosoft Surface 2 Pro (remote visual feed-back, Figure 3-D). We built a BFL for each device. EachBFL tracked the amount of time a user has spoken usinga head-worn close-talk microphone and a simple signal-to-

noise ratio-based voice activity detector. Based on this in-formation, feedback containing the remaining speaking timewas delivered to the user. The total speaking time for eachparticipant per discussion was set at 120 seconds.

To keep the feedback consistent across the devices, feed-back was delivered at the same intervals for every device.More precisely, feedback was delivered once the user reached75%, 50%, 25% and 10% remaining speaking time. In thecase of the auditory and the tactile devices, the feedback wasdiscrete, i.e. it was only delivered at the before-mentionedintervals. For the visual devices, the feedback was providedcontinuously while the user was talking, however, it was onlyupdated at the specified intervals.

The tactile feedback was generated by the Myo armbandand consisted various vibrations patterns delivered to theuser’s forearm. More precisely, one short vibration was de-livered at 75% remaining speaking time, two short vibrationsat 50%, three short vibrations at 25% and one long vibrationat 10%. If the user passed 0%, the Myo would deliver onelong vibration every time the user started an utterance.

To provide auditory feedback, we used a pair of bone con-ductance headphones. The largest advantage of these head-phones is their ability to propagate sound waves throughthe skull without disrupting the normal hearing process, andthus allowing the user to participate normally in the conver-sation. The auditory feedback was verbal for the 75%, 50%,25% and 10% marks and consisted of a simple read-out ofthe remaining speaking time. Once the remaining speakingtime was close to zero, the verbal feedback is replaced by abeeping sound which continued after the time has elapsed ifthe user proceeded to speak.

For both visual devices, the feedback was delivered usingsimple colour and text coding. More precisely, a feedbackevent consisted of the remaining time displayed in text formin the middle of the screen. The text was displayed on acoloured background which varied from green to red on aYUV spectrum according to the remaining speaking time.In order to only deliver the visual feedback to the user whenshe or he was talking (see Section 5.1), the display was onlyturned on when the user was speaking. The participantswere positioned in a way which permitted only one user wasto see the display of the tablet. For the Google Glass, theoutside of display was covered using thick paper.

205

240

60

120

0

180

1 1312111098765432 14session no.

dura

tion

rang

e (s

)

CCEC

Figure 4: Speaking duration ranges (min - max) forall groups between conditions.

5.5 Results

Objective Measurements.After the completion of the study, the audio data of each

participant was processed using the PRAAT [3] toolbox toextract speaking duration, speechrate, loudness and pitchinformation. Overall, each participant spoke on average for112.80 seconds during the control condition (CC) and 105.98during the experimental condition (EC). This difference washowever, not statistically significant. Nevertheless, whenlooking at the standard deviation (SD) of speaking durationwithin each group, we found significant differences betweenthe conditions. More precisely, a paired sample T-Test re-vealed that the SD is significantly (p = 0.017) smaller in EC(M = 30.92) than in CC (M = 45.07). Similarly, the meanmaximum speaking duration over all discussion groups is sig-nificantly (p = 0.016) smaller in the EC (M = 143.53) whencompared with the CC (M = 175.60). These effects canbe seen in Figure 4 where the duration ranges (min - max)for all discussion groups are plotted. One-Way ANOVAsrevealed no significant differences between the devices forboth conditions in terms of speaking duration, speechrate,loudness and pitch.

The click data of the participants was also recorded andstatistically analysed. When comparing between CC andEC, a paired sample T-Test revealed a significant (p = .014)increase in clicks per discussion (CC: M = .34, Sum = 24,EC: M = .97, Sum = 68). However, ANOVA tests revealedno significant differences between devices for both discus-sions.

Subjective Measurements.Statistical analysis on the questionnaire data also revealed

interesting effects. When comparing the participants’ rat-ing against the scales midpoint (i.e. four), one-sided T-Testwith applied Bonferroni-Holm correction yielded 8 signifi-cant differences. In particular, for the first research ques-tion where we investigated the effectiveness of the BFL, oneitem (MQ1-2 = 4.72, SD1-2 = 1.73) was rated significantlyabove the middle and two items (M1-1 = 3.22, SD1-1 = 1.63,M1-3 = 2.89, SD1-3 = 1.25) were rated as significantly be-low the middle. No items which were meant to measurethe disruption effect on the user (RQ2) received ratings sig-

nificantly different than the middle value. On the otherhand, all items of RQ3 were we measured the disruptioneffect on the other users scored significantly below the mid-dle (M3-1 = 3.02, SD3-1 = 1.89, M3-2 = 3.39, SD3-2 = 1.93,M3-3 = 2.07, SD3-3 = 1.31, M3-4 = 1.46, SD3-4 = 1.04,M3-5 = 1.91, SD3-5 = 1.48).

First, a multivariate ANOVA revealed significant differ-ences between the devices for Pillai’s Trace, Wilks’ Lambda,Hotelling’s Trace and Roy’s Largest Root. We followed thiswith univariate tests for each dependent variable. Theseyielded significant differences for the items Q3-1 (Mtactile =2.46, Maudio = 2.15, Mhmd = 4.21, Mmonitor = 3.14) andQ3-2 (Mtactile = 4.00, Maudio = 2.15, Mhmd = 4.29,Mmonitor = 3.07), with participants rating the Google Glassas most disruptive. Significant differences between deviceswere also found for the items Q2-2 (Mtactile = 5.15, Maudio =3.00, Mhmd = 4.07, Mmonitor = 3.57) and Q2-3 (Mtactile =2.36, Maudio = 2.07, Mhmd = 4.20, Mmonitor = 3.40).

Furthermore, when looking at how participants rated thedevices of the other interlocutors, a multivariate ANOVA re-vealed significant differences between the devices. Followupunivariate tests showed significant differences for the item“The mere presence of the other person’s {DEVICE} dis-turbed me” (Q3-5) when comparing between devices (p =.001). Participants felt that the Google Glass worn by theother interlocutors was the most disturbing (Mtactile = 1.39,Maudio = 1.70, Mhmd = 2.57, Mmonitor = 1.98). The meanvalues of each questionnaire item are displayed in Figure 5.

5.6 DiscussionOverall, the study yielded some valuable insights into how

BFL integrate with social interactions. The main findingsare summarized in Table 2.

RQ1: Does the BFL improve the social interaction?How do the feedback methods compare.

To measure the first research question, we first investi-gated whether the behaviour of the participants in the ex-perimental condition (system was on) was different thanthat in the control condition (system was off). This way wewere able to observe that the behavioural feedback loops didhave a positive effect on the behaviour of the users duringthe group discussions. More precisely, the group discussionsin the experimental condition were more balanced in termsof speaking time distribution between users than the dis-cussions in the control condition. This can be observed inFigure 4, where the speaking duration range for EC is “nar-rower” than for CC. Upon closer inspection, we found thatthe BFL had a larger impact on the more active persons, sig-nificantly reducing the maximum speaking duration acrossall sessions. However, no such effect can be observed on theminimum speaking duration, suggesting that the less activepersons were not particularly affected by the feedback. In-terestingly, these effect was not noticed by the participantsthemselves. Only one (Q1-2) out of RQ1’s four items wasrated significantly above the middle value (four) and twowere rated significantly below the middle value.

When comparing between devices, we were not able tofind any significant differences both in terms of objective andsubjective measurements. This leads us to believe that allfeedback delivery mechanisms, regardless of modality, weresimilar effective.

206

1

2

3

4

5

6

7

The device supported me during the discussionThe feedback gave me a better perception of my speaking timeThe discussion improved because of my feed-backBecause of the feedback I was able to better control my speaking time

Q1-1Q1-2

Q1-3

Q1-4

* DEVICE = Myo/Headphones/Google Glass/Monitor

RQ

1R

Q2

RQ

3 I think it was observable on my behaviour when I received feedbackI think others noticed when I received feedback I think others were disturbed by me receiving feedbackThe feedback of the other person's [DEVICE]* disturbed meThe mere presence of the other person's [DEVICE]* disturbed me

The feedback disturbed my conversation flowI would have preferred to talk without the feed-backRegardless of the type of feedback, the device itself disturbed meI would have preferred to talk without the feed-back device

Q2-1Q2-2

Q2-3

Q2-4

Q3-1

Q3-2

Q3-3Q3-4

Q3-5

Q1-1 Q1-2 Q1-3 Q1-4 Q2-1 Q2-2 Q2-3 Q2-4 Q3-1 Q3-2 Q3-3 Q3-4 Q3-5

TactileAudioVisual HMDVisual Remote

Error Bars: 2x SEM, * sig. diff. between devices, ** sig. diff. to midpoint

**

**

* * ***

***

** ***

**

**

Figure 5: Mean values for post-session questionnaire.

RQ2: Does the BFL disturb the user? How do the feed-back methods compare.

For RQ2 we first looked at the paralinguistic features andhow they changed between conditions. If the BFL disturbedthe participants we would expect some differences in termsof speaking rate or voice modulation. However, we wereunable to find any evidence of such an effect.

Looking at the differences between devices, we were un-able to find any significant differences in terms of paralin-guistic features. However, the analysis of the questionnairedata did reveal that the participants found the Google Glassto be the most disturbing device and the tablet the least.This is not very surprising and can be explained by the highintrusion factor and field of view obstruction of the GoogleGlass and the low intrusion factor and the ordinariness ofthe tablet. Our analysis also revealed that the vibrotactilefeedback was rated as the one they would have preferred totalk without the most (Q2-3). This effect is more interestingas it goes against our expectations. More precisely, since ina social interaction we rely most heavily on our vision andhearing, we expected the tactile feedback to be the least dis-ruptive and the visual and auditory feedback to be the most.Our results may be explained by the fact that the users werenot accustomed to this type of feedback and were startledby the vibrations. Also of interest is the fact that audiofeedback was at the opposite end of Q2-3. This stands incontrast to the results reported by Ofek et al. [10], as theysuggested that during vocal tasks, audio feedback is partic-ularly disruptive. However, it is important to point out thatin our scenario, audio feedback events were delivered in mostcases only 4-5 times over a period of 2 minutes and consistedof a single short word. In Ofek’s study, up to 100 words weredelivered during a 5 minute window. Furthermore, Ofek didnot use bone conduction speakers. For our future work weplan to look closer at these effects and compare betweenmultiple feedback strategies for each modality.

RQ3: Does the BFL disturb the other interlocutors?How do the feedback methods compare.

To measure how disturbing the device is for the others, welooked at the clicker data and the questionnaires. While theclicker data is not directly able to tell us if the BFL disturbsothers, it does tell us if other persons are aware of BFL as theparticipants were instructed to press the clicker every time

they thought the speaker received feedback. Looking at thisdata we noticed an increase in overall amount of clicks inthe experimental condition, which does show that the par-ticipants were in fact able to detect the presence of the BFL.We can thus state that the BFL’s impact on the participants’behaviour was sufficient to be observable from the outside.However, the questionnaire data analysis yielded that thiseffect was not considered disruptive since all five items wererated significantly below the scale’s midpoint. Here we dowant to note that the overall amount of clicks was quite lowand during some sessions, the clickers were not used at all.When confronted by this fact, most participants said thatthey were so engaged in the discussion, they forgot aboutthe clicking task. To this end, although successfully usedby Ofek et al. [10], we were disappointed by this metric andthus find it difficult to recommend for other natural socialinteraction studies.

Although, generally speaking the BFL were not consid-ered disturbing, we found that some devices were rated lessdisturbing, and others more. In particular, we found sig-nificant differences for how disruptive participants believedthe feedback was (Q3-1 and Q3-2). Here, users generallythought that the audio feedback was least noticeable by oth-ers, whereas the feedback delivered on the Google Glass wasthe most noticeable. The participants rated that the tactileand auditory feedback as least likely to cause any unwantednoticeable changes in behaviour. When asking the opin-ion of the other users (i.e the participants not wearing thedevice in question), we found that they were significantlymore disturbed by the Google Glass than the other devices(Q3-5). These results are in line with the “glasshole” phe-nomenon, according to which persons who use the GoogleGlass in public may be perceived as “creepy or rude”. Thisterm attracted so much interest in 2014 that Google wasforced to release guidelines on how to behave when wearingthe Google Glass3.

6. CONCLUSIONThe main contribution of this work lies in the exposition

of behavioural feedback loops (BFL) as automated realtimemultimodal processes, able to assist the user in improvingher or his behaviour during social interactions. Further-

3https://sites.google.com/site/glasscomms/glass-explorers

207

https://sites.google.com/site/glasscomms/glass-explorers

Table 2: Summary of user study findingsGeneral effects Effects between feedback methods

RQ1 Discussions were more balanced when BFLwere active

All feedback methods performed similarly well

RQ2 BFL were not considered particularly disturb-ing or undisturbing by users

HMD was rated most disturbing device, tablet least. Tactile feed-back was rated most disturbing feedback method, audio least.

RQ3 Presence of BFL was observable by others(but was generally considered not disturbing)

HMD and HMD visual feedback are most disturbing to others.

more, in an effort to encourage further research into BFL,we created an open source software framework for design-ing, building and executing BFL in a mobile environment.We then conducted a user study with 54 participants wherewe investigated the impact of four different BFL spanningthree modalities on face-to-face human interactions. TheBFL were designed to assist the users in keeping a balanceddiscussion, i.e. all participants should speak for roughly thesame amount of time. The feedback was delivered eitherusing tactile (Myo armband), auditory (headphones), vi-sual head-mounted (Google Glass) or visual remote (com-mon monitor) stimulation. The study yielded that althoughobjectively effective, humans are sceptical towards the use-fulness of such systems. Moreover, our participants foundthe Google Glass as a feedback delivery device to be the mostdisturbing. Interestingly, when rating the feedback methodsthemselves, participants found the vibrotactile feedback tobe the most disturbing.

As part of our future work, we intend to explore differentfeedback strategies for each modality, e.g. other visualisa-tions for visual feedback or other vibration patterns for tac-tile feedback. Furthermore, we would like to also test moreadvanced feedback delivery mechanisms, such as dynami-cally adapting the feedback strategy to the user’s reaction,simultaneously sending feedback over multiple channels ormore intelligently controlling when the feedback is deliveredbased on current activity and cognitive state.

AcknowledgmentsThe work was partially funded by the German Federal Min-istry of Education and Research (BMBF) under the project“GLASSISTANT” (FKZ 16SV7267K). We thank the stu-dents Christian Brummer, Tim Buchwald, Yang Peng Liu,Davin Geiger, Veli Hasanca, Sekip Tuncer, Nicola Haupt-mann, Philipp Nowak, Yulia Khlyaka, Matthias Landsge-sell, Dominik Brindl and Marziye Tanha for their work onthe user study.

7. REFERENCES[1] A. Bandura. Social foundations of thought and action:

A social cognitive theory, 1986.

[2] R. Barmaki and C. E. Hughes. Providing real-timefeedback for student teachers in a virtual rehearsalenvironment. In Proc. ICMI. ACM, 2015.

[3] P. Boersma and D. Weenink. Praat: doing phoneticsby computer. http://www.praat.org/, retrv. 2013.

[4] I. Damian and E. Andre. Exploring the potential ofrealtime haptic feedback during social interactions. InProc. TEI. ACM, 2016.

[5] I. Damian, T. Baur, B. Lugrin, P. Gebhard,G. Mehlmann, and E. Andre. Games are better than

books: In-situ comparison of an interactive jobinterview game with conventional training. In Proc.AIED, LNCS, pages 84–94. Springer, 2015.

[6] I. Damian, C. S. Tan, T. Baur, J. Schoning,K. Luyten, and E. Andre. Augmenting socialinteractions: Realtime behavioural feedback usingsocial signal processing techniques. In Proc. CHI,pages 565–574, 2015.

[7] C. Hamilton. Communicating for results: A guide forbusiness and the professions. Thomson/Wadsworth,Belmont, Calif., 9th ed. edition, 2011.

[8] J. G. Hollandsworth, R. Kazelskis, J. Stevens, andM. E. Dressel. Relative contributions of verbal,articulative, and nonverbal communication toemployment decisions in the job interview setting.Personnel Psychology, 32(2):359–367, 1979.

[9] D. Navon and D. Gopher. On the economy of thehuman-processing system. Psychological Review,86(3):214–255, 1979.

[10] E. Ofek, S. T. Iqbal, and K. Strauss. Reducingdisruption from subtle information delivery during aconversation: Mode and bandwidth investigation. InProc. CHI, pages 3111–3120. ACM, 2013.

[11] J. Schneider, D. Borner, P. van Rosmalen, andM. Specht. Presentation trainer, your public speakingmultimodal coach. In Proc. ICMI. ACM, 2015.

[12] S. J. Shettleworth. Cognition, Evolution, andBehavior. Oxford University Press, 2009.

[13] J. Six, O. Cornelis, and M. Leman. Tarsosdsp, areal-time audio processing framework in java, 2014.

[14] B. F. Skinner. The behavior of organisms: anexperimental analysis, 1938.

[15] M. B. Stein and D. J. Stein. Social anxiety disorder.The Lancet, 371(9618):1115–1125, 2008.

[16] M. I. Tanveer, E. Lin, and M. Hoque. Rhema: Areal-time in-situ intelligent interface to help peoplewith public speaking. In Proc. IUI, pages 286–295,2015.

[17] J. van der Linden, R. Johnson, J. Bird, Y. Rogers, andE. Schoonderwaldt. Buzzing to play: Lessons learnedfrom an in the wild study of real-time vibrotactilefeedback. In Proc. CHI, pages 533–542. ACM, 2011.

[18] J. Wagner, F. Lingenfelser, T. Baur, I. Damian,F. Kistler, and E. Andre. The social signalinterpretation (SSI) framework - multimodal signalprocessing and recognition in real-time. In Proc. ACMMULTIMEDIA, 2013.

[19] C. D. Wickens. Multiple resources and performanceprediction. Theoretical Issues in Ergonomics Science,3(2):159–177, 2002.

208

measuring the impact of multimodal behavioural feedback loops … · 2017-01-11 · finally, an...

Documents