signalshare: a secondary two-way mediated interaction tool

SignalShare: A Secondary Two-Way Mediated Interaction ToolIris Chang

University of British ColumbiaSchool of Biomedical Engineering

Vancouver, [email protected]

Andie BuenconsejoUniversity of British Columbia

School of Biomedical EngineeringVancouver, Canada

[email protected]

Eric EasthopeUniversity of British Columbia

Electrical and Computer EngineeringVancouver, Canada

[email protected]

Figure 1: Setup of the (a) abstract cue condition and the (b) contextual cue condition. During the one-on-one video conference,participants and testers completed a jigsaw puzzlewhile having the abstract and contextual interfaces by their desktop display.

ABSTRACTWe prototype a web-based mobile interface for users to easily see,create, and re-create shared real-time secondary auditory and tactileenvironmental cues in the context of a one-on-one desktop videocall. Two types of secondary cues are assessed in terms of theireffects on a video caller’s sense of immersion and presence.

CCS CONCEPTS• Human-centered computing → Empirical studies in inter-action design; Interface design prototyping; Collaborative in-teraction; Web-based interaction.

KEYWORDSinteraction design, interface design, video, teleconferencing, real-time, web-based, WebRTC, immersion, presence

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).Human Interface Technologies ’21 (HIT2021), April 20, 2021, Vancouver, BC, Canada© 2021 Copyright held by the owner/author(s).

ACM Reference Format:Iris Chang, Andie Buenconsejo, and Eric Easthope. 2021. SignalShare: ASecondary Two-Way Mediated Interaction Tool. In Human Interface Tech-nologies ’21 (HIT2021), April 20, 2021, Vancouver, BC. ACM, New York, NY,USA, 8 pages.

1 INTRODUCTIONVideo calls, which are becoming common, do not encompass en-vironmental and non-verbal cues typically experienced by peopleinteracting in the same space. These ambient and non-verbal cuesmay affect the quality of a conversation, and so simulating them ina shared way may be able to extend or enhance the video callingexperience by increasing interaction between callers.

We prototype an interface for sharing and synchronizing envi-ronmental cues in an interpretable way between video callers inorder to simulate ambient and non-verbal experiences of in-personinteractions. We draw inspiration from the idea of using externalcues and non-verbal interactions to extend or enhance the videocalling experience, and also inspiration from related work that hassought to create and remotely re-create shared interactions andexperiences between participants, physically or virtually. Our pro-totype takes secondary environmental cues typical of in-personinteractions, and virtually re-creates part of the corresponding vi-sual, auditory, or tactile immersion of such interactions in a sharedway between callers. These secondary cues are then assessed interms of their effects on a video caller’s sense of immersion andpresence.

Human Interface Technologies ’21 (HIT2021), April 20, 2021, Vancouver, BC, Canada Chang and Buenconsejo and Easthope

2 BACKGROUND & RELATEDWORKMeasurements of user engagement and sentiment from video havebeen previously explored. In particular, emotion recognition hasbeen applied to video teleconferencing platforms to assess callersentiment through computer vision methods by detecting facialfeatures [21]. Eye-tracking has been used to recognize emotionsand engagement [3, 16], as well as voice characteristics such aspitch and intensity [13]. User engagement has also been predictedfrom body language [22]. Going further, telepresence robots havebeen used to capture and exaggerate body motion cues of remoteteleconferencing participants [12]. However, while technologieslike these are able to extend humanmotion cues, overall there seemsto be less research on detecting and sharing ambient features of aparticipant’s environment, especially in video calling contexts.

Shared haptics are one way in which remote participants canhave shared interactions. The MIT Tangible Media Group has con-tributed inTouch, which enables distanced touch interactions usingforce-feedback [4], but does not go as far as enabling touch inter-actions over arbitrary distances using wireless technology or thelike. Further investigations into force-feedback have found thathaptic-based remote interactions negatively affect user attitudesin cooperative settings [5], and force-feedback interactions withshared physical or virtual objects in teleconferencing scenarioshave also been explored [23].

More recently, Apple Inc.’s Digital Touch feature makes it pos-sible to wirelessly share impromptu sketches, simple tap-basedhaptic interactions, and even heartbeats between users. However,these wireless interactions are based on an asynchronous send-and-receive communication model. Similar projects have been under-taken in the context of long distance relationships, such as veststhat send virtual hugs and LumiTouch, a photo frame that lights upat the touch of the remote partner’s hand [17]. Instead of havingusers intentionally initiate interactions, several projects such asComSlipper [6] and remotely warmed beds [8] explored the remotesynchronous sharing of experiences through embedded hardwarein furniture, which resembles ambient intelligence.

In larger research spaces, the McGill Ultra-Videoconferencing Re-search Group simulated the environment and presence of a remotevideo call partner with an adaptive videoconference environmentconsisting of high-quality cameras, speakers, and plasma screens.Their environment also allowed for floor vibration transmission[7, 11]. While they have achieved good results, they require theuse of high-quality equipment that is not easily accessible due toits high cost and space requirements. Augmented Reality (AR) andMixed Reality (MR) also allow users to add virtual objects to aperceived environment to enhance teleconferencing [1, 15], andthe addition of virtual objects to AR and MR collaboration tasksas spatial cues has been shown to be favourable over physical ob-ject cues for lowering subjective workload and encouraging verbalcommunication [18]. However, these methods require specializedequipment capable of AR or MR, which remains costly.

Ambient intelligence (AmI), as mentioned in the context of shar-ing experiences, is another relevant research area, where othershave embedded communication and computation capabilities intoeveryday objects in order to support and influence user behaviour.Implicit environmental cues, for example, have been applied to

goal-oriented activities, drawing upon inspiration from collectivenatural systems [20]. Social awareness and context-aware comput-ing are also recurring themes in related research, where mobileand online applications are used to promote social context, aware-ness, and support [2, 10]. Examples include a mobile applicationthat shares user-selected colour cues to share user emotions, an“emotional climate map” of a museum space that aggregates inputsfrom multi-user activities and stylus interactions, and a campusmap where students can contribute their experiences as contenttags. These AmI, social awareness, and context-aware computingtools focus on increasing and improving user awareness and socialconnections to others, but do not usually consider the sharing ofcues as part of shared social experiences. To our knowledge, sharingcommon context cues between mobile phones, specifically withthe purpose of enhancing or extending an existing video callinginteraction, remains unexplored.

3 METHODOLOGY3.1 Prototype DesignDuring a desktop video call, callers access a shared virtual “room”through awebpage-based interface in their respectivemobile browsers,where they both enable permissions for microphone access. Micro-phone data is not transmitted between callers in any way, and isonly used to detect secondary cues. Callers then place their mobilephone upright on the same surface as their desktop computer suchthat it is visible and within reach of them.

Ambient auditory sounds, as well as direct tactile interactionsthat either caller makes with their mobile phone display are thendetected and processed. During processing, these detected cues areabstracted into simple digital tokens representing cues, which aresynchronized in near real-time between callers in the same virtualroom through a shared visual interface (Figure 2). This synchro-nization and interface are run independently of the video callinginterface itself. Also, we simplify input cues before synchronizationin order to reduce bandwidth use, avoid transmitting microphonedata, and prevent unwanted lag.

In this way each mobile phone acts as a real-time simultaneoussender and a receiver of ambient audio and tactile cues betweencallers in the same virtual room. However, callers do not shareaudio directly, as this would make the accompanying video callredundant. Instead, client-side code on each caller’s mobile phonesimplifies audio and tactile input cues prior to synchronizing anddisplaying them between callers’ phones.

To compare a user’s sense of immersion and presence, we rep-resent identical input cues with two distinct visual interfaces, onewhere we represent cues abstractly with changes in colour andbrightness, and another where we represent cues contextually withskeuomorphic elements meant to virtually resemble nearby physi-cal objects in a caller’s environment. For the contextual interface,our implementation displays a container of water that ripples andchanges based on audio and tactile input (Figure 3). A user test isconducted to compare the abstract and contextual interfaces, whichis discussed in Experimental Design. We summarize these cue types,inputs, and outputs in Table 1.

SignalShare: A Secondary Two-Way Mediated Interaction Tool Human Interface Technologies ’21 (HIT2021), April 20, 2021, Vancouver, BC, Canada

Table 1: Cue types, inputs, and outputs.

Cue Types Inputs OutputsAbstract (1) Audio (sound near microphone) (1) Continuous colour changes

(2) Tactile (touch and drag) (2) Sudden flashes of colour

Contextual (1) Audio (sound near microphone) (1) Continuous skeuomorphic graphical changes(2) Tactile (touch and drag) (2) Sudden, localized skeuomorphic graphical changes with visible path

Figure 2: Secondary interactions between video callers aresynchronized in real-time through a shared visual interfaceon theirmobile phones. Two early-stage concepts are shown:(1) synchronization of tapping and other repetitive ambi-ent audio cues, and (2) synchronization of on-screen inter-actions with shared virtual objects.

3.2 Prototype ImplementationThe prototype is written using the HTML, CSS, and JavaScript weblanguages, and is hosted online using Vercel as a mobile-friendlywebpage-based visual interface. Two interfaces are developed: (1)an abstract interface that displays continuous and sudden flashes ofcolour, and (2) a contextual interface that displays realistic surfacewater effects like rippling. We use a Viridis colour map for theabstract interface as it is red-green colour-blind (deuteranopia) safe[19].

Heroku is used to configure a fallback signalling server for thesynchronization of secondary cues, and source code files for boththe interface and signalling server are hosted on GitHub [9]. GoogleChrome for Android and Safari for iOS are supported.

Client-side code (code that runs on each caller’s device) processesauditory and tactile inputs, and strings and arrays are synced be-tween callers accessing the webpage in real-time using WebRTC(an open-source real-time communication API) with Yjs [14]. Yjsinstantiates shared virtual rooms, and waits for caller interactionevents, which are synchronized between devices using WebRTC.

4 EXPERIMENTAL DESIGNThe user study focused on evaluating the abstract and contextualinterfaces in terms of their ability to enhance the aspects of immer-sion and presence in the video call experience. Immersion pertainsto the sense of sharing the same physical space and feeling physi-cally close together, while presence pertains to the sense of sharingthe same experience with others. Our prototype requirements anddesign led us to investigate the following evaluation and researchquestions (labelled RQ1-RQ3 below):

EvaluationQuestion: Usingmobile phones, how can secondaryenvironmental cues be synchronously captured, simplified, andshared to positively impact the videoconferencing experience oftwo users in terms of immersion and presence?

• RQ1: Can interactive contextual cues in the environment im-prove the video conference experience in terms of immersion?

• RQ2: Can interactive abstract cues in the environment improvethe video conference experience in terms of presence?

• RQ3: Are interactive contextual and abstract cues equivalentlyeffective at improving the video conference experience?

4.1 Participant RecruitmentParticipants who were at least 18 years of age were recruited forthis study through convenience sampling based on existing socialconnections. The target demographic of this study were individualswho regularly use video conferencing platforms for their occupa-tion, education, or social activities. We have chosen this recruitmentstrategy due to its convenience given the limited timeframe for usertesting. Additionally, participants were required to have a mobilephone and desktop computer with stable internet connection tobe able to run the web interfaces and engage in a video call with atester. Informed consent was obtained from each participant priorto conducting each session, and the sessions were recorded withexpressed permission.

4.2 Experiment ProcedureThe experiment was conducted remotely and one-on-one, involvingthe participant and one research teammember as a tester. The studyconsisted of three test conditions based on cue types, all of whichwere tested with each participant:

• No shared cues, i.e. a regular video conference call• Shared abstract cues• Shared contextual cues

During the experiment, the participant and the team memberused video conferencing to complete a shared online jigsaw puzzleunder each condition. The “no shared cues condition” was tested


Figure 3: Interfaces for real-time synchronized abstract (left) and contextual (right) cues from caller interactions, each shownover time: an instantaneous touch input triggers a flash of colour (1) or sudden skeuomorphic graphical change (not showndue to poor visibility in print) on the other caller’s device, a continuous touch input (touch and drag) triggers a continuouscolour or skeuomorphic graphical change (2 and 4), and an ambient audio input triggers a different colour or skeuomorphicgraphical change (3 and 5). All cues are visible on both callers’ devices.

first and served as the control condition to establish a baselineand to assess the participant’s normal behaviour in a regular videoconference call. The order of abstract and contextual cue conditionswere randomized across the participant population.

4.3 Test SetupPrior to each test session, participants were asked to set their phonesto Do not disturb mode to eliminate notifications and to adjusttheir screens to timeout after 30 minutes of inactivity. Each testparticipant connected with a team member (the tester) via videocall on Zoom. The participants were then notified when videorecording on Zoom was initiated. The test setup for the abstractand contextual cue conditions are shown in Figure 1.

For each experiment condition, participants were asked to com-plete a shared online jigsaw puzzle from jigsawexplorer.com to-gether with the tester. A jigsaw puzzle activity was chosen due toits potential for high interactivity and engagement with the testerwhile keeping the video conference atmosphere casual. The puzzleswere set to have 40 pieces to ensure puzzle completion within fiveto ten minutes. Images of the puzzles were colourful quilt patterns.Pictures of sceneries or specific objects were avoided to preventparticipants from linking the puzzles to the interfaces. Differentpuzzles were assigned for each cue condition.

For the abstract and contextual cue conditions, the participantswere asked to set up virtual Zoom backgrounds to simulate sharingthe same space as the tester. When setting up for the shared abstractcue condition, both the tester and participant set their Zoom virtual

backgrounds to a plain black image (Figure 1, left). To set up thecontextual cue condition, participants were asked to change theirvirtual Zoom backgrounds to images of bedrooms with rainfall out-side a window. The tester changed their background to somethingsimilar (Figure 1, right).

Once the Zoom backgrounds were set, the participants weregiven a website link to the interface for the cue condition beingtested, which they were asked to open on either Google Chromefor Android or Safari for iOS. The participants were promptedto provide microphone access to the web application in order toinitiate audio-based cues. Testers used a modified version of theinterface in which a red rectangle was displayed whenever a cuewas triggered by the tester.

Participants were guided through the interactions they couldinput and receive on the interface. After the participants exploredthe interfaces, they were asked to keep their phones upright andwithin their line of sight by their desktop screens. Once the partici-pant was ready to begin the jigsaw puzzle, the testers initiated aphone screen recording to archive interface interactions during thepuzzle between participants and themselves.

4.4 Participant SurveyAfter each experiment condition, the participants were asked tocomplete a post-task questionnaire (Table 2), which was providedon Google Forms. For the no cue condition, questions IQ1, IQ2, andIQ7 were omitted as the phone interfaces were not used for thiscomponent of the test. The questions in the post-task questionnaire


Table 2: Questions included in the post-task questionnaire.

ID Interface Question FocusIQ1 “I understood what was displayed on my phone.” UnderstandingIQ2 “My phone was distracting.” AttentionIQ3 “I felt close to the other person on the video call.” ImmersionIQ4 “I felt that I shared the same space as the other person.” ImmersionIQ5 “I felt like I could directly interact with the other person.” PresenceIQ6 “I felt like I was present during the video call.” PresenceIQ7 “The phone interface improved my video calling experience.” Quality of Video Conference

are 5-point Likert scale questions, where for each statement theparticipant selected a nominal value ranging from “strongly agree”,with a score of 5, to “strongly disagree”, with a score of 1.

Finally, the participants were asked to complete a post-test ques-tionnaire on their demographic and technology use behaviour toprovide further insight to their responses:

• DBQ1: In which age group do you fall under? (options: “18-25,” “25-30,” “30-40,” “40-60,” “60+,” “prefer to not answer”)

• DBQ2:What is your gender? (options: “male,” “female,” “other,”“prefer to not answer”)

• DBQ3: What video calling software do you use? (enter text)• DBQ4: Roughly, how many hours per day do you do videocalls? (options: “0-1 hour”, “2-5 hours”, “5+ hours”)

• DBQ5: Do you ever use multiple displays at the same time,such as your smartphone and computer together? (options:“yes”, “no”, “unsure”)

• DBQ6: If so, howmany displays do you usually use at a time?(enter number)

This questionnaire was also provided on Google Forms, andthese questions were asked after all tasks to avoid priming theparticipants.

After all questionnaires were answered, participants were in-terviewed to gain their insights about the interfaces. Testers alsoasked questions regarding each participant’s survey and interviewanswers, especially if there were outliers or interesting responses.For the interview, we asked:

• What does being “present” mean to you in this video call?• Was the ambience appropriate for a video call (virtual back-ground images)?

• Were the interactions appropriate for a video call (abstractversus contextual cues)?

• What did you think about the different interfaces? Any pref-erence for one versus the other (abstract versus contextualcues)?

4.5 AnalysisResponse distributions of the post-task Likert questions that con-cern immersion and presence (IQ3 to IQ6) were compared to answerthe proposed research questions. To further understand the answersto RQ1, RQ2, RQ3, we analyzed qualitative observations and inter-view results.

Participant answers to interview questions were analyzed to de-termine their interest in and understanding of the interface and theirability to interact with it. The types of cues participants expected

were identified, and we gauged participant receptions to each cue.Additionally, participant responses provided insights on what kindof abstract or contextual cues they appreciate and might use inpersonal video calls. Besides determining participant experienceduring the experiment, we identified areas for improvement to theinterface, especially if this work is extended to further experiments.

5 RESULTS5.1 DemographicsOverall, six participants were recruited: four females and two males.All participants were in the age range of 18 to 29. All participantshad used at least one video calling software, with universal usageof “Zoom” out of the six video calling platform options provided.Five of the six participants would spend 0 to 1 hour on video callswhile one participant (P6) would spend 2 to 5 hours. All but oneparticipant (P4) had used multiple displays at the same time, usually2 displays. Overall, the recruited participants were considered tobe part of the target audience of the proposed interface, since theyused video calling platforms in daily life and were generally familiarwith simultaneous use of multiple displays.

5.2 Interaction & Gaze ObservationsWe observed that most participants did not actually interact withthe phone interfaces once they started on the puzzle task. The par-ticipants reported to be too focused and drawn into the puzzle task,either forgetting about the phone interface or electing to ignore itsince the interface interactions do not assist with puzzle making.Before the tasks started, most participants expressed interest inthe interfaces and tested by interacting through touch and audio.Out of six participants, only one participant (P6) interacted withthe interfaces regularly through the puzzle tasks by reciprocatinginteractions and initiating input, mostly through touch. P6 reportedthe interactions to be fun and enjoyed the mystery of trying to fig-ure out what the other user is trying to communicate. Gaze of theparticipants was observed by the tester through video recordings.Similar to the interface interaction results, most participants did notshift their gaze to objects or interfaces other than the puzzle screen.For P6 who regularly interacted with the phone interfaces in bothtest conditions, they shifted their gaze 10 times to the abstract cuesinterface and 15 times to the contextual cues interface.


5.3 Post-Task Questionnaire & InterviewResults

Since the post-task questionnaire provided ordinal data, the datawas analyzed based on frequency. The interview results are ana-lyzed for themes in user experience and interface-specific feedbackin the context of our research questions:

• RQ1: Can interactive contextual cues in the environment im-prove the video conference experience in terms of immersion?To answer RQ1, the responses to post-task questionnaireitems associated with immersion (IQ3 and IQ4) were groupedfor comparison across test conditions. Figure 4 shows thatthe combined frequency of “agree” and “strongly agree” re-sponses to statements that positively affirm the immersionexperience is highest in the contextual cues condition fol-lowed by abstract cues and no cues, respectively.

• RQ2: Can interactive abstract cues in the environment improvethe video conference experience in terms of presence?To answer RQ2, the responses to post-task questionnaireitems associated with presence (IQ5 and IQ6) were groupedfor comparison across test conditions. Figure 4 shows thatthe combined frequency of “agree” and “strongly agree” re-sponses to statements that positively affirm the presence ex-perience is highest in the contextual cues condition followedby abstract cues and no cues, respectively. It is interestingto note that the frequency of “strongly agree” responses isactually highest in the no cues condition, indicating that,to some users, the interactive interface actually negativelyimpacted the presence experience.

• RQ3: Are interactive contextual and abstract cues equivalentlyeffective at improving the video conference experience?IQ7 on the post-task questionnaire directly addresses theimprovement posed by the tested interface, and its responsefrequency is as shown in Figure 4. In the case of abstractcues, half of the participants either “strongly disagree” or“disagree” with the positively-worded statement. In the con-textual cues condition, half of the participants were “neutral”about the statement, with the other half of the participantssplit towards the opposite ends of the spectrum.

More information regarding this research question can be gleanedfrom responses to other questionnaire and interview questions.Drawing from the performance of the two cue types in terms ofimmersion and presence, the interface with contextual cues is as-sociated with higher frequencies of affirming responses in bothimmersion and presence. Based on the responses to IQ1, both in-terfaces were understandable to the same degree. The researchquestion can also be approached from a preference perspective.From interview feedback, four out of six participants (P1, P2, P3,P5) stated a personal preference for the contextual cues interface,while the other two (P4 and P6) preferred the abstract cues interface.Preference for the contextual cues interface was based on entertain-ment, graphic detail, output subtlety, and realism. Preference for theabstract cues interface was based on salience of output, simplicity,and speed of reaction.

5.4 Distraction Effect of InterfaceIQ2 of the post-task questionnaire was designed to understandwhether the interactive interfaces may be considered distractingwhen used in the video calling context. As described earlier, mostparticipants did not pay attention to the phone interface and ex-clusively focused on the puzzle task. Some participants naturallymaintained their attention on the puzzle while one participant (P2)intentionally ignored the interface in favour of concentrating onthe puzzle. This trend is reflected in the responses to IQ2, as shownin Figure 4.

The issue of distractionwas further explored in the interviews. P6found the contextual cues interface to be more distracting due to theneed to pay close attention and scrutinize the phone screen to detectoutput. The inability to observe and interact with the contextualcues interface within the peripheral vision distracted the user fromtheir puzzle task and video conference by having to shift their fullattention. However, the majority of users found the abstract cuesinterface to be distracting due to the bright flashing colours asthey were harder to ignore than the ripples on the contextual cuesinterface.

6 DISCUSSIONBased on the user test questionnaire results, it is suggested that theinteractive contextual cues improved the video calling experienceboth in terms of immersion and presence to some degree, and thatinteractive contextual cues aremore effective at improving the videocalling experience than abstract cues. However, these results arecontrasted by the observation that most participants did not interactwith the phone interface regardless of interactive cue type duringthe shared puzzle-making tasks. There are several ways to interpretthis. One, the participants may have enjoyed the puzzle-makingtasks too much to pay attention to the phone interface. Two, thesensory demand of the puzzle task was too similar to the interfacesensory demands, so participants elected to only pay attention topuzzle-making as the designated primary task. Alternatively, theinteractive interface may be unsuitable to the video calling use caseregardless of assigned participant task.

Most participants (P1, P2, P3, P4, P5) stated they were personallyinvested in the completion of the puzzles with feelings of com-petitiveness despite tester explanation that their puzzle-makingperformance was not measured. The puzzle interface employedin the user test showed a stop-watch timer, which may have con-tributed to participant’s race-like experience. When analyzing thepuzzle-making task in terms of sensory workload, there was astrong overlap of visual demand since puzzle-making requires de-tailed colour and pattern searching, leaving little visual resourcesto be shared with monitoring the interactive interface or video con-ference. There was also an overlap of tactile demand, as the puzzleinterface requires point and dragging, reducing tactile resourcesfor tactile interaction with the phone interface. P5 suggested that ashared task with more individual downtime such as a turn-basedgame may encourage usage of the interface. All of these findingsindicate that the interactive interface is not suited to be used incases where the video-calling users are intently involved in visuallyand tactilely-demanding tasks.


Figure 4: Likert response frequency of Presence-themed questions (IQ3 and IQ4, top-left), Likert response frequency ofImmersion-themed questions (IQ3 and IQ4, top-right), Likert response frequency of IQ2 (bottom-right), and Likert responsefrequency of IQ7 (bottom-left).

The participants showed varied interpretation and reaction tothe virtual background component of the test conditions. Two par-ticipants (P3, P6) had technical difficulties in setting up the virtualbackgrounds due to facial detection issues, which may be associ-ated with their use of older MacBook laptops. Upon seeing thevirtual backgrounds for the contextual cues test case, P3 immedi-ately understood and pointed out the intent to create a “sharedspace”. On the other hand, P4 recognized the intent but interpretedit as “two different places”. In general, however, participants didnot notice the intent to share space or connect the contextual cuesvirtual background with the design of the contextual interface. Mostparticipants noticed the similarity between the black virtual back-ground in the abstract cues condition with the black backgrounddesign of the abstract interface. Interestingly, P5 used languagesuggesting that they saw the background as part of their physicalenvironment, particularly in reference to black background duringthe abstract condition: “I don’t think I even glanced at it over there.”

The group of participants recruited presents the unique oppor-tunity to observe how the interface designs perform for peoplewith ADHD and for people with colour deficiency. P2, who has

ADHD, found the abstract interface particularly distracting whilecompleting the jigsaw puzzle. They provided the following feed-back: “You’re trying to look for the matching colours... in the puzzle,and it’s, like, really distracting when you see another colour flash outof the corner of your eye. I forgot which [puzzle piece] colour I waslooking for.” When prompted for their thoughts on the output de-sign of the abstract interface: “I don’t like the flashing colours; that’sreally distracting. It was really distinct, and it’s very eye-catching.Too eye-catching. . . I don’t. . . I don’t like that.” Other participantsreported mixed opinions on the use of flashing colours. As for P6,the participant with colour deficiency, they reported some confu-sion when trying to interpret the abstract interface interactions. P6had difficulty associating the type of input (i.e. audio or touch) withoutput colour, which was not reported by any other participants.

While most participants did not demonstrate interest in theinterfaces during user tasks, two participants (P3 and P6) said theysaw the appeal of the interactive interfaces in video call settings.Both participants described the additional tactile interaction thatis provided by the interface as “fun” and that the sense of touchadds a level of closeness with the other party. P3 described being


able to see the physical aspect of touch, and that “it does feel like Iam touching fingers with you from the screen.” The same participantstated, “I’d totally use it for my friends and loved ones, like if they’rein other countries. If I could feel like I am physically interacting withmy little sister, for example, I think that would be adorable. I mightcry, actually.” P6 noted when using the contextual interface, “It feltlike a new thing and like noticing something happening on the side.Felt like a new way of interacting. The mystery of it was interesting,like, ‘What is the other person trying to communicate?’” It can betheorized from these comments that there are user scenarios wherethe interactive interfaces are better suited, such as in video callsbetween users who are emotionally close but physically located indistant locations, where the task or goal is more communication-based.

6.1 LimitationsA clear limitation of the user test conducted is its small samplesize and relative similarity between the recruited participants. Also,very few in-task user interactions with the interface were captured,likely due to the task design. While it helped determine that theinteractive interface is not compatible with visually, tactilely de-manding shared tasks, it leaves a gap in understanding how userswould interact with the interface in other more suitable scenarios.Lastly, the tester-participant experiment design may have limitedorganic interactions that could arise out of paired participants whowere not constrained by their tester role, since the interactive in-terface is designed for two-way interactions.

7 CONCLUSION & FUTUREWORKWe have prototyped a web-based mobile interface for users to easilysee, create, and re-create shared real-time secondary auditory andtactile environmental cues in the context of a one-on-one desktopvideo call. We have assessed two types of secondary cues in termsof their effects on a video caller’s sense of immersion and presence.Through the user test we also collected informative feedback onuser preference between the two cue types, general interface design,and interface suitability to a specific type of shared task.

This work is a precursor to further development and evaluationof secondary interactive interfaces that can be used in conjunctionto conventional video conferencing tools. In the case of interfaceswith contextual cues, there are many different visual and interac-tion designs that can be explored to express the interactive context(e.g. other reactive materials, animated animals, etc.), which canbe compared to determine their impact on user immersion, pres-ence or other aspects of the calling experience. The water ripplecontext tested in this work also has room for further testing andimprovement, such as output salience and user identification.

REFERENCES[1] Istvan Barakonyi, Helmut Prendinger, Dieter Schmalstieg, and Mitsuru Ishizuka.

2007. Cascading Hand and Eye Movement for Augmented Reality Videoconfer-encing. In 2007 IEEE Symposium on 3D User Interfaces. https://doi.org/10.1109/3DUI.2007.340777

[2] Jakob E. Bardram and Thomas R. Hansen. 2004. The AWARE Architecture:Supporting Context-Mediated Social Awareness in Mobile Cooperation. In Pro-ceedings of the 2004 ACM Conference on Computer Supported Cooperative Work(Chicago, Illinois, USA) (CSCW ’04). Association for Computing Machinery, NewYork, NY, USA, 192–201. https://doi.org/10.1145/1031607.1031639

[3] Roman Bednarik, Shahram Eivazi, and Michal Hradis. 2012. Gaze and Conversa-tional Engagement in Multiparty Video Conversation: An Annotation Schemeand Classification of High and Low Levels of Engagement. In Proceedings ofthe 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction (SantaMonica, California) (Gaze-In ’12). Association for Computing Machinery, NewYork, NY, USA, Article 10, 6 pages. https://doi.org/10.1145/2401836.2401846

[4] Scott Brave and Andrew Dahley. 1997. InTouch: A Medium for Haptic Interper-sonal Communication. In CHI ’97 Extended Abstracts on Human Factors in Com-puting Systems (Atlanta, Georgia) (CHI EA ’97). Association for Computing Ma-chinery, New York, NY, USA, 363–364. https://doi.org/10.1145/1120212.1120435

[5] Scott Brave, Clifford Nass, and Erenee Sirinian. 2001. Force-Feedback in computer-mediated communication. 145–149.

[6] Chun-Yi Chen, Jodi Forlizzi, and Pamela Jennings. 2006. ComSlipper: An Ex-pressive Design to Support Awareness and Availability. In CHI ’06 ExtendedAbstracts on Human Factors in Computing Systems (Montréal, Québec, Canada)(CHI EA ’06). Association for ComputingMachinery, New York, NY, USA, 369–374.https://doi.org/10.1145/1125451.1125531

[7] Jeremy R. Cooperstock. 2011. Multimodal Telepresence Systems. IEEE SignalProcessing Magazine 28, 1 (2011), 77–86. https://doi.org/10.1109/MSP.2010.939040

[8] Chris Dodge. 1997. The Bed: A Medium for Intimate Communication. In CHI ’97Extended Abstracts on Human Factors in Computing Systems (Atlanta, Georgia)(CHI EA ’97). Association for ComputingMachinery, New York, NY, USA, 371–372.https://doi.org/10.1145/1120212.1120439

[9] Eric Easthope. 2021. SignalShare, GitHub. Retrieved March 9, 2021 from https://github.com/ericeasthope/signal-share

[10] Geri Gay. 2009. Context-Aware Mobile Computing: Affordances ofSpace, Social Awareness, and Social Influence. https://doi.org/10.2200/S00135ED1V01Y200905HCI004

[11] McGill Ultra-Videoconferencing Research Group. 2005. ULTRA-Videoconferencing.Retrieved February 1, 2021 from http://canarie.mcgill.ca/demos/SC05_Seattle_Flyer.pdf

[12] Komei Hasegawa and Yasushi Nakauchi. 2014. Telepresence Robot That Exag-gerates Non-Verbal Cues for Taking Turns in Multi-Party Teleconferences. InProceedings of the Second International Conference on Human-Agent Interaction(Tsukuba, Japan) (HAI ’14). Association for Computing Machinery, New York,NY, USA, 293–296. https://doi.org/10.1145/2658861.2658945

[13] Rongqing Huang and Changxue Ma. 2006. Toward A Speaker-Independent Real-Time Affect Detection System. In Proceedings of the 18th International Conferenceon Pattern Recognition - Volume 01 (ICPR ’06). IEEE Computer Society, USA,1204–1207. https://doi.org/10.1109/ICPR.2006.1127

[14] Kevin Jahns. 2014. Yjs: peer-to-peer shared types, GitHub. Retrieved February 11,2021 from https://github.com/yjs/yjs

[15] Tuomas Kantonen, Charles Woodward, and Neil Katz. 2010. Mixed reality invirtual world teleconferencing. In 2010 IEEE Virtual Reality Conference (VR). 179–182. https://doi.org/10.1109/VR.2010.5444792

[16] Jia Zheng Lim, James Mountstephens, and Jason Teo. 2020. Emotion RecognitionUsing Eye-Tracking: Taxonomy, Review and Current Challenges. Sensors 20, 8(2020). https://doi.org/10.3390/s20082384

[17] Danielle Lottridge, Nicolas Masson, and Wendy Mackay. 2009. Sharing EmptyMoments: Design for Remote Couples. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Associationfor Computing Machinery, New York, NY, USA, 2329–2338. https://doi.org/10.1145/1518701.1519058

[18] Jens Müller, Roman Rädle, and Harald Reiterer. 2016. Virtual Objects as SpatialCues in Collaborative Mixed Reality Environments: How They Shape Commu-nication Behavior and User Task Load. In Proceedings of the 2016 CHI Confer-ence on Human Factors in Computing Systems (San Jose, California, USA) (CHI’16). Association for Computing Machinery, New York, NY, USA, 1245–1249.https://doi.org/10.1145/2858036.2858043

[19] Tamara Munzner. 2014. Visualization Analysis and Design. A.K. Peters Visualiza-tion Series. CRC Press.

[20] Christoph Obermair, Bernd Ploderer, Wolfgang Reitberger, andManfred Tscheligi.2006. Cues in the Environment: A Design Principle for Ambient Intelligence. InCHI ’06 Extended Abstracts on Human Factors in Computing Systems (Montréal,Québec, Canada) (CHI EA ’06). Association for Computing Machinery, New York,NY, USA, 1157–1162. https://doi.org/10.1145/1125451.1125669

[21] Victor Shaburov and Yurii Monastyrshyn. 2017. Emotion recognition in videoconferencing. Patent No. US9576190B2, Filed Mar. 18th., 2015, Issued Feb. 21st.,2017.

[22] Janez Zaletelj and Andrej Koir. 2017. Predicting students’ attention in the class-room from Kinect facial and body features. EURASIP Journal on Image and VideoProcessing 2017 (2017), 1–12.

[23] Jin Li Rajesh Kutpadi Hegde Kori Marie Quinn Michel Pahud Zhengyon Zhang,Xuedong D. Huang and Jayman Dalal. 2012. Force-feedback within telepresence.Patent No. US8332755B2, Filed May 28th., 2009, Issued Dec. 11th., 2012.

https://doi.org/10.1109/3DUI.2007.340777

https://doi.org/10.1109/3DUI.2007.340777

https://doi.org/10.1145/1031607.1031639

https://doi.org/10.1145/2401836.2401846

https://doi.org/10.1145/1120212.1120435

https://doi.org/10.1145/1125451.1125531

https://doi.org/10.1109/MSP.2010.939040

https://doi.org/10.1145/1120212.1120439

https://github.com/ericeasthope/signal-share

https://github.com/ericeasthope/signal-share

https://doi.org/10.2200/S00135ED1V01Y200905HCI004

https://doi.org/10.2200/S00135ED1V01Y200905HCI004

http://canarie.mcgill.ca/demos/SC05_Seattle_Flyer.pdf

http://canarie.mcgill.ca/demos/SC05_Seattle_Flyer.pdf

https://doi.org/10.1145/2658861.2658945

https://doi.org/10.1109/ICPR.2006.1127

https://github.com/yjs/yjs

https://doi.org/10.1109/VR.2010.5444792

https://doi.org/10.3390/s20082384

https://doi.org/10.1145/1518701.1519058

https://doi.org/10.1145/1518701.1519058

https://doi.org/10.1145/2858036.2858043

https://doi.org/10.1145/1125451.1125669

signalshare: a secondary two-way mediated interaction tool

Documents