ryan kilgore mark chignell university of toronto ibm cas, kmdi 03 | 22 | 06
DESCRIPTION
seeing unfamiliar voices. does visualization of spatial position enhance voice identification?. Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06. presentation overview. Voice collaboration and spatial audio Visualizing audio spaces Experimental methodology - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/1.jpg)
Ryan KilgoreMark ChignellUniversity of Toronto IBM CAS, KMDI
03 | 22 | 06
seeing unfamiliar voicesdoes visualization of spatial position enhance voice identification?
![Page 2: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/2.jpg)
Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
• Voice collaboration and spatial audio
• Visualizing audio spaces
• Experimental methodology
• Results
• Discussion
presentation overview
02 | 19
![Page 3: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/3.jpg)
• Traditional methods of synchronous communication do not adequately support large groups
• Monaural audio, lack of visual feedback, and poor audio quality make it difficult to determine:
• Who is present?
• Who is speaking?
• What is being said?
problems with voice collaboration
03 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 4: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/4.jpg)
spatial audio | overview
Free-Field Acoustics
S1 S2
S
Virtual Acoustics
S1 S2
04 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 5: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/5.jpg)
spatial audio | benefits (1 of 2) • Reduction in masking; facilitation of auditory scene
analysis (Bregman, 1990; Gilkey & Anderson, 1997)
• Increased speech intelligibility in noisy environments (Ericson & McKinley, 1997)
• Increased speech intelligibility in multi-talker listening tasks (Drullman & Bronkhorst, 2000; Abouchacra, 2001; Bolia, 2001)
05 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 6: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/6.jpg)
• Distinct voice locations aid in cognition of audio conference events (Baldis, 2001; Kilgore et al, 2003)
• Significantly preferred to traditional, monaural voice presentation
• Reduced perception of attention required for speaker identification task
• Increased speaker identification performance, particularly for ‘personalized’ audio spaces
spatial audio | benefits (2 of 2)
06 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 7: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/7.jpg)
Vocal Village interface
07 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 8: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/8.jpg)
• Early Vocal Village field trials indicated users want GUI for monitoring and controlling audio space
• Participants in audio-only field trials have highlighted the difficulty of knowing who was present in the audio space (Singer et al, 1999)
• Visual modality can convey awareness-supporting information parallel to audio communication
Will increased awareness of voice locations aid listeners in learning to identify completely unfamiliar voices?
visualization | audio spaces
08 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 9: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/9.jpg)
visualization | previous studies
• Spatially arranged photos of speakers showed no performance benefits but preference (Baldis, 2001)
• Graphic insert w/ voice names and locations showed no benefit to voice identification in an ATC task (MacDonald, 2002)
• HOWEVER: These studies used familiar collaborators, or were limited to only four voices
09 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 10: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/10.jpg)
Determine if visual representation of voice locations will aid listeners in learning to recognize voices that are completely unfamiliar
Dependent variables:
• Accuracy and response time for voice identification task
• Confidence in voice identification task performance
• Mental workload (NASA-TLX) (Hart & Staveland, 1998)
experiment | overview
10 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 11: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/11.jpg)
• Modified Coordinate Response Measure (CRM) listening task (Bolia et al, 2000)
“Ready [call sign], go to [color] [number] now”
• 4 male, 4 female voices
• Response to target with color, number, speaker’s name
• 27 Participants, no voice training
• Provided performance feedback (w/ correct answer)
experiment | methodology (1 of 2)
11 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 12: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/12.jpg)
• Two independent variables:
• 4 experimental blocks
• 40 stimuli per block (160 total)
experiment | methodology (2 of 2)
Presentation Format
• Mono, Spatial, Spatial+Visual• Varied between subjects
Number of Voices
• 4 Voices, 8 Voices• Varied between subjects
12 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 13: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/13.jpg)
experiment | stimuli
13 | 19Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006
![Page 14: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/14.jpg)
experiment 3 | results (1 of 3)
Correct Voice Identifications by Experimental Block
Experimental BlockF[3, 30] = 61.15, p < .001
Number of VoicesF[1, 30] = 68.21, p < .001
FormatF[2, 30] = 1.39, p = .27
Number × FormatF < 1
4 Voices
Block 1 Block 2 Block 3 Block 4
Per
cent
Cor
rect
Voi
ce I
dent
ifica
tions
by
Blo
ck
0.0
0.2
0.4
0.6
0.8
1.0
8 Voices
Block 1 Block 2 Block 3 Block 40.0
0.2
0.4
0.6
0.8
1.0
MonoSpatialSpatial+Visual
MonoSpatialSpatial+Visual
4 Voices
Block 1 Block 2 Block 3 Block 4
Pe
rcen
t C
orr
ect
Vo
ice
Iden
tific
atio
ns b
y B
lock
0.0
0.2
0.4
0.6
0.8
1.0
8 Voices
Block 1 Block 2 Block 3 Block 40.0
0.2
0.4
0.6
0.8
1.0
MonoSpatialSpatial+Visual
MonoSpatialSpatial+Visual
Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006 15 | 19
![Page 15: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/15.jpg)
experiment 3 | results (2 of 3)
Removed data for low-learning participants:
• Excluded subjects that showed no improvement in voice identification over duration of experiment
• 2 Mono participants removed3 Spatial participants removed3 Spatial+Visual participants removed
Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006 16 | 19
![Page 16: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/16.jpg)
experiment 3 | results (3 of 3)
Correct Voice Identifications(low-learning subjects removed)
4V: Format × BlockF < 1
8: Format × BlockF[2,10] = 5.43, p = .025
4 Voices
Mono Spatial Spatial+Visual
% C
orr
ect R
esp
on
ses
0.0
0.2
0.4
0.6
0.8
1.0
8 Voices
Mono Spatial Spatial+Visual0.0
0.2
0.4
0.6
0.8
1.0
Trials 1-120Trials 121-160
Trials 1-120Trials 121-160
4 Voices
Mono Spatial Spatial+Visual
% C
orr
ect
Re
spo
nse
s
0.0
0.2
0.4
0.6
0.8
1.0
8 Voices
Mono Spatial Spatial+Visual0.0
0.2
0.4
0.6
0.8
1.0
Trials 1-120Trials 121-160
Trials 1-120Trials 121-160
Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006 17 | 19
![Page 17: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/17.jpg)
discussion
• Simple visual representation of voice locations improves the learning of completely unfamiliar voices in larger audio spaces (8 talkers)
• Visualizations continue to support identification as voices become increasingly familiar
• Spatial presentation of voice, coupled with low-cost visualization methods, may be particularly useful in supporting:
• Large collaborative groups• Groups with limited familiarity
Kilgore & Chignell, Seeing Unfamiliar Voices | HFT 2006 18 | 19
![Page 18: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/18.jpg)
current work – visual scale
19 | 19
![Page 19: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/19.jpg)
questions?
fin
![Page 20: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/20.jpg)
references (1 of 2)
Abouchacra, K., (2001). Binaural Helmet: Improving speech recognition in noise with spatialized sound. Human Factors, 43 (4), 584.
Baldis, Jessica., (2001). Effects of spatial audio on memory, comprehension, and preference during desktop conferences. Proceedings of the SIGCHI conference on human factors in computing systems, Vol. 3, 166-173.
Bregman, A. S., (1990). Auditory Scene Analysis. Cambridge: MIT Press.
Bolia, Robert S., W. Todd Nelson, Mark A. Ericson, and Brian D. Simpson, (2000). A speech
corpus for multitalker communication research. J. Acoust. Soc. Am. 107 (2) 1065-1066.
Bolia, R., (2001). Asymmetric performance in the cocktail party effect: implications for the design of Spatial Audio Displays. Human Factors, 43 (2), 208.
Drullman, Rob and Adelbert W. Bronkhorst, (2000). Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. J. Acoust. Soc. Am. 107(4), 2224-2235.
Ericson, M.A., and R. L. McKinley, (1997). The intelligibility of multiple talkers separated spatially in noise. In Binaural and Spatial Hearing in Real and Virtual Environments, Gilkey, Robert H. and Timothy R. Anderson Eds., NJ, Lawrence Erlbaum Associates, 701-724.
ref1
![Page 21: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/21.jpg)
ref2
Gilkey, Robert H. and Timothy R. Anderson Eds., (1997). Binaural and Spatial Hearing in Real and Virtual Environments, New Jersey: Lawrence Erlbaum Associates.
Hart, S.G., and Staveland, L.E., (1988). Development of the NASA-TLX (Task Load Index): results of empirical and theoretical research. In P.A. Hancock, and N. Meshkati (Eds.), Human Mental Workload. North Holland: Elsevier Science Publishers, 139-183.
Kilgore, Ryan M., Mark Chignell and Paul W. Smith, (2003). Spatialized audioconferencing: what are the benefits? Proceedings of the 2003 conference of the Centre for Advanced Studies Conference on Collaborative Research, 111-120.
MacDonald, J. (2002). Intelligibility of speech in a virtual 3-D environment. Human Factors, 44(2), 272.
Singer, A., Hindus, D., Stifelman, L., and S. White, (1999),.Tangible progress: less is more in Somewire audio spaces, Proceedings of the SIGCHI conference on
human factors in computing systems, 104-111.
references (2 of 2)
![Page 22: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/22.jpg)
spatial audio | explanation
• Perception of relative differences between signals picked up by the left and right ears
• Allows people with binaural hearing to locate sound sources in three-dimensional space
• Product of multiple interaural cues: IID, ITD, HRTFs
![Page 23: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/23.jpg)
experimental interface
![Page 24: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/24.jpg)
Vocal Village interface
![Page 25: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/25.jpg)
experiment 4 | visual stimuli (1 of 2)
![Page 26: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/26.jpg)
experiment 4 | visual stimuli (2 of 2)
25.5”
10”
2”
![Page 27: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/27.jpg)
experiment 4 | audio stimuli
Full-Scale (24°) Half-Scale (12°)
![Page 28: Ryan Kilgore Mark Chignell University of Toronto IBM CAS, KMDI 03 | 22 | 06](https://reader035.vdocument.in/reader035/viewer/2022062805/56814c7a550346895db99704/html5/thumbnails/28.jpg)
thanks
• UofT Interactive Media Lab and the Vocal Village development team
• My committee (Mark Chignell, Greg Jamieson, Ron Baecker)
• IBM Centre for Advanced Studies, Toronto
• Knowledge Media Design Institute