audio scene analysis and music cognitive elements of music listening kevin d. donohue databeam...
TRANSCRIPT
Audio Scene Analysis and Music Cognitive Elements of Music Listening
Kevin D. DonohueDatabeam Professor
Electrical and Computer EngineeringUniversity of Kentucky
What is Music?
1 a : the science or art of ordering tones or sounds in succession, in combination, and in temporal relationships to produce a composition having unity and continuity b : vocal, instrumental, or mechanical sounds having rhythm, melody, or harmony
Merrian-Webster Online Dictionary:
http://www.m-w.com/dictionary/music
Auditory Scene: Input Sensory organs (ears) separate acoustic energy into
frequency bands and convert band energy into neural firings
The auditory cortex receives the neural responses and abstracts an auditory scene.
http://hyperphysics.phy-astr.gsu.edu/hbase/sound/hearcon.html Time
0
0.05
0.1
Frequency
12
34
Auditory Scene: Perception Perception derives a useful representation of reality
from sensory input. Auditory Stream refers to a perceptual unit associated
with a single happening (A.S. Bregman, 1990) .
Acoustic to Neural
Conversion
Organize into Auditory Streams
Representation of Reality
Auditory Stream ExperimentBergman & Campbell (1971) Streams tend to form by grouping notes close in time and frequency
(similarity and proximity). Click on spectrograms to play tone sequence. Identify changes in tone
grouping based on separation in time and frequency.
http://www.psych.mcgill.ca/labs/auditory/demo2.htmlhttp://www.psych.mcgill.ca/labs/auditory/demo3.html
Note change in grouping/phrasing from inserting a pair of closely spaced tones around the lower tone.
Circularity in Pitch Judgement
Shepard’s Scale (1964)
(Auditory Demonstrations CD, from the Acoustical Society of America)
Perceptual OrganizationOrganization properties: Belongingness – a sensory element belongs to an
organization (or stream) of which is a part. Exclusive allocation – a sensory element cannot belong to
more than one organization at a time. Bregman & Rudnicky (1975) Click on spectrogram to listen to tone sequence. Note in
first case the later tonal group sounds as one stream due to time proximity. In the second case flanking the lower tones with a sequence at same frequency, separates the lower tone from the upper tones creating 2 separate streams.
Perceptual OrganizationOrganization properties: Closure – perceived continuity, a tendency to close strong
perceptual forms, response to missing evidence. Click on time waveform plots to listen. In the first case a
low level tone is playing and then stops, but the gap is covered by a white noise mask. Most will hear the tone playing through the mask.
Tone pattern first spectrogram
White noise only, used in masking
Sequential and Spectral Integration
Sequential IntegrationGrouping sensory elements over time or events at
different times and considered as from the same source. Melody, rhythm
Spectral IntegrationFusing simultaneous sensory elements over frequency
into one Timbre, harmony
Timbre and Spectral Integration The time harmonic structure (spectral envelope) and time envelope give
rise the timbre of the sound. Click on spectra to hear sound. Note Impact of spectral and time envelopes
0 2000 4000 6000 8000 10000 12000-60
-40
-20
0
Hertz
dB
0 0.2 0.4 0.6 0.8 1-0.5
0
0.5
Seconds
Am
plitu
de0 2000 4000 6000 8000 10000 12000
-60
-40
-20
0
Hertz
dB
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
Seconds
Am
plitu
de
0 2000 4000 6000 8000 10000 12000-60
-40
-20
0
Hertz
dB
0 0.2 0.4 0.6 0.8 1-1
-0.5
0
0.5
1
Seconds
Am
plitu
de
Timbre and Spectral Integration Simultaneous tones grouped by timbre Click on spectrograms to play sounds. Note that different spectral
bands do not sound like different streams. Just one stream is heard.
Seconds
Her
tz
0.1 0.2 0.3 0.40
1000
2000
3000
4000
5000
Seconds
Her
tz
0.1 0.2 0.3 0.40
1000
2000
3000
4000
5000
Same Note (A) 2 Notes (F and A)
Auditory Scene Organization
Primitive Stream Segregation Inherent constraints in auditory scene analysis (perceptual organization
demonstrated by infants/children) Music: Organization of musical sensory units
Schema-based segregation Learned constraints in auditory scene analysis (differences in perceptual
organization resulting from training and culture) Music: Differences between musicians and non-musicians Music: Differences resulting from acculturation
(A.S. Bregman, Auditory Scene Analysis, MIT Press 1990, pp. 1-45)
Music Related Terms
Pitch – Perceived frequency/fundamental tone (20Hz-20kHz Range)
Melody – Pattern of tones identified by the intervals between consecutive pitches
Contour – Shape of the melody without regard to intervals
Loudness – Perceived intensity of sound (0dB to 120dB) Timbre – Nature of a sound defined mostly by its
harmonic structure and time envelope Rhythm – Repeated pattern of strong and weak sounds Tempo – Rate of the rhythm
Melody Invariance
A melody can typically be recognized over changes in pitch, loudness, timbre, tempo, spatial location, and reverberations.
Contours are typically recalled better than actual melodies (intervals) for unfamiliar tunes. (Massaro, Kallman, and Kelly 1980).
(Daniel J. Levitin, Memory for Musical Attributes, in Music Cognition and Computerized Sound, ed. P.R. Cook, MIT Press, 1999, pp. 209-227)
Primitive Musical Perception
Distinguish between cognitive components present at an early age and those resulting from acculturation. Infant: Grasp of musical structuresAdult: Develop cognitive strategies for applying
musical structures
(W. Jay Dowling, The Development of Music Perception and Cognition, The Psychology of Music Academic Press, 1999, pp 603-625)
Summary Innate perceptual organization separates sounds from different
sources. Grouping by pitch, contour, rhythm (phrasing), and timbre are exhibited by infants.
Acculturation refines melody distinctions and its relationship to harmonies and rhythms based on cultural scales and patterns.
Melodic memory is enhanced for melodies following note of a known scale.
Auditory scene analysis operations apply broadly to all sounds (speech, noise, music). Why some auditory streams become pleasurable/stimulating/interesting (music), and others are simply used to form a perception of reality is still not clear.
How many streams are there?
0
20
40
60
80
100
120
Seconds
Her
tz
Tell Me Ma - Spectrogram in dB
0 5 10 15
1000
2000
3000
4000
5000
6000
7000
8000
Interesting Websites
Mind, Music, and Machinehttp://www.nici.kun.nl/mmm/ Auditory Scene Analysishttp://www.psych.mcgill.ca/labs/auditory/introASA.html
Joe Wolfe’s Web Page
http://www.phys.unsw.edu.au/~jw/Joe.html