communication acoustics karjalainen

322
M. Karjalainen 1 General introduction Communication by sound and voice – Examples of communication situations Systems approach to communication Modeling and theory formation in research Chapter 1: Introduction

Upload: w56

Post on 10-Apr-2015

314 views

Category:

Documents


6 download

DESCRIPTION

Communication Acoustics- Prof. Matti Karjalainen- http://www.acoustics.hut.fi/teaching/S-89.3320/

TRANSCRIPT

Page 1: Communication Acoustics Karjalainen

M. Karjalainen1

• General introduction

• Communication by sound and voice

– Examples of communication situations

• Systems approach to communication

• Modeling and theory formation in research

Chapter 1: Introduction

Page 2: Communication Acoustics Karjalainen

M. Karjalainen2

Information Transmission by Sound

Environmental orientation by sound

Page 3: Communication Acoustics Karjalainen

M. Karjalainen3

Communication by Speech

Speech communication via acoustic medium

Page 4: Communication Acoustics Karjalainen

M. Karjalainen4

Communication by Music

Music via acoustic medium

Page 5: Communication Acoustics Karjalainen

M. Karjalainen5

Communication by Music

Origins of speech and music ?

Speech has been important in evolution by what about music?

Role of music: just a side product or important factor?

- Charles Darvin: Important for mating etc.

Two interesting recent books:

Steven Mithen: “The Singing Neanderthals ---

The Origins of Music, Language, Mind, and Body”

Harward University Press, 2006

Daniel J. Levitin: This is Your Brain on Music ---

The Science of a Human Obsession, PLUME 2006

Page 6: Communication Acoustics Karjalainen

M. Karjalainen6

Speech Transmission

Speech communication electronic medium

Page 7: Communication Acoustics Karjalainen

M. Karjalainen7

Virtual Acoustic Reality

Virtual instrument in virtual space

Page 8: Communication Acoustics Karjalainen

M. Karjalainen8

Man-Machine Communication by Speech

Speech synthesis and recognition

Page 9: Communication Acoustics Karjalainen

M. Karjalainen9

A Black-Box Approach

Input-output relationship

Page 10: Communication Acoustics Karjalainen

M. Karjalainen10

A Systems Approach

A multi-level system

Page 11: Communication Acoustics Karjalainen

M. Karjalainen11

Systemic Concepts

• Element (part of a whole, entity)

• Relation / property• Structure (relatively permanent properties of a system)

• Function(ality) (relatively variant properties of a system)

• Event (a relatively discrete change, typically in time)

• State• Object• Type (class)• System• Control• Process• Organization• Hierarchy / heterarchy• Data / information / knowledge (communication, language)

Page 12: Communication Acoustics Karjalainen

M. Karjalainen12

Abstraction in Modeling and Theory Formation

Abstraction hierarchy

Page 13: Communication Acoustics Karjalainen

M. Karjalainen13

Communication by Sound and Voice

hardware

software

functionware

contentware

Physics

Cognition

Signals

Information

Analysis Synthesis

Page 14: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 2: Acoustics

This is background information that is not asked directlyin the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.

Page 15: Communication Acoustics Karjalainen

M. Karjalainen2

Chapter 2: Acoustics

Sound as physical phenomenon

When a tree in a forrest falls, and there isno one to listen, does it make a sound?

• Vibration – generation of sound

• Sound radiation

• Sound propagation

• Reflection, absorption,

• Diffraction, refraction

• Standing waves

• Resonance, resonators

Page 16: Communication Acoustics Karjalainen

M. Karjalainen3

Vibrating systems

• Simple vibration: mass–spring system

Page 17: Communication Acoustics Karjalainen

M. Karjalainen4

Vibrating systems

Undamped and damped oscillation

Page 18: Communication Acoustics Karjalainen

M. Karjalainen5

Resonance

Mass-springresonator

Helmholtz-resonator

Page 19: Communication Acoustics Karjalainen

M. Karjalainen6

Two-mass vibrating system

Transversal and longitudinal vibration of a two-mass system

Page 20: Communication Acoustics Karjalainen

M. Karjalainen7

Vibration modes of a string

Page 21: Communication Acoustics Karjalainen

M. Karjalainen8

Wave propagation

Wave equation:

D’Alembert:

Page 22: Communication Acoustics Karjalainen

M. Karjalainen9

Sound pressure, sound pressure level, decibel

Sound pressure: p [Pa]

Sound pressure level:

Reference:

Page 23: Communication Acoustics Karjalainen

M. Karjalainen10

Wave phenomena: spherical wave

Sound velocity in the air:

Spherical wave:

Page 24: Communication Acoustics Karjalainen

M. Karjalainen11

Wave phenomena: planar wave

Planar wave in a tube:

Reflection (and transmission):

Page 25: Communication Acoustics Karjalainen

M. Karjalainen12

Lowest resonance modes in a tube

Open ends One end closed

Page 26: Communication Acoustics Karjalainen

M. Karjalainen13

Spectral content of string vibration

Page 27: Communication Acoustics Karjalainen

M. Karjalainen14

Bar and membrane modes

Bar

Membrane

Page 28: Communication Acoustics Karjalainen

M. Karjalainen15

Reflection and refraction (bending)

Page 29: Communication Acoustics Karjalainen

M. Karjalainen16

Diffraction

Page 30: Communication Acoustics Karjalainen

M. Karjalainen17

Sound propagation paths in a room

Page 31: Communication Acoustics Karjalainen

M. Karjalainen18

Sound field decay in a room

Tapiola-sali

Page 32: Communication Acoustics Karjalainen

M. Karjalainen19

Sound field in a room, Computer simulation

Page 33: Communication Acoustics Karjalainen

M. Karjalainen20

Sound field level in a reverberant room

Page 34: Communication Acoustics Karjalainen

M. Karjalainen21

Modal behavior in a room

L i = dimensions of a rectangular roomn i = integer indices 0, 1, 2, ...

measured magnitude response in a room

Page 35: Communication Acoustics Karjalainen

M. Karjalainen22

Sound propagation by image source model

Solid line = real path; dotted line virtual path

Page 36: Communication Acoustics Karjalainen

M. Karjalainen23

Electroacoustics: Loudspeaker

principle driver structure enclosure

Dynamic loudspeaker

Page 37: Communication Acoustics Karjalainen

M. Karjalainen24

Electroacoustics: Microphone

principle construction

Condenser microphone

Page 38: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 3: Sound and Voice as Signals

This is background information that is not asked directlyin the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.

Page 39: Communication Acoustics Karjalainen

M. Karjalainen2

Sound and Voice as Signals

• Signal as a mathematical function:– Pure tone:

– Random signal:

• Discrete-time numeric sequence

In signal representations a physical or abstract variable is typically reptesented as a function of time, such as:

Continues ...

Page 40: Communication Acoustics Karjalainen

M. Karjalainen3

Sound and Voice as Signals

Continues ... • Graphical presentations:

sinewave random noise

speech waveformsample sequence

unit impulse unit pulse

Page 41: Communication Acoustics Karjalainen

M. Karjalainen4

Linear and time-invariant (LTI) systems

• Any (stable) LTI system can be fullyrepresented by its impulse response

• Output cannot include any frequencies thatare not in the input (no nonlinear distortion)

• Any bandlimited LTI system can beapproximated by digital filters with arbitraryaccuracy (theoretically)

Properties of LTI systems:

Page 42: Communication Acoustics Karjalainen

M. Karjalainen5

Convolution

Signal processing algorithms

Fourier analysis

Page 43: Communication Acoustics Karjalainen

M. Karjalainen6

Signal processing algorithms

Fourier synthesis

Convolution vs. Fourier transform

Page 44: Communication Acoustics Karjalainen

M. Karjalainen7

Decomposition of sawtooth waveform

Page 45: Communication Acoustics Karjalainen

M. Karjalainen8

Spectrum analysis

Magnitude spectrum

Phase spectrum

Group delayPhase delay

Page 46: Communication Acoustics Karjalainen

M. Karjalainen9

Fourier analysis with windowing

• Rectangular window

• Hamming window

• Hann(ing) window

• Kaiser window

• Blackman (Blackman-Harris) window

Page 47: Communication Acoustics Karjalainen

M. Karjalainen10

Spectrum analysis using Fourier analysis with windowing

Sine wave

Sine wavewindowed

synchronously

Sine wavewindowed non-synchronously

Sine wave,Hamming-windowed

Page 48: Communication Acoustics Karjalainen

M. Karjalainen11

Vowel spectra

Page 49: Communication Acoustics Karjalainen

M. Karjalainen12

Time-frequency representations: Spectrogram

Word: /kaksi/

Page 50: Communication Acoustics Karjalainen

M. Karjalainen13

Auto- and cross-correlation

Cross-correlationAutocorrelation

Page 51: Communication Acoustics Karjalainen

M. Karjalainen14

Cepstrum

• Compute Fourier transform

• Logarithm of (power) spectrum

• Inverse Fourier transform

Page 52: Communication Acoustics Karjalainen

M. Karjalainen15

Digital signal processing: DSP systems

• Analog-to-digital (A/D) converter

• Digital signal processor (+ software)

• Digital-to-analog (D/A) converter

Page 53: Communication Acoustics Karjalainen

M. Karjalainen16

Signal quantization: A/D conversion

• Linear quantization (PCM-coding)

• Discrete levels: 2n (n= bit number)

• 16–24 bits/sample in audio ( 96 dB SNR)

• Sample rate: 44100 or 48000 samples/sec

Page 54: Communication Acoustics Karjalainen

M. Karjalainen17

Z-transform

Linear transform of sequence x(n) :

Unit delay as building element:

Digital filtering can be expressed as

rational function (or polynomial) of z-1

Page 55: Communication Acoustics Karjalainen

M. Karjalainen18

Digital filtering: FIR filters

FIR = finite impulse response filter

Page 56: Communication Acoustics Karjalainen

M. Karjalainen19

Digital filtering: IIR filters

IIR = infinite impulse response filter

Page 57: Communication Acoustics Karjalainen

M. Karjalainen20

Linear prediction (AR-modeling)

Modeling of signal generation with flat

spectrum excitation (impulse or noise)

and IIR (all-pole) filter. Speech example:

Signal

Windowed

FFT-spectrum

LP-spectra

Page 58: Communication Acoustics Karjalainen

M. Karjalainen21

Neural networks

MLF = multilayer feedforward network = multilayer perceptron

Input layer + hidden and output layer nodeswith sigmoidal nonlinearity

Backpropagation algorithm for training

Page 59: Communication Acoustics Karjalainen

M. Karjalainen22

Hidden Markov models (HMM)

For probabilistic modeling of state sequences

Used especially in speech recognition

Page 60: Communication Acoustics Karjalainen

M. Karjalainen23

Audio reproduction: loudspeaker response

Magnitude response of a non-ideal loudspeaker

Page 61: Communication Acoustics Karjalainen

M. Karjalainen24

Group delay response of a loudspeaker

Page 62: Communication Acoustics Karjalainen

M. Karjalainen25

Reproduction quality: Distortion and SNR

Nonlinearity results in distortion: Sine wave inputresults in generation of harmonic components A(i)Distortion (usually given in %):

Signal-to-noise ratio (SNR):

Distortion in general is discussed in later chapters

Page 63: Communication Acoustics Karjalainen

M. Karjalainen26

Response equalization

Non-flat magnitude response can be equalized(flattened), by digital filtering.

Example by so-called frequency-warped filters

Page 64: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 4: Speech and Music

• Speech communication

• Speech production:– Speech production mechanism

– Vocal cords – phonation

– Vocal and nasal tract – articulation

– Units and notation of speech: vowels, consonants

– Prosody of speech

– Modeling of speech production

• Singing voice

• Speech processing: analysis, synthesis, coding, recognition

• Musical instruments as sound sources

• Music signal processing– Sound synthesis techniques

– Physical modeling

– Digital audio vs. music

Page 65: Communication Acoustics Karjalainen

M. Karjalainen2

Speech communication chain

Page 66: Communication Acoustics Karjalainen

M. Karjalainen3

Speech production mechanism

Page 67: Communication Acoustics Karjalainen

M. Karjalainen4

Phonation and articulation

• Vocal cords (vocal folds) — phonation– Generation and controlling of voiced sound at glottis

• Vocal tract and nasal tract — articulation– Controlling of voice features by articulation organs

• Concepts:– Glottis (vocal cord opening)

– Voiced / unvoiced / combined

– Constriction

– Formant (and antiformant)

– Vowel / consonant

– Prosodic features

Page 68: Communication Acoustics Karjalainen

M. Karjalainen5

Units and notation of speech – Phonetics

• Phonetics: study and description of spoken language

• Languages and language families

– Indo-European, Finno-Ugric, …

• Phonetic alphabet:

– IPA (International Phonetic Alphabet)

– Computerized: SAMPA, Worldbet, ...

• Units of spoken language:

– Phoneme (smallest linguistic unit), abstract unit class

– Allophone (variant of a phoneme)

– Phone (äänne in Finnish), a concrete unit of speech

– Diphone (from mid phone via transition to the mid of next one

– Triphone (similar combination of three successive phones)

– Speech segment (typically subunit of a phone)

Page 69: Communication Acoustics Karjalainen

M. Karjalainen6

Vowels (Finnish)

• Front–back (etisyys: etu–taka)

• Open–closed (suppeus: suppea–väljä)

• Rounded–unrounded (lavea–pyöreä)

Page 70: Communication Acoustics Karjalainen

M. Karjalainen7

Consonants (Finnish)

• Articulation place (ääntämispaikka):

– Labial, dental, palatal, velar, laryngeal

• Articulation manner (ääntämistapa)

– Stop consonant (klusiili), fricative (frikatiivi), nasal(nasaali) tremulant (tremulantti), lateral (lateraali),semivowel (puolivokaali)

Page 71: Communication Acoustics Karjalainen

M. Karjalainen8

Prosody (suprasegmental features)

• Intonation (intonaatio)

– Primarily by fundamental frequency trajectory

• Stress (paino)

– Primarily by intensity (loudness) of pronounciation

• Timing (ajoitus)

– Rhythmic pattern (primarily by segment durations)

Page 72: Communication Acoustics Karjalainen

M. Karjalainen9

Modeling of speech production

• Simplification of the speech production mechanism

– Acoustic model

Page 73: Communication Acoustics Karjalainen

M. Karjalainen10

Circuit model (transmission-line model)

• Glottal oscillator

– Varying cross-section between vocal cords

• Vocal tract as a transmission line

– Two-directional wave propagation

• Lip radiation (acoustic load)

• Variables: pressure and volume velocity

Page 74: Communication Acoustics Karjalainen

M. Karjalainen11

Signal model = Source-Filter model

• Source = excitation– (a) voiced = quasiperiodic excitation

– (b) unvoiced = noislike excitation

• Filter = vocal and nasal tract

Page 75: Communication Acoustics Karjalainen

M. Karjalainen12

Glottal oscillation

• Phonation = vibration of vocal folds– Glottal opening is a function of time:

• Open phase, closed phase

• Glottal closure event generates the mainexcitation to the vocal tract

Page 76: Communication Acoustics Karjalainen

M. Karjalainen13

Formants (tract resonances)

• Example: resonances of a homogeneous tube– Volume velocity transfer function

– 17 cm tube corresponds to typical male vocal tract

– quarter waveleght resonator with resonances at

Page 77: Communication Acoustics Karjalainen

M. Karjalainen14

Vocal tract transfer functions: vowel /i/

• Inhomogeneous vocal tract area profile /i/– Constriction in frontal tract

– Cavity in the rear part of tract

– First formant down from neutral position

– Second formant up from neutral position

Page 78: Communication Acoustics Karjalainen

M. Karjalainen15

Radiation directivity of speech

• Omnidirectional at low frequencies

• Increased frontal directivity at high frequencies

Azimuth Elevation

Page 79: Communication Acoustics Karjalainen

M. Karjalainen16

Singing voice

• Classical singing style– `Singers formant´ around 3 kHz makes voice more audible

– In soprano singing the high fundamental frequency or aharmonic component should match a formant

• Singing in popular music– Style and way of voice production is free since

amplification makes it loud anyway

– Personality of voice is important

Page 80: Communication Acoustics Karjalainen

M. Karjalainen17

Speech processing

• Speech analysis

– Feature analysis of speech signals

• Speech synthesis

– Typically synthesis from text

• Speech recognition

– From speech to text or commands

• Speech coding

– Compression for transmission or storage

• Speech enhancement

– Improving degraded speech signals

Page 81: Communication Acoustics Karjalainen

M. Karjalainen18

Formant synthesis models

• Cascaded and parallel filter models

Page 82: Communication Acoustics Karjalainen

M. Karjalainen19

Synthesis by waveform concatenation

• Overlap-add reconstruction of voiced speech

– Fundamental frequency (pitch) can be changed

Page 83: Communication Acoustics Karjalainen

M. Karjalainen20

Text-to-speech synthesis

• Transforming text to speech signal

– Language-dependent text processing

– Speech signal production quite language-independent

Page 84: Communication Acoustics Karjalainen

M. Karjalainen21

Text-to-speech synthesis

Page 85: Communication Acoustics Karjalainen

M. Karjalainen22

Speech coding

• Speech signal analysis

– Typically model-based (linear prediction) where source and

filter parameters are analyzed from speech signal

• Quantization of the parameters (bit compression)

• Transmission or storage of parametrized speech

• Reconstruction of parameters

• Reconstruction of speech signal

• Encoding -> transmission -> decoding

Page 86: Communication Acoustics Karjalainen

M. Karjalainen23

Speech recognition

• Feature analysis of signal

– Typically mel cepstral coefficients

– Compression of data & redundancy removal

• Pattern recognition

– Comparison to speech units

– Typically by Hidden Markov Models (HMM)

• Possible postprocessing

– Language modeling

• Formal grammar

• Unlimited text is difficult

Page 87: Communication Acoustics Karjalainen

M. Karjalainen24

Musical instrument sounds

• String instruments

– Plucked string instruments

– Struck string instruments

– Bowed string instrument

• Wind instruments

– Brass instruments

– Woodwind instruments

• Percussion instruments

– Drums etc.

Page 88: Communication Acoustics Karjalainen

M. Karjalainen25

Modeling of musical instruments (string modeling)

• String model– Two-dimensional waveguide (transmission line)

– Excitation (pluck) inserted to both delay lines

– Wave reflections at terminations modeled as filters

– Output is taken at bridge or pickup, sum of both lines

– The same model is applicable to wind instrument bores(but there is a nonlinear oscillating feedback in them)

Page 89: Communication Acoustics Karjalainen

M. Karjalainen26

Simplified string modeling

• String model reduction (signal model)– Two delay lines can be combined to one

– Filters in the loop can be combined to a single loop filter

– Computation is more efficient

– So-called Karplus-Strong model is a simplified case wherean intial random noise is inserted in the delay line beforesynthesis and loop filter is a simple two-tap FIR filter

Page 90: Communication Acoustics Karjalainen

M. Karjalainen27

Impulse response of a simple string model

• Impulse and magnitude responses of the previous model

Page 91: Communication Acoustics Karjalainen

M. Karjalainen28

Body response modeling

• String instrument body works like an LTI system (filter)

Impulse

response

Magnitude

response

(low frequencies)

Page 92: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 5: Structure and Function of Hearing

• Peripheral hearing– External ear

– Middle ear

– Inner ear (cochlea)• Basilar membrane

• Hair cells

• Auditory nerve

• Active cochlea and nonlinearities

• Higher levels of the auditory system

• Basic properties of human hearing– Effective hearing area (level vs. frequency)

– Equal loudness curves

– Technical measures related to hearing• Sound level and frequency weighting functions

Page 93: Communication Acoustics Karjalainen

M. Karjalainen2

Approaches to hearing research

• Anatomy of hearing

– The structure of hearing organs is studied

• Physiology of hearing

– The (physiological) responses of hearing to physical

sound stimuli are studied

• Psychology of hearing

– Functional properties of auditory perception are studied

as subjects reactions to physical sound stimuli

• The main interest here is ’Engineering psychoacoustics’ and

computational models of auditory functions

Page 94: Communication Acoustics Karjalainen

M. Karjalainen3

Peripheral hearing

• External ear (outer ear) Middle ear Inner ear

Page 95: Communication Acoustics Karjalainen

M. Karjalainen4

Schematic of peripheral hearing

• External ear (outer ear) Middle ear Inner ear

Page 96: Communication Acoustics Karjalainen

M. Karjalainen5

External ear and ear canal transmission

• Transfer functions– Frontal sound source to the eardrum (solid line)

– Entrance of ear canal to the eardrum (dotted line)

• Head-related transfer functions (HRTFs) discussed later

Page 97: Communication Acoustics Karjalainen

M. Karjalainen6

Middle ear: Bone conduction

• Ossicles– Malleus (hammer-shaped bone)

– Incus (anvil-shaped bone)

– Stapes (stirrup-shaped bone)

• Impedance match from air to liquid (1:3000)

Page 98: Communication Acoustics Karjalainen

M. Karjalainen7

Animations of middle ear function

Animations: University of Wisconsin http://www.neurophys.wisc.edu/~ychen/auditory/fs-auditory.html

Page 99: Communication Acoustics Karjalainen

M. Karjalainen8

Middle ear conduction and features

• Signal transfer function is a bandpass filter

• Other middle ear features:– Acoustic reflex

– Eustachian tube

Page 100: Communication Acoustics Karjalainen

M. Karjalainen9

Inner ear: the cochlea

• Cochlea is a spiral-shaped, liquid-filled tube of about2.7 turns and 35 mm long

• Stapes vibration enters to cochlea through oval window

• Another window to mid-ear is called round window

• Basilar membrane divides the cochlea into two parts

Cochlea linearized

Page 101: Communication Acoustics Karjalainen

M. Karjalainen10

Cross-section of the cochlea

• Basilar membrane between bony shelves– Division to scala vestibuli and scala tympani

• Reissner’s membrane separates scala media

• Organ of Corti: hair cells

• Tectrorial membrane

Page 102: Communication Acoustics Karjalainen

M. Karjalainen11

Basilar membrane motion: traveling waves

• Basilar membrane is a nonhomogeneous transmission line:– Wider and more massive towards apex

– Sound pressure entering the liquid of cochlea generates atraveling wave along the basilar membrane

– Traveling wave has maximum vibration amplitude dependingon the frequency of wave (characteristic frequency = C.F.)

– High frequencies resonate close to the oval window and lowfrequencies close to helicotrema

Page 103: Communication Acoustics Karjalainen

M. Karjalainen12

Animation of basilar membrane motion

Page 104: Communication Acoustics Karjalainen

M. Karjalainen13

Basilar membrane response to a square-wave signal

• Time–position–amplitude pattern of basilar membranemovement as a response to square-wave signal

Page 105: Communication Acoustics Karjalainen

M. Karjalainen14

Hair cells

• Inner hair cells, in one row

• Outer hair cells, in 3-5 rows

• Together about 15000 – 16000 hair cells

• Each hair cell is equipped on top with u-, v-, or w-shaped filament called stereocilia

• Neural fibers are connected to hair cells

Page 106: Communication Acoustics Karjalainen

M. Karjalainen15

Hair cells in the organ of Corti

Page 107: Communication Acoustics Karjalainen

M. Karjalainen16

Stereocilia (= ’hair bundles’ of hair cells)

Page 108: Communication Acoustics Karjalainen

M. Karjalainen17

Movement of the organ of Corti

Page 109: Communication Acoustics Karjalainen

M. Karjalainen18

Movement and activation of hair cells

Page 110: Communication Acoustics Karjalainen

M. Karjalainen19

Hair cells: neural conduction

• Vibration of the basilar membrane causes bending ofstereocilia and this opens ion channels which modulatespotential within the cell

• Activation of the cell releases neurotransmitter tosynaptic junctions between hair cell and neural fibers ofthe auditory nerve

• A neural spike is generated that propagates in theauditory nerve fiber

• Next spike possible only after at least 1 ms

Page 111: Communication Acoustics Karjalainen

M. Karjalainen20

Activation and inhibition of hair cells

• Asymmetrical effect of sterocilia bending on firing rate

• Cochlear potentials

Page 112: Communication Acoustics Karjalainen

M. Karjalainen21

Phase-locking and synchrony of neural firing

• Statistically phase-lockedwithin half cycle

• Statistical synchrony ofneural firing

Page 113: Communication Acoustics Karjalainen

M. Karjalainen22

Passive vs. active cochlea

• Georg von Békésy found basilar membrane behavior by

experimention with ears from dead animals

=> reduced frequency resolution

• Explanation: second filter needed

• Now it is known that the cochlea is active:

– Especially at low signal levels the outer hair cells amplify

basilar membrane motion

• Outer hair cells receive many efferent neural fibers from

higher neural levels

• Outer hair cells are able to change their length very

rapidly (in synchrony with high audio frequencies)

• Otoacoustic emission (cochlear echo) as a response to

external stimulus, recordable in near canal, is related to

this phenomenon

Page 114: Communication Acoustics Karjalainen

M. Karjalainen23

Auditory nerve responses: firing rate

• Steady-state firing rate is a saturating function with

spontaneous rate (= without sound excitation)

• There are fibers with different sensitivity (and

spontaneous rate)

Page 115: Communication Acoustics Karjalainen

M. Karjalainen24

Poststimulus time histogram (PST)

• Firing rate overshoot and undershoot with onset and

offset of excitation

– Works like automatic gain control

Page 116: Communication Acoustics Karjalainen

M. Karjalainen25

PST with steady-state sinusoidal excitation

• Statistically, half-wave rectification appears along with

automatic gain control

Page 117: Communication Acoustics Karjalainen

M. Karjalainen26

Firing rate saturation for a vowel excitation

• For increasing level of excitation, the firing rate profile

(’neural activation spectrum’) saturates

Page 118: Communication Acoustics Karjalainen

M. Karjalainen27

Tuning curves for constant firing level

• If the firing rate of a neural fiber is kept constant for varying

excitation frequency, a tuning curve is obtained

• This characterizes the frequency selectivity of cochlea

Page 119: Communication Acoustics Karjalainen

M. Karjalainen28

Effects of active cochlea

• Low-level signals are amplified substantially byactive cochlea:– Sensitivity of hearing is increased

– Due to AGC-like compression, the narrow dynamic range(about 25 dB) of hair cells is expanded to more than 100 dB

• Selectivity (frequency resolution) is increased(especially at low signal levels) due to active function

• If outer hair cells are damaged, the activeamplification is degraded or disappears– Loss of auditory sensitivity

– Tuning curves are broadened

– Otoacoustic emissions disappear

Page 120: Communication Acoustics Karjalainen

M. Karjalainen29

Cochlear nonlinearity: Two-tone suppression

• Addition of another tone (shaded area in figure below)

suppresses the activation due to probe tone at its characteristic

frequency (= kind of masking)

Page 121: Communication Acoustics Karjalainen

M. Karjalainen30

Cochlear nonlinearity: Combination tones

• Nonlinear interaction of two tones generates

new tones that are perceived:

– Difference tone: fdiff = f2 – f1• E.g.: 1.1 kHz and 1.0 kHz => 100 Hz

– Cubic difference tone: fcubic = 2f1 – f2• E.g.: 1.0 kHz and 1.1 kHz => 900 Hz

• Appears already at low level of excitation

Page 122: Communication Acoustics Karjalainen

M. Karjalainen31

Central auditory system

• Higher-level functions

not known well.

• Cochlear nucleus has

specific cells such as

’chopper cells’ that do

temporal processing.

Spectral information is

recovered unsaturated.

• Binaural hearing starts

at superior olive level.

• Auditory cortex is the

center for processing

perceptions and

integrating the sound

scene.

• Interaction with other

senses (vision) strong.

Page 123: Communication Acoustics Karjalainen

M. Karjalainen32

Dynamic range of hearing

Sound

level

’thermo-

meter’

6 dB steps

3 dB steps

1 dB steps

Page 124: Communication Acoustics Karjalainen

M. Karjalainen33

Equal loudness curves and threshold of hearing

• Equal loudness level perception, unit phone = SPL at 1 kHz

Page 125: Communication Acoustics Karjalainen

M. Karjalainen34

Sound level and frequency weighting curves

• Weighting filters for sound level measurement (A most common)

Page 126: Communication Acoustics Karjalainen

M. Karjalainen35

Recommended frequences and bands

• Recommended

frequences and

frequency bands

for measurements

and technical

applications:

• Octave = 2:1

• 1/2 octave

• 1/3 octave

Page 127: Communication Acoustics Karjalainen

M. Karjalainen36

Filtered noise demo

• White noise

• Low-pass filtered noise,

decreasing cutoff frequency

• High-pass filtered noise,

increasing cutoff frequency

• 1/3 octave noise,

increasing center frequency

• White and pink noise

Page 128: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 6: Fundamentals of Psychoacoustics

• Psychoacoustics = auditory psychophysics

• Sound events vs. auditory events– Sound stimuli types, psychophysical experiments

– Psychophysical functions

• Basic phenomena and concepts– Masking effect

• Spectral masking, temporal masking

– Pitch perception and pitch scales• Different pitch phenomena and scales

– Loudness formation• Static and dynamic loudness

– Timbre• as a multidimensional perceptual attribute

– Subjective duration of sound

Page 129: Communication Acoustics Karjalainen

M. Karjalainen2

Psychophysical experimentation

• Sound events (si) = pysical (objective) events

• Auditory events (hi) = subject’s internal events– Need to be studied indirectly from reactions (bi)

• Psychophysical function h=f(s)• Reaction function b=f(h)

Page 130: Communication Acoustics Karjalainen

M. Karjalainen3

Sound events: Stimulus signals

• Elementary sounds– Sinusoidal tones

– Amplitude- and frequency-modulated tones

– Sinusoidal bursts

– Sine-wave sweeps, chirps, and warble tones

– Single impulses and pulses, pulse trains

– Noise (white, pink, uniform masking noise)

– Modulated noise, noise bursts

– Tone combinations (consisting of partials)

• Complex sounds– Combination tones, noise, and pulses

– Speech sounds (natural, synthetic)

– Musical sounds (natural, synthetic)

– Reverberant sounds

– Environmental sounds (nature, man-made noise)

Page 131: Communication Acoustics Karjalainen

M. Karjalainen4

Sound generation and experiment environment

• Reproduction techniques– Natural acoustic sounds (repeatability

problems)

– Loudspeaker reproduction

– Headphone reproduction

• Reproduction environment– Not critical in headphone reproduction

– Anechoic chamber (free field)• Room effects minimized

• Not a natural environment

– Listening room• Carefully designed, relatively normal

acoustics

– Reverberation chamber• Special experiments with diffuse

sound field

Page 132: Communication Acoustics Karjalainen

M. Karjalainen5

Psychophysical functions

• Sound event property to auditory event property mapping

h = a log(s) Weber, Weber-Fechner law

h = c sk (e.g., loudness)

Page 133: Communication Acoustics Karjalainen

M. Karjalainen6

Experimental concepts: Thresholds

• Threshold values– Absolute thresholds (e.g., threshold of hearing)

– Difference thresholds (just noticeable difference, JND)

Example: Threshold of perception:

- 50%, 75%, etc. thresholds

Page 134: Communication Acoustics Karjalainen

M. Karjalainen7

Experimental concepts

• Comparison of percepts– Magnitude estimation

– Magnitude production

• Probe tone method– Generation of a probe tone to make test tone

audible/noticeable

– Modulation, canceling, interference

• Classification and scaling of percepts– Nominal scale (rough, sharp, reverberant, …)

– Ordinal scale (percepts have ordering)

– Interval scale (numeric scale, no zero point defined)

– Ratio scale (numeric scale, zero point defined)

• Multidimensional scaling– Semantic differentials: low – high, dull – sharp, ...

Page 135: Communication Acoustics Karjalainen

M. Karjalainen8

Psychoacoustic experiments

• Description of auditory events– Oral or written description

• Method of adjustment– Adjusting a stimulus to correspont to a reference

• Selection methods– Forced choice methods (select one!):

• Two alternative forced choice (TAFC, 2AFC)

• Method of tracking– Tracking with varying stimulus

• Bekesy audiometry

• Bracketing method– Descending and ascending bracketing

• Yes/no answering

• Reaction time measurement– Indicates the difficulty of decision task

Page 136: Communication Acoustics Karjalainen

M. Karjalainen9

Békésy audiometry

• Slow frequency sweep and level tracking

Page 137: Communication Acoustics Karjalainen

M. Karjalainen10

Typical psychoacoustical test types

• AB test– Set in preference order / select one

– AB hidden reference (one must be recognized)

• AB scale test– As AB but assign numeric values for A and B

• ABC test– A is fixed reference (anchor point) for assigning

values for B and C

• ABX test– Which one, A or B, is equal to X ?

• TAFC (2AFC)– Two alternative forced choice

• Formation of a listening test panel

• Formation of a description language

Page 138: Communication Acoustics Karjalainen

M. Karjalainen11

Masking effect

• ”A loud sound makes a weaker sound imperceptible”

• Categories and aspects of masking– Frequency masking

– Temporal masking

– Time-frequency masking

– Frequency selectivity of the auditory system

– Psychophysical tuning curves

– Critical band• Bark bandwidth

• ERB bandwidth

• Masking tone and test tone

Page 139: Communication Acoustics Karjalainen

M. Karjalainen12

Frequency masking

• Masking by white noise

Page 140: Communication Acoustics Karjalainen

M. Karjalainen13

Frequency masking

• Masking by narrow-band noise (0.25, 1, 4 kHz)

Page 141: Communication Acoustics Karjalainen

M. Karjalainen14

Frequency masking

• Frequency masking as a function of masker level

Page 142: Communication Acoustics Karjalainen

M. Karjalainen15

Frequency masking

• Frequency masking by lowpass and highpass noise

Page 143: Communication Acoustics Karjalainen

M. Karjalainen16

Frequency masking

• Frequency masking by 1 kHz sinusoidal signal

Page 144: Communication Acoustics Karjalainen

M. Karjalainen17

Frequency masking

• Frequency masking by a complex tone(harmonic complex)

Page 145: Communication Acoustics Karjalainen

M. Karjalainen18

Temporal masking

• Masking before and after a noise signal

Page 146: Communication Acoustics Karjalainen

M. Karjalainen19

Temporal masking

• Beginning of postmasking

Page 147: Communication Acoustics Karjalainen

M. Karjalainen20

Temporal masking

• Postmasking as a function of time– For 200 ms long masker

– For 5 ms long masker

Page 148: Communication Acoustics Karjalainen

M. Karjalainen21

Time-frequency masking

• Masking of a tone burst in time and frequencyby a time-frequency block of noise

Page 149: Communication Acoustics Karjalainen

M. Karjalainen22

Temporal masking

• Masking due to an impulse train

Page 150: Communication Acoustics Karjalainen

M. Karjalainen23

Frequency selectivity of hearing

• Masking curves tell much about auditory selectivity

• Psychophysical tuning curves match with physiological curves

Page 151: Communication Acoustics Karjalainen

M. Karjalainen24

Critical band experiment

• Experiment: loudness vs. bandwidth of noise

Page 152: Communication Acoustics Karjalainen

M. Karjalainen25

Critical band

• Loudness vs. bandwidth of noise– Loudness increases when bandwidth exceeds

a critical band

Page 153: Communication Acoustics Karjalainen

M. Karjalainen26

Critical band (Bark band) vs. frequency

• Critical band (Bark band) fG vs. mid frequency

• Ref: just noticeable tone frequency change vs. frequency

Page 154: Communication Acoustics Karjalainen

M. Karjalainen27

Critical band: 24 Bark bands (Zwicker)

Page 155: Communication Acoustics Karjalainen

M. Karjalainen28

ERB band experiment

• ERB = Equivalent Rectangular Bandwidth

• Loudness of a tone is measured as a function of frequencygap in masking noise around the test tone

• ERB band is narrower than Bark band, especially at lowfrequences

Page 156: Communication Acoustics Karjalainen

M. Karjalainen29

Pitch scales

• Pitch = subjective measure of tone hight

• Mel scale

• Bark scale

• ERB scale

or

or

Inverse function:

Inverse :

Page 157: Communication Acoustics Karjalainen

M. Karjalainen30

Logarithmic pitch scale

• Logarithmic scale used in music and audio

• Frequency ratios more important than absolute frequencies

• Octave and ratios of small integers important

Page 158: Communication Acoustics Karjalainen

M. Karjalainen31

Comparison of pitch scales

• Pitch scales are related to place coding on the basilar

membrane, although they are measured by psychoacoustic

experiments

Page 159: Communication Acoustics Karjalainen

M. Karjalainen32

Comparison of pitch scales

• Comparison (log reference) of:– logarithmic scale– ERB scale– Bark scale– linear scale

Page 160: Communication Acoustics Karjalainen

M. Karjalainen33

Comparison of pitch scales

• Comparison (linear reference) of:– logarithmic scale– ERB scale– Bark scale– linear scale

Page 161: Communication Acoustics Karjalainen

M. Karjalainen34

Pitch

• Continues in file KA6b

Page 162: Communication Acoustics Karjalainen

M. Karjalainen1

Pitch phenomenaCont’d from file 6a

• Pitch of a pure tone as a function of amplitude– Individually varying property

Page 163: Communication Acoustics Karjalainen

M. Karjalainen2

JND of frequency modulation

• Frequency modulation JND threshold– As a function of carrier frequency– As a function of modulation frequency– About 4 Hz modulation most easily perceivable

Page 164: Communication Acoustics Karjalainen

M. Karjalainen3

Minumum duration of a tone for pitch percept

• Duration to make pitch perceivable– Duration in milliseconds– Duration of two cycles as a reference

Page 165: Communication Acoustics Karjalainen

M. Karjalainen4

JND pitch change vs. tone duration

• Threshold of perceived pich variation increases below200 ms duration

Page 166: Communication Acoustics Karjalainen

M. Karjalainen5

Pitch strength

• How strong or weak a pitch perception is?

Page 167: Communication Acoustics Karjalainen

M. Karjalainen6

Pitch phenomena and theories

• Place (spectral) pitch vs. temporal pitch theories

• Spectral pitch (due to spectral peak)

• Temporal pitch (periodicity)

• Missing fundamental

• Virtual pitch

• Repetition pitch

• Pitch of inharmonic signals

• Absolute pitch (memory)

Page 168: Communication Acoustics Karjalainen

M. Karjalainen7

Loudness

• Loudness is the perceived subjective ’strength’(’volume’, ’intensity, etc.) of a sound

– Subjective scale defined in relation to physical scale

– Unit is sone: 1 sone — 40 dB SPL at 1 kHz

Page 169: Communication Acoustics Karjalainen

M. Karjalainen8

Loudness of a sinusoidal tone

• Loudness N vs. SPL of a 1 kHz tone

– Power law found to mach best

Power law:

More precisely:

Loudness vs.

loudness level :

Page 170: Communication Acoustics Karjalainen

M. Karjalainen9

Partial loudness (by noise masking)

• Partial loudness of 1 kHz tone in presence of masking noise

– As a function of tone level and masking noise level

Page 171: Communication Acoustics Karjalainen

M. Karjalainen10

Loudness example: two tones

• Loudness of a pair of tones as a function of frequency difference

– Slow beat range: loudness due to peaks (6 dB over 60 dB)

– Medium rate fluctuation: power doubled => 3 dB increase

– Fast fluctuation: wideband signal => loudness doubled (10 dB)

Page 172: Communication Acoustics Karjalainen

M. Karjalainen11

Loudness computation (Zwicker formulation)

• Excitation signal => power spectral density on the Bark scale

• Spreading function B(z), such as

• Convolution by spreading function

• Loudness density

• Total loudness

Page 173: Communication Acoustics Karjalainen

M. Karjalainen12

Loudness computation, examples

• Left: excitation level for sinusoidal tone and white noise

• Right: loudness density for sinusoidal and white noise

Page 174: Communication Acoustics Karjalainen

M. Karjalainen13

Loudness graphically

• Graphical chart determination of loudness (Zwicker)

Page 175: Communication Acoustics Karjalainen

M. Karjalainen14

JND of loudness level

• Just noticeable difference by amplitude modulation

– Modulation of 1 kHz tone

– Modulation of white noise

– Modulation frequency 4 Hz

Page 176: Communication Acoustics Karjalainen

M. Karjalainen15

JND of loudness level

• Just noticeable difference by amplitude modulation

– As a function of modulation frequency

– Modulation of 1 kHz tone

– Modulation of white noise

Page 177: Communication Acoustics Karjalainen

M. Karjalainen16

Modulation detection

• Detection of amplitude and frequency modulation

– Amplitude modulation easily detectable by ’off-band listening’(loudness modulated due to upper spreading slope variation)

– No slope variation in frequency modulation

Page 178: Communication Acoustics Karjalainen

M. Karjalainen17

Loudness vs. duration

• Temporal integration of loudness for duration < 200 ms

– Loudness level decreases 10 phon for for 10-fold decrease induration

Page 179: Communication Acoustics Karjalainen

M. Karjalainen18

Loudness formation temporally

• Loudness formation for different durations of a tone burst

– Peak value of total loudness is tracked in time-varying cases

Page 180: Communication Acoustics Karjalainen

M. Karjalainen19

Timbre (perceived ’sound color’)

• Timbre is a multidimensional attribute of sound– For stationary sounds:

• Spectrum: (loudness spectrum)

• Periodicity (periodic, multiperiodic, noise-like)

• Repetitiveness (reflections, reverberation, spatialness)

– For time-varying signals

• Amplitude envelope important

– Amplitude envelope at each critical band

– For transients and onsets

• Changes are more prominent than steady-state parts,especially onsets

Page 181: Communication Acoustics Karjalainen

M. Karjalainen20

Subjective duration

• Subjective vs. objective duration

Page 182: Communication Acoustics Karjalainen

M. Karjalainen21

Auditory Demonstrations 1

1 Cancelled harmonics

2-6 Critical bands by masking

7 C.B. by loudness comparison

8-11 The decibel scale

12-16 Filtered noise

17-18 Frequency response of the ear

19-20 Loudness scaling

21 Temporal integration

22 Asymmetry of masking by pulsed tones

23-25 Backward and forward masking

26 Pulsation threshold

Page 183: Communication Acoustics Karjalainen

M. Karjalainen22

Auditory Demonstrations 2

27-28 Dependence of pich on intensity

29 Pitch salience and tone duration

30 Influence of masking noise on pitch

31 Octave matching

32 Streched and compressed scales

33 Frequency difference limen

34-35 Log and lin frequency scales

36 Pitch streaming

37 Virtual pitch (missing fundamental)

38-39 Shift of virtual pitch

40-42 Masking spectral and virtual pitch

Page 184: Communication Acoustics Karjalainen

M. Karjalainen23

Auditory Demonstrations 3

43-45 Virtual pitch with random harmonics

46-47 Strike note of chime

48 Analytic vs synthetic pitch

49-51 Scales with repetition pitch

52 Circularity in pitch judgment

53 Effect of spectrum on timbre

54-56 Effect of tone envelope on timbre

57 Change in timbre with transposition

58-61 Tones and tuning with streched partials

62-63 Primary and secondary beats

Page 185: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 7: Other psychoacoustic concepts

• Sharpness– Spectral center of gravity

• Fluctuation strength– Perception of slow modulations (beats)

• Impulsiveness• Roughness

– Perception of fast modulations

• Tonality– Periodic vs. random excitation

• Sensory pleasantness• Psychoacoustic concepts and music

– Sensory consonance and dissonance– Intervals, scales, and tunings– Rhythm, tempo, bar, measure

• Perceptual organization of sound

Page 186: Communication Acoustics Karjalainen

M. Karjalainen2

Sharpness

• Perceived sharpness is proportional to spectral center of gravity

• Unit of sharpness is 1 acum ~ for noise of 60 dB, 1 kHz, 1 Bark

• Sharpness for 1 Bark wide noise, lowpass noise, and highpass noise

• Increase of level from 30 dB to 90 dB doubles the sharpness

Bandpass noises:

Page 187: Communication Acoustics Karjalainen

M. Karjalainen3

Computation of sharpness

• Sharpness can be estimated (without level effect) from

where is defined by curve:

Page 188: Communication Acoustics Karjalainen

M. Karjalainen4

Fluctuation strength

• Perception of relatively slow modulations: fluctuation strength• Highest sensitivity to modulation at 4 Hz

• Unit of fluctuation strength is 1 vacil~ 4 Hz 100 % modulation of 1 kHz 60 dB tone

• Figure: (a) AM broadband noise, (b) AM sinusoidal tone, (c) FM sinusoidal tone

1 Hz

4 Hz

16 Hz

Page 189: Communication Acoustics Karjalainen

M. Karjalainen5

Fluctuation strength

• Left: fluctuation strength for AM (4 Hz) wideband noise (60 dB)

• Right: sine tone, 1.5 kHz, 70 dB, modulated at 4 Hz, as a functionof FM deviation

Page 190: Communication Acoustics Karjalainen

M. Karjalainen6

Fluctuation strength

• Fluctuation strength computation:

Page 191: Communication Acoustics Karjalainen

M. Karjalainen7

Impulsiveness

• There is no clearly defined psychoacoustic concept of impulsiveness

• Impulsiveness is related to rapid onsets in signal

• If the repetition rate of impulses is > 10–15 Hz, roughness is perceived

• In noise control, impulsiveness is considered to increase hearing

damage risk compared to non-impulsive sound of same energy

Page 192: Communication Acoustics Karjalainen

M. Karjalainen8

Roughness

• Fast (> 15 Hz) modulation is perceived as roughness

• Addition of two tones of different frequencies creates envelopefluctuation

• When the frequency difference increases, tones start to segregate

• When the frequency difference is larger than a critical band,roughness disappears

7 Hz

70 Hz

300 Hz

1 kHz+ f

Page 193: Communication Acoustics Karjalainen

M. Karjalainen9

Roughness

• Unit of roughness is 1 asper ~ 1 kHz tone, 60 dB, 100 % AM

modulated at 70 Hz.

• Towards lower and higher modulation frequences the roughness

decreases

Page 194: Communication Acoustics Karjalainen

M. Karjalainen10

Roughness

• Roughness for different carrier frequencies as a function of AM

modulation frequency with 100 % modulation.

7 Hz

70 Hz

300 Hz

1 kHz+ f

Page 195: Communication Acoustics Karjalainen

M. Karjalainen11

Tonality

• Tonality (tonalness) = sound exhibits voiced component(s), periodicity

• Non-tonal sound is noise-like, non-periodic

• Non-tonal (noisy) signal masks a tonal one more easily than vice versa

• For tonality index , critical band index i, the masking threshold is:

– ( = 0.0: non-tonal, = 0.5: half-tonal, = 1: fully tonal)

• Tonality with varying modal density, log. distribution of frequencies (approx/critical band):

80/CB40/CB20/CB10/CB

Page 196: Communication Acoustics Karjalainen

M. Karjalainen12

Sensory pleasantness

• Sensory pleasantness (example by Zwicker):

– P = sensory pleasantness– S = sharpness– R = roughness– T = tonality– N = loudness

– Product sound quality measures are often constructed bysimilar techniques.

Page 197: Communication Acoustics Karjalainen

M. Karjalainen13

Sensory consonance and dissonance

• Consonance and dissonance are closely related to roughness

• Consonance vs. dissonance of two partials:

Page 198: Communication Acoustics Karjalainen

M. Karjalainen14

Consonance and dissonance of harmonic tones

• Roughness due to interaction of partials in a sound contribute todissonance

• Rations of small integers are most consonant (just intonation)

• Consonance vs. dissonance of two harmonic complexes:

Page 199: Communication Acoustics Karjalainen

M. Karjalainen15

Examples of intervals

• Pythagoras noticed that intervals 2:1, 3:2, and 4:3 sound

”pleasant”

• Consonant intervals (decreasing order of consonance):

– 2:1 octave

– 3:2 perfect fifth

– 4:3 perfect fourth

– 5:3 major sixth

– 5:4 major third

– 8:5 minor sixth

– 6:5 minor third

– 16/15 (dissonant)

– 40/27 (dissonant)

1.4983 fifth

1.2599 third

Equally

tempered

intervals

Page 200: Communication Acoustics Karjalainen

M. Karjalainen16

Examples of intervals

• Log and lin uniformly spaced scales

• Which one is the best octave ?

• Stretched and compressed scales

Octave and its partitioning

Circularity of pitch

• Shepard effect

Page 201: Communication Acoustics Karjalainen

M. Karjalainen17

Intervals, scales, tuning

• Just intonation, Pythagorean scale, (equally) tempered scale

• On a tempered scale a semitone is 1:1.05946

• 1 cent is 1/100 of a semitone

Page 202: Communication Acoustics Karjalainen

M. Karjalainen18

Non-western scales and tunings

• The (tempered) western scale is adapted to a multitude of

harmonic timbres of western instruments

• For example the Balinese gamelan music is quite different

– W. A. Sethares: Tuning, Timbre, Spectrum, Scale. Springer 1998

• Example of tuning where octave is a very dissonant interval!

• Tunings and musical scales are strongly bound with spectral

properties of musical instruments

Page 203: Communication Acoustics Karjalainen

M. Karjalainen19

Temporal structures in music: Rhythm, tempo

• Rhythm: periodicity and repeated structure in music

• Tempo: rate of main events in music

• Beat: positioning of emphasis on some events

• Measure: basic rhythmic sequence

• Duration of a note or another basic unit

Page 204: Communication Acoustics Karjalainen

M. Karjalainen20

Perception of magnitude and phase spectrum

• Magnitude

– 1 dB deviation per critical band noticeable in direct comparison.

Even smaller deviations can be noticed by trained ”golden ears”

– Even ± 3...5 dB deviations are not easy to ”perceive” when there is

no immediate reference (except for well trained listeners)

– Magnitude response deviations = spectral coloration

• Phase and time differences

– The auditory system is relatively insensitive to phase (Helmholtz)

in general: magnitude spectrum more important than phase

spectrum, but sometimes phase is important

– Phase functions from Fourier analysis are circular and difficult to

analyze and interpret

– Group delay (phase derivative) is a relatively good perceptual

measure which describes the delay of modulation (not the carrier)

Page 205: Communication Acoustics Karjalainen

M. Karjalainen21

• Special phase effects:

– The following two signals have the same magnitude spectrum but

sound (as well as look) different

Perception of phase: extreme cases

This is how the response looks

like in a single critical band

Page 206: Communication Acoustics Karjalainen

M. Karjalainen22

Perceptual organization of sound

• Streaming (sequential grouping) of pitch sequences:

– Slow repetition: one stream perceived

– Fast repetition: segregation into two separate streams

A

B

C

D

E

FB

D

FB

D

F

A

CE

A

CE

(b)(a)

Time TimeTwo streamsOne stream

Page 207: Communication Acoustics Karjalainen

M. Karjalainen23

Perceptual organization of sound

• Streaming may change also the perceived rhythm:

– Large separation: B-D-F vs. A-C-E

– Small separation: B-D vs. A-C-E-F

A

CE

BD

F

BD

A

CE

F

Time Time

Lower streamUpper stream Upper stream Lower stream

Page 208: Communication Acoustics Karjalainen

M. Karjalainen24

Perceptual organization of sound

• Streaming with increasing tempo

increasingtempo orfrequencydifference

segregationof multiplestreams

TIMBRE/TEXTURE

time

Page 209: Communication Acoustics Karjalainen

M. Karjalainen25

Perceptual organization of sound

• Streaming or segregation as a function of frequency

difference and repetition period

0 50 100 150 200 250 300 400 500

20

15

10

5

0

20 10 5 3

Repetition period (msec)

alwayscoherenti

alwaysseparated

separatedor coherent

Page 210: Communication Acoustics Karjalainen

M. Karjalainen26

Auditory scene analysis

• Auditory scene analysis

– Bregman: Auditory scene analysis (MIT Press, 1990)

• Sequential integration and segregation

– Spectral vs. temporal relations

– Spatial cues in segregation

• Integration and segregation of simultaneous auditory components

– Spectral vs. temporal relations

– The ”old-plus-new” heuristics

– Spatial cues in segregation

• Primitive auditory organization

– Built-in and low-level mechanisms

• Schema-based auditory organization

– Learning of stream integration and segregation

Page 211: Communication Acoustics Karjalainen

M. Karjalainen27

Computational auditory scene analysis (CASA)

• Computational auditory scene analysis (CASA) is an attempt to

computationally simulate and model human auditory scene analysis

– Sound source segregation (separation)

– Multipitch signal analysis of harmonic sound mixtures

– Bottom-up vs. top-down driven processing

– Prediction-driven processing

– Spatial source separation (coctail-party effect)

– Applications:

• Audio content analysis and content-based coding

• Automatic music transcription

• Speech recognition

Page 212: Communication Acoustics Karjalainen

Tilakuuleminen

Ville PulkkiAkustiikan ja aanenkasittelytekniikan laboratorio

Teknillinen korkeakouluEspoo, Suomi

http://www.acoustics.hut.fi/

Ville [email protected]

Page 213: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Aani tilassa

Ville Pulkki ([email protected]) sivu 3

Page 214: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

TilakuuleminenSuuntakuulo

• Suuntakuulon tarkkuus

• Suuntakuulon teoria

Etaisyyskuulo

Tilan havaitseminen

Tilaanentoisto

Ville Pulkki ([email protected]) sivu 4

Page 215: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Siirtofunktio aanilahteesta korvakaytavaan

Head Related Impulse Response (HRIR)Head Related Transfer Function (HRTF)

c©Duda: http://interface.cipic.ucdavis.edu/CIL tutorial/

Ville Pulkki ([email protected]) sivu 5

Page 216: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

HRTF:ien mittaaminen

c©Algazi et al.: http://interface.cipic.ucdavis.edu/

Ville Pulkki ([email protected]) sivu 6

Page 217: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus aanilahteen suunnasta

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2 msmsms

msmsms

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2vasenϕ = 0δ = 0

vasenϕ = 60δ = 0

vasenϕ = 0δ = 60

oikeaϕ = 60δ = 0

oikeaϕ = 0δ = 60

oikeaϕ = 0δ = 0

a) b) c)

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 7

Page 218: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus aanilahteen vaakakulmasta

c©Algazi et al.: http://interface.cipic.ucdavis.edu/

Ville Pulkki ([email protected]) sivu 8

Page 219: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus aanilahteen pystykulmasta

c©Algazi et al.: http://interface.cipic.ucdavis.edu/

Ville Pulkki ([email protected]) sivu 9

Page 220: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus aanilahteen suunnasta

102 104

-40

-30

-20

-10

0dB

102 103 103 103

103 103 103

104

-40

-30

-20

-10

0dB

10 2 10 4

-40

-30

-20

-10

0

102 104

-40

-30

-20

-10

0

10 2 10 4

-40

-30

-20

-10

0

10 2 10 4

-40

-30

-20

-10

0

vasenϕ = 0δ = 0

vasenϕ = 60δ = 0

vasenϕ = 0δ = 60

oikeaϕ = 0δ = 0

oikeaϕ = 0δ = 60

oikeaϕ = 60δ = 0

a) b) c)Hz Hz

HzHzHz

Hz

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 10

Page 221: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Suuntakuulon tarkkuus horisontaalitasossa

179,3°180°

±5,5°

281,6°±10°

359°±3,6°

80,7°±9,2°

90°

Kuulotapahtuman suunta

Äänitapahtuman suunta270°

ϕ

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 11

Page 222: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Suuntakuulon tarkkuus mediaanitasossa

Äänitapahtumansuunta

Kuulotapah-tuman suunta

δ = 0ο

δ = 36ο+68ο+74ο

±22ο±13ο

±9ο

+27ο±15ο

+30ο±10ο

δ = 36ο

δ = 90ο

δ = 0οϕ = 180οϕ = 0ο

0ο

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 12

Page 223: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Lateralisaatiokokeet

τ ph1 τ ph2 a 1 a2

a) b)

vaimentimet

signaalisignaali

viivepiirit

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 13

Page 224: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Lateralisaatiokokeet, aikaviive

hava

ittu

late

raal

isija

inti

oike

ava

sen

6

4

2

0

2

4

6 vasen aiemmin vasen myöh.

-15000 -1000 -500 0 500 1000 15000

korvien välinen vaiheviive τph / μs

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 14

Page 225: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Lateralisaatiokokeet, ominaisuuksia

Hyvat puolet:

• Voidaan vapaasti tuottaa mika tahansa ITD-ILD yhdistelma

• Perustulokset

Ongelmat:

• Epaluonnollisuus

• Paan sisalle lokalisointi

• Korkeiden taajuuksien toisto erilainen eri kuuntelukerroilla

Ville Pulkki ([email protected]) sivu 15

Page 226: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Suuntakuulo

Vihjeet:

• Binauraaliset vihjeet

– Korvienvalinen aikaero

– Korvienvalinen voimakkuusero

• Monauraalinen spektri

• Paan kaantelyn vaikutus binauraalisiin vihjeisiin

• Heijastusten suppressio

Ville Pulkki ([email protected]) sivu 16

Page 227: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Binauraaliset vihjeet

• Interaural Time Difference, korvienvalinen aikaero

• ITD

Ville Pulkki ([email protected]) sivu 17

Page 228: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

ITD:n taajuusriippuvuus

vasen

oikea

ITD ITD

kantoaallon aikaviive

korkeat taajuudet > ~1600 Hz

verhokayran aikaviive

matalat taajuudet ~200 − ~1600 Hz

Ville Pulkki ([email protected]) sivu 18

Page 229: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

ITD:n mallinnus

τ τ

τ

τ

τ

τ

τ

τ

τ

τ

oikeastakorvastavasemmasta korvasta

oikeakeskivasen

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 19

Page 230: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

ITD:n mallinnus

ITD

IACC

IACC

Composite

IACC

IACCspectrum

GTFB

GTFB

filteringlow pass

rectificationhalf wave

Ville Pulkki ([email protected]) sivu 20

Page 231: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Ristikorrelaatio ERB-kanavilla

−1

−0.5

0

0.5

1

5

10

15

20

25

30

0

0.2

0.4

0.6

0.8

121 kHz

10 kHz

90°

3 kHz

60°40°

1.5 kHz

20°

800 Hz

200 Hz

20°40°60°

Band cross correlation functions

90°

Ville Pulkki ([email protected]) sivu 21

Page 232: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

ITD:n taajuusriippuvuus

0.20.4

0.71.1

1.72.6

3.95.7

8.512.4

18.2 9060

300

−30−60

−90

−1

−0.5

0

0.5

1

x 10−3

Direction [degree]Frequency [kHz]

ITD

[ms]

Ville Pulkki ([email protected]) sivu 22

Page 233: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Binauraaliset vihjeet

dB dB

• Interaural Level Difference, korvienvalinen voimakkuusero

• ILD

Ville Pulkki ([email protected]) sivu 23

Page 234: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

ILD:n mallinnus

ILD

levelloudness

spectrum

spectrum

Composite

CLL

GFTB

GFTB

CLL

ILD

CLL

ILD

ILD

LL

LL

LL

LL

LL

LL

Ville Pulkki ([email protected]) sivu 24

Page 235: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

ILD:n taajuusriippuvuus

0.20.4

0.71.1

1.72.6

3.95.7

8.512.4

18.290

6030

0−30

−60−90

−60

−40

−20

0

20

40

60

Direction [degree]Frequency [kHz]

ILD

[pho

n]

Ville Pulkki ([email protected]) sivu 25

Page 236: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Sekaannuskartio

sekaannuskartio

ääni- lähde

ccφ

ccθ

• ITD ja ILD ratkaisevat missa sekaannuskartiossa aanilahde on

– korvalehden ja kehon vaikutus

– paan kaantely

Ville Pulkki ([email protected]) sivu 26

Page 237: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Paan kaantelyn vaikutus binauraalisiin vihjeisiin

ITD & ILDmuuttuvat paljon

ITD & ILD vakio

paan pyoritys

ITD & ILD

vastakkaiseen suuntaanmuuttuvat paljon

- karkea vihje

Ville Pulkki ([email protected]) sivu 27

Page 238: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Kehon vaikutus

Korvalehti, paa, keho

Spektri muuttuu, ILD muuttuu

Ville Pulkki ([email protected]) sivu 28

Page 239: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Korvalehden vaikutus

• Korvalehden onkalot varittavat aanta saapumissuunnasta riippuen

Ville Pulkki ([email protected]) sivu 29

Page 240: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Elevaation vaikutus spektriin

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

−30

−20

−10

0

10

20

30

Frequency [kHz]v [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 1

−30

−20

−10

0

10

20

30

Frequency [kHz]Elev [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

1 2

Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keskivoistettu pois.

Ville Pulkki ([email protected]) sivu 30

Page 241: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Elevaation vaikutus spektriin

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

−30

−20

−10

0

10

20

30

Frequency [kHz]v [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 1

−30

−20

−10

0

10

20

30

Frequency [kHz]Elev [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

3 4

Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keskivoistettu pois.

Ville Pulkki ([email protected]) sivu 31

Page 242: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Vihjeiden luotettavuusJos vihjeet ovat ristiriitaisia:

• Signaalin spektri < ˜ 1000 Hz

– ITD yleensa vahvin

– ILD heikko, trading?

• Korkeammat taajuudet

– ITD ja ILD kumpikin vahvoja

– ILD voimakkaampi joskus

• Johdonmukaisempi vihje voittaa [Wightman]

• Voi syntya useita havaintoja suunnasta

• Aanilahteen koko

• Individuaalisuus

Ville Pulkki ([email protected]) sivu 32

Page 243: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Suuntakuulon fysiologia

c©Kalat 1998

Ville Pulkki ([email protected]) sivu 33

Page 244: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Presedenssiefekti

Vihjeet relevantteja vain silloin kun suora aani dominoi

Ville Pulkki ([email protected]) sivu 34

Page 245: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Presedenssiefekti

ϕ = 40o

ϕ =-40o

ϕ = 0o

ϕ

So

ST

α=80o

0 1 2ms 20 30 40 50ms

kaiku

kaik

ukyn

nys

ST:n viive τph

ensimm. kuulotapahtuma

Ville Pulkki ([email protected]) sivu 35

Page 246: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Kaikujen havaitsemiskynnykset

ST:n viive

taso

ero

LS

T -

LS

O40dB

20

0

-20

-40

0 20 40 60 80 100 ms

ensimmäinen äänitapahtuma ei enää erotettavissa(ensiääni estetty) (≥ 6 henkeä)

ensimmäinen äänitapahtuma ja kaikuyhtä äänekkäät (≥ 6 henkeä)

kaiku häiritsevä (80 henkilöä)

peittokynnys(1-2 henkilöä)

Ville Pulkki ([email protected]) sivu 36

Page 247: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Aanekkyyden vaikutus vapassa kentassa

0 0 2 4 6 8 10

2

4

6

8

kuul

otap

ahtu

man

etä

isyy

s / m

äänilähteen etäisyys / m

viiden henk. keskiarvo

c©M. Karjalainen

Ville Pulkki ([email protected]) sivu 37

Page 248: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Etaisyyden havaitseminen

Vihjeet

• Aanekkyys

• Binauraaliset vihjeet

• Suoran aanen suhde kaiuntakenttaan

• Spektri

Ville Pulkki ([email protected]) sivu 38

Page 249: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Tilaaanen toistometodit

• Perinteinen toisto

– Monofonia

– Stereofonia

– Monikanava 2-D

– Monikanava 3-D

• Binauraalinen toisto

– Kuulokkeet

– Kaiuttimet, ristiinkuulumisen esto

Ville Pulkki ([email protected]) sivu 39

Page 250: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Monofoninen toisto

Ville Pulkki ([email protected]) sivu 40

Page 251: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Stereofoninen toisto

Ville Pulkki ([email protected]) sivu 41

Page 252: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

“Surround” toisto

Ville Pulkki ([email protected]) sivu 42

Page 253: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

3-D monikanavatoisto

Ville Pulkki ([email protected]) sivu 43

Page 254: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Binauraalinen toisto

H l Hr H i H i

Hc Hc

ˆ y l ˆ y rxm

y l y lyr yr

c©M. Karjalainen

Yksinkertaisimmillaan kuunnellaan keino- tai tosipaa-aanitysta kuulokkeilla.

Ville Pulkki ([email protected]) sivu 44

Page 255: Communication Acoustics Karjalainen

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Binauraalinen toisto

1Hi + Hc

1Hi − Hc− −

yl

yr − −

yl

yr

binau-raalinen

transau-raalinen

Hi + Hc

Hi − Hc

binauraalinen

(a)

(b)

(c)

(d)

stereo

yl

yr

mono

Hl + Hr

Hi + Hc

Hl − Hr

Hi − Hc

Hl

Hr −

x l

xr

ˆ y l ˆ y l

ˆ y l

ˆ y r ˆ y r

ˆ y r

xm

transau-raalinen

binau-raalinen

transau-raalinen

c©M. Karjalainen

Yksinkertaisimmillaan kuunnellaan keino- tai tosipaa-aanitysta kuulokkeilla.

Ville Pulkki ([email protected]) sivu 45

Page 256: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 9: Auditory modeling

• Simple psychoacoustic models– Psychoacoustic spectrum and spectrogram– Mel spectrum and cepstrum– Perceptual linear prediction– Examples of auditory spectra

• Auditory filter bank models– Gammatone filterbanks– Inner ear simulation models– Temporal dynamics and masking

• Cochlear models– Basilar membrane models– Hair cell models

• Modeling of higher level functions– Pitch and periodicity analysis– Speech specific models– Computational auditory scene analysis

• Binaural auditory modeling

Page 257: Communication Acoustics Karjalainen

M. Karjalainen2

Simple psychoacoustic modeling

• Problems with Fourier spectrum from auditoryperception viewpoint:– Linear frequency scale vs. critical band scale

– Level (dB) vs. loudness scaling

– Frequency bins vs. spreading and masking

– Flat response vs. equal loudness sensitivity

– Windowing vs. temporal integration and masking

– Temporal adaptation in auditory perception

Page 258: Communication Acoustics Karjalainen

M. Karjalainen3

Auditory spectrum through FFT

Page 259: Communication Acoustics Karjalainen

M. Karjalainen4

Examples of psychoacoustic spectra

• Auditory spectra– Sinewave (400 Hz)

– White noise

Page 260: Communication Acoustics Karjalainen

M. Karjalainen5

Examples of psychoacoustic spectra

• Vowel /a/ and fricative /s/– Fourier spectrum vs. auditory spectrum

Page 261: Communication Acoustics Karjalainen

M. Karjalainen6

Mel frequency cepstral coefficients

• MFCC computation– FFT, mel warping, logarithm, inverse cosine transform

Page 262: Communication Acoustics Karjalainen

M. Karjalainen7

Filterbank auditory models

• General principle of an auditory filterbank model

Page 263: Communication Acoustics Karjalainen

M. Karjalainen8

Response of a filterbank model (Bark-bank)

• Simple Bark-filterbank by warped filters (Karjalainen)

Page 264: Communication Acoustics Karjalainen

M. Karjalainen9

Gammatone filterbank

• Temporal and magnitude response of one channel

• Filterbank

Page 265: Communication Acoustics Karjalainen

M. Karjalainen10

Neural adaptation

• Neural adaptation model by Dau et al– Automatic gain control feedbacks

Page 266: Communication Acoustics Karjalainen

M. Karjalainen11

Temporal processing

• Adaptation, temporal integration, and masking model (Karjalainen)– Neural feedback model

– Adaptation (AGC)

– Loudness (level) computation

– Teporal masking effect

Page 267: Communication Acoustics Karjalainen

M. Karjalainen12

Responses

• Excitation, firing rate response, and loudness level response

Page 268: Communication Acoustics Karjalainen

M. Karjalainen13

Basilar membrane traveling wave model

• Principle of approximating basilar membrane traveling wave propagation

Page 269: Communication Acoustics Karjalainen

M. Karjalainen14

Meddis hair cell model

• Processing of neurotransmitter in the hair cell

Page 270: Communication Acoustics Karjalainen

M. Karjalainen15

Periodicity analysis (Meddis)

• Computation of sum autocorrelation function (SACF)

Page 271: Communication Acoustics Karjalainen

M. Karjalainen16

Periodicity analysis example

• Signal, filterbank responses, cochlegrams, sum autocorrelation for speech

Page 272: Communication Acoustics Karjalainen

M. Karjalainen17

Auditory spectrum vs. auditory formant spectrum

• Example of vowel /ä/ and fricative /s/

Page 273: Communication Acoustics Karjalainen

M. Karjalainen18

Auditory representation of speech

• Example of vowel transitions /...iaiai.../– Auditory spectrogram

– Auditory formant spectrogram

Page 274: Communication Acoustics Karjalainen

M. Karjalainen19

Applications of auditory modeling

• Audio coding– Psychoacoustic or perceptual models of masking

• Sound quality modeling– Modeling of perceived differences

– Criteria for audio reproduction

– Binaural audio quality

• Speech recognition– Advanced front-end models

• Advanced hearing aids– Cochlear implants

Page 275: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 10: Sound quality

• Effects of sound:– Physical effects (generally meaningless)– Physiological effects (hearing loss)– Information and knowledge effects (communication)– Esthetic and emotional effects (communication)

• Concept of quality in general:– Quality as contrast to quantity (categorical

dissimilarity)– Quality on scale low-Q vs. high-Q (measure of

preference)

• Speech intelligibility and quality• Sound quality of concert halls and auditoria• Sound quality in audio reproduction• Noise quality• Product sound quality

Page 276: Communication Acoustics Karjalainen

M. Karjalainen2

Evaluation and measurement of sound quality

• Sound quality is a fundamentally subjective (perceptual) conceptbut it can be approximated by objective and computational criteria

• Subjective quality can be evaluated by listening experiments, forexample:– Compare to ’perfect quality’ reference to find out if any degradation

can be noticed

– Compare two or more sounds and sort then by quality preference

– Characterize sound quality by conceptual description (such as notannoying, slightly annoying, annoying, very annoying)

– Give an overall quality rating on a numerical scale

– Give a rating for a specific quality factor (numerical scale)

– Give quality ratings for several different quality factors(multidimensional scaling)

• Based on subjective experimentation, a computational (objective)measure and model can be derived to simulate the perceived quality– Objective measures are less laborious and yield high repeatability

– It is important to check the validity range of a model

Page 277: Communication Acoustics Karjalainen

M. Karjalainen3

Development of sound quality models and theories

Theories and models in general

Computational models

Computational models with reference

Page 278: Communication Acoustics Karjalainen

M. Karjalainen4

Intelligibility and quality of speech

• Intelligibility of speech in general depends on:– the ability of a speaker to produce intelligible message and clear speech

– quality of speech transmission medium (acoustic or technical)

– the ability of a listener to analyze and conceive the message

• Technical concept of speech intelligibility:– related to the quality of transmission channel

– developed since 1920’s (Harvey Fletcher, Bell Labs)

• Articulation– score of correct recognition of phones and (nonsense) phone sequences

– articulation index is a measure that is additive from frequency bands(like loudness adds from critical band specific loudnesses)

• Speaker identification score– quality of channel to convey speaker identity

• Naturalness of speech– particularly in speech synthesis (and coding)

Page 279: Communication Acoustics Karjalainen

M. Karjalainen5

Speech quality: subjective measures and methods

• Articulation tests and articulation score– /CV/ or /CVC/ sequences used to measure recognition percentage

• Intelligibility test and intelligibility score– recognition percentage using meaningful words or sentences

• Rhyme tests (RT)– using ’rhyme’ words or syllables (in Finnish: /patti/, /tatti/, /katti/)

• Diagnostic rhyme tests (DRT)– modifying single distinctive feature at a time (nasality, voicing, etc.) in RT

• Speech interference tests (find a disturbing noise level of 50% articulation)

• Quality comparison method, including pairwise comparison methods– ordering of sound examples by overall or specific quality factor

• Mean opinion score (MOS)– overall rating on 1–5 scale

• Other methods– Indirect judgement tests (PARM, QUART)

– Communicability tests (communicate a drawing task, measure the difficulty)

– Task recall tests (memorizing ability)

– Analytic measures (multidimensional scaling)

Page 280: Communication Acoustics Karjalainen

M. Karjalainen6

Speech quality: objective measures and methods

• Articulation index (AI)– for measuring a (linear) speech transmission channel with additive noise

– articulation loss is assumed to be additive from 20 frequency band AIvalues

• Percentage articulation loss of consonants (%ALcons)– measure of speech intelligibility, can be estimated from acoustic

properties of a room

• Room acoustical indices, see below

• Speech transmission index (STI, RASTI)– based on modulation transfer function, see below

• Signal-to-noise ratio (SNR)– ratio of speech vs. noise (power) level (in dB)

– segmental SNR (SNRseg) based on short-time segmental SNRs

• Spectral distance measures (distance measures in the frequency domain)

• Auditory sound quality measures (based on auditory modeling)

• Other methods– weighted spectral slope distance

– LPC (linear prediction) distance measure

Page 281: Communication Acoustics Karjalainen

M. Karjalainen7

MOS (mean opinion score)

• A very popular technique to quantify overall quality in speechand audio

• Combines a quantitative scale and qualitative categorizations

• Three sorts of MOS measures used:– MOS = (direct) evaluation on 1–5 scale

– DMOS = degradation MOS (how much signal is degraded)

– CMOS = comparative MOS (typically scale -3...+3)

• Sometimes a scale of 1–10 by step of 0.1 is used instead

• Basic MOS scaling:

Very disturbingBad1

Disturbing but tolerablePoor2

Noticeable, slightly disturbingFair3

Just noticeable, not disturbingGood4

Not noticeableExcellent5

Degradation (DMOS)Quality (MOS)Rating

Page 282: Communication Acoustics Karjalainen

M. Karjalainen8

Modulation transfer function

• The auditory system analyzes signals by critical bands

• Each band is analyzed by signal level, i.e., modulationenvelope

• More important than the exact transfer function ismodulation transfer function, i.e., how signal modulations ineach critical band are transmitted

• The auditory system is most sensitive to modulations ofabout 4 Hz

• Modulation transfer is degraded by:

– Reverberation (lowpass of modulation)

– Background noise (reduction of relative modulation)

– These effects are multiplicative (cascaded)

• Modulation transfer function is a mathematically motivatedapproximation of auditorily relevant signal transfer analysis

Page 283: Communication Acoustics Karjalainen

M. Karjalainen9

Modulation transfer function (2)

Page 284: Communication Acoustics Karjalainen

M. Karjalainen10

Modulation transfer function (3)

Page 285: Communication Acoustics Karjalainen

M. Karjalainen11

Modulation transfer function (4): STI

• Total effect on modulation transfer function

• Apparent SNRapp vs. modulation reduction

• Speech transmission index (STI), for each band:

– STI = 1.0 for SNRapp 15 dB

– STI = 0.0 for SNRapp -15 dB

– otherwise STI = m, see also next figure

– (Weighted) average of SNRapp values of bands is computedand converted to total STI

Page 286: Communication Acoustics Karjalainen

M. Karjalainen12

Modulation transfer function (5)

Page 287: Communication Acoustics Karjalainen

M. Karjalainen13

STI vs. speech intelligibility

Page 288: Communication Acoustics Karjalainen

M. Karjalainen14

RASTI vs. STI

• RASTI = Rapid STI

• Partial evaluation offrequency bands &modulation bandsused

• Specific RASTIinstrument availablefor speech acousticsevaluation

Page 289: Communication Acoustics Karjalainen

M. Karjalainen15

Percentage articulation loss of consonants (%ALcons)

• Estimate of speech intelligibility

• %Alcons can be estimated

• where– r = distance of source and listener– RT = reverberation time– V = room volume– Q = directivity of a sound source– k = constant (for individual listener) = 1.5 ... 12.5 %

• %Alcons can also be estimated from roommeasurements

• %Alcons up to 25...30% can be tolerated inmeaningful speech due to informationredundancy

Page 290: Communication Acoustics Karjalainen

M. Karjalainen16

Sound quality in concert halls (and performing spaces)

• Esthetic effects very important– communication by esthetic and emotional factors

• ’Good acoustics’ depends on type of music– for example tempo, mixture of instruments (size of orchestra)

• Many factors to be taken into account– multidimensional scaling of quality needed

• Different proposed theories and models exist– no full agreement upon indices and factors of quality

• Visual factors also very prominent in concert halls– a concert is a multimodal experience to most listeners

• It is not only the audience but also the musicians– stage acoustics is important as well

• Theaters and other performing spaces– may require different acoustics

• Active (electroacoustically created or enhanced) acoustics– used increasingly except for classical acoustic music

Page 291: Communication Acoustics Karjalainen

M. Karjalainen17

Sound quality in concert halls: (1) subjective indices

• Intimacy or presence• Reverberation (subjective)• Spaciousness (apparent source width, listener envelopment)• Clarity (separation of sounds and sources)• Warmth (level and reverberation at low frequencies)• Loudness• Acoustic glare (walls should not reflect like mirrors)• Brilliance (due to long reverberation at high frequences)• Balance (how sound sources (instruments) are balanced)• Blend (how instruments are mixed harmonically)• Ensemble (how musicians can play together)• Immediacy of response (from the hall back to musicians)• Texture (how early reflections arrive to listeners)• Freedom from echo (discrete echoes are highly undesirable)• Dynamic range (useful range of playing levels)• Extraneous effects on tonal quality (no extra sounds desired)• Uniformity of sound (quality should be equal in all positions)

Page 292: Communication Acoustics Karjalainen

M. Karjalainen18

Sound quality in concert halls: (2) objective measures

• Loudness– Gmid (sound level at mid frequencies)

• Reverberation time– RT60 (decay time of 60 dB for full hall)– EDT (early decay time, 0–10 dB scaled to correspond to 60 dB)

• Clarity– Early vs. late energy ratio C80 (empty hall)

• Spaciousness– IACCearly (interaural cross-correlation, early)– LFearly (lateral energy fraction, early)

• Envelopment– IACClate and visual inspection of surface irregularity

• Intimacy– ITDG (initial time delay)

• Warmth– BR (base ratio, full hall)

• Stage support– Early energy (20-100 ms), sound source on the stage 1m from the

microphone

Page 293: Communication Acoustics Karjalainen

M. Karjalainen19

Objective sound quality in concert halls: definitions

• Interaural cross-correlation function IACFt( )

from pressure signals of left and right ears• Interaural cross-correlation, max of IACFt( )• Lateral energy fraction (LF or LEF)

• Gain factor (level vs 10 m free field distance level)

• Base ratio

• Stage support

Page 294: Communication Acoustics Karjalainen

M. Karjalainen20

Early vs. late ratios

Clearness

Centertime

Page 295: Communication Acoustics Karjalainen

M. Karjalainen21

Audio sound quality

• HiFi (High Fidelity) vs. professional reproduction

• Good quality is defined indirectly by loss ofdegradations

• Degradations & distortions:

– Linear distortion

– Nonlinear distortion

– Transient distortion

– Noise & quantization noise (SNR)

– Spatially poor reproduction

Page 296: Communication Acoustics Karjalainen

M. Karjalainen22

• Phase in audio reproduction

– Group delay differences of about 1 ms are noticed in extreme cases

• In high-Q-value cases even much lower differences

– Group delay differences of about 2 ms become noticeable in critical

listening (about 60 cm of propagation distance difference)

– 5-10 ms group delay differences may start to be disturbing

– Even 50-100 ms group delay errors may be tolerable sometimes

– In spatial sound perception (Chapter 8): precedence effect

• Perception of distortion

– Linear distortion = magnitude and phase distortion

– Nonlinear distortion = new spectral components are produced

Perception of audio reproduction

Page 297: Communication Acoustics Karjalainen

M. Karjalainen23

• Nonlinear distortion

– In a nonlinear system a sine wave generates harmonics:

– If total rms level is:

– Then harmonic distortion (HM):

– HM is not a particularly good measure from a perceptual point of view

– Low-order HM may improve perceived quality

– JND: 1% for 2nd, 0.3% for 3rd, 0.1-0.3% for 4th harmonic

Nonlinear distortion

Page 298: Communication Acoustics Karjalainen

M. Karjalainen24

• Other distortion mechanisms:

– Intermodulation distortion (IM)

• Sine waves, of f1 and f2 generate f1 – f2 , f1 + f1 etc.

• IM describes perceived distortion better than HM

– Transient intermodulation distortion (TIM)

• Distortion that is created in fast transients but not in steady

state signals

– Quantization noise in digital signal processing

• Perceived as distortion if correlation with signal

• Perceived as noise if not correlated

– Pre-echo in audio coding

• Temporal spreading of a signal in time ”backwards”

– Perceptual criteria needed in digital audio instead of simpledistortion and SNR measures

Audio distortion mechanisms

Page 299: Communication Acoustics Karjalainen

M. Karjalainen25

Perceptual (objective) sound quality models for audio

• Schroederet al.:

• Karjalainen:

Page 300: Communication Acoustics Karjalainen

M. Karjalainen26

PAQM (perceptual audio quality measure)

Page 301: Communication Acoustics Karjalainen

M. Karjalainen27

Product sound quality

• Minimize negative effects and maximize positiveeffects of product sound

• Examples:– Cars and work machines

– Home appliances

– Office equipment

– Personal devices

• Computational models of product sound quality

Page 302: Communication Acoustics Karjalainen

M. Karjalainen1

Chapter 11: Technical audiology

• How do we hear ? (discussed already)

• What if we don’t hear ?– Why don’t we hear? (mechanisms)

– How to measure ? (audiometry)

– How to improve hearing? (hearing aids)

• Technical devices:– Audiometric equipment

– Hearing aids

– Cochlear implants

Page 303: Communication Acoustics Karjalainen

M. Karjalainen2

Hearing degradation I

• Hearing disabled population– WHO: 270 million hearing disabled in the world (5 %)

– In Finland: ~740 000 with hearing degradation

14 000 new hearing device fittings per year

• Categories of handicap– Disease (sairaus)

– Impairment (vaurio)

– Disability (toimintavajavuus)

– Handicap (haitta)

• Hearing disorders: social classification– Hard-of-hearing persons (huonokuuloinen)

– Deafened persons (kuuroutunut)

– Deaf persons (kuuro)

Page 304: Communication Acoustics Karjalainen

M. Karjalainen3

Hearing degradation II

• Medical classification of hearing impairments– Conductive hearing loss (äänen johtumisvika)

• External and middle ear problems

• Attenuation of loudness

– Sensorineural hearing loss• Inner ear and retrocochlear problems

• Attenuation or recruitment

• Tinnitus

– Central hearing loss• Higher neural levels

• Problems in sound separation or speech analysis

• Problems in localization (spatial separation)

• Tinnitus

– Psychic hearing problems• No clear physiological reason

Page 305: Communication Acoustics Karjalainen

M. Karjalainen4

Hearing threshold change

Page 306: Communication Acoustics Karjalainen

M. Karjalainen5

Audiometry

Audiometer and calibrated headphones

Page 307: Communication Acoustics Karjalainen

M. Karjalainen6

Audiogram behavior

Loud noise effect

(impulse noise)

Effect of age

(presbyacusis)

Page 308: Communication Acoustics Karjalainen

M. Karjalainen7

Degrees of hearing impairment

• Measure of hearing degradation– Average of threshold values at

500, 1000, 2000, 4000 Hz

Page 309: Communication Acoustics Karjalainen

M. Karjalainen8

Other hearing impairment problems

• Other effects of impairment– Sound separation problems, particularly in

noise and reverberation

– Speech communication problems

– Tinnitus• Source at different levels

• No good treatment known

• Often like sinusoidal tone,

but can be hum, broadband noise,

pulsation, etc.

Page 310: Communication Acoustics Karjalainen

M. Karjalainen9

Ear drum impedance measurement

Page 311: Communication Acoustics Karjalainen

M. Karjalainen10

Noise and causes of hearing loss

• Noise measurement– A-weighted equivalent level

– 85 dB long-term daily exposure limit

• Other factors:– Vibration

– Smoking

– Drugs

– Deseases

– Genetic effects

– Combined = often more than their sum

Page 312: Communication Acoustics Karjalainen

M. Karjalainen11

Inner ear damage

Inner hair cell

damage

Outer hair cell

partial damage

Outer hair cell

full damage

Page 313: Communication Acoustics Karjalainen

M. Karjalainen12

Temporary threshold shift

Page 314: Communication Acoustics Karjalainen

M. Karjalainen13

Hearing protectors

Ear plugs Ear muffsAttenuation

Page 315: Communication Acoustics Karjalainen

M. Karjalainen14

Hearing aid types

Page 316: Communication Acoustics Karjalainen

M. Karjalainen15

Hearing aid response

Typical frequency response

of a traditional hearing aid

Multichannel digital hearing aids:

- each frequency channel programmed separately

Page 317: Communication Acoustics Karjalainen

M. Karjalainen16

Hearing aid gain control

Linear gain + limiter Automatic gain control

Page 318: Communication Acoustics Karjalainen

M. Karjalainen17

Hearing aid AGC control

Feedback control Feedforward control

Page 319: Communication Acoustics Karjalainen

M. Karjalainen18

Hearing aid output waveforms

Page 320: Communication Acoustics Karjalainen

M. Karjalainen19

Other issues in hearing aids

• Directional microphones

• Binaural processing

• Noise cancellation

• Wind noise cancellation

• Feedback cancellation

• Speech enhancement

Page 321: Communication Acoustics Karjalainen

M. Karjalainen20

Cochlear implants

• Electronic stimulation of auditory nerve

Page 322: Communication Acoustics Karjalainen

M. Karjalainen21

Cochlear implants II

• ~100 000 units fitted worldwide

• For deafened adults and deaf-born children

• Price about 50 000 $ in USA

• Multielectrode devices nowadays– (e.g. 24 channels)

– Speech from microphone is divided to channels

– Inductive coupling through skin

– Multielectrode in the cochlea

– Different pulse modulations used