analisis spektrogram dan trackplot

1

HCS 7367Speech Perception Lab

Dr. Peter AssmannFall 2013

Class web page

http://www.utdallas.edu/~assmann/hcs7367/

• Course information

• Lab details

• Speech demos

• Matlab programs used for class assignments

• Additional resources

Matlab BackgroundKermit Sigmon, MATLAB Primer 2nd Edition.

http://www.fi.uib.no/Fysisk/Teori/KURS/WRK/mat/singlemat.html

Getting started with Matlab (The MathWorks):http://www.mathworks.com/help/techdoc/learn_matlab/bqr_2pl.html

UTD IR – Matlab and Simulink: Resources for Getting Started

http://www.utdallas.edu/ir/how-to/ml_help/index.html

Praat : doing phonetics by computer

Download Praat:http://www.fon.hum.uva.nl/praat/

Praat tutorial:http://www.fon.hum.uva.nl/praat/manual/Intro.html

Wavesurfer

Download Wavesurfer:www.speech.kth.se/wavesurfer

Wavesurfer User Manualwww.speech.kth.se/wavesurfer/man.html

Starting with Matlab

Interactive MATLAB Tutorialhttp://www.mathworks.com/help/techdoc/learn_matlab/f0‐11759.html

http://www.mathworks.com/academia/student_center/tutorials/ml_onramp/player.html?slide=1

Start Matlab docMatlab Clickon“GettingStarted” Thislaunchesavideoinyourbrowser

2

Dates for lab assignments

Lab assignment 1: Sept 19

Lab assignment 2: Oct 10

Lab assignment 3: Oct 31

Lab assignment 4: Nov 21

• 3 page reports (with figures) on lab projects

Term project: important dates

Sept 5: Submit project topics

Sep 26: Turn in project outline

Oct 3: Preliminary project presentations

Nov 14/21: Oral presentations

Dec 12: Final project paper due

Examples of topics• Acoustic analysis and intelligibility of children’s speech

• Neural network models of vowel recognition

• Simulating distortions introduced by hearing loss

• Noise reduction algorithms for hearing aid processors

• Production and perception of foreign accents

• Contribution of prosody to connected speech intelligibility

• Effects of noise, reverberation on speech communication

• Monaural vs. binaural speech understanding in noise

• Development of speech perception in infants

• Models of speech coding in the auditory cortex

Initial stages

• Identify a topic area and read the relevant papers

• Refine your topic; choose a manageable problem

• Set specific goals and define evaluation metric

• Identify the approach to solve the problem

• Start right away.

Finding papersPubMed search engine:

http://www.ncbi.nlm.nih.gov/entrez/

Finding papersPubMed search engine

http://www.ncbi.nlm.nih.gov/entrez/

Find more papers

Find free full-text articles

3

Finding papersJournal of the Acoustical Society of America:

http://scitation.aip.org/jasa/

Fundamental frequency (F0)

Fundamental frequency (F0) is determined by the rate of vocal fold vibration, and is responsible for the perceived voice pitch.

Audio demo: the source signal

Source signal for an adult male voice

Source signal for an adult female voice

Source signal for a 10-year child

Harmonics are integer multiples of F0 and are evenly spaced in frequency

Harmonicity and Periodicity

• Period: regularly repeating pattern in the waveform

Period duration T0 = 6 ms

0 0.5 1 1.5 2 2.5

-40

-20

0

20

Am

plitu

de

(dB

)

F0 = 1000 / 6 = 166 Hz

F0 = 1 / T0

Waveform

Amplitude Spectrum

Frequency (kHz)

Source properties

In voiced sounds the glottal source spectrum contains a series of lines called harmonics.

The lowest one is called the fundamental frequency (F0).

0 200 400 600 800 1000-50

-40

-30

-20

-10

0

Rel

ativ

e A

mpl

itude

(dB

)

Frequency (Hz)

F0

Amplitude Spectrum

4

Filter properties The vocal tract resonances (called formants)

produce peaks in the spectrum envelope.

Formants are labeled F1, F2, F3, ... in order of increasing frequency.

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plitu

de in

dB

F1 F2

F3

F4AmplitudeSpectrum

(with superimposedLPC spectral envelope)

Demo: harmonic synthesis

Additive harmonic synthesis: vowel /i/

Cumulative sum of harmonics: vowel /i/

Additive synthesis: “wheel”

Cumulative sum of partials:

(.wav)

(.wav)

(.wav)

(.wav)

Vocal tract properties

Resonating tube model– approximation for neutral vowel (schwa), [ə]

– closed at one end (glottis); open at the other (lips)

– uniform cross-sectional area

– curvature is relatively unimportant

Glottis Lips

length, L

/ə/

Uniform tube model (schwa)

i “heed”

ɩ “hid”

e “hayed”

ɛ “head”

æ “had”ʌ “hut”

ɑ “hod”

ɔ “hawed”

o “hoed”

ʊ “hood”

u “who’d”

American English vowel spaceAdvancement

Height

F1→

←F2

front center back

high

mid

low

Ə “schwa”

Acoustic vowel space

i “heed”

ɑ “hod”u “who’d”

First formant, F1 frequency (Hz)

Sec

ond

form

ant,

F2

freq

uenc

y (H

z)

100080060040020000

1000

2000

3000

Ə

5

Vocal tract model

Quarter-wave resonator:

Fn = ( 2n – 1 ) c / 4 L

– Fn is the frequency of formant n in Hz

– c is the velocity of sound in air (about 35000 cm/sec)

– L is the length of the vocal tract (17.5 for adult male)

L

Vocal tract model


Fn = ( 2n – 1 ) c / 4 L

– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz

– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz

– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz

L

Note that the vowel /Ə/ (‘schwa’ ) has formants at odd multiples of F1

Helium speech

The speed of sound in a helium/oxygen mixture

at 20°C is about 93000 cm/s, compared to

35000 cm/s in air. This increases the resonance

frequencies but has relatively little effect on F0.

In helium speech, the formants are shifted up

but the pitch stays the same.

Helium speech

Using Matlab as a calculator, find the

frequencies of F1, F2 and F3 for a 17.5 cm

vocal tract producing the vowel /ə/ in a

helium/air mixture (velocity c ≈ 93000 cm/s)

Fn = ( 2n – 1 ) c / 4 L

F1 = (2*(1) - 1)*93000/(4*17.5) = 1329

F2 = (2*(2) - 1)*93000 /(4*17.5) = 3986

F3 = (2*(3) - 1)*93000 /(4*17.5) = 6643

Helium speech

Audio demos

– Speech in air

– Speech in helium

– Pitch in air

– Pitch in helium

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.htmlTime (ms)

Fre

quen

cy (

kHz)

0 100 200 300 400 500 600 700 800 9000

1

2

3

4

Speech in air

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

6

Time (ms)

Fre

quen

cy (

kHz)

0 100 200 300 400 500 600 700 8000

1

2

3

4

Speech in helium

http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html

Vocal tract model


Fn = ( 2n – 1 ) c / 4 l where Fn is the frequency of formant n

c is the velocity of sound (about 35000 cm/sec) l is the vocal tract tube length (17.5 cm for adult male)

L

Perturbation Theory

The first formant (F1) frequency is lowered by a constriction in the front half of the vocal tract (/u/ and /i/), and raised when the constriction is in the back of the vocal tract, as in //.

deltaF1

glottis lips

Perturbation Theory

The second formant (F2) is lowered by a constriction near the lips or just above the pharynx; in /u/ both of these regions are constricted. F2 is raised when the constriction is behind the lips and teeth, as in the vowel /i/.

deltaF2

glottis lips

Perturbation Theory

The third formant (F3) is lowered by a constriction at the lips or at the back of the mouth or in the upper pharynx. This occurs in /r/ and /r/-colored vowels like American English / ɚ / (as in “herd”).

deltaF3

glottis lips

Perturbation Theory

F3 is raised when the constriction is behind the lips and teeth or near the upper pharynx.

deltaF3

glottis lips

7

Perturbation Theory

All formants tend to drop in frequency when the vocal tract length is increased or when a constriction is formed at the lips.

glottis lips

Perturbation Theory

F1 frequency is correlated with jaw opening (and inversely related to tongue height ).

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ude in

dB

amplitude spectrum

Perturbation Theory

F2 frequency is correlated with tongue advancement (front-back dimension)

0 1 2 3 4-50

-40

-30

-20

-10

0

Frequency (kHz)

Am

plit

ude in

dB

amplitude spectrum

Spectral analysis

Amplitude spectrum: sound pressure levels associated with different frequency components of a signal

Power or intensity

Amplitude or magnitude

Log units and decibels (dB)

Phase spectrum: relative phases associated with different frequency components

Degrees or radians

Spectral analysis of speech

Why perform a frequency analyses of speech?

– Ear+brain carry out a form of frequency analysis

– Relevant features of speech are more readily visible

in the amplitude spectrum than in the raw waveform

8

Spectral analysis of speech

But: the ear is not a spectrum analyzer.

– Auditory frequency selectivity is best at low

frequencies and gets progressively worse at higher

frequencies.

Short-term amplitude spectrum

0 1 2 3 4-10

0

10

20

30

40

50

60

Frequency (kHz)

Am

plit

ude

(dB

) F3 = 2755 Hz

F1 = 281 HzF2 = 2196 Hz

Speech spectrograms

What is a speech spectrogram?– Display of amplitude spectrum at successive

instants in time ("running spectra")

– How can 3 dimensions be represented on a two-dimensional display? Gray-scale spectrogram

Waterfall plots

Animation

Speech spectrograms

Why are speech spectrograms useful?– Shows dynamic properties of speech

– Incorporates frequency analysis

– Related to speech production

– Helps to visually identify speech cues

9

“The watchdog”waveform

spectrogram

F3

F2

F10

5

Freq

uenc

y (k

Hz)

Digital representations of signals

time

ampl

itude

sampling quantization

Digital representations of signals

Sampling frequency (e.g. 44.1 kHz) Nyquist frequency Effects of discrete-time sampling on bandwidth

Quantization rate (16 bits) 16 bits =216 quantization steps Effects of discrete-level quantization on

dynamic range

Use the wavesurfer program on your laptop as a sound recording device.

Left-click red button to record the vowels “ee”, “ah” and “oo,” in quick sequence.

In-class assignment In-class assignment Use wavesurfer to make a spectrogram of

the vowels. Right click on waveform plot to add spectrogram + formant tracks.

10

In-class assignment Left-click and drag mouse to select the

desired region in the signal. Then right-click and select “Statistics”.

In-class assignment This will display the formant frequencies

(mean and standard deviation across n=13 frames, in this example).

In-class assignment For this vowel (“ee”) the estimated

formant frequencies are F1=174 Hz, F2=1849 Hz, F3=2492 Hz, F4=3392 Hz.

In-class assignment Now measure the formants in your

productions of the vowels “ee”, “ah”, “oo”. Make a table of F1, F2 and F3 frequencies.

F1

F2

F3

F1

F2

F3

F1

F2

F3

i “heed” ɑ “hod” u “who’d”

In-class assignment

Save the waveform as vowels.wav Load into Praat and Matlab and repeat the

assignment (instructions to follow).

Vector representation of speech

In Matlab speech signals are represented as row or column vectors (e.g., N rows x 1 columns, where N is the number of samples in the waveform).

>> [y,fs]=wavread(‘wheel.wav’); % load waveform

>> size( y )

ans =

3200 1

The variable ‘y’ has 3200 rows x 1 column (row vector).

The variable ‘fs’ has 1 row x 1 column (scalar).

11

Vector representation of speech

Load the waveform and plot it:

>> [y,fs]=wavread(‘wheel.wav’); % load waveform

>> t=(1:length(y) ) ./ (fs/1000); % set up time axis

>> plot( t, y ); % use plot command

>> axis( [ 0 400 -1 1 ] ); % set axis limits

>> xlabel('Time (ms)'); % x-axis label

>> ylabel('Amplitude'); % y-axis label

>> title('Waveform plot'); % axis title

Spectral analysis in Matlab Fourier spectrum of a vector:

>> X= fft (y);

>> help fft

FFT Discrete Fourier transform.

FFT(X) is the discrete Fourier transform (DFT) of vector X. If the length of X is a power of two, a fast radix-2 fast-Fourier transform algorithm is used. If the length of X is not a power of two, a slower non-power-of-two algorithm is employed. For matrices, the FFT operation is applied to each column.

FFT(X,N) is the N-point FFT, padded with zeros if X has less than N points and truncated if it has more.

Spectral analysis in Matlab

Log magnitude (amplitude) spectrum:

>> X= fft (y);

>> m = 20 * log10 ( abs ( X ) );

>> help abs

ABS Absolute value.ABS(X) is the absolute value of the elements of X. When X is complex, ABS(X) is the complex modulus (magnitude) of the elements of X.

Spectral analysis in Matlab Log magnitude (amplitude) spectrum:

>> plot(20*log10(abs(fft(y))))

0 200 400 600 800 100040

60

80

100

120

140

Plotting amplitude spectra

» help fp

FP: function to compute & plot amplitude spectrum

Usage: [a,f]=fp(wave,rate,window);

wave: input waveform

rate: sample rate in Hz (default 10000 Hz)

window options: 'hann', 'hamm', 'kais', or 'rect' (default=hamming)

[a,f]: log magnitude (dB re:1), frequency (Hz)

Plotting amplitude spectra

» [a,f]=fp(wave,rate,window);

» [a,f]=fp(y,fs,'hann');

0 1 2 3 4-50

-40

-30

-20

-10

0

10

20

Frequency (kHz)

Am

plit

ud

e (

dB

)

p

12

Assignment 1

Part 1: (Matlab code, plots, brief summary)

• Make a set of digital recordings (WAV files) of

the 12 vowels of American English:

/i/ "heed" /ɪ/ "hid" /e/ "hayed" /ɛ/ "head"

/æ/ "had" /ʌ/ "hud" /ɑ/ "hod" /ɔ/ "hawed"

/o/ "hoed" /ʊ/ "hood" /u/ "who’d" /ɚ/ "herd"

Assignment 1

• Load waveforms into Matlab; make 12

subplots of the amplitude spectra of the vowels,

sampled near the midpoint.

» [ y, fs ] = wavread ('heed.wav');

» subplot (4,3,1);

» start = ( length (y) / 2 ) - 256;

» stop = ( length (y) / 2 ) + 256;

» fp ( y ( start : stop ) , 512 , fs, 'heed.wav', 'Hamming');

Assignment 1

• Plot the amplitude spectra of the vowels. Place

all 12 plots in a single figure window using the

subplot command:

>> subplot ( 3, 4, 1);

>> plot ( x, y );

>> subplot ( 3, 4, 2);

>> plot ( x, y );

// "heed" // "hid" // "hayed" // "head"

// "had" // "hud" // "hod" // "hawed"

// "hoed" // "hood" // "who’d" // "herd"

Assignment 1

• Step 1: Make a list of the filenames as a character

array:

>> filenames = char ( 'heed', 'hid', 'hayed', …

'head', 'had', 'hud', 'herd', 'hod', …

'hawed', 'hoed', 'hood', 'whod' ) ;

>> deblank ( filenames ( 3, : ) )

ans =

hayed

Assignment 1

• Step 2: Load the waveform of each vowel from

the disk:

>> for i=1:12,

>> [ y, rate ] = wavread ( deblank ( filenames ( i , : ) ) );

>> y = y * 2^15; % scale signal to 16-bit range (±215)

>> % insert plot commands here

>> end;

Assignment 1

• Step 2: extract the middle part from the waveform

>> % extract samples that lie between start and stop:

>> y = y( start : stop ); % but how do we select start and stop?

start stop

13

Exercise1

• Find out various properties of the waveform:

» length ( y ) % vector length

» min ( y ) % minumum value

» max ( y ) % maximum value

» mean ( y ) % mean value

» plot ( y ) % inspect waveform

» sound ( y, rate ) % listen to waveform

Exercise1

• Step 3: Find vowel midpoint; define a range

of sample points to extract from the waveform.

» nfft = 512;

» start = ( length (y) / 2 ) – (nfft/2 – 1);

» stop = ( length (y) / 2 ) + nfft/2;

% y ( start : stop )

Exercise1

• Step 4: Use the function fp.m to compute and

plot the amplitude spectrum of the vowel

segment:

» fp ( y( start : stop ) , fs , 'Hamm' );

input vector(waveform segment) sample

rate

type ofwindowfunction

input arguments

Function M-file: fp.m

• There are two types of M-files: scripts and functions. To

display the contents of an M-file, type the following:

» type fp.m

• Function M-files start with a function statement (see next

page) and a series of comment lines. The comment lines

are included to provide online help and are optional (but

very useful!). The next five slides illustrate and explain

the contents of the function fp.m

Function M-file: fp.m% FP: function to compute & plot amplitude spectrum

% Usage: [a,f]=fp(wave,rate,window);

% wave: input waveform

% rate: sample rate in Hz (default 10000 Hz)

% window options: 'hann', 'hamm', 'kais', 'rect' (default=hanning)

% [a,f]: log magnitude, frequency

function [ a, f ] = fp ( x, rate, window ) ;

optional output argumentsa=log magnitude spectrumf=corresponding frequencies

function statement

comment lines


% set reasonable defaults for optional variables

if ~exist ( 'rate' , 'var' ) ,

rate=10000;

end;

if ~exist ( 'window' , 'var' ),

window = 'hamm' ;

end;

set defaults

14


x = x ( : ) ; % convert x to column vector

n = length ( x ) ; % length of data vector

Variables defined inside a function are “local.” In other words, they are not accessible on the

command line, outside the function itself.

Function M-file: fp.m% illustration of if-else statements:window=lower(window); % window must be lower case

if window=='rect', % rectangular window = [1 1 1 1 1]

x=x.*ones(n,1); % multiplying x by 1 does nothing!

elseif window=='hamm',

x=x.*hamming(n); % multiply x by Hamming window

elseif window=='hann',

x=x.*hanning(n); % multiply x by Hanning window

else,

x=x.*hamming(n); % default case: Hamming window

end;

Function M-file: fp.mm=fft(x,n); % Fast Fourier Transform (fastest if n = power of 2)

no2=round(n/2); % n/2 samples: FFT is symmetrical

a=20*log10( abs ( m ) / n); % convert linear magnitude to dB

f=rate/n*(0:no2)/1000; % frequency scale: DC = 0 to fs/2

freq = f (1:no2); % retain only the first n/2 samples

amp = a (1:no2); % retain only the first n/2 samples

Function M-file: fp.m% plot amplitude spectrum: frequency vs. amplitude

plot ( freq , amp ) ; % frequency = x-axis, amplitude=y-axis

axis( [ 0 rate/2000 -Inf Inf ] ) ; % axis range: [ xl xh yl yh ]

% ****** End of function fp.m ******

Exercise1

• Annotate graph:

>> xlabel ( 'Frequency (kHz)' ); % x-axis label

>> ylabel ( 'Amplitude (dB)' ); % y-axis label

>> title ( filenames ( i , : ) ); % graph title

Turn off the axis labels by inserting an empty string:

>> ylabel ( ' ' ); % null axis label

Exercise: modify fp.m % modify fp.m to compute phase spectrum

phase = unwrap ( angle (m) ) ;

p = 180 / phase; % convert from radians to degrees

% plot phase spectrum: frequency vs. phase

plot ( freq , phase ) ; % frequency = x-axis, phase=y-axis

axis( [ 0 rate/2000 –180 180 ] ) ;

15

Annotations

>> xlabel ( 'Frequency (kHz)' ); % x-axis label

>> ylabel ( 'Amplitude (dB)' ); % y-axis label

>> title ( filenames ( i , : ) ); % graph title

Modifying axes properties

• Modify default axes properties:

>> gca % “get current axes” = axes handle

>> set ( gca, 'XLim', [ 0 4 ] ); % x-axis range

>> set ( gca, 'YLim', [ -20 40 ] ); % y-axis range

>> set ( gca, 'TickDir', 'Out' ); % tick mark dir

Amplitude spectrum

0 1 2 3 4-10

0

10

20

30

40

50

60

Frequency (kHz)

Am

plitu

de (

dB)

Phase spectrum

0 1 2 3 4

-150

-100

-50

0

50

100

150

Frequency (kHz)

Pha

se (

deg)

Speech spectrograms in Matlab

» help specgram

SPECGRAM Calculate spectrogram from signal.

B = SPECGRAM(A,NFFT,Fs,WINDOW,NOVERLAP)

calculates the spectrogram for the signal in vector A.

SPECGRAM splits the signal into overlapping

segments, windows each with the WINDOW vector

and forms the columns of B with their zero-padded,

length NFFT discrete Fourier transforms.

Speech spectrograms in Matlab» help sp

sp: create gray-scale spectrogram

Usage: h=sp(wave,rate,nfft,nsampf,nhop,pre,drng);

wave: input waveform

rate: sample rate in Hz (default 8000 Hz)

nfft: FFT window length (default: 256 samples)

nsampf: number of samples per frame (default: 60)

nhop: number of samples to hop to next frame

(default: 5 samples)

pre: preemphasis factor (0-1) (default: 1)

drng: dynamic range in dB (default: 80)

title: title for graph (default: none)

16

Making spectrograms

>> load wheel % Load pre-recorded waveform

>> sp (wheel, 8000); % Use defaults for other variables

>> colormap(hot); % determines image color scheme

>> axis tight; % extends plot to axis limits

Making spectrograms

Time (ms)

Fre

que

ncy

(kH

z)

0 100 200 300 400 500 6000

1

2

3

4

5

6

hod

TrackDraw: a graphical speech synthesizerTrackDraw program

Provides a graphical interface for controlling aspeech synthesizer (cascade formant synthesis,Klatt, 1980)

Allows for successive iterations of hand-tracking,synthesizing and listening to the results

Assmann, P., Ballard, W., Bornstein, L., andPaschall, D. (1994). Track-Draw: A graphicalinterface for controlling the parameters of aspeech synthesizer. Behavior Research Methods,Instruments and Computers 26, 431-436.

Using TrackDraw

» load wheel» y=wheel;» specsynth;

The “Spectral Slice” Display

17

Fundamental Frequency (F0) window Amplitude of voicing (AV) window

TrackDraw: finished tracks Saving, printing and re-loading “tracks”

>> specsynth; % when finished tracking click on exit button >> savetr% save tracks in file; enter name xxheedtr% savetr will append the .mat extension >> load xxheedtr.mat% To re-load track files and run statistics >> plottracks>> print -Pljhd

analisis spektrogram dan trackplot

Documents