analisis spektrogram dan trackplot
DESCRIPTION
komputerisasi, pengolahan sinyalTRANSCRIPT
1
HCS 7367Speech Perception Lab
Dr. Peter AssmannFall 2013
Class web page
http://www.utdallas.edu/~assmann/hcs7367/
• Course information
• Lab details
• Speech demos
• Matlab programs used for class assignments
• Additional resources
Matlab BackgroundKermit Sigmon, MATLAB Primer 2nd Edition.
http://www.fi.uib.no/Fysisk/Teori/KURS/WRK/mat/singlemat.html
Getting started with Matlab (The MathWorks):http://www.mathworks.com/help/techdoc/learn_matlab/bqr_2pl.html
UTD IR – Matlab and Simulink: Resources for Getting Started
http://www.utdallas.edu/ir/how-to/ml_help/index.html
Praat : doing phonetics by computer
Download Praat:http://www.fon.hum.uva.nl/praat/
Praat tutorial:http://www.fon.hum.uva.nl/praat/manual/Intro.html
Wavesurfer
Download Wavesurfer:www.speech.kth.se/wavesurfer
Wavesurfer User Manualwww.speech.kth.se/wavesurfer/man.html
Starting with Matlab
Interactive MATLAB Tutorialhttp://www.mathworks.com/help/techdoc/learn_matlab/f0‐11759.html
http://www.mathworks.com/academia/student_center/tutorials/ml_onramp/player.html?slide=1
Start Matlab docMatlab Clickon“GettingStarted” Thislaunchesavideoinyourbrowser
2
Dates for lab assignments
Lab assignment 1: Sept 19
Lab assignment 2: Oct 10
Lab assignment 3: Oct 31
Lab assignment 4: Nov 21
• 3 page reports (with figures) on lab projects
Term project: important dates
Sept 5: Submit project topics
Sep 26: Turn in project outline
Oct 3: Preliminary project presentations
Nov 14/21: Oral presentations
Dec 12: Final project paper due
Examples of topics• Acoustic analysis and intelligibility of children’s speech
• Neural network models of vowel recognition
• Simulating distortions introduced by hearing loss
• Noise reduction algorithms for hearing aid processors
• Production and perception of foreign accents
• Contribution of prosody to connected speech intelligibility
• Effects of noise, reverberation on speech communication
• Monaural vs. binaural speech understanding in noise
• Development of speech perception in infants
• Models of speech coding in the auditory cortex
Initial stages
• Identify a topic area and read the relevant papers
• Refine your topic; choose a manageable problem
• Set specific goals and define evaluation metric
• Identify the approach to solve the problem
• Start right away.
Finding papersPubMed search engine:
http://www.ncbi.nlm.nih.gov/entrez/
Finding papersPubMed search engine
http://www.ncbi.nlm.nih.gov/entrez/
Find more papers
Find free full-text articles
3
Finding papersJournal of the Acoustical Society of America:
http://scitation.aip.org/jasa/
Fundamental frequency (F0)
Fundamental frequency (F0) is determined by the rate of vocal fold vibration, and is responsible for the perceived voice pitch.
Audio demo: the source signal
Source signal for an adult male voice
Source signal for an adult female voice
Source signal for a 10-year child
Harmonics are integer multiples of F0 and are evenly spaced in frequency
Harmonicity and Periodicity
• Period: regularly repeating pattern in the waveform
Period duration T0 = 6 ms
0 0.5 1 1.5 2 2.5
-40
-20
0
20
Am
plitu
de
(dB
)
F0 = 1000 / 6 = 166 Hz
F0 = 1 / T0
Waveform
Amplitude Spectrum
Frequency (kHz)
Source properties
In voiced sounds the glottal source spectrum contains a series of lines called harmonics.
The lowest one is called the fundamental frequency (F0).
0 200 400 600 800 1000-50
-40
-30
-20
-10
0
Rel
ativ
e A
mpl
itude
(dB
)
Frequency (Hz)
F0
Amplitude Spectrum
4
Filter properties The vocal tract resonances (called formants)
produce peaks in the spectrum envelope.
Formants are labeled F1, F2, F3, ... in order of increasing frequency.
0 1 2 3 4-50
-40
-30
-20
-10
0
Frequency (kHz)
Am
plitu
de in
dB
F1 F2
F3
F4AmplitudeSpectrum
(with superimposedLPC spectral envelope)
Demo: harmonic synthesis
Additive harmonic synthesis: vowel /i/
Cumulative sum of harmonics: vowel /i/
Additive synthesis: “wheel”
Cumulative sum of partials:
(.wav)
(.wav)
(.wav)
(.wav)
Vocal tract properties
Resonating tube model– approximation for neutral vowel (schwa), [ə]
– closed at one end (glottis); open at the other (lips)
– uniform cross-sectional area
– curvature is relatively unimportant
Glottis Lips
length, L
/ə/
Uniform tube model (schwa)
i “heed”
ɩ “hid”
e “hayed”
ɛ “head”
æ “had”ʌ “hut”
ɑ “hod”
ɔ “hawed”
o “hoed”
ʊ “hood”
u “who’d”
American English vowel spaceAdvancement
Height
F1→
←F2
front center back
high
mid
low
Ə “schwa”
Acoustic vowel space
i “heed”
ɑ “hod”u “who’d”
First formant, F1 frequency (Hz)
Sec
ond
form
ant,
F2
freq
uenc
y (H
z)
100080060040020000
1000
2000
3000
Ə
5
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
– Fn is the frequency of formant n in Hz
– c is the velocity of sound in air (about 35000 cm/sec)
– L is the length of the vocal tract (17.5 for adult male)
L
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 L
– F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz
– F2 = (2(2) –1)*35000/(4*17.5) = 1500 Hz
– F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz
L
Note that the vowel /Ə/ (‘schwa’ ) has formants at odd multiples of F1
Helium speech
The speed of sound in a helium/oxygen mixture
at 20°C is about 93000 cm/s, compared to
35000 cm/s in air. This increases the resonance
frequencies but has relatively little effect on F0.
In helium speech, the formants are shifted up
but the pitch stays the same.
Helium speech
Using Matlab as a calculator, find the
frequencies of F1, F2 and F3 for a 17.5 cm
vocal tract producing the vowel /ə/ in a
helium/air mixture (velocity c ≈ 93000 cm/s)
Fn = ( 2n – 1 ) c / 4 L
F1 = (2*(1) - 1)*93000/(4*17.5) = 1329
F2 = (2*(2) - 1)*93000 /(4*17.5) = 3986
F3 = (2*(3) - 1)*93000 /(4*17.5) = 6643
Helium speech
Audio demos
– Speech in air
– Speech in helium
– Pitch in air
– Pitch in helium
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.htmlTime (ms)
Fre
quen
cy (
kHz)
0 100 200 300 400 500 600 700 800 9000
1
2
3
4
Speech in air
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
6
Time (ms)
Fre
quen
cy (
kHz)
0 100 200 300 400 500 600 700 8000
1
2
3
4
Speech in helium
http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html
Vocal tract model
Quarter-wave resonator:
Fn = ( 2n – 1 ) c / 4 l where Fn is the frequency of formant n
c is the velocity of sound (about 35000 cm/sec) l is the vocal tract tube length (17.5 cm for adult male)
L
Perturbation Theory
The first formant (F1) frequency is lowered by a constriction in the front half of the vocal tract (/u/ and /i/), and raised when the constriction is in the back of the vocal tract, as in //.
deltaF1
glottis lips
Perturbation Theory
The second formant (F2) is lowered by a constriction near the lips or just above the pharynx; in /u/ both of these regions are constricted. F2 is raised when the constriction is behind the lips and teeth, as in the vowel /i/.
deltaF2
glottis lips
Perturbation Theory
The third formant (F3) is lowered by a constriction at the lips or at the back of the mouth or in the upper pharynx. This occurs in /r/ and /r/-colored vowels like American English / ɚ / (as in “herd”).
deltaF3
glottis lips
Perturbation Theory
F3 is raised when the constriction is behind the lips and teeth or near the upper pharynx.
deltaF3
glottis lips
7
Perturbation Theory
All formants tend to drop in frequency when the vocal tract length is increased or when a constriction is formed at the lips.
glottis lips
Perturbation Theory
F1 frequency is correlated with jaw opening (and inversely related to tongue height ).
0 1 2 3 4-50
-40
-30
-20
-10
0
Frequency (kHz)
Am
plit
ude in
dB
amplitude spectrum
Perturbation Theory
F2 frequency is correlated with tongue advancement (front-back dimension)
0 1 2 3 4-50
-40
-30
-20
-10
0
Frequency (kHz)
Am
plit
ude in
dB
amplitude spectrum
Spectral analysis
Amplitude spectrum: sound pressure levels associated with different frequency components of a signal
Power or intensity
Amplitude or magnitude
Log units and decibels (dB)
Phase spectrum: relative phases associated with different frequency components
Degrees or radians
Spectral analysis of speech
Why perform a frequency analyses of speech?
– Ear+brain carry out a form of frequency analysis
– Relevant features of speech are more readily visible
in the amplitude spectrum than in the raw waveform
8
Spectral analysis of speech
But: the ear is not a spectrum analyzer.
– Auditory frequency selectivity is best at low
frequencies and gets progressively worse at higher
frequencies.
Short-term amplitude spectrum
0 1 2 3 4-10
0
10
20
30
40
50
60
Frequency (kHz)
Am
plit
ude
(dB
) F3 = 2755 Hz
F1 = 281 HzF2 = 2196 Hz
Speech spectrograms
What is a speech spectrogram?– Display of amplitude spectrum at successive
instants in time ("running spectra")
– How can 3 dimensions be represented on a two-dimensional display? Gray-scale spectrogram
Waterfall plots
Animation
Speech spectrograms
Why are speech spectrograms useful?– Shows dynamic properties of speech
– Incorporates frequency analysis
– Related to speech production
– Helps to visually identify speech cues
9
“The watchdog”waveform
spectrogram
F3
F2
F10
5
Freq
uenc
y (k
Hz)
Digital representations of signals
time
ampl
itude
sampling quantization
Digital representations of signals
Sampling frequency (e.g. 44.1 kHz) Nyquist frequency Effects of discrete-time sampling on bandwidth
Quantization rate (16 bits) 16 bits =216 quantization steps Effects of discrete-level quantization on
dynamic range
Use the wavesurfer program on your laptop as a sound recording device.
Left-click red button to record the vowels “ee”, “ah” and “oo,” in quick sequence.
In-class assignment In-class assignment Use wavesurfer to make a spectrogram of
the vowels. Right click on waveform plot to add spectrogram + formant tracks.
10
In-class assignment Left-click and drag mouse to select the
desired region in the signal. Then right-click and select “Statistics”.
In-class assignment This will display the formant frequencies
(mean and standard deviation across n=13 frames, in this example).
In-class assignment For this vowel (“ee”) the estimated
formant frequencies are F1=174 Hz, F2=1849 Hz, F3=2492 Hz, F4=3392 Hz.
In-class assignment Now measure the formants in your
productions of the vowels “ee”, “ah”, “oo”. Make a table of F1, F2 and F3 frequencies.
F1
F2
F3
F1
F2
F3
F1
F2
F3
i “heed” ɑ “hod” u “who’d”
In-class assignment
Save the waveform as vowels.wav Load into Praat and Matlab and repeat the
assignment (instructions to follow).
Vector representation of speech
In Matlab speech signals are represented as row or column vectors (e.g., N rows x 1 columns, where N is the number of samples in the waveform).
>> [y,fs]=wavread(‘wheel.wav’); % load waveform
>> size( y )
ans =
3200 1
The variable ‘y’ has 3200 rows x 1 column (row vector).
The variable ‘fs’ has 1 row x 1 column (scalar).
11
Vector representation of speech
Load the waveform and plot it:
>> [y,fs]=wavread(‘wheel.wav’); % load waveform
>> t=(1:length(y) ) ./ (fs/1000); % set up time axis
>> plot( t, y ); % use plot command
>> axis( [ 0 400 -1 1 ] ); % set axis limits
>> xlabel('Time (ms)'); % x-axis label
>> ylabel('Amplitude'); % y-axis label
>> title('Waveform plot'); % axis title
Spectral analysis in Matlab Fourier spectrum of a vector:
>> X= fft (y);
>> help fft
FFT Discrete Fourier transform.
FFT(X) is the discrete Fourier transform (DFT) of vector X. If the length of X is a power of two, a fast radix-2 fast-Fourier transform algorithm is used. If the length of X is not a power of two, a slower non-power-of-two algorithm is employed. For matrices, the FFT operation is applied to each column.
FFT(X,N) is the N-point FFT, padded with zeros if X has less than N points and truncated if it has more.
Spectral analysis in Matlab
Log magnitude (amplitude) spectrum:
>> X= fft (y);
>> m = 20 * log10 ( abs ( X ) );
>> help abs
ABS Absolute value.ABS(X) is the absolute value of the elements of X. When X is complex, ABS(X) is the complex modulus (magnitude) of the elements of X.
Spectral analysis in Matlab Log magnitude (amplitude) spectrum:
>> plot(20*log10(abs(fft(y))))
0 200 400 600 800 100040
60
80
100
120
140
Plotting amplitude spectra
» help fp
FP: function to compute & plot amplitude spectrum
Usage: [a,f]=fp(wave,rate,window);
wave: input waveform
rate: sample rate in Hz (default 10000 Hz)
window options: 'hann', 'hamm', 'kais', or 'rect' (default=hamming)
[a,f]: log magnitude (dB re:1), frequency (Hz)
Plotting amplitude spectra
» [a,f]=fp(wave,rate,window);
» [a,f]=fp(y,fs,'hann');
0 1 2 3 4-50
-40
-30
-20
-10
0
10
20
Frequency (kHz)
Am
plit
ud
e (
dB
)
p
12
Assignment 1
Part 1: (Matlab code, plots, brief summary)
• Make a set of digital recordings (WAV files) of
the 12 vowels of American English:
/i/ "heed" /ɪ/ "hid" /e/ "hayed" /ɛ/ "head"
/æ/ "had" /ʌ/ "hud" /ɑ/ "hod" /ɔ/ "hawed"
/o/ "hoed" /ʊ/ "hood" /u/ "who’d" /ɚ/ "herd"
Assignment 1
• Load waveforms into Matlab; make 12
subplots of the amplitude spectra of the vowels,
sampled near the midpoint.
» [ y, fs ] = wavread ('heed.wav');
» subplot (4,3,1);
» start = ( length (y) / 2 ) - 256;
» stop = ( length (y) / 2 ) + 256;
» fp ( y ( start : stop ) , 512 , fs, 'heed.wav', 'Hamming');
Assignment 1
• Plot the amplitude spectra of the vowels. Place
all 12 plots in a single figure window using the
subplot command:
>> subplot ( 3, 4, 1);
>> plot ( x, y );
>> subplot ( 3, 4, 2);
>> plot ( x, y );
// "heed" // "hid" // "hayed" // "head"
// "had" // "hud" // "hod" // "hawed"
// "hoed" // "hood" // "who’d" // "herd"
Assignment 1
• Step 1: Make a list of the filenames as a character
array:
>> filenames = char ( 'heed', 'hid', 'hayed', …
'head', 'had', 'hud', 'herd', 'hod', …
'hawed', 'hoed', 'hood', 'whod' ) ;
>> deblank ( filenames ( 3, : ) )
ans =
hayed
Assignment 1
• Step 2: Load the waveform of each vowel from
the disk:
>> for i=1:12,
>> [ y, rate ] = wavread ( deblank ( filenames ( i , : ) ) );
>> y = y * 2^15; % scale signal to 16-bit range (±215)
>> % insert plot commands here
>> end;
Assignment 1
• Step 2: extract the middle part from the waveform
>> % extract samples that lie between start and stop:
>> y = y( start : stop ); % but how do we select start and stop?
start stop
13
Exercise1
• Find out various properties of the waveform:
» length ( y ) % vector length
» min ( y ) % minumum value
» max ( y ) % maximum value
» mean ( y ) % mean value
» plot ( y ) % inspect waveform
» sound ( y, rate ) % listen to waveform
Exercise1
• Step 3: Find vowel midpoint; define a range
of sample points to extract from the waveform.
» nfft = 512;
» start = ( length (y) / 2 ) – (nfft/2 – 1);
» stop = ( length (y) / 2 ) + nfft/2;
% y ( start : stop )
Exercise1
• Step 4: Use the function fp.m to compute and
plot the amplitude spectrum of the vowel
segment:
» fp ( y( start : stop ) , fs , 'Hamm' );
input vector(waveform segment) sample
rate
type ofwindowfunction
input arguments
Function M-file: fp.m
• There are two types of M-files: scripts and functions. To
display the contents of an M-file, type the following:
» type fp.m
• Function M-files start with a function statement (see next
page) and a series of comment lines. The comment lines
are included to provide online help and are optional (but
very useful!). The next five slides illustrate and explain
the contents of the function fp.m
Function M-file: fp.m% FP: function to compute & plot amplitude spectrum
% Usage: [a,f]=fp(wave,rate,window);
% wave: input waveform
% rate: sample rate in Hz (default 10000 Hz)
% window options: 'hann', 'hamm', 'kais', 'rect' (default=hanning)
% [a,f]: log magnitude, frequency
function [ a, f ] = fp ( x, rate, window ) ;
optional output argumentsa=log magnitude spectrumf=corresponding frequencies
function statement
comment lines
Function M-file: fp.m
% set reasonable defaults for optional variables
if ~exist ( 'rate' , 'var' ) ,
rate=10000;
end;
if ~exist ( 'window' , 'var' ),
window = 'hamm' ;
end;
set defaults
14
Function M-file: fp.m
x = x ( : ) ; % convert x to column vector
n = length ( x ) ; % length of data vector
Variables defined inside a function are “local.” In other words, they are not accessible on the
command line, outside the function itself.
Function M-file: fp.m% illustration of if-else statements:window=lower(window); % window must be lower case
if window=='rect', % rectangular window = [1 1 1 1 1]
x=x.*ones(n,1); % multiplying x by 1 does nothing!
elseif window=='hamm',
x=x.*hamming(n); % multiply x by Hamming window
elseif window=='hann',
x=x.*hanning(n); % multiply x by Hanning window
else,
x=x.*hamming(n); % default case: Hamming window
end;
Function M-file: fp.mm=fft(x,n); % Fast Fourier Transform (fastest if n = power of 2)
no2=round(n/2); % n/2 samples: FFT is symmetrical
a=20*log10( abs ( m ) / n); % convert linear magnitude to dB
f=rate/n*(0:no2)/1000; % frequency scale: DC = 0 to fs/2
freq = f (1:no2); % retain only the first n/2 samples
amp = a (1:no2); % retain only the first n/2 samples
Function M-file: fp.m% plot amplitude spectrum: frequency vs. amplitude
plot ( freq , amp ) ; % frequency = x-axis, amplitude=y-axis
axis( [ 0 rate/2000 -Inf Inf ] ) ; % axis range: [ xl xh yl yh ]
% ****** End of function fp.m ******
Exercise1
• Annotate graph:
>> xlabel ( 'Frequency (kHz)' ); % x-axis label
>> ylabel ( 'Amplitude (dB)' ); % y-axis label
>> title ( filenames ( i , : ) ); % graph title
Turn off the axis labels by inserting an empty string:
>> ylabel ( ' ' ); % null axis label
Exercise: modify fp.m % modify fp.m to compute phase spectrum
phase = unwrap ( angle (m) ) ;
p = 180 / phase; % convert from radians to degrees
% plot phase spectrum: frequency vs. phase
plot ( freq , phase ) ; % frequency = x-axis, phase=y-axis
axis( [ 0 rate/2000 –180 180 ] ) ;
15
Annotations
>> xlabel ( 'Frequency (kHz)' ); % x-axis label
>> ylabel ( 'Amplitude (dB)' ); % y-axis label
>> title ( filenames ( i , : ) ); % graph title
Modifying axes properties
• Modify default axes properties:
>> gca % “get current axes” = axes handle
>> set ( gca, 'XLim', [ 0 4 ] ); % x-axis range
>> set ( gca, 'YLim', [ -20 40 ] ); % y-axis range
>> set ( gca, 'TickDir', 'Out' ); % tick mark dir
Amplitude spectrum
0 1 2 3 4-10
0
10
20
30
40
50
60
Frequency (kHz)
Am
plitu
de (
dB)
Phase spectrum
0 1 2 3 4
-150
-100
-50
0
50
100
150
Frequency (kHz)
Pha
se (
deg)
Speech spectrograms in Matlab
» help specgram
SPECGRAM Calculate spectrogram from signal.
B = SPECGRAM(A,NFFT,Fs,WINDOW,NOVERLAP)
calculates the spectrogram for the signal in vector A.
SPECGRAM splits the signal into overlapping
segments, windows each with the WINDOW vector
and forms the columns of B with their zero-padded,
length NFFT discrete Fourier transforms.
Speech spectrograms in Matlab» help sp
sp: create gray-scale spectrogram
Usage: h=sp(wave,rate,nfft,nsampf,nhop,pre,drng);
wave: input waveform
rate: sample rate in Hz (default 8000 Hz)
nfft: FFT window length (default: 256 samples)
nsampf: number of samples per frame (default: 60)
nhop: number of samples to hop to next frame
(default: 5 samples)
pre: preemphasis factor (0-1) (default: 1)
drng: dynamic range in dB (default: 80)
title: title for graph (default: none)
16
Making spectrograms
>> load wheel % Load pre-recorded waveform
>> sp (wheel, 8000); % Use defaults for other variables
>> colormap(hot); % determines image color scheme
>> axis tight; % extends plot to axis limits
Making spectrograms
Time (ms)
Fre
que
ncy
(kH
z)
0 100 200 300 400 500 6000
1
2
3
4
5
6
hod
TrackDraw: a graphical speech synthesizerTrackDraw program
Provides a graphical interface for controlling aspeech synthesizer (cascade formant synthesis,Klatt, 1980)
Allows for successive iterations of hand-tracking,synthesizing and listening to the results
Assmann, P., Ballard, W., Bornstein, L., andPaschall, D. (1994). Track-Draw: A graphicalinterface for controlling the parameters of aspeech synthesizer. Behavior Research Methods,Instruments and Computers 26, 431-436.
Using TrackDraw
» load wheel» y=wheel;» specsynth;
The “Spectral Slice” Display
17
Fundamental Frequency (F0) window Amplitude of voicing (AV) window
TrackDraw: finished tracks Saving, printing and re-loading “tracks”
>> specsynth; % when finished tracking click on exit button >> savetr% save tracks in file; enter name xxheedtr% savetr will append the .mat extension >> load xxheedtr.mat% To re-load track files and run statistics >> plottracks>> print -Pljhd