algorithms for nlp - carnegie mellon universitytbergkir/11711fa17/fa17 11-711 lecture 6... · 2017....

106
Acoustic Models Taylor Berg-Kirkpatrick – CMU Slides: Dan Klein – UC Berkeley Algorithms for NLP

Upload: others

Post on 25-Jul-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

AcousticModelsTaylorBerg-Kirkpatrick– CMU

Slides:DanKlein– UCBerkeley

AlgorithmsforNLP

Page 2: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SpeechSignals

Page 3: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

n Frequencygivespitch;amplitudegivesvolume

n Frequenciesateachtimesliceprocessedintoobservationvectors

s p ee ch l a b

ampl

itude

SpeechinaSlide

……………………………………………..x12x13x12x14x14………..

Page 4: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Articulation

Page 5: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

TextfromOhala,Sept2001,fromSharonRoseslide

Sagittal sectionofthevocaltract(Techmer 1880)

Nasalcavity

Pharynx

Vocalfolds(inthelarynx)

Trachea

Lungs

ArticulatorySystem

Oralcavity

Page 6: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SpaceofPhonemes

§ Standardinternationalphoneticalphabet(IPA)chartofconsonants

Page 7: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Place

Page 8: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

PlacesofArticulation

labial

dentalalveolar post-alveolar/palatal

velaruvular

pharyngeal

laryngeal/glottal

FigurethankstoJenniferVenditti

Page 9: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Labialplace

bilabial

labiodental

FigurethankstoJenniferVenditti

Bilabial:p,b,m

Labiodental:f,v

Page 10: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Coronalplace

dentalalveolar post-alveolar/palatal

FigurethankstoJenniferVenditti

Dental:th/dh

Alveolar:t/d/s/z/l/n

Post:sh/zh/y

Page 11: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

DorsalPlace

velaruvular

pharyngeal

FigurethankstoJenniferVenditti

Velar:k/g/ng

Page 12: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SpaceofPhonemes

§ Standardinternationalphoneticalphabet(IPA)chartofconsonants

Page 13: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Manner

Page 14: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

MannerofArticulation§ Inadditiontovaryingbyplace,soundsvaryby

manner

§ Stop:completeclosureofarticulators,noairescapesviamouth§ Oralstop:palateisraised(p,t,k,b,d,g)§ Nasalstop:oralclosure,butpalateislowered(m,

n,ng)

§ Fricatives:substantialclosure,turbulent:(f,v,s,z)

§ Approximants:slightclosure,sonorant:(l,r,w)

§ Vowels:noclosure,sonorant:(i,e,a)

Page 15: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SpaceofPhonemes

§ Standardinternationalphoneticalphabet(IPA)chartofconsonants

Page 16: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Vowels

Page 17: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

VowelSpace

Page 18: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Acoustics

Page 19: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

“Shejusthadababy”

§ Whatcanwelearnfromawavefile?§ Nogapsbetweenwords(!)§ Vowelsarevoiced,long,loud§ Lengthintime=lengthinspaceinwaveformpicture§ Voicing:regularpeaksinamplitude§ Whenstopsclosed:nopeaks,silence§ Peaks=voicing:.46to.58(vowel[iy],fromsecond.65to.74(vowel[ax])andsoon

§ Silenceofstopclosure(1.06to1.08forfirst[b],or1.26to1.28forsecond[b])

§ Fricativeslike[sh]:intenseirregularpattern;see.33to.46

Page 20: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Time-DomainInformation

bad

pad

spat

pat

ExamplefromLadefoged

Page 21: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SimplePeriodicWavesofSound

Time (s)0 0.02

œ0.99

0.99

0

• Y axis: Amplitude = amount of air pressure at that point in time• Zero is normal air pressure, negative is rarefaction

• X axis: Time.• Frequency = number of cycles per second.• 20 cycles in .02 seconds = 1000 cycles/second = 1000 Hz

Page 22: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

ComplexWaves:100Hz+1000Hz

Time (s)0 0.05

œ0.9654

0.99

0

Ampl

itude

Page 23: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Spectrum

100 1000Frequency in Hz

Coe

ffici

ent

Frequency components (100 and 1000 Hz) on x-axis

Page 24: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Partof[ae]waveformfrom“had”

§ Notecomplexwaverepeatingninetimesinfigure§ Plussmallerwaveswhichrepeats4timesforeverylarge

pattern§ Largewavehasfrequencyof250Hz(9timesin.036seconds)§ Smallwaveroughly4timesthis,orroughly1000Hz§ Twolittletinywavesontopofpeakof1000Hzwaves

Ampl

itude

Time

Page 25: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SpectrumofanActualSpeech

Frequency (Hz)0 5000

0

20

40

Coe

ffici

ent

Page 26: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Spectrogramsam

pl

time

slice

Frequency (Hz)0 5000

0

20

40

freq

coeff

FFT

time

ampl

Page 27: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Spectrograms

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

Fre

qu

en

cy (H

z)

05

00

0

0

20

40

time

ampl

Page 28: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Spectrogramsfre

q

time

time

ampl

Page 29: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

TypesofGraphsfre

q

time

time

ampl

ampl

time

Frequency (Hz)0 5000

0

20

40

freq

coeff

Page 30: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

BacktoSpectra§ Spectrumrepresentsthesefreqcomponents§ ComputedbyFouriertransform,algorithmwhichseparates

outeachfrequencycomponentofwave.

§ x-axisshowsfrequency,y-axisshowsmagnitude(indecibels,alogmeasureofamplitude)

§ Peaksat930Hz,1860Hz,and3020Hz.

Page 31: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Source/Filter

Page 32: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

WhythesePeaks?

§ Articulationprocess:§ Thevocalcordvibrations

createharmonics§ Themouthisanamplifier§ Dependingonshapeof

mouth,someharmonicsareamplifiedmorethanothers

Page 33: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Figures from Ratree Wayland

A3

A4

A2

C4 (middle C)

C3

F#3

F#2

Vowel[i]atincreasingpitches

Page 34: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

ResonancesoftheVocalTract

§ Thehumanvocaltractasanopentube:

§ Airinatubeofagivenlengthwilltendtovibrateatresonancefrequencyoftube.

§ Constraint:Pressuredifferentialshouldbemaximalat(closed)glottalendandminimalat(open)lipend.

Closedend Openend

Length17.5cm.

Figure from W. Barry

Page 35: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

FromSundberg

Page 36: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Computingthe3FormantsofSchwa

§ LetthelengthofthetubebeL§ F1 =c/l1 =c/(4L)=35,000/4*17.5=500Hz§ F2 =c/l2 =c/(4/3L)=3c/4L=3*35,000/4*17.5=1500Hz§ F3 =c/l3 =c/(4/5L)=5c/4L=5*35,000/4*17.5=2500Hz

§ Soweexpectaneutralvoweltohave3resonancesat500,1500,and2500Hz

§ Thesevowelresonancesarecalledformants

Page 37: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

FromMarkLiberman

Page 38: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SeeingFormants:theSpectrogram

Page 39: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

VowelSpace

Page 40: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SeeingFormants:theSpectrogram

Page 41: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

AmericanEnglishVowelSpace

FRONT BACK

HIGH

LOW

iy

ih

eh

ae aa

ao

uw

uh

ahax

ix ux

Figures from Jennifer Venditti, H. T. Bunnell

Page 42: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Spectrograms

Page 43: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

HowtoReadSpectrograms

§ [bab]:closureoflipslowersallformants:sorapidincreaseinallformantsatbeginningof"bab”

§ [dad]:firstformantincreases,butF2andF3slightfall§ [gag]:F2andF3cometogether:thisisacharacteristicof

velars.Formanttransitionstakelongerinvelarsthaninalveolars orlabials

From Ladefoged “A Course in Phonetics”

Page 44: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

“Shecamebackandstartedagain”

1.lotsofhigh-freqenergy3.closurefork4.burstofaspirationfork5.ey vowel;faint1100Hzformantisnasalization6.bilabialnasal7.shortbclosure,voicingbarelyvisible.8.ae;noteupwardtransitionsafterbilabialstopatbeginning9.noteF2andF3comingtogetherfor"k"

FromLadefoged “ACourseinPhonetics”

Page 45: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

DialectIssues

§ Speechvariesfromdialecttodialect(examplesareAmericanvs.BritishEnglish)§ Syntactic(“Icould”vs.“Icould

do”)§ Lexical(“elevator”vs.“lift”)§ Phonological§ Phonetic

§ Mismatchbetweentrainingandtestingdialectscancausealargeincreaseinerrorrate

American British

all

old

Page 46: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SpeechRecognition

Page 47: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

TheNoisyChannelModel

Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions

Language model: Distributions over sequences

of words (sentences)

Page 48: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

SpeechModel

w1 w2Words

s1 s2 s3 s4 s5 s6 s7Soundtypes

a1 a2 a3 a4 a5 a6 a7Acousticobservations

Languagemodel

Acousticmodel

Page 49: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

AcousticModel

s1 s2 s3 s4 s5 s6 s7Soundtypes

a1 a2 a3 a4 a5 a6 a7Acousticobservations

Acousticmodel

Page 50: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Frame Extraction

§ A frame (25 ms wide) extracted every 10 ms

25 ms

10ms

a1 a2 a3

Figure:SimonArnfield

Previewoffeatureextractionforeachframe:1) DFT(Spectrum)2) Log(Calibrate?)3) anotherDFT(!!??)

Page 51: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

FeatureExtraction

Page 52: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

DigitizingSpeech

Figure:BryanPellom

Page 53: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Source/Filter

§ Articulationprocess:§ Thevocalcordvibrations

createharmonics§ Themouthisanamplifier§ Dependingonshapeof

mouth,someharmonicsareamplifiedmorethanothers

Page 54: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Figures from Ratree Wayland

ProblemwithRawSpectrum

Page 55: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Deconvolution /Liftering

Page 56: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Deconvolution /Lifterings

e f

Page 57: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Deconvolution /Lifterings

e f

log

log

⇣log

⇣ ⌘

⌘+

Page 58: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Deconvolution /Liftering

GraphsfromDanEllis

s = e � f

log(s) = log(e) + log(f)

IDFT(log(s))

Page 59: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

MelFreq.Cepstral Coefficients

§ DoFFTtogetspectralinformation§ Likethespectrogramwesawearlier

§ ApplyMelscaling(New)§ Modelshumanear;moresensitivity

inlowerfreqs§ Approx linearbelow1kHz,logabove,

equalsamplesaboveandbelow1kHz

§ TakeLog§ Dodiscretecosinetransform

[Graph:Wikipedia]

Page 60: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

FinalFeatureVector

§ 39(real)featuresper10msframe:§ 12MFCCfeatures§ 12deltaMFCCfeatures§ 12delta-deltaMFCCfeatures§ 1(log)frameenergy§ 1delta(log)frameenergy§ 1delta-delta(logframeenergy)

§ Soeachframeisrepresentedbya39Dvector

Page 61: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

EmissionModel

Page 62: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

HMMsforContinuousObservations

§ Before:discretesetofobservations

§ Now:featurevectorsarereal-valued

§ Solution1:discretization§ Solution2:continuousemissions

§ Gaussians§ MultivariateGaussians§ MixturesofmultivariateGaussians

§ Astateisprogressively§ Contextindependentsubphone (~3per

phone)§ Contextdependentphone(triphones)§ StatetyingofCDphone

Page 63: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

VectorQuantization

§ Idea:discretization§ MapMFCCvectorsonto

discretesymbols§ Computeprobabilities

justbycounting

§ ThisiscalledvectorquantizationorVQ

§ NotusedforASRanymore

§ But:usefultoconsiderasastartingpoint

Page 64: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

GaussianEmissions§ VQisinsufficientfortop-

qualityASR§ Hardtocoverhigh-

dimensionalspacewithcodebook

§ Movesambiguityfromthemodeltothepreprocessing

§ Instead:assumethepossiblevaluesoftheobservationvectorsarenormallydistributed.§ Representtheobservation

likelihoodfunctionasaGaussian?

From bartus.org/akustyk

Page 65: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

GaussiansforAcousticModeling

§ P(x):

P(x)

x

P(x) is highest here at mean

P(x) is low here, far from mean

A Gaussian is parameterized by a mean and a variance:

Page 66: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

MultivariateGaussians§ Insteadofasinglemeanµ andvariances2:

§ Vectorofmeansµ andcovariancematrixS

§ Usuallyassumediagonalcovariance(!)§ Thisisn’tverytrueforFFTfeatures,butislessbadforMFCCfeatures

Page 67: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Gaussians:SizeofS

§ µ =[00] µ =[00] µ =[00]§ S =I S =0.6I S =2I§ AsS becomeslarger,Gaussianbecomesmorespreadout;asS becomessmaller,Gaussianmorecompressed

TextandfiguresfromAndrewNg

Page 68: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Gaussians:ShapeofS

§ Asweincreasetheoffdiagonalentries,morecorrelationbetweenvalueofxandvalueofy

TextandfiguresfromAndrewNg

Page 69: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Butwe’renotthereyet

§ SingleGaussiansmaydoabadjobofmodelingacomplexdistributioninanydimension

§ Evenworsefordiagonalcovariances

§ Solution:mixturesofGaussians

From openlearn.open.ac.uk

Page 70: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

MixturesofGaussians§ MixturesofGaussians:

Fromrobots.ox.ac.uk http://www.itee.uq.edu.au/~comp4702

Page 71: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

GMMs§ Summary:eachstatehasanemission

distributionP(x|s)(likelihoodfunction)parameterizedby:§ Mmixtureweights§ MmeanvectorsofdimensionalityD§ EitherM covariancematricesofDxD orM

Dx1diagonalvariancevectors

§ Likesoftvectorquantizationafterall§ Thinkofthemixturemeansasbeing

learnedcodebookentries§ ThinkoftheGaussiandensitiesasa

learnedcodebookdistancefunction§ ThinkofthemixtureofGaussianslikea

multinomialovercodes§ (EvenmoretruegivensharedGaussian

inventories,cf nextweek)

Page 72: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

StateModel

Page 73: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

StateTransitionDiagrams§ BayesNet:HMMasaGraphicalModel

§ StateTransitionDiagram:MarkovModelasaWeightedFSA

w w w

x x x

the cat chased

doghas

Page 74: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

ASRLexicon

Figure:J&M

Page 75: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

LexicalStateStructure

Figure:J&M

Page 76: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

AddinganLM

FigurefromHuangetalpage618

Page 77: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

StateSpace§ Statespacemustinclude

§ Currentword(|V|onorderof20K+)§ Indexwithincurrentword(|L|onorderof5)§ E.g.(lec[t]ure)(thoughnotinorthography!)

§ Acousticprobabilitiesonlydependonphonetype§ E.g.P(x|lec[t]ure)=P(x|t)

§ Fromastatesequence,canreadawordsequence

Page 78: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

StateRefinement

Page 79: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

PhonesAren’tHomogeneous

Page 80: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

NeedtoUseSubphones

Figure:J&M

Page 81: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

AWordwithSubphones

Figure:J&M

Page 82: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Modelingphoneticcontext

wiyriymiyniy

Page 83: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

“Need”withtriphonemodels

Figure:J&M

Page 84: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

LotsofTriphones

§ Possibletriphones:50x50x50=125,000

§ Howmanytriphonetypesactuallyoccur?

§ 20KwordWSJTask(fromBryanPellom)§ Wordinternalmodels:need14,300triphones§ Crosswordmodels:need54,400triphones

§ Needtogeneralizemodels,tietriphones

Page 85: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

StateTying/Clustering

§ [Young,Odell,Woodland1994]

§ Howdowedecidewhichtriphonestoclustertogether?

§ Usephoneticfeatures (or‘broadphoneticclasses’)§ Stop§ Nasal§ Fricative§ Sibilant§ Vowel§ lateral

Figure:J&M

Page 86: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

StateSpace§ Statespacenowincludes

§ Currentword:|W|isorder20K§ Indexincurrentword:|L|isorder5§ Subphone position:3§ E.g.(lec[t-mid]ure)

§ Acousticmodeldependsonclusteredphonecontext§ Butthisdoesn’tgrowthestatespace

§ But,addingtheLMcontextfortrigram+does§ (afterthe,lec[t-mid]ure)§ Thisisarealproblemfordecoding

Page 87: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Decoding

Page 88: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

InferenceTasks

Mostlikelywordsequence:d- ae- d

Mostlikelystatesequence:d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5

Page 89: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

ViterbiDecoding

Figure:EnriqueBenimeli

Page 90: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

ViterbiDecoding

Figure:EnriqueBenimeli

Page 91: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

EmissionCaching§ Problem:scoringalltheP(x|s)valuesistooslow§ Idea:manystatessharetiedemissionmodels,socachethem

Page 92: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

PrefixTrie Encodings§ Problem:manypartial-wordstatesareindistinguishable§ Solution:encodewordproductionasaprefixtrie (with

pushedweights)

§ AspecificinstanceofminimizingweightedFSAs[Mohri,94]Figure:Aubert,02

n i d

n i t

n o t

d

ni

t

o t

0.04

0.02

0.01

0.04

0.25

0.5

11

1

Page 93: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

BeamSearch§ Problem:trellisistoobigtocomputev(s)vectors§ Idea:moststatesareterrible,keepv(s)onlyfortopstatesat

eachtime

§ Important:stilldynamicprogramming;collapseequiv states

theb.

them.

andthen.

atthen.

theba.thebe.thebi.

thema.theme.themi.

thena.thene.theni.

theba.

thebe.

thema.

thena.

Page 94: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

LMFactoring§ Problem:Higher-ordern-gramsexplodethestatespace§ (One)Solution:

§ Factorstatespaceinto(wordindex,lmhistory)§ Scoreunigramprefixcostswhileinsideaword§ Subtractunigramcostandaddtrigramcostoncewordiscomplete

d

ni

t

o t

0.04

0.25

0.5

11

1

the

Page 95: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

LMReweighting§ Noisychannelsuggests

§ Inpractice,wanttoboostLM

§ Also,goodtohavea“wordbonus”tooffsetLMcosts

§ Thesearebothconsequencesofbrokenindependenceassumptionsinthemodel

Page 96: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n
Page 97: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Training

Page 98: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

TrainingMixtureModels§ Input:wavfileswithunalignedtranscriptions

§ Forcedalignment§ Computingthe“Viterbipath”overthetrainingdata(wherethe

transcriptionisknown)iscalled“forcedalignment”§ Weknowwhichwordstringtoassigntoeachobservationsequence.§ Wejustdon’tknowthestatesequence.§ Soweconstrainthepathtogothroughthecorrectwords(byusinga

specialexample-specificlanguagemodel)§ AndotherwiseruntheViterbialgorithm

§ Result:alignedstatesequence

Page 99: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

StateTying

§ CreatingCDphones:§ Startwithmonophone,doEM

training§ CloneGaussiansintotriphones§ Builddecisiontreeandcluster

Gaussians§ Cloneandtrainmixtures

(GMMs)

§ Generalidea:§ Introducecomplexitygradually§ Interleaveconstraintwith

flexibility

Page 100: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Standardsubphone/mixtureHMM

Temporal Structure

GaussianMixtures

Model Error rateHMM Baseline 25.1%

Page 101: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

AnInducedModel

Standard Model

Single Gaussians

Fully Connected

[Petrov, Pauls, and Klein, 07]

Page 102: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

HierarchicalSplitTrainingwithEM

32.1%

28.7%

25.6%

HMM Baseline 25.1%5 Split rounds 21.4%

23.9%

Page 103: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Refinementofthe/ih/-phone

Page 104: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Refinementofthe/ih/-phone

Page 105: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

Refinementofthe/ih/-phone

Page 106: Algorithms for NLP - Carnegie Mellon Universitytbergkir/11711fa17/FA17 11-711 lecture 6... · 2017. 9. 14. · Figure thanks to Jennifer Venditti Dental: th/dh Alveolar: t/d/s/z/l/n

0

5

10

15

20

25

30

35

ae

ao

ay

eh

er

ey

ih f r s sil

aa

ah

ix

iy z cl k sh n

vcl

ow l m t v

uw

aw

ax

ch

w

th

el

dh

uh p en

oy

hh

jh

ng y b d dx g zh

epi

HMMstatesperphone