by the novel approaches team: nelson morgan, icsi hynek hermansky, ogi dan ellis, columbia kemal...
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/1.jpg)
By the Novel Approaches team:
Nelson Morgan, ICSIHynek Hermansky, OGI
Dan Ellis, ColumbiaKemal Sonmez, SRIMari Ostendorf, UW
Hervé Bourlard, IDIAP/EPFLGeorge Doddington, NA-sayer
EARS Kickoff Meeting:“Pushing the Envelope”
![Page 2: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/2.jpg)
Modern ASR SystemsModern ASR Systems
• From 50,000 ft, all ASR systems the same:
- compute local spectral envelope- determine likelihoods of speech
sounds- search for most likely HMMs
• Spectral envelope distorted by many things
- Alternatives often are bad fits to the statistical models
![Page 3: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/3.jpg)
ASR is half-deafASR is half-deaf• Phonetic classification very poor
• Success due to constraints (domain, speaker, noise-canceling mic, etc)
• These constraints can mask the underlyingweakness of the technology
![Page 4: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/4.jpg)
“Y'see, they just find out who complains
the loudest about the cooking, and he gets to be the
cook.”
- Utah Phillips
Who gets to try to fix it?Who gets to try to fix it?
![Page 5: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/5.jpg)
Rethinking Acoustic Rethinking Acoustic Processing for ASRProcessing for ASR
• Escape dependence on spectral envelope
• Use multiple front ends across time/freq
• Modify statistical models to accommodate new front ends
• Design optimal combination schemes for multiple models
![Page 6: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/6.jpg)
The Two EARS-NA The Two EARS-NA TasksTasks
• Signal processing - Replacing the spectral envelope by long-time and short-time (multirate) probabilistic functions of the spectro-temporal plane.
• Statistical Modeling: Modifying the statistical models, both to incorporate these new multirate front ends and to explicitly handle areas of missing information.
![Page 7: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/7.jpg)
time
Task 1: Pushing the Task 1: Pushing the Envelope (aside)Envelope (aside)
• Problem: Spectral envelope is a fragile information carrier
estimate of sound identity
info
rmat
ion
fusi
on
10 msOLD
PROPOSED
• Solution: Probabilities from multiple time-frequency patches
i-th estimate
up to 1s
k-th estimate
n-th estimate
estimate of sound identity
![Page 8: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/8.jpg)
Multiple time-Multiple time-frequency tradeoffsfrequency tradeoffs
• Temporal trajectories of narrow subbands
• Optimal search for more general patches
• Data-driven broad class probabilities
time
k-th estimate
n-th estimate
i-th estimate
up to 1s
![Page 9: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/9.jpg)
Pitch-related featuresPitch-related features• Current recognizers have no use for pitch• Listeners benefit from pitch• Correlogram estimates spectrum of pitch
![Page 10: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/10.jpg)
Principled multistreamPrincipled multistream
• Not just different, but useful in combination
- minimizing relative entropy between error signals
- minimizing conditional information of posterior signals
• Choosing categories for per-stream probabilistic functions (e.g., broad classes)
![Page 11: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/11.jpg)
Task 2: Beyond Task 2: Beyond Frames…Frames…
• Solution: Advanced features require advanced models, not limited by fixed-frame-rate paradigm
OLD
PROPOSED
conventional HMMshort-term features
• Problem: Features & models interact, new features may require different models
advanced features multi-rate / dynamic scale classifier
![Page 12: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/12.jpg)
Multirate ModelsMultirate Models
• Goal: Model features that span different time scales and dependence across scales/streams
advanced features multirate classifier
![Page 13: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/13.jpg)
Multirate Models (ctd)Multirate Models (ctd)• Why multirate vs. redundant features?
- Redundant features violate independence assumptions, lead to poor confidence (posterior) estimates- Redundancy adds unnecessary computation
• Important research issues:- Acoustically driven rate mixing and/or variable alignment - Discriminative learning of dependence across streams
![Page 14: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/14.jpg)
Partial information Partial information techniquestechniques
• Can integrate across unknown dimensions
• particularly simple for diagonal Gaussians
• e.g. Spectral masks: Skip missing dimensions
• Hard part is identifying the bad data
![Page 15: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/15.jpg)
Multistream statisticsMultistream statistics
• All possible combinations of individual streams
![Page 16: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/16.jpg)
Multistream statistics Multistream statistics (ctd)(ctd)
• Statistical modeling in both frequency and time: HMM2
![Page 17: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/17.jpg)
EvaluationEvaluation
• For greatest and most reliable progress, need frequent internal evaluations
• Most importantly, need to define helpful evaluation tasks – to guide the research
• Other considerations beyond the task:- definition of performance measures- choice of corpora- establishment of an evaluation process
![Page 18: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/18.jpg)
Task and corpus, initial Task and corpus, initial planplan
• Evaluation tasks – Recognition of words and syllables
• Cross-corpus testing- training on Hub 5, Macrophone
- testing on OGI numbers for quick turn- around, debugging
• Testing on Hub 5 in due course
• Rescoring SRI decoder output (N-best or lattice)
![Page 19: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/19.jpg)
Metrics and diagnosticsMetrics and diagnostics• Word and syllable error statistics
• Detection statistics and error distribution across speakers (and other conditions that are deemed to be important)
• Comparison to human performance
• Running scores on dev sets within group, held-out evals at least annually (NA-sayer wants weekly )
![Page 20: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/20.jpg)
Connection to RT evalsConnection to RT evals
• Rescore output of SRI system
• In later years work more closely with RT team to transfer most successful ideas
• Feedback from RT experience (error diagnostics) is also important
![Page 21: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/21.jpg)
Summary
• An alternative view of acoustic
processing for ASR for
features+models
• Pushing the envelope … aside
• Matching new front end
characteristics with appropriate
statistical models
• Diagnostic evaluations a key feature
![Page 22: By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d575503460f94a35756/html5/thumbnails/22.jpg)
Closing Thought
“When you come to a fork in the road,
take it.”
- Yogi Berra