crowd-sourcing platform for large-scale speech data...
TRANSCRIPT
![Page 1: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/1.jpg)
Crowd-sourcing platform for large-scale
speech data collection
João Freitas, António Calado, Daniela Braga, Pedro Silva,
Miguel Sales Dias
![Page 2: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/2.jpg)
Outline
• Motivation
• Crowd-sourcing
• System description
• Quiz Game and Personalized TTS
• Media and user feedback
• Results
• Conclusions and Future work
2
![Page 3: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/3.jpg)
Motivation
• ASR systems based on statistical models require vast
amounts of speech data
• Corpora are expensive
• Databases quality issues:
– bad recording conditions
– sample rates inconsistency
– inexistent transcription
– Etc.
3
![Page 4: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/4.jpg)
Previous Data Collections
4
![Page 5: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/5.jpg)
YourSpeech
• What is YourSpeech?
– Platform that aims at collecting desktop speech data at negligible
costs for any language.
– Entertainment based reward in exchange for his/her speech.
5
![Page 6: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/6.jpg)
Crowd-sourcing
• Act of outsourcing tasks to a community (crowd)
• Collaborative model
• Entity publishes a problem Crowd finds the solution
• Reward
• Task characteristics:
– Hard to automate
– Vast
– Expensive
6
![Page 7: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/7.jpg)
Crowd-sourcing examples
7
![Page 8: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/8.jpg)
Quiz Game
8
![Page 9: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/9.jpg)
Personalized TTS
9
![Page 10: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/10.jpg)
System description
10
Internet
YourSpeech Server
YourSpeech
Database
Recording
ServicesHTTPS
Handler
Handle wave files
Handle Sessions
TTS Generation Server
Recording Platform
Recording Application
ActiveX
Recording
Control
TTS Queue
![Page 11: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/11.jpg)
Personalized TTS (2)
11
![Page 12: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/12.jpg)
Media and User feedback
• Dissemination and advertisement are essential
• Positive feedback
• People in general liked the initiative
12
MSN Site National TV
Tech
maganazine
Tech blogs
![Page 13: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/13.jpg)
Results
13
Quiz Game Personalized
TTS Total
Pure Speech
(hours) 3.87 21.4 25.27
Total audio
(hours) 11.9 48 59.9
Completed
Sessions 473 94 567
Incomplete
Sessions 205 223 428
Utterances 18300 9463 27763
![Page 14: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/14.jpg)
Results (2)
14
Quiz
Game
Personalized
TTS Total
Words 2010 40119 42129
Insertions 79 46 125
Deletions 92 103 195
Substitutions 36 47 83
WER 10.3% 0.05% 1%
![Page 15: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/15.jpg)
Ongoing campaigns
• “Doar a voz”:
http://www.doaravoz.com/
• YourSpeech deployment in 10
other languages
15
![Page 16: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/16.jpg)
Future work
• Platform expansion to other languages
• Transcribe and annotate all the collected corpora
• Retrain existent acoustic models with the collected data
• Verify any changes in the ASR accuracy rate
• Increase the number of questions available in the quiz
• Improve UX
• Create content-specific games
– Focus on certain groups of words (e.g. city names, numbers,
etc.) in order to have acoustic models specialized in specific
grammar types
16
![Page 17: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/17.jpg)
Conclusions
• Crowd-sourcing can be used to expand speech
resources at negligible costs
• Motivation (reward) and good dissemination are
essential
• Media and users “snowball” effect
• Games can be used to lure users
• Personalized TTS acts as a “qui pro quo” service for
speech technology
• Positive results (1% total WER)
17
![Page 18: Crowd-sourcing platform for large-scale speech data collectiondownload.microsoft.com/download/A/0/B/A0B1A66A-5EBF-4CF3-945… · Crowd-sourcing platform for large-scale speech data](https://reader033.vdocument.in/reader033/viewer/2022050502/5f9420a07b0d831f7235ffc1/html5/thumbnails/18.jpg)
Thank you very much for your attencion!
Crowd-sourcing platform for large-
scale speech data collection
www.microsoft.com/portugal/mldc
Questions?
FALA 2010
12th November 2010, Vigo, Spain
18
PT-pt YourSpeech: www.pt.yourspeech.net