a framework for bangla text to speech synthesis
DESCRIPTION
My conference presentation slide for my paper in 16th ICCIT conference, 2013.TRANSCRIPT
![Page 1: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/1.jpg)
A Framework for Bangla Text to Speech Synthesis
Authors
K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi
Presented By
Sanjoy Dutta
Department of Computer Science & Engineering
Khulna University of Engineering and Technology, Khulna, Bangladesh.
Authors
![Page 2: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/2.jpg)
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
2
![Page 3: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/3.jpg)
Problem Statement
•Develop a framework for Bangla Text to Speech Synthesis.
3
![Page 4: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/4.jpg)
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
4
![Page 5: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/5.jpg)
Factors for Speech Synthesis in Bangla
• Sequential flow of diphones
A diphone is a set of two adjacent phonemes where the transition between two phonemes are modelled, usually from the middle of the first phoneme to the middle of the second phoneme.
A phoneme is a sound or a group of different sounds perceived to have the same function by speakers of the language or dialect in question. Like in English for K/C phoneme: Skill, School.
• Position vs. Pronunciation
Three kinds of position occurs of consonant and vowels:
Constant Vowel(CV)
Vowel Constant(VC)
Vowel Constant Vowel(VCV)
5
![Page 6: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/6.jpg)
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
6
![Page 7: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/7.jpg)
Proposed Framework Structure and Rules
• Text Normalization:
Transforming text into a single standard form.
Used when converting text to speech, numbers, dates, acronyms, and abbreviations.
Text Normalization for Position vs. Pronunciation.
7
![Page 8: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/8.jpg)
Normalization rules for ‘ ’
8
![Page 9: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/9.jpg)
Normalization rules for ‘ - - -’
9
![Page 10: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/10.jpg)
Syllable Parser Development
10
![Page 11: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/11.jpg)
Syllable Parser In Action
11
![Page 12: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/12.jpg)
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
12
![Page 13: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/13.jpg)
Audio File Selection and Normalization
Total 39 consonants 11 vowels in Bangla
After Reduction
28 independent consonants
8 (the vowel ’ ‘ is the exception) vowel
13
![Page 14: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/14.jpg)
Audio File Selection and Normalization
Finally 224 (28*8) audio files for the syllables.
28 consonant against 5 vowels to generate
140 (28*5) diphones.
In summary, we need (9 vowels, 28
consonants, 224 syllables and 140 diphones)
401 audio files to be created.
14
![Page 15: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/15.jpg)
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
15
![Page 16: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/16.jpg)
Experimental Analysis and Results
Strategy of Analysis:
Sample Input Test: Various News Articles from News Portals
Listeners Selection: Anonymous Personals Chosen Randomly
Accuracy Analysis:
Accuracy = 𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒
16
![Page 17: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/17.jpg)
Experiment Result Listening Factors:
• Duration Synchronization and
Merging
• Numerical Value like years
Constrains in Sample 1:
, , ,
, , ,
Constrains in Sample 2:
, , , , ,
,
17
![Page 18: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/18.jpg)
Limitations and Future Works
Detect Noun and Adjective words namely
( ) Noun and
( ) Adjective
both words should follow the rule 3(a) .
But they don't follow the rule 3(a) and their pronunciation is different.
18
![Page 19: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/19.jpg)
CONCLUSION
We believe the proposed framework can be useful for Bangla TTS development to detect the Bangla words with minimum audio file requirement.
19
![Page 20: A framework for bangla text to speech synthesis](https://reader033.vdocument.in/reader033/viewer/2022052900/5562c3f1d8b42a595e8b52ed/html5/thumbnails/20.jpg)
Thank You !!!
20