use amazon polly to create apps that talk - april 2017 aws online tech talks
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rafal Kuklinski Josiah Jordan, Steve Suhy
Amazon Text-to-Speech Amazon Rapids
04/10/2017
Use Amazon Polly to Create
Apps that Talk
What to Expect from the Session
Amazon Polly
• What is Amazon Polly?
• Polly out of the box
• How to get the most out of Polly
• Polly use cases
• Polly levels of complexity
Amazon Rapids
• What is Amazon Rapids?
• Integrating with Polly
• Best Practices
• Polly/Rapids in Action
• Lessons learned
Q & A
• A service that converts text into lifelike speech
• 47 voices, 24 languages
• You can store, replay and distribute generated
speech
What is Polly?
Polly out of the boxNatural sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processingThe system interprets common text formats such as abbreviations, numerical sequences, homographs, etc.
1 PT 8 OZ (24FL OZ) 710 mL.
St. Mary’s Church is on 226 St. Mary’s St.
Highly intelligibileA measure of how comprehensible speech is.
”Peter Piper picked a peck of pickled peppers.”
• March 27th Webinar
• Punctuation example (Commas/periods example)
• Lexicon example (First name example)
• SSML Tags (Speech rate example)
How to get the most out of Polly
Polly Use Cases
Contact Center Training materials
Education/Elearning
Content Creation
Gaming/EntertainmentInternet of Things
Simple Complex
Languages One (e.g. US English) Many (US English, Spanish…)
Voices One (e.g. Joanna) Many (Joanna, Salli, Miguel…)
Lexicons None One+ (e.g. medical terms)
SSML tags None (out-of-the-box) Many (i.e. speech optimization)
Automation No (Console - small volume) Yes (API - speech at scale)
Audio Storage No (Regenerate speech) Yes (Cache and reuse speech)
Polly Levels of Complexity
Introducing Amazon Rapids
Contact Center Training materials
Education/Elearning
Content Creation
Gaming/EntertainmentInternet of Things
Simple Complex
Languages One (e.g. US English) Many (US English, UK English…)
Voices One (e.g. Joanna) Many (Joanna, Salli, Amy…)
Lexicons None One+ (e.g. medical terms)
SSML tags None (out-of-the-box) Many (i.e. speech optimization)
Automation No (Console - small volume) Yes (API - speech at scale)
Audio Storage No (Regenerate speech) Yes (Cache and Reuse Speech)
Introducing Amazon Rapids
• Subscription based reading app for
kids 12 and under
• Original short stories, perfect for kids
on the go
What is Amazon Rapids?
Amazon Rapids Intro Video
Read Along• Introduced to help younger readers
• Complaints: “Sounds too robotic”, “My kids don’t like the computer
voice”
• Our needs:
• Scalable
• Platform agnostic
• 2-4 speakers per story, 500+ stories (and counting)
• Entertaining
How Rapids uses Polly
Implementing a UI
1. Upload manuscript to
admin tool
Automatic
conversion
process
3. Fill out story metadata
2. Add art for story
4. Assign voices to characters
5. Generate speech files
6. Proof-listen to story
6. Tweaks and customizations
Implementing a UI
• Existing tool for managing stories
• Process manuscripts, add images, assign metadata
• Goals
• Incorporate voice integration without breaking the flow
• Eliminate developer involvement
• Automate as much as possible
• Enable rapid iteration on generated speech
Empowering Content Creators
Ask yourself…
1. Who: Who is your customer?
2. What: What does your customer expect?
3. Where: Gauge Emotional vs Informational presentation
4. What: What does your target output sound like?
5. How: Develop an integration percentage breakdown
Polly/Rapids Relationship
Our goals…
1. Customer: Readers with mobile means, ages 5-12
2. Customer Expectations: Reading is both educational as well as
entertaining
3. Presentation: 70% Emotional, 30% Informational
4. Audio Example:
5. Development: 90% Polly out of the box, 10% customizations
Polly/Rapids Best Practices
• Identify ‘ideal cast’ for roles
• Written <> Verbal Presentation
• Customizations
• SSML
• Punctuation
• Variations in pronunication
• Combination phrases (“About you vs Aboutju vs Aboutchoo”)
• Phonetic Creativity
Polly/Rapids In Action
• “PreCoffee” Example• Default Justin voice
• <speak>But I thought I wouldn't like it at first,
remember? You made me try it, and now I love
it.</speak>
Voice Cast:
• Kendra (mom), Justin (Boy)
“Coffee!!!” by Carl Bowen
Polly/Rapids In Action
• “On Coffee” Example• Faster rate, Higher pitch
• <speak><prosody rate="130%"><prosody
pitch="20%">Am I talking loud? I can't even tell!
I should get dressed! I'll be right back! Bye
Mom!</prosody></prosody></speak>
Voice Cast:
• Kendra (mom), Justin (Boy)
“Coffee!!!” by Carl Bowen
Polly/Rapids In Action
• “Coffee Crash” Example• Slower Rate, Lower Pitch
• <speak><prosody rate="-40%"><prosody
pitch="-20%">Oh man.</prosody></prosody>
There was so much I wanted to do
today.</speak>
Voice Cast:
• Kendra (mom), Justin (Boy)
“Coffee!!!” by Carl Bowen
Lessons Learned
• Ease of integrating and building a UI to support
• With voice modifications we were able to support multi-character
conversations
• Proof listening worked wonders for us with static content
• We’d still be in free tier with our usage, even with hundreds of
stories.
• Contact us with any question about this webinar or Polly in general
• Contact us about Amazon Rapids
https://rapids.amazon.com/
• Introducing Amazon Polly at re:Invent 2016
https://www.youtube.com/watch?v=zjMqimHis3U&t=2s
• Other Polly webinars:
https://www.youtube.com/user/AWSwebinars/search?query=polly
• Amazon Polly/Rapids video:
https://www.youtube.com/watch?v=Q8lGMQDR_zI
Next Amazon Polly Webinar (June 19th): Title – Coming soon
Thank You!