evan macmillan, gridspace // how machines enter the conversation
TRANSCRIPT
How Machines Enter The Conversation
Evan Macmillan CEO and Co-founder
Product & Engineering team
Downloaded the talks from Youtube
250K spoken words
DataDriven talks included in sample set
What past speakers covered
• Data and data sources! (860 mentions / 1MM words) – Hooray, we can “use these new data sources…”
– Can you imagine "all these different data sources…”
• Infrastructure for scale (723 mentions / 1MM words) – Are you ready for ”massive scale?”
– This is how you ”operate infrastructure…”
…And what they didn’t cover
• Sales and marketing (<90 mentions / 1MM words) – These terms were used 1/10th as much as data
– Nobody was talking about competition
• Applied machine learning (12.7 mentions / 1MM words) – We are ”cleverly building machine learning algorithms…”
– We need “training data to do good machine learning…”
How Machines Enter the Conversation
• Why voice data is unique
• Evolution of voice processing
• Task-oriented interfaces
• Collaborating with machines
15,942 spoken words per day
161 emailed words per day
Voice data is abundant
Just averages, some folks type and talk way more…
And it comes with lots of labels…
The closed captioning on ‘Kathy Lee Live’
Professional transcripts
These speaker ids help with contextual dictionaries
Speaker identities
Really support the recognition task
Business outcomes
The rise of voice processing
Evan Macmillan CEO and Co-founder
Abundant voice
recordings
Lots of labeled
voice data
Enterprise computing
capacity
But… hard processing pipelines!
Satellite photo of corn field Audio of corn crop report
Output 100 corns/pixel Sentences or figures
Boundaries Most corn is in the NE field Ambiguous
Signal Clouds, optical distortions Echo, background noise
Pipeline CNN, spreadsheet Transcription, NLP, ???
Analyzing corn.jpg vs. corn.wav
Voice processing recipe:
• Transcriptions (speech -> words)
• Natural language processing (words -> meaning)
• Human Software Interfaces (meaning -> users)
Loud and clear progress
• Progress on ASR – Bell labs detects one speaker saying numbers in 1930s
– New statistical methods in the 1980s
– HMM -> DNN-GMM -> End-to-end DNNs (future?)
• Progress on NLP and HCI – Recalibrating the bar for ASR
– Helping companies adopt and train new systems
Output representations
From Wordlens NLP visualizer
A modern ox is a couple of feet in front of the hay wall. It is cloudy. The ground is shiny grass. The huge hamburger is on the ox. An enormous gold chicken is behind the wall. feet
Single voice interaction?
…Or many voice interactions?
How Machines Enter The Conversation
Thanks!