evan macmillan, gridspace // how machines enter the conversation

How Machines Enter The Conversation

Evan Macmillan CEO and Co-founder

Product & Engineering team

Downloaded the talks from Youtube

250K spoken words

DataDriven talks included in sample set

What past speakers covered

•  Data and data sources! (860 mentions / 1MM words) –  Hooray, we can “use these new data sources…”

–  Can you imagine "all these different data sources…”

•  Infrastructure for scale (723 mentions / 1MM words) –  Are you ready for ”massive scale?”

–  This is how you ”operate infrastructure…”

…And what they didn’t cover

•  Sales and marketing (<90 mentions / 1MM words) –  These terms were used 1/10th as much as data

–  Nobody was talking about competition

•  Applied machine learning (12.7 mentions / 1MM words) –  We are ”cleverly building machine learning algorithms…”

–  We need “training data to do good machine learning…”

How Machines Enter the Conversation

•  Why voice data is unique

•  Evolution of voice processing

•  Task-oriented interfaces

•  Collaborating with machines

15,942 spoken words per day

161 emailed words per day

Voice data is abundant

Just averages, some folks type and talk way more…

And it comes with lots of labels…

The closed captioning on ‘Kathy Lee Live’

Professional transcripts

These speaker ids help with contextual dictionaries

Speaker identities

Really support the recognition task

Business outcomes

The rise of voice processing

Evan Macmillan CEO and Co-founder

Abundant voice

recordings

Lots of labeled

voice data

Enterprise computing

capacity

But… hard processing pipelines!

Satellite photo of corn field Audio of corn crop report

Output 100 corns/pixel Sentences or figures

Boundaries Most corn is in the NE field Ambiguous

Signal Clouds, optical distortions Echo, background noise

Pipeline CNN, spreadsheet Transcription, NLP, ???

Analyzing corn.jpg vs. corn.wav

Voice processing recipe:

•  Transcriptions (speech -> words)

•  Natural language processing (words -> meaning)

•  Human Software Interfaces (meaning -> users)

Loud and clear progress

•  Progress on ASR –  Bell labs detects one speaker saying numbers in 1930s

–  New statistical methods in the 1980s

–  HMM -> DNN-GMM -> End-to-end DNNs (future?)

•  Progress on NLP and HCI –  Recalibrating the bar for ASR

–  Helping companies adopt and train new systems

Output representations

From Wordlens NLP visualizer

A modern ox is a couple of feet in front of the hay wall. It is cloudy. The ground is shiny grass. The huge hamburger is on the ox. An enormous gold chicken is behind the wall. feet

Single voice interaction?

…Or many voice interactions?

How Machines Enter The Conversation

Thanks!

[email protected]

evan macmillan, gridspace // how machines enter the conversation

Technology