hello gpu: high-quality, real-time speech recognition on ... · hello gpu: high-quality, real-time...
TRANSCRIPT
![Page 1: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/1.jpg)
Hello GPU: High-Quality, Real-Time Speech
Recognition on Embedded GPUs
Kshitij Gupta
UC Davis
[/shi/ /tij/]
www.KshitijGupta.com
![Page 2: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/2.jpg)
Three Trends
![Page 3: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/3.jpg)
Trend #1:
Technology
Transistor Personal
Computing Internet Search
Wireless Comm.
Embedded Consumer
Elec.
Smart phones
•Internet (Web 2.0)
•Music
•Video
•Games
•Cell phones
•GPS navigation
Embedded/Mobile & Entertainment
Mobile + Convergence
![Page 4: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/4.jpg)
Trend #2:
User Interface
Switches Keypads Mice Scroll Wheel
Touch Gestures Speech
•Eyes free
•Hands free •Eyes free
•Eyes required
•Hands required
10 years
User Interface has proven to be a key enabler
![Page 5: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/5.jpg)
?
Trend #3(a):
Processor Architecture (Desktop)
CPU
•Scalar •SMT •Multi-core
•Fixed-function
•Semi programmable
•GPGPU
•Parallel (many-core)
•App independent (prog.)
G
P
U
CPUs doing graphics and GPUs doing GPP!
•CPU to run Aero-class graphics on Windows
•GPU evolving from “kernels” to “applications”
![Page 6: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/6.jpg)
•Parallel (many-core)
•App independent (prog.)
Trend #3(b):
Processor Architecture (Embedded)
CPU EPU
•Scalar •SMT •Multi-core
•Fixed-function
•Semi programmable
•GPGPU
Atom
Tegra
OMAP
•CPU •DSP •Si •GPU
G
P
U
•Parallel (many-core)
•App independent (prog.)
•Graphics/Visual
Computing Platforms
![Page 7: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/7.jpg)
Looking Ahead…
Mobile
+
UI
+
Parallel, programmable
![Page 8: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/8.jpg)
Introduction
Motivation
Overview & Characterization
Design Goals & Principles
Acoustic Modeling Lookahead
Future Directions
Outline
![Page 9: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/9.jpg)
Why so hard?
The Holy Grail…
accurate
real-time
continuous
naturally spoken
noisy conditions
large set of words
speaker-independent
real-time!
Hard limit: Real-time response
Soft(er) limit: Accuracy!
![Page 10: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/10.jpg)
A few examples of „continuous‟ speech
thisnewdistplaywillrecognizespeech This new display will recognize speech
This nudist play will wreck a nice beach
greytape Grey tape
Great ape
hesgone He‟s gone.
He‟s gone?
Lets not go, ummm, ok, errr, fine, lets do this! Was that a „yes‟ or a „no‟?
What‟s the context here?
![Page 11: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/11.jpg)
Variability, variability, v-a-r-i-a-b-l-i-t-y!
male female
child
10-19
20-29
30-49
40-69
70+
Cheetah
Jaguar
Panther
Tiger
Leopard
/AE/
/ER/
/HH/
/NG/
/SH/
/ZH/
Dialect
Gender
Words
Phonemes
Good speech models are BIG!
Age
southern western penn.
model
![Page 12: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/12.jpg)
Automatic Speech Recognition:
A high-level view
Training
Speech
•Text
•Action
Knowledge
Base
Decoder
Speech
•Text
•Action Decoder
Acoustics Semantic
)(*)/(maxarg WPWOPW Wbest
![Page 13: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/13.jpg)
Acoustics
ASR:
Knowledge-Base View
Language
Words W A
H N
Phonemes S0 S1 S2
One
Three
Two
Five
Three
Seven
s /s
P(1/s)
P(2/1,s)
P(3/1,s)
P(3/2,1)
P(5/3,1)
P(7/3,1)
P(s/3,2)
P(s/5,3)
P(s/7,3)
> 1M
> 100k
> 50k
4k-8k
inner-most loop
![Page 14: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/14.jpg)
ASR:
Knowledge-Base View (GMM)
Acoustics 4k-8k
2M – 80M
Mixtures 8-128
*
Dimensions 39
*
Equation 2
*
![Page 15: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/15.jpg)
ASR:
Block Diagram View
Backend
Feature
Extraction
Acoustic
Modeling
(GMM)
Phonetic
Modeling
(HMM)
Word
Modeling
(Lexicon)
Language
Modeling
(Syn/Sem)
Application
Knowledge Base
Input
SIMD-friendly Thread-friendly
w
0
w
1
w
2
w
N
w0,0
w0,3
w0,4
w1,0
w1,1
w0,0,0
w0,0,1
w0,0,3
w1,0,0
w1,0,4
w1,1,0
w1,1,4
K
AE
AE
AE
N
L IX D OW N IY AY
L AX F AO R N Y AX
N AX DX AX
M B AX
L
L Z
M D AX
N
N Z
CALEDONIA
CALIFORNIA
CAMDEN’S
CAMDEN
CAMPBELL
CAMPBELL’S
CAN
CANADA
K
AE
AE
AE
N
L IX D OW N IY AY
L AX F AO R N Y AX
N AX DX AX
M B AX
L
L Z
M D AX
N
N Z
CALEDONIA
CALIFORNIA
CAMDEN’S
CAMDEN
CAMPBELL
CAMPBELL’S
CAN
CANADA
S0 S1 S2
16 kHz 100 Hz
![Page 16: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/16.jpg)
ASR:
State-of-the-art, Today
Offload processing to the „cloud‟
Drawbacks: Latency, Accuracy, Power
NSR/DSR are the only solution today for supporting ASR on embedded devices
Backend
Feature
Extraction
Acoustic
Modeling
(GMM)
Phonetic
Modeling
(HMM)
Word
Modeling
(Lexicon)
Language
Modeling
(Syn/Sem)
Application
Knowledge Base
Input
SIMD-friendly Thread-friendly
w
0
w
1
w
2
w
N
w0,0
w0,3
w0,4
w1,0
w1,1
w0,0,0
w0,0,1
w0,0,3
w1,0,0
w1,0,4
w1,1,0
w1,1,4
K
AE
AE
AE
N
L IX D OW N IY AY
L AX F AO R N Y AX
N AX DX AX
M B AX
L
L Z
M D AX
N
N Z
CALEDONIA
CALIFORNIA
CAMDEN’S
CAMDEN
CAMPBELL
CAMPBELL’S
CAN
CANADA
K
AE
AE
AE
N
L IX D OW N IY AY
L AX F AO R N Y AX
N AX DX AX
M B AX
L
L Z
M D AX
N
N Z
CALEDONIA
CALIFORNIA
CAMDEN’S
CAMDEN
CAMPBELL
CAMPBELL’S
CAN
CANADA
S0 S1 S2
NSR DSR
ESR The Challenge
![Page 17: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/17.jpg)
Characterization of ASR algorithms
Frontend Backend
Feature Extraction Acoustic Modeling Language Modeling
Core kernels FFT, DCT GMM computation &
HMM state traversal Layered graph search
Memory
Footprint Very small ++ Medium + Very large - -
Bandwidth Low ++ Very high - - Medium +
Access
pattern N/A
Spatial locality
(for mini-datasets) +
Temporal locality
(non-sequential) +
Compute Very low ++ Very High - - Low ++
Data-structure N/A Regular:
Dense +
H. irregular:
Sparse - -
Time System < 1% 50-90% 10-50%
Bottleneck Focus of this talk
![Page 18: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/18.jpg)
Application Domains for ASR
Server Desktop Embedded*
Off-line & On-line On-line & Off-line On-line
Real-Time constraint N/A & Soft Soft Hard
Application domain
Transcription Desktop control Search
Data mining Dictation Dictation
Customer support Game consoles SMS/Chatting
Distributed Speech
Recognition
Home automation (multi-
stream) Command & Control
Data mining Automotive
Hardware
# 10s-1,000s + CPU/GPU CPU + GPU CPU + GPU + acc. Si
Compute PFLOP TFLOP GFLOP
Memory ~ (TB/PB)/s ~ GB/s ~ (GB/MB)/s
Vocabulary size 1M + ~ 50k 10+
*anything not plugged into the power socket
![Page 19: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/19.jpg)
The Challenge
Server Desktop Embedded*
Off-line & On-line On-line & Off-line On-line
Real-Time constraint N/A & Soft Soft Hard
Application domain
Transcription Desktop control Search
Data mining Dictation Dictation
Customer support Game consoles SMS/Chatting
Distributed Speech
Recognition
Home automation
(multi-stream) Command & Control
Data mining Automotive
Hardware
# 10s-1,000s + CPU/GPU CPU + GPU CPU + GPU + acc. Si
Compute PFLOP TFLOP GFLOP/MFLOP
Memory ~ (TB/PB)/s ~ GB/s ~ (GB/MB)/s
Vocabulary size 1M + ~ 50k 10+
“Desktop-class ASR on Embedded devices”
*anything not plugged into the power socket
![Page 20: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/20.jpg)
The Challenge:
Desktop v/s Embedded System Architectures
North
Bridge
CPU
GPU
Mem
Desktop System Architecture Embedded System Architecture
C
T
R
L
CPU
GPU
Mem
Processor
Cache
Memory
Vastly different architectures & constraints: Memory & Compute resources are limited
UMA
DSP
Desktop (480GTX) Embedded (9400M)
# of SMs 16 x 32 2 x 8
Compute TFLOP GFLOP
Memory ~ 100‟s of GB/s < 10 GB/s
Discrete Integrated
![Page 21: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/21.jpg)
Design Goals
Target : GeForce 9400M # of SMs: 2
Shared memory: 16kB/SM
Registers file: 8k/SM
Compute Capability 1.1
Stringent memory coalescing constraints
OpenCL-capable
Speed : Faster than real-time
Accuracy: Any optimizations should impact accuracy „marginally‟
HOW? Re-visit traditional ASR pipeline
Extract intra-module parallelism!
![Page 22: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/22.jpg)
Design Principles:
CPU v/s GPU (1)
#1
CPU: Dynamisim is fine; remove every state that is not needed
GPU: Regular structure, consistency important; extra work OK
Compute is cheap, main memory accesses are expensive
Static; memory allocation/de-allocation user-managed
#2
CPU: Branches are fine; HW support
GPU: Branches may lead to serialization
Carefully organize your data-structures
Avoid branches and reduce access to branch-able code
![Page 23: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/23.jpg)
Design Principles:
CPU v/s GPU (2)
#3
CPU: Repetitive computation over time is OK
GPU: Repetitive computation staggered over time has a
huge cost
Small/non-existant on-chip memories
Increase „arithmetic intensity‟ of computations
#4
CPU: Multiple optimization layers are fine
GPU: Hand-pick few optimizations that map well to the
arch.
![Page 24: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/24.jpg)
Task List: Brute Force
Feed-forward Loop
Feature Extraction
Compute Acoustics
Compute Phonemes
Compute Words
Compute Language
1 frame
Score
Hypothesized words
Speech input
![Page 25: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/25.jpg)
Bottleneck
Task List: Prune, prune, p-r-u-n-e
Feedback Loop
Feature Extraction
Activate Words
Activate Phonemes
Activate Acoustics
Compute Acoustics
Compute Phonemes
Compute Words
Compute Language
Generate active lists
Score
Initialization
Hypothesized words
Speech input
1 frame
![Page 26: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/26.jpg)
Active Acoustics
Frame Time
GM
M I
Ds
Memory bandwidth intensive
![Page 27: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/27.jpg)
Active Acoustics:
Observation (1)
“show locations and c-ratings for all deployed subs that were in their home ports april five"
![Page 28: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/28.jpg)
Active Acoustics:
Observation (2)
![Page 29: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/29.jpg)
Solution:
Feedback (w/ intra-module parallelism)
Feature Extraction
Activate Words
Activate Phonemes
Activate Acoustics
Compute Acoustics
Compute Phonemes
Compute Words
Compute Language
Compute Acoustics … Compute Acoustics …
Generate active lists
Score
Initialization
Hypothesized words
Speech input
N frames
![Page 30: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/30.jpg)
Acoustic Model Look-ahead:
Frame #1
Frame Chunk Time
GM
M I
Ds
![Page 31: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/31.jpg)
Acoustic Model Look-ahead:
Frame #1
Frame Chunk Time
GM
M I
Ds
![Page 32: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/32.jpg)
Acoustic Model Look-ahead:
Frame #2
Frame Chunk Time
GM
M I
Ds
![Page 33: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/33.jpg)
Acoustic Model Look-ahead:
Frame #3
Frame Chunk Time
GM
M I
Ds
![Page 34: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/34.jpg)
Acoustic Model Look-ahead:
Frame #4 (do nothing)
Frame Chunk Time
GM
M I
Ds
![Page 35: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/35.jpg)
Acoustic Model Look-ahead:
Frame #5
Frame Chunk Time
GM
M I
Ds
![Page 36: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/36.jpg)
Acoustic Model Look-ahead:
All Frames
Frame Chunk Time
GM
M I
Ds
![Page 37: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/37.jpg)
Result:
Significant savings in Memory Bandwidth
![Page 38: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/38.jpg)
Acoustic Model Look-ahead (#1)
Activate Acoustics
Compute Acoustics
Compute Phonemes
Activate Acoustics
Compute Phonemes
GMM Compute
Activate Acoustics
Compute Phonemes
in
AML
Scan & Compact
buf
GMM Compute
new
new = in AND (NOT(buf))
Activate Acoustics
Compute Phonemes
in
Scan & Compact
GMM Compute
buf = new OR (buf)
![Page 39: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/39.jpg)
Results
Chunk WER Comp.
Ovrd (%)
BW
Saved (%)
RTF 260
GTX
RTF 9400
M (ION)
1
6.86
0 0 14.38 1.50
2 3.46 43.76 20.30 2.70
4 9.76 67.46 25.34 3.27
8 20.64 79.90 32.36 3.96
360 MB
70 MB
![Page 40: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/40.jpg)
Context-Independent Acoustics
CI-
GM
M
GM
M
![Page 41: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/41.jpg)
Context-Independent Acoustics:
Lifetime
Time
CI-
GM
M I
Ds
![Page 42: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/42.jpg)
Context-Independent Acoustics:
Chunk-based processing
Time
CI-
GM
M I
Ds
![Page 43: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/43.jpg)
1
2
2
1
3
Context-Independent Acoustics:
Chunk-based processing
Time
CI-
GM
M I
Ds
![Page 44: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/44.jpg)
Context-Independent Acoustics:
Chunk-based processing
Time
GM
M I
Ds
C
I-G
MM
ID
s
![Page 45: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/45.jpg)
Acoustic Model Look-ahead (#2)
Activate Acoustics
Compute Phonemes
in
AML(b)
Compact
GMM Compute (CI)
AML(c)
Compact
buf
CI-GMM Process
GMM Comp. Back-off
new
buf
CI-phase only
at chunk
boundary
Computed
every frame
• Compute CI-GMMs
• Compute Maximum for beam pruning
• If (CI-GMM > CIGMM Threshold) {
score corresponding GMMs
}
![Page 46: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/46.jpg)
Results
Chunk CI-GMM
Thresh WER
Comp.
Saved
(%)
BW
Saved
(%)
RTF 260
GTX
RTF 9400
M (ION)
4 1 7.27 24.04 79.47 23.52 4.32
4 2 7.72 36.81 82.95 24.93 4.85
4 3 8.67 48.81 86.21 26.58 5.40
8 1 7.23 11.78 86.05 33.23 4.95
8 2 7.31 23.57 87.75 34.68 5.37
8 3 7.81 34.05 89.27 36.25 6.18
Faster than real-time; with savings in both compute & memory bandwidth
36 MB
![Page 47: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/47.jpg)
In Summary
High-end & Low-end systems vastly different in Architectures
Constraints
Re-visit traditional application pipeline
Memory is a key bottleneck Extraction of temporal locality is critical
Acoustic Modeling Look-ahead is „critical‟ in … Enabling faster than real-time performance
Saving bandwidth
Saving compute
… at a marginal loss in accuracy
![Page 48: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/48.jpg)
Future Directions
![Page 49: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/49.jpg)
We‟re just getting started…
Multi-stream Speech Recognition Home automation
Transcription Minutes of meetings
Language Translation Tour guides
Today‟s killer-app Dictation!
…
![Page 50: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/50.jpg)
The Final Frontier in Speech Recognition...
The Holy Grail
accurate
real-time
continuous
naturally spoken
noisy conditions
large set of words
speaker-independent
Using speech recognition not just for a few selective, non-critical tasks, but for all tasks, including „mission-critical‟ ones.
![Page 51: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/51.jpg)
HAL 9000
“Perfect” voice-driven interfaces are not possible with today‟s algorithms
![Page 52: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/52.jpg)
Switches Keypads Mice Scroll Wheel
Speech
Gestures
Touch
The Future: „Complimentary‟ UIs!
![Page 53: Hello GPU: High-Quality, Real-Time Speech Recognition on ... · Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs Kshitij Gupta UC Davis [/shi/ /tij/]](https://reader031.vdocument.in/reader031/viewer/2022041415/5e1b26850d85e9597406f816/html5/thumbnails/53.jpg)
Thank You