core technologies for multimodal interactions (ii)
TRANSCRIPT
Multimodal Interaction Technology (MIT)
• OCR: Optical Character Recognition
• HWR: Handwriting Recognition
• Speech Enhancement
HWR
• Online Handwritten Character Recognition
• Offline Handwritten Character Recognition
• Online Handwritten Text Recognition
• Offline Handwritten Text Recognition
Motivation
• Requirement of WP8 HWR in China Markets – Support GB18030 (27533 China Characters) – Robust to rotational distortion
• The Status of HWR Product Team – The original team is reorganized to STC – The new team have no capability to build the engine
Project: Snake: a WP8 HWR engine
Jun Du, Qiang Huo, Kai Chen “Designing compact classifiers for rotation-free recognition of large
vocabulary online handwritten Chinese characters,” ICASSP, 2012
Training Stage
Rotation Normalization
Feature Extraction
Training Samples
LBG Clustering
SSM-MCE (Rprop)
Split VQ Compression
Building Fast Match Tree
Run-Time Resources
Split VQ Compression • Split D-dimensional prototype vectors into Q streams • Sub-vectors in different streams are quantized by VQ with different codebooks
Fast-Match Tree
Recognition Stage
Rotation Normalization
Feature Extraction
Input character
Level-1 Recognition Using Fast Match Tree Resources
Level-2 Recognition Using Candidate List Final Results
A Magic for Rotation Normalization
Starting point of each stroke
Ending point of each stroke
Averaged starting point over all strokes
Averaged ending point over all strokes
Rotation-Sensitive vs. Rotation-Free
Recognizer #1: rotation-sensitive Recognizer #2: rotation-free
Footprint: 3.4M Footprint: 8.8M
Accuracy (%
)
Accuracy (%
)
Comparison with Mango (WP7.5) System
Current System
Mango System
Vocabulary
27533 Characters
6763 Characters
Footprint
6MB+
2MB
Accuracy
94.5%
94.2%
Recognition Time
Per Character
7ms
11ms
Summary
• A Chinese HWR solution for WP8 Apollo release – Rotation-free – Fast – Compact – Accurate – Large vocabulary
Multimodal Interaction Technology (MIT)
• OCR: Optical Character Recognition
• HWR: Handwriting Recognition
• Speech Enhancement
Project: Speech Enhancement using
Deep Neural Networks
Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee, “An Experimental Study on Speech Enhancement Based on Deep Neural Networks ,” IEEE Signal Processing Letter, 2014
Speech Signal
Speech Enhancement
Utterance-based approach Based on additive noise model Wiener filtering, Spectral subtraction, Log-MMSE Musical noise after enhancement
New perspective: data-driven approach Mapping function between clean and noisy data Learning using deep neural networks (DNNs)
System Overview
Feature Extraction
Clean/Noisy Samples
DNN Training
Enhancement Stage
Noisy Samples
Feature Extraction
DNN Enhancement
Wave Reconstruction
Enhanced Speech
Training Stage
DNN Training
White Noise
Noisy speech
DNN approach Traditional approach
Machine Gun Noise
Noisy speech
DNN approach Traditional approach
Speech from Movie <Forrest Gump>
Noisy speech: I remember the bus ride on the first day of school very well
DNN approach Traditional approach