intel nervana artificial intelligence meetup 11/30/16
Post on 16-Apr-2017
272 Views
Preview:
TRANSCRIPT
A fast, scalable deep learning platform
End-to-end speech recognition with neonAnthony Ndirango & Tyler Lee
MAKING MACHINES SMARTER.
now part of
Proprietary and confidential. Do not distribute.
Nervana Systems Proprietary Outline2Deep learning synopsisLarge vocabulary continuous speech recognitionEnd-to-end speech recognition systemsIntegrating weighted finite-state transducers for decoding
Nervana Systems Proprietary
3Back-propagationEnd-to-endResnetImageNetWord2VecRegularizationConvolutionUnrollingRNNGeneralizationhyperparametersVideo recognitiondropoutPoolingLSTMAlexNetSpeech recognitiondownload neon!https://github.com/NervanaSystems/neongit clone git@github.com:NervanaSystems/neon.git
Nervanas deep learning tutorials:https://www.nervanasys.com/deep-learning-tutorials/We are hiring!https://www.nervanasys.com/careers/
Nervana Systems Proprietary
What is deep learning?4A method for extracting features at multiple levels of abstractionFeatures are discovered from dataPerformance improves with more dataNetwork can express complex transformationsHigh degree of representational power
Nervana Systems Proprietary
5What can deep learning do today?
Nervana Systems Proprietary
6
Healthcare: Tumor detection
Automotive: Speech interfaces
Finance: Time-series search engine
Positive:Negative:
Agricultural Robotics
Oil & Gas
Positive:Negative:
Proteomics: Sequence analysis
Query:
Results:
Nervana in action
Nervana Systems Proprietary
Large Vocabulary Continuous Speech Recognition7
Nervana Systems Proprietary
State-of-the-art ASR pipeline8kwkbranfks
quick brown fox
Nervana Systems Proprietary
9
Nervana Systems Proprietary
Emphasize that we are just replicating DS2s acoustic model
Processing data using aeon10download aeon!https://github.com/NervanaSystems/aeonTrain directly from raw audio, extracting spectral features on-the-flyHandles arbitrarily large datasetsLoads data from disk to device with minimal latencyAlso supports image and video data
Nervana Systems Proprietary
11
Acoustic models in neon
complete source available athttps://github.com/NervanaSystems/deepspeech
Nervana Systems Proprietary CTC12The basic problem to be solved involves mapping a sequence of audio features to a sequence of characters, with no obvious relationship between the lengths of the sequences.CTC works around this problem by first defining a collapse function.Definition by example: Collapse(_NNN_ _EE_ _R_ _VVV_AAA_N_AAAA_) = NERVANA
Nervana Systems Proprietary
13
For each utterance, model outputs a matrix of frame-wise character probabilitiesGiven the ground truth transcript, the CTC algorithm:finds all paths which collapse onto ground truthuses the probability matrix to weight each path
Nervana Systems Proprietary Example14input audio with 5 framesground truth: CABfind all strings of length 5, including blank characters that collapse onto CAB
CTC Cost
Nervana Systems Proprietary Inference15
do an argmax for each columnconcatenate the resulting characters to obtain a stringcollapse the string to get the output
Nervana Systems Proprietary Examples of argmax-decoded outputs 16decoded outputsground truthyounited presidentiol is a lefe in surance companyunited presidential is a life insurance company
that was sertainly true last weekthat was certainly true last week
we're now ready to say we're intechnical default a spokesman saidwe're not ready to say we're in technical default a spokesman said
Nervana Systems Proprietary Decoding: from characters to language17So we have probabilities of each character at each frame. Now what?If CER (character error rate) is nearly perfect
Were pretty much set. Just use the best character at each frame.If CER is too high
We should enforce some rules from the language. E.g:All words must be validFavor likely word sequences
Nervana Systems Proprietary WFSTs efficiently enforce language constraints18Weighted finite state transducers:Automata whose state transitions map a sequence of input symbols to a sequence of output symbolsDirected graph structureStates enforce language structureTransitions choose amongst valid symbols
For an in-depth review, see Mohri, Pereira & Riley, 2008
Nervana Systems Proprietary Why do we use them?19A lot of decoding concepts map nicely to FSTs. CTC, lexicon (vocabulary) and grammar (language model) can all be easily represented.Efficient algorithms exist to combine FSTs, giving a single decoding graphDecoding graphCTCgraphLexicongraphGrammargraph
Nervana Systems Proprietary
CTC is easily implemented as an FST20
Removes repeated characters and blanks (_)
C_
A A A _
B B _Input: C _ A A A _ B B _C
C A
C A BOutput: C A B
Nervana Systems Proprietary
A vocabulary (lexicon) is also easy to implement as an FST21Maps a sequence of characters or phonemes to words
Nervana Systems Proprietary WFSTs have a few drawbacks22Less end-to-end: A large number of parameters learned completely separate from the acoustic model
Memory issues with large vocabularies or complex language modelsGraph# States# ArcsCTC3191Lexicon30,62940,516Trigram3,538,57910,213,039Composed Trigram26,817,69654,104,686
Nervana Systems Proprietary WFSTs greatly improve word error rate23ReferenceCER(no LM)WER(no LM)WER(trigram LM)WER(trigram LM w/ enhancements)Hannun, et al. (2014)10.735.814.1N/AGraves-Jaitly (2014) 9.230.1N/A8.7Hwang-Sung (2016) 10.638.48.888.1Miao et al. (2015) [Eesen]N/AN/A9.17.3Bahdanau et al. (2016)6.418.610.89.3Nervana-Speech8.6432.58.4N/A
younited presidentiol is a lefe in surance companyunited presidential is a life insurance company
that was sertainly true last weekthat was certainly true last week
we're now ready to say we're intechnical default a spokesman saidwe're not ready to say we're in technical default a spokesman said
Nervana Systems Proprietary 24Nervanas deep learning tutorials:https://www.nervanasys.com/deep-learning-tutorials/
Acoustic model source available at:https://github.com/NervanaSystems/deepspeech
Github page:https://github.com/NervanaSystems/neon
For more information, contact:info@nervanasys.comMore info
Nervana Systems Proprietary neon on
25
Nervana Systems Proprietary model zoo26github.com/NervanaSystems/ModelZoo model files, parameters
GoogLeNetAlexnetVGGDeep Residual NetbAbI Q&Aimdb Sentiment AnalysisVideo Activity DetectionDeep Reinforcement LearningLSTM Image CaptioningFast-RCNN Object LocalizationAllCNN
Nervana Systems Proprietary
dont have to make a model from scratch- many examples of pre-trained models
mention Yinyin-Fast-RCNN, Sathish C3D, babI
27THANK YOU!
QUESTIONS?
Nervana Systems Proprietary 28
Nervana Systems Proprietary Convolution012345678
0123
19253743
0134
0123
19
Each element in the output is the result of a dot product between two vectors
29inputfilteroutput
Nervana Systems Proprietary
To understand Convolution networks, we should understand convolution operation first and then see how such operation is implemented in a network structure
Convolution layer30012345678
0123
19253743
012345678
19
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
25
37
43
Nervana Systems Proprietary Bi-directional RNN (BiRNN)31
Nervana Systems Proprietary
top related