intel nervana artificial intelligence meetup 11/30/16

A fast, scalable deep learning platform

End-to-end speech recognition with neonAnthony Ndirango & Tyler Lee

MAKING MACHINES SMARTER.

now part of

Proprietary and confidential. Do not distribute.

Nervana Systems Proprietary Outline2Deep learning synopsisLarge vocabulary continuous speech recognitionEnd-to-end speech recognition systemsIntegrating weighted finite-state transducers for decoding

Nervana Systems Proprietary

3Back-propagationEnd-to-endResnetImageNetWord2VecRegularizationConvolutionUnrollingRNNGeneralizationhyperparametersVideo recognitiondropoutPoolingLSTMAlexNetSpeech recognitiondownload neon!https://github.com/NervanaSystems/neongit clone git@github.com:NervanaSystems/neon.git

Nervanas deep learning tutorials:https://www.nervanasys.com/deep-learning-tutorials/We are hiring!https://www.nervanasys.com/careers/

What is deep learning?4A method for extracting features at multiple levels of abstractionFeatures are discovered from dataPerformance improves with more dataNetwork can express complex transformationsHigh degree of representational power

5What can deep learning do today?

Healthcare: Tumor detection

Automotive: Speech interfaces

Finance: Time-series search engine

Positive:Negative:

Agricultural Robotics

Oil & Gas

Positive:Negative:

Proteomics: Sequence analysis

Query:

Results:

Nervana in action

Large Vocabulary Continuous Speech Recognition7

State-of-the-art ASR pipeline8kwkbranfks

quick brown fox

Emphasize that we are just replicating DS2s acoustic model

Processing data using aeon10download aeon!https://github.com/NervanaSystems/aeonTrain directly from raw audio, extracting spectral features on-the-flyHandles arbitrarily large datasetsLoads data from disk to device with minimal latencyAlso supports image and video data

Acoustic models in neon

complete source available athttps://github.com/NervanaSystems/deepspeech

Nervana Systems Proprietary CTC12The basic problem to be solved involves mapping a sequence of audio features to a sequence of characters, with no obvious relationship between the lengths of the sequences.CTC works around this problem by first defining a collapse function.Definition by example: Collapse(_NNN_ _EE_ _R_ _VVV_AAA_N_AAAA_) = NERVANA

For each utterance, model outputs a matrix of frame-wise character probabilitiesGiven the ground truth transcript, the CTC algorithm:finds all paths which collapse onto ground truthuses the probability matrix to weight each path

Nervana Systems Proprietary Example14input audio with 5 framesground truth: CABfind all strings of length 5, including blank characters that collapse onto CAB

CTC Cost

Nervana Systems Proprietary Inference15

do an argmax for each columnconcatenate the resulting characters to obtain a stringcollapse the string to get the output

Nervana Systems Proprietary Examples of argmax-decoded outputs 16decoded outputsground truthyounited presidentiol is a lefe in surance companyunited presidential is a life insurance company

that was sertainly true last weekthat was certainly true last week

we're now ready to say we're intechnical default a spokesman saidwe're not ready to say we're in technical default a spokesman said

Nervana Systems Proprietary Decoding: from characters to language17So we have probabilities of each character at each frame. Now what?If CER (character error rate) is nearly perfect

Were pretty much set. Just use the best character at each frame.If CER is too high

We should enforce some rules from the language. E.g:All words must be validFavor likely word sequences

Nervana Systems Proprietary WFSTs efficiently enforce language constraints18Weighted finite state transducers:Automata whose state transitions map a sequence of input symbols to a sequence of output symbolsDirected graph structureStates enforce language structureTransitions choose amongst valid symbols

For an in-depth review, see Mohri, Pereira & Riley, 2008

Nervana Systems Proprietary Why do we use them?19A lot of decoding concepts map nicely to FSTs. CTC, lexicon (vocabulary) and grammar (language model) can all be easily represented.Efficient algorithms exist to combine FSTs, giving a single decoding graphDecoding graphCTCgraphLexicongraphGrammargraph

CTC is easily implemented as an FST20

Removes repeated characters and blanks (_)

A A A _

B B _Input: C _ A A A _ B B _C

C A BOutput: C A B

A vocabulary (lexicon) is also easy to implement as an FST21Maps a sequence of characters or phonemes to words

Nervana Systems Proprietary WFSTs have a few drawbacks22Less end-to-end: A large number of parameters learned completely separate from the acoustic model

Memory issues with large vocabularies or complex language modelsGraph# States# ArcsCTC3191Lexicon30,62940,516Trigram3,538,57910,213,039Composed Trigram26,817,69654,104,686

Nervana Systems Proprietary WFSTs greatly improve word error rate23ReferenceCER(no LM)WER(no LM)WER(trigram LM)WER(trigram LM w/ enhancements)Hannun, et al. (2014)10.735.814.1N/AGraves-Jaitly (2014) 9.230.1N/A8.7Hwang-Sung (2016) 10.638.48.888.1Miao et al. (2015) [Eesen]N/AN/A9.17.3Bahdanau et al. (2016)6.418.610.89.3Nervana-Speech8.6432.58.4N/A

younited presidentiol is a lefe in surance companyunited presidential is a life insurance company

that was sertainly true last weekthat was certainly true last week

we're now ready to say we're intechnical default a spokesman saidwe're not ready to say we're in technical default a spokesman said

Nervana Systems Proprietary 24Nervanas deep learning tutorials:https://www.nervanasys.com/deep-learning-tutorials/

Acoustic model source available at:https://github.com/NervanaSystems/deepspeech

Github page:https://github.com/NervanaSystems/neon

For more information, contact:info@nervanasys.comMore info

Nervana Systems Proprietary neon on

Nervana Systems Proprietary model zoo26github.com/NervanaSystems/ModelZoo model files, parameters

GoogLeNetAlexnetVGGDeep Residual NetbAbI Q&Aimdb Sentiment AnalysisVideo Activity DetectionDeep Reinforcement LearningLSTM Image CaptioningFast-RCNN Object LocalizationAllCNN

dont have to make a model from scratch- many examples of pre-trained models

mention Yinyin-Fast-RCNN, Sathish C3D, babI

27THANK YOU!

QUESTIONS?

Nervana Systems Proprietary 28

Nervana Systems Proprietary Convolution012345678

19253743

Each element in the output is the result of a dot product between two vectors

29inputfilteroutput

To understand Convolution networks, we should understand convolution operation first and then see how such operation is implemented in a network structure

Convolution layer30012345678

19253743

012345678

Nervana Systems Proprietary Bi-directional RNN (BiRNN)31

intel nervana artificial intelligence meetup 11/30/16

Technology

meetup/luncheon - events.static.linuxfound.org ·...

platelet structure & function dr. nervana bayoumy md, phd...

meetup -- rfid

internet of things cebu meetup : 1st meetup

talend spark meetup 03042017 - paris spark meetup

intellectual property - eng 0805 - ed. 2 manar, nervana &...

hepatobiliary schistosomiasis - semantic...

sydney ioe meetup community - 1st meetup presentation

intel edison & microsoft azure meetup

birmingham meetup

green smoothies - find meetup groups near you - meetup

flink meetup

ai – innovation, opportunity and responsibility · intel...

tecnologieemergentiper nuoviambitiapplicativi:...

coagulation mechanisms dr. nervana bayoumy associate...

meetup 5min lightning talk for meetup 2/17/2016

fort worth ibd meetup investor’s business daily meetup...

seoworkshop-meetup-mar272011 seo meetup full...

ibd meetup pikes peak meetup group colorado springs, co

the iot methodology & an introduction to the intel galileo,...