language processing deep learning methods for natural

Deep Learning Methods for Natural Language ProcessingGarrett HoffmanDirector of Data Science @ StockTwits

Talk Overview

https://github.com/GarrettHoffman/AI_Conf_2019_DL_4_NLP

Learning Distributed Representations of Words with Word2Vec

3

Sparse Representation

Sparse Representation Drawbacks

Sparse Representation Drawbacks

□

Distributed Representation

Word2Vec

“Distributed Representations of Words and Phrases and their Compositionality”, Mikolov et al. (2013)

https://arxiv.org/abs/1310.4546

Word2Vec - Generating Data

McCormick, C. (2016, April 19). Word2Vec Tutorial - The Skip-Gram Model.

http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

Word2Vec - Skip-gram Network Architecture



Word2Vec - Embedding Layer



Word2Vec - Skip-gram Network Architecture



Word2Vec - Output Layer



Word2Vec - Intuition

McCormick, C. (2017, January 11). Word2Vec Tutorial Part 2 - Negative Sampling.

http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

Word2Vec - Negative Sampling

McCormick, C. (2017, January 11). Word2Vec Tutorial Part 2 - Negative Sampling.

http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

https://www.tensorflow.org/tutorials/word2vec

Word2Vec - Results

https://www.tensorflow.org/tutorials/word2vec

Pre-Trained Word Embedding

https://github.com/Hironsan/awesome-embedding-models

Distributed Representations of Sentences and Documents

Doc2Vec

https://cs.stanford.edu/~quocle/paragraph_vector.pdf

Recurrent Neural Networks and their Variants

31

Sequence Models

Recurrent Neural Networks (RNNs)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/


Long Term Dependency Problem



Long Short Term Memory (LSTMs)



LSTM - Forget Gate



LSTM - Learn Gate



LSTM - Update Gate



LSTM - Output Gate



Gated Recurrent Unit (GRU)



Types of RNNs

http://karpathy.github.io/2015/05/21/rnn-effectiveness/


LSTM Network Architecture

Learning Embeddings End-to-End

Dropout

Bidirectional LSTM

http://colah.github.io/posts/2015-09-NN-Types-FP/

http://colah.github.io/posts/2015-09-NN-Types-FP/

Convolutional Neural Networks for Language Tasks

51

Computer Vision Models

Convolutional Neural Networks (CNNs)


http://colah.github.io/posts/2014-07-Conv-Nets-Modular/

http://colah.github.io/posts/2014-07-Conv-Nets-Modular/

CNNs - Convolution Function

0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 2

1 2 0

1 2 2

Input Vector Kernel / Filter


0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 0

1 0 0

0 2 0


2

Output Vector


0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 0

1 0 0

0 2 0


2 3

Output Vector


0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 0

1 0 0

0 2 0


2 3 4

Output Vector


0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 0

1 0 0

0 2 0


2 3 4 3

Output Vector


0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 0

1 0 0

0 2 0


2 3 4 3

0

Output Vector


0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 0

1 0 0

0 2 0


2 3 4 3

0 1

Output Vector


0 0 0 0 0 0

0 1 2 1 1 2

0 1 1 1 1 1

1 0 0 0 0 0

0 0 1 1 1 0

0 1 1 1 1 1

0 0 0

1 0 0

0 2 0


2 3 4 3

0 1 1 1

1 2 2 2

2 2 3 3

Output Vector

CNNs - Max Pooling Function

3

Input Vector Output Vector

2 3 4 3

0 1 1 1

1 2 2 2

2 2 3 3


3 4


2 3 4 3

0 1 1 1

1 2 2 2

2 2 3 3


3 4

2


2 3 4 3

0 1 1 1

1 2 2 2

2 2 3 3


3 4

2 3


2 3 4 3

0 1 1 1

1 2 2 2

2 2 3 3

CNN Architecture for Text

State of the Art in NLP - Generalized Language Models

70

Generalized Language Modeling

Types of RNNs



P(wn|w1,…wn−

1)

Generalized Language Modeling

Current SOTA

ULMFiT

http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html


ULMFiT - GLM Pre Training

AWD-LSTM

ULMFiT



ULMFiT - Refine GLM for Target Task

Discriminative Fine-Tuning

Slanted Triangular Learning Rates (STLR)

ULMFiT



ULMFiT - Target Task Classification Training

Concat Pooling

Gradual Unfreeze

BERT / GPT-2 - Transformer Model

Transformer Model

Attention Mechanism

http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html

http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html

Transformer Model

Attention Is All You Need

https://arxiv.org/pdf/1706.03762.pdf

Transformer Model

http://mlexplained.com/2017/12/29/attention-is-all-you-need-explained/

http://mlexplained.com/2017/12/29/attention-is-all-you-need-explained/

Practical Considerations for Modeling with Your Data

87

Practical Considerations

Thanks!Any questions?

https://github.com/GarrettHoffman/AI_Conf_2019_DL_4_NLP

http://www.oreilly.com/people/d3807-garrett-hoffman