deep learning for natural language processing the

17

Deep Learning for Natural Language Processing The Transformer model Richard Johansson [email protected]

Upload: others

Post on 05-Apr-2022

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Deep Learning for Natural Language Processing The

Deep Learning for Natural LanguageProcessing

The Transformer model

Richard Johansson

[email protected]

Page 2: Deep Learning for Natural Language Processing The

-20pt

drawbacks of recurrent models

I even with GRUs and LSTMs, it is difficult for RNNs topreserve information over long distances

I we introduced attention as a way to deal with this problemI can we skip the RNN and just use attention?

Page 3: Deep Learning for Natural Language Processing The

-20pt

attention models: recapI first, compute an “energy” ei for each state hi

I for the attention weights, we apply the softmax:

αi =exp ei∑nj=1 exp ej

I finally, the “summary” is computed as a weighted sum

s =n∑

i=1

αihi

Page 4: Deep Learning for Natural Language Processing The

-20pt

the Transformer

I the Transformer (Vaswani et al., 2017) is anarchitecture that uses attention for information flow:“Attention is all you need”

I it was originally designed for machine translationand has two parts:I an encoder that “summarizes” an input sentenceI a decoder (a conditional LM) that generates an

output, based on the input

I let’s consider the encoder

Page 5: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 6: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 7: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 8: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 9: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 10: Deep Learning for Natural Language Processing The

-20pt

illustration of a Transformer block

Page 11: Deep Learning for Natural Language Processing The

-20pt

multi-head attention

I in each layer, the Transformer applies severalattention models (“heads”) in parallel

I intuitively, the heads are “looking” for differenttypes of information

I each attention head computes a scaled dotproduct attention:

ei =1√dqi · kj

α = softmax(e)

where qi and kj are linear transformations of theinput at positions i and j

Page 12: Deep Learning for Natural Language Processing The

-20pt

a layer in the Transformer encoder

I after each application of multi-head attention, a2-layer feedforward model (with ReLU activation) isapplied

I residual connections (“shortcuts”) and layernormalization (Ba et al., 2016) added for robustnessand to facilitate training

I the Transformer encoder consists of a stack of thistype of block

Page 13: Deep Learning for Natural Language Processing The

-20pt

what do the attention heads look at?

I see (Vig, 2019)

Page 14: Deep Learning for Natural Language Processing The

-20pt

pros and cons

+ short path length for information flow– quadratic complexity

Page 15: Deep Learning for Natural Language Processing The

-20pt

the road ahead

I the full Transformer is an effective model formachine translation

I we’ll return to it when we discussencoder–decoder architectures

I for now, let’s use it simply as a pre-trainedrepresentation

Page 16: Deep Learning for Natural Language Processing The

-20pt

reading

The Illustrated Transformer

http://jalammar.github.io/illustrated-transformer/

Page 17: Deep Learning for Natural Language Processing The

-20pt

references

J. Ba, J. Kiros, and G. Hinton. 2016. Layer normalization. arXiv:1607.06450.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In NIPS 30 .

J. Vig. 2019. Visualizing attention in transformer-based languagerepresentation models. arXiv:1904.02679.

https://arxiv.org/pdf/1607.06450.pdf

http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

https://arxiv.org/pdf/1904.02679

https://arxiv.org/pdf/1904.02679

Language Processing Deep Learning Methods for Natural

Federated Data Sharing, Natural Language Processing and Deep … · 2019-05-20 · Federated Data Sharing, Natural Language Processing and Deep Phenotyping to Advance Precision Medicine

Deep Learning for Natural Language Processing · Deep Learning for Natural Language Processing Stephen Clark University of Cambridge and DeepMind. 1. Introduction to Neural ... •

Deep Learning for Natural Language Processing - Collobertronan.collobert.com/pub/matos/2009_tutorial_nips.pdf · Deep Learning for Natural Language Processing ... Learn NLP from \scratch"

CS224n: Natural Language Processing with Deep …web.stanford.edu/class/cs224n/readings/cs224n-2019-notes...cs224n: natural language processing with deep learning lecture notes: part

Recent Trends in Deep Learning Based Natural Language Processing … · Recent Trends in Deep Learning Based Natural Language Processing Tom Younga y, Devamanyu Hazarikab, Soujanya

Research on Deep Learning for Natural Language Processing at

Deep learning for natural language processing Introduction ... · Deep learning for natural language processing Introduction to natural language processing Benoit Favre

Natural Language Processing with Deep Learning …web.stanford.edu/class/cs224n/lectures/lecture2.pdf · Natural Language Processing with Deep Learning CS224N/Ling284 ... Lecture

Natural Language Processing with Deep Learning CS224N/Ling284 · 2019-01-01 · Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning ... en%%es, descrip%ve

Deep learning for natural language processing A short ... · Deep learning for Natural Language Processing Day 1 Class: intro to natural language processing Class: quick primer on

CS224D: Deep Learning for Natural Language Processing

Clinical Natural Language Processing with Deep Learning · Clinical Natural Language Processing with Deep Learning 3 senting, learning, and using linguistic, situational, world or

CS224d Deep Learning for Natural Language Processing ...cs224d.stanford.edu/lectures/CS224d-Lecture2.pdf · CS224d Deep Learning for Natural Language Processing Lecture 2: Word Vectors

CS224n: Natural Language Processing with Deep Learning

Natural Language Processing with Deep Learning CS224n

Beyond Deep Learning: Combining Neural Processing and ... · Deep Learning for Natural Language Processing • Major tools: word embedding, deep networks (recurrent neural networks,

Deep Learning for Natural Language Processing (NLP) using … · 2020. 5. 29. · Deep Learning for Natural Language Processing (NLP) using Variational Autoencoders (VAE) Master`s

Recursive Deep Learning for Natural Language Processing

CS224n: Natural Language Processing with Deep Learning …web.stanford.edu/class/cs224n/readings/cs224n-2019-notes... · 2020-01-02 · cs224n: natural language processing with deep

Recent Trends in Deep Learning Based Natural Language … · Index Terms—Natural Language Processing, Deep Learning I. INTRODUCTION Natural language processing (NLP) is a theory-motivated

Deep Learning for Natural Language Processing Introduction

Natural Language Processing with Deep Learning …

Deep Learning for Natural Language Processing: A Gentle

Deep Learning for Natural Language Processing: Word Embeddings

What is Deep Learning, Natural Language Processing and

Natural Language Processing with Deep Learningleeck/Graph_theory/9-2_Fancy...Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 9: Recap and Fancy Recurrent Neural

Deep Learning for Natural Language Processing

Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/~zabokrtsky/fel/slides/lect14-deep-learning... · Deep Learning Applications in Natural Language Processing

Deep Learning for Natural Language Processingweb.cecs.pdx.edu/.../DeepLearningLanguage.pdf · Deep Learning for Natural Language Processing . ... • Recurrent neural networks •

CS224n: Natural Language Processing with Deep Learning ...web.stanford.edu/class/cs224n/readings/cs224n-2019-notes02-wordvecs2.pdfcs224n: natural language processing with deep learning

For Natural Language Processing Deep Learningweb.engr.illinois.edu/~slazebni/spring17/lec24_nlp.pdfIntroduction to Natural Language Processing Word Representation Language Model Question

Deep Learning for Natural Language Processing - …docs.huihoo.com/.../Deep-Learning-for-Natural-Language-Processing... · Xipeng Qiu (Fudan University) Deep Learning for Natural

Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for Natural Language Processing Ling 571 January 3, 2011 Gina-Anne Levow

Deep Learning & Natural Language processing