“deep”&learning&demo.clab.cs.cmu.edu/nlp/f20/files/slides/26-deep... · 2020. 11. 24. · 2...

“Deep” Learning

2

natural language analyzer

Big picture: natural language analyzers

Natural language input signal: -‐  Web page -‐  Ques

3

sen

4

sen

Agenda

•  Big picture

•  Why deep learning?

•  Building blocks of a deep neural network

•  How to train deep neural networks

•  Important results

6

do

classifica

7

classifica

8

How to define f(l, d): linear models

Linear models: f(l, d) = w . g(l,d)

0 0 0 1 0 0 1 0 0 0 0 …

0.4 -‐1.2 0.2 0.2 -‐0.4 -‐1.0 5.1 1.1 2.3 0.8 -‐0.1 … Number of

9

How to define f(l, d): linear models

Linear models: f(l, d) = w . g(l,d) -‐  Easy to implement -‐  Easy to op

Agenda

•  Big picture





11

Linear models: f(l, d) = w . g(l,d) = w(l) . x(d) e.g., y1 = x1 w1,1+ x2 w2,1+ x3 w3,1+ x4 w4,1+ x5 w5,1 = w(1) . x(d)

w1,1

w5,3

Number of

12

neural network v1.0: linear model

Linear models: f(l, d) = w . g(l,d) = w(l) . x(d) e.g., y1 = x1 w1,1+ x2 w2,1+ x3 w3,1+ x4 w4,1+ x5 w5,1 = w(1) . x(d) x

W same as

y

W

=

x y

similar words s

13

neural network v2.0: representa

14

neural network v2.1: representa

15

neural network v3.0: complex func

16


17


18


19

neural network v3.5: “deeper” networks

x

W2

y

W1

h1

y = W3 h2 = W3 a2( W2 a1(W1 x) )

W3

h2

Wait but why do we need more layers?

20

neural network v3.5: “deeper” networks

x

W2

y

W1

h1

y = W3 h2 = W3 a2( W2 a1(W1 x) )

W3

h2

21

neural network v4.0: recurrent neural networks

Big idea: use hidden layers to represent sequen

22

neural network v4.0: recurrent neural networks

Figure credits: Christopher Olah

How to compute the hidden layers?

23

neural network v4.1: output sequences

Figure credits: Andrej Karpathy

24


Figure credits: Andrej Karpathy

Example: Character-‐level language models

25


Credits: Andrej Karpathy

Sample output: Copyright was the succession of independence in the slop of Syrian influence that was a famous German movement based on a more popular servicious, non-‐doctrinal and sexual power post. Many governments recognize the military housing of the [[Civil Liberaliza

26

neural network v4.2: Long-‐Short Term Memory

Figure credits: Christopher Olah hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/

LSTMs

Regular RNNs

27

neural network v4.2: Long-‐Short Term Memory

Figure credits: Christopher Olah hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/

28

neural network v4.3: bidirec

29

neural network v4.4: acen

30

neural network v5: convolu

31


32


convolu

34

neural network v5.1: recursive NNs

35

neural network v6: dropout

Agenda

•  Big picture





How to train NN models? •  argmaxl f(d, l) only tells us which label to predict. •  Supervised learning (need input/output pairs) •  Loss func

How to op

Agenda

•  Big picture





46

Major results: language modeling

Krizehvsky et al. (2012)

47

Major results: image classifica

48

Major results: ImageNet

Krizehvsky et al. (2012): posi

49

Major results: ImageNet

Krizehvsky et al. (2012): sample convolu

50

Major results: speech recogni

51

Major results: transla

52

Major results: transla

53

Major results: dependency parsing

Chen and Manning (2014)

54

Major results: dependency parsing

Dyer et al. (2015)

Important things we didn’t cover

•  Dark knowledge •  Connec

Agenda

•  Big picture





57

sen

“deep”&learning&demo.clab.cs.cmu.edu/nlp/f20/files/slides/26-deep... · 2020. 11. 24. · 2...

Documents