nlu lecture 5: word representations and morphology · summary • deep learning is not magic and...
TRANSCRIPT
![Page 1: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/1.jpg)
NLU lecture 5: Word representations and
morphologyAdam Lopez
![Page 2: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/2.jpg)
• Essential epistemology
• Word representations and word2vec
• Word representations and compositional morphology
Reading: Mikolov et al. 2013, Luong et al. 2013
![Page 3: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/3.jpg)
Essential epistemologyExact sciences Empirical
sciences Engineering
Deals with Axioms & theorems
Facts & theories Artifacts
Truth is Forever Temporary It works
ExamplesMathematics C.S. theory F.L. theory
Physics Biology
Linguistics
Many, including applied C.S.
e.g. NLP
![Page 4: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/4.jpg)
Essential epistemologyExact sciences Empirical
sciences Engineering
Deals with Axioms & theorems
Facts & theories Artifacts
Truth is Forever Temporary It works
Examples Mathematics C.S. theory
Physics Biology
Linguistics
Many, including applied C.S.
e.g. MT
![Page 5: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/5.jpg)
Essential epistemologyExact sciences Empirical
sciences Engineering
Deals with Axioms & theorems
Facts & theories Artifacts
Truth is Forever Temporary It works
Examples Mathematics C.S. theory
Physics Biology
Linguistics
Many, including applied C.S.
e.g. MT
morphological properties of words (facts)
![Page 6: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/6.jpg)
Essential epistemologyExact sciences Empirical
sciences Engineering
Deals with Axioms & theorems
Facts & theories Artifacts
Truth is Forever Temporary It works
Examples Mathematics C.S. theory
Physics Biology
Linguistics
Many, including applied C.S.
e.g. MT
morphological properties of words (facts)
Optimality theory
![Page 7: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/7.jpg)
Essential epistemologyExact sciences Empirical
sciences Engineering
Deals with Axioms & theorems
Facts & theories Artifacts
Truth is Forever Temporary It works
Examples Mathematics C.S. theory
Physics Biology
Linguistics
Many, including applied C.S.
e.g. MT
morphological properties of words (facts)
Optimality theory
Optimality theory is
finite-state
![Page 8: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/8.jpg)
Essential epistemologyExact sciences Empirical
sciences Engineering
Deals with Axioms & theorems
Facts & theories Artifacts
Truth is Forever Temporary It works
Examples Mathematics C.S. theory
Physics Biology
Linguistics
Many, including applied C.S.
e.g. MT
morphological properties of words (facts)
Optimality theory
Optimality theory is
finite-state
We can represent
morphological properties of words with finite-state automata
![Page 9: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/9.jpg)
![Page 10: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/10.jpg)
Remember the bandwagon
![Page 11: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/11.jpg)
Word representations
![Page 12: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/12.jpg)
Feedforward model
p(e) =
|e|Y
i=1
p(ei | ei�n+1, . . . , ei�1)
p(ei | ei�n+1, . . . , ei�1) =
ei
ei�1
ei�2
ei�3
C
C
C
W V
tanhsoftmax
x=x
![Page 13: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/13.jpg)
p(e) =
|e|Y
i=1
p(ei | ei�n+1, . . . , ei�1)
p(ei | ei�n+1, . . . , ei�1) =
ei
ei�1
ei�2
ei�3
C
C
C
W V
tanhsoftmax
x=x
Every word is a vector (a one-hot vector)
The concatenation of these vectors is
an n-gram
Feedforward model
![Page 14: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/14.jpg)
p(e) =
|e|Y
i=1
p(ei | ei�n+1, . . . , ei�1)
p(ei | ei�n+1, . . . , ei�1) =
ei
ei�1
ei�2
ei�3
C
C
C
W V
tanhsoftmax
x=x
Word embeddings are vectors: continuous
representations of each word.
Feedforward model
![Page 15: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/15.jpg)
p(e) =
|e|Y
i=1
p(ei | ei�n+1, . . . , ei�1)
p(ei | ei�n+1, . . . , ei�1) =
ei
ei�1
ei�2
ei�3
C
C
C
W V
tanhsoftmax
x=x
n-grams are vectors:
continuous representations of n-grams (or, via recursion, larger
structures)
Feedforward model
![Page 16: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/16.jpg)
p(e) =
|e|Y
i=1
p(ei | ei�n+1, . . . , ei�1)
p(ei | ei�n+1, . . . , ei�1) =
ei
ei�1
ei�2
ei�3
C
C
C
W V
tanhsoftmax
x=x
a discrete probability distribution over V
outcomes is a vector: V non-negative reals
summing to 1.
Feedforward model
![Page 17: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/17.jpg)
p(e) =
|e|Y
i=1
p(ei | ei�n+1, . . . , ei�1)
p(ei | ei�n+1, . . . , ei�1) =
ei
ei�1
ei�2
ei�3
C
C
C
W V
tanhsoftmax
x=x
No matter what we do in NLP, we’ll (almost) always have words… Can we reuse these vectors?
Feedforward model
![Page 18: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/18.jpg)
Design a POS tagger using an RRNLM
![Page 19: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/19.jpg)
Design a POS tagger using an RRNLM
What are some difficulties with this?
What limitation do you have in learning a POS tagger that you don’t have when learning a LM?
![Page 20: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/20.jpg)
Design a POS tagger using an RRNLM
What are some difficulties with this?
What limitation do you have in learning a POS tagger that you don’t have when learning a LM?
One big problem: LIMITED DATA
![Page 21: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/21.jpg)
–John Rupert Firth (1957)
“You shall know a word by the company it keeps”
![Page 22: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/22.jpg)
Learning word representations using language modeling
• Idea: we’ll learn word representations using a language model, then reuse them in our POS tagger (or any other thing we predict from words).
• Problem: Bengio language model is slow. Imagine computing a softmax over 10,000 words!
![Page 23: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/23.jpg)
Continuous bag-of-words (CBOW)
![Page 24: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/24.jpg)
Skip-gram
![Page 25: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/25.jpg)
Skip-gram
![Page 26: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/26.jpg)
Learning skip-gram
![Page 27: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/27.jpg)
Learning skip-gram
![Page 28: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/28.jpg)
Word representations capture some world knowledge
![Page 29: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/29.jpg)
Continuous Word Representations
man
woman
king
queen
Semantics
walk
walks
Syntactic
read
reads
![Page 30: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/30.jpg)
Will it learn this?
![Page 31: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/31.jpg)
(Additional) limitations of word2vec
• Closed vocabulary assumption
• Cannot exploit functional relationships in learning
?
![Page 32: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/32.jpg)
Is this language?
A Lorillard spokeswoman said, “This is an old story.”
A UNK UNK said, “This is an old story.”
What our data contains:
What word2vec thinks our data contains:
![Page 33: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/33.jpg)
Is it ok to ignore words?
![Page 34: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/34.jpg)
Is it ok to ignore words?
![Page 35: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/35.jpg)
What we know about linguistic structure
Morpheme: the smallest meaningful unit of language
“loves”
root/stem: loveaffix: -s
morph. analysis: 3rd.SG.PRES
love +s
![Page 36: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/36.jpg)
What if we embed morphemes rather than words?
Basic idea: compute representation recursively
from children
Vectors in green are morpheme embeddings (parameters)Vectors in grey are computed as above (functions)
f is an activation function (e.g. tanh)
![Page 37: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/37.jpg)
Train compositional morpheme model by minimizing distance to reference vector
Target output: reference vector pr
contructed vector is pc
Minimize:
![Page 38: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/38.jpg)
Or, train in context using backpropagation
Vectors in green are morpheme embeddings (parameters)Vectors in grey are computed as above (functions)
Vectors in blue are word or n-gram embeddings (parameters)
(Basically a feedforward LM)
![Page 39: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/39.jpg)
Where do we get morphemes?
• Use an unsupervised morphological analyzer (we’ll talk about unsupervised learning later on).
• How many morphemes are there?
![Page 40: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/40.jpg)
New stems are invented every day!
fleeking, fleeked, and fleeker are all
attested…
![Page 41: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/41.jpg)
Representations learned by compositional morphology model
![Page 42: NLU lecture 5: Word representations and morphology · Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful](https://reader033.vdocument.in/reader033/viewer/2022060307/5f09e4517e708231d428ffee/html5/thumbnails/42.jpg)
Summary• Deep learning is not magic and will not solve all of your
problems, but representation learning is a very powerful idea.
• Word representations can be transferred between models.
• Word2vec trains word representations using an objective based on language modeling—so it can be trained on unlabeled data.
• Sometimes called unsupervised, but objective is supervised!
• Vocabulary is not finite.
• Compositional representations based on morphemes make our models closer to open vocabulary.