sp20 cs288 -- language models (1)acoustic confusions the station signs are in deep in english -14732...
TRANSCRIPT
![Page 1: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/1.jpg)
Language Models
Dan Klein, John DeNeroUC Berkeley
![Page 2: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/2.jpg)
Language Models
![Page 3: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/3.jpg)
Language Models
![Page 4: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/4.jpg)
Acoustic Confusions
the station signs are in deep in english -14732the stations signs are in deep in english -14735the station signs are in deep into english -14739the station 's signs are in deep in english -14740the station signs are in deep in the english -14741the station signs are indeed in english -14757the station 's signs are indeed in english -14760the station signs are indians in english -14790
![Page 5: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/5.jpg)
Noisy Channel Model: ASR§We want to predict a sentence given acoustics:
§The noisy-channel approach:
Acoustic model: score fit between sounds and words
Language model: score plausibility of word sequences
![Page 6: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/6.jpg)
Noisy Channel Model: Translation“Also knowing nothing official about, but having guessed andinferred considerable about, the powerful new mechanizedmethods in cryptography—methods which I believe succeedeven when one does not know what language has beencoded—one naturally wonders if the problem of translationcould conceivably be treated as a problem in cryptography.When I look at an article in Russian, I say: ‘This is reallywritten in English, but it has been coded in some strangesymbols. I will now proceed to decode.’ ”
Warren Weaver (1947)
![Page 7: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/7.jpg)
Perplexity§ How do we measure LM “goodness”?§ The Shannon game: predict the next word
When I eat pizza, I wipe off the _________
§ Formally: test set log likelihood
§ Perplexity: “average per word branching factor” (not per-step)
perp 𝑋, 𝜃 = exp −log 𝑃(𝑋|𝜃)
|𝑋|
grease 0.5
sauce 0.4
dust 0.05
….
mice 0.0001
….
the 1e-100
3516 wipe off the excess 1034 wipe off the dust547 wipe off the sweat518 wipe off the mouthpiece…120 wipe off the grease0 wipe off the sauce0 wipe off the mice-----------------28048 wipe off the *
log P 𝑋 𝜃 = 23∈5
log(𝑃 𝑤 𝜃 )
![Page 8: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/8.jpg)
N-Gram Models
![Page 9: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/9.jpg)
N-Gram Models§ Use chain rule to generate words left-to-right
§ Can’t condition atomically on the entire left context
§ N-gram models make a Markov assumption
P(??? | The computer I had put into the machine room on the fifth floor just)
![Page 10: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/10.jpg)
Empirical N-Grams
§ Use statistics from data (examples here from Google N-Grams)
§ This is the maximum likelihood estimate, which needs modification
198015222 the first194623024 the same168504105 the following158562063 the world…14112454 the door-----------------23135851162 the *
Trai
ning
Cou
nts
![Page 11: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/11.jpg)
Increasing N-Gram Order
§ Higher orders capture more correlations
198015222 the first194623024 the same168504105 the following158562063 the world…14112454 the door-----------------23135851162 the *
197302 close the window 191125 close the door 152500 close the gap 116451 close the thread 87298 close the deal
-----------------3785230 close the *
Bigram Model Trigram Model
P(door | the) = 0.0006 P(door | close the) = 0.05
![Page 12: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/12.jpg)
Increasing N-Gram Order
![Page 13: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/13.jpg)
What’s in an N-Gram?
§ Just about every local correlation!§ Word class restrictions: “will have been ___”
§ Morphology: “she ___”, “they ___”
§ Semantic class restrictions: “danced a ___”
§ Idioms: “add insult to ___”
§ World knowledge: “ice caps have ___”
§ Pop culture: “the empire strikes ___”
§ But not the long-distance ones§ “The computer which I had put into the machine room on the fifth floor just ___.”
![Page 14: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/14.jpg)
Linguistic Pain§ The N-Gram assumption hurts your inner linguist§ Many linguistic arguments that language isn’t regular§ Long-distance dependencies§ Recursive structure
§ At the core of the early hesitance in linguistics about statistical methods
§ Answers§ N-grams only model local correlations… but they get them all
§ As N increases, they catch even more correlations
§ N-gram models scale much more easily than combinatorially-structured LMs
§ Can build LMs from structured models, eg grammars (though people generally don’t)
![Page 15: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/15.jpg)
Structured Language Models
§ Bigram model:§ [texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house,
said, mr., gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, fifty, five, yen]
§ [outside, new, car, parking, lot, of, the, agreement, reached]§ [this, would, be, a, record, november]
§ PCFG model:§ [This, quarter, ‘s, surprisingly, independent, attack, paid, off, the, risk,
involving, IRS, leaders, and, transportation, prices, .]§ [It, could, be, announced, sometime, .]§ [Mr., Toseland, believes, the, average, defense, economy, is, drafted, from,
slightly, more, than, 12, stocks, .]
![Page 16: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/16.jpg)
N-Gram Models: Challenges
![Page 17: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/17.jpg)
Sparsity
3380 please close the door1601 please close the window1164 please close the new1159 please close the gate…0 please close the first-----------------13951 please close the *
Please close the first door on the left.
![Page 18: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/18.jpg)
Smoothing§ We often want to make estimates from sparse statistics:
§ Smoothing flattens spiky distributions so they generalize better:
§ Very important all over NLP, but easy to do badly
P(w | denied the)3 allegations2 reports1 claims1 request7 total
alle
gatio
ns
char
ges
mot
ion
bene
fits
…
alle
gatio
ns
repo
rts
clai
ms
char
ges
requ
est
mot
ion
bene
fits
…
alle
gatio
ns
repo
rts
clai
ms
requ
est
P(w | denied the)2.5 allegations1.5 reports0.5 claims0.5 request2 other7 total
![Page 19: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/19.jpg)
Back-offPlease close the first door on the left.
3380 please close the door1601 please close the window1164 please close the new1159 please close the gate…0 please close the first-----------------13951 please close the *
198015222 the first194623024 the same168504105 the following158562063 the world……-----------------23135851162 the *
197302 close the window 191125 close the door 152500 close the gap 116451 close the thread…8662 close the first-----------------3785230 close the *
0.0 0.002 0.009
Specific but Sparse Dense but General
4-Gram 3-Gram 2-Gram
![Page 20: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/20.jpg)
Discounting§ Observation: N-grams occur more in training data than they will later
§Absolute discounting: reduce counts by a small constant, redistribute “shaved” mass to a model of new events
Count in 22M Words Future c* (Next 22M)
1 0.45
2 1.25
3 2.24
4 3.23
5 4.21
Empirical Bigram Counts (Church and Gale, 91)
![Page 21: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/21.jpg)
Fertility§ Shannon game: “There was an unexpected _____”
§ Context fertility: number of distinct context types that a word occurs in§ What is the fertility of “delay”?§ What is the fertility of “Francisco”?§ Which is more likely in an arbitrary new context?
§ Kneser-Ney smoothing: new events proportional to context fertility, not frequency [Kneser & Ney, 1995]
§ Can be derived as inference in a hierarchical Pitman-Yor process [Teh, 2006]
𝑃 𝑤 ∝ | 𝑤′: 𝑐 𝑤′, 𝑤 > 0 |
delay? Francisco?
![Page 22: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/22.jpg)
Better Methods?
5.56
6.57
7.58
8.59
9.510
1 2 3 4 5 6 7 8 9 10 20n-gram order
Ent
ropy
100,000 Katz
100,000 KN
1,000,000 Katz
1,000,000 KN
10,000,000 Katz
10,000,000 KN
all Katz
all KN
![Page 23: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/23.jpg)
More Data?
[Brants et al, 2007]
![Page 24: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/24.jpg)
Storage
…searching for the best 192593searching for the right 45805searching for the cheapest 44965searching for the perfect 43959searching for the truth 23165searching for the “ 19086searching for the most 15512searching for the latest 12670searching for the next 10120searching for the lowest 10080searching for the name 8402searching for the finest 8171
…
![Page 25: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/25.jpg)
Storage
Slide: Greg Durrett
![Page 26: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/26.jpg)
Graveyard of Correlations
§ Skip-grams
§ Cluster models
§ Topic variables
§ Cache models
§ Structural zeros
§ Dependency models
§ Maximum entropy models
§ Subword models
§ …
![Page 27: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/27.jpg)
Entirely Unseen Words§ What about totally unseen words?
§ Classical real world option: systems are actually closed vocabulary§ ASR systems will only propose words that are in their pronunciation dictionary
§ MT systems will only propose words that are in their phrase tables (modulo special models for numbers, etc)
§ Classical theoretical option: build open vocabulary LMs§ Models over character sequences rather than word sequences
§ N-Grams: back-off needs to go down into a “generate new word” model
§ Typically if you need this, a high-order character model will do
§ Modern approach: syllable-sized subword units (more later)
![Page 28: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/28.jpg)
Neural LMs: Preview
![Page 29: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/29.jpg)
A GPT2 Continuation
§ The computer I had put into the machine room on the fifth floor just outside our landing was taken by a lot of people. It was going to be recovered from outside the machine room as soon as we could, but after the machine room was shut down, we had to open all of the windows and doors to save ourselves, and the computer would have been lost. Luckily we are prepared for this. We had five steel doors on each side of the landing to hold any rain and wind that might have hit, so we weren't affected by the storm.
The situation was really grim for days. A few days before, the Apache at Qush Tefah had been shot. This was the most damaged machine in the Armies Corps. The command was only meant to repair Qush
§ The computer I had put into the machine room on the fifth floor just
![Page 30: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/30.jpg)
A GPT2 Continuation
§ I shall go seek the traitor Gloucester, when he hath declared his vengeance. Cunegund Shall I back her to kill my cousin in my stead? Witbane Far first shall the brother take his Queen's soul and that in the flames. Clotilda For to slay the King of Scotland with, the blood of my cousin, lie true; And she shall have the right to my feather." Sefton leapt into the rushes, and stole along in the sunlight: the small figure flashed like lightning in the west. In the likeness of a snare he had laid to catch the rushes and made of them a snares, a road to flee from his pursuers; but he now came to an oak where the branches were wreathed in an oak-
§ I shall go seek the traitor Gloucester,
![Page 31: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/31.jpg)
Words: Clusterings and Embeddings
![Page 32: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/32.jpg)
Stuffing Words into Vector Spaces?
Cartoon: Greg Durrett
![Page 33: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/33.jpg)
Distributional Similarity§ Key idea in clustering and embedding methods: characterize a word by the
words it occurs with (cf Harris’ distributional hypothesis, 1954)
§ “You can tell a word by the company it keeps.” [Firth, 1957]
§ Harris / Chomsky divide in linguistic methodology
¨ the president said that the downturn was over ¨
presidentgovernor
saidreported
thea
Mw
context counts
![Page 34: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/34.jpg)
Clusterings
![Page 35: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/35.jpg)
Clusterings
§ Automatic (Finch and Chater 92, Shuetze 93, many others)
§ Manual (e.g. thesauri, WordNet)
![Page 36: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/36.jpg)
“Vector Space” Methods§ Treat words as points in Rn (eg
Shuetze, 93)
§ Form matrix of co-occurrence counts
§ SVD or similar to reduce rank (cfLSA)
§ Cluster projections
§ People worried about things like: log of counts, U vs US
§ This is actually more of an embedding method (but we didn’t want that in 1993)
Mw
context counts
US V
w
context counts
Cluster these 50-200 dim vectors instead.
![Page 37: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/37.jpg)
Models: Brown Clustering§ Classic model-based clustering (Brown et al, 92)§ Each word starts in its own cluster
§ Each cluster has co-occurrence stats
§ Greedily merge clusters based on a mutalmutualmutual information criterion
§ Equivalent to optimizing a class-based bigram LM bigram LM.
§ Produces a dendrogram (hierarchy) of clusters
![Page 38: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/38.jpg)
Embeddings
Most slides from Greg Durrett
![Page 39: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/39.jpg)
Embeddings
§ Embeddings map discrete words (eg|V| = 50k) to continuous vectors (eg d = 100)
§ Why do we care about embeddings?
§ Neural methods want them
§ Nuanced similarity possible; generalize across words
§ We hope embeddings will have structure that exposes word correlations (and thereby meanings)
![Page 40: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/40.jpg)
Embedding Models
§ Idea: compute a representation of each word from co-occurring words
§ We’ll build up several ideas that can be mixed-and-matched and which frequently get used in other contexts
the dog bit the man
Token-Level Type-Level
![Page 41: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/41.jpg)
word2vec: Continuous Bag-of-Words
![Page 42: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/42.jpg)
word2vec: Skip-Grams
![Page 43: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/43.jpg)
word2vec: Hierarchical Softmax
![Page 44: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/44.jpg)
word2vec: Negative Sampling
![Page 45: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/45.jpg)
fastText: Character-Level Models
![Page 46: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/46.jpg)
GloVe
§ Idea: Fit co-occurrence matrix directly (weighted least squares)
§ Type-level computations (so constant in data size)
§ Currently the most common word embedding method
Pennington et al, 2014
![Page 47: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/47.jpg)
Bottleneck vs Co-occurrence
§ Two main views of inducing word structure§ Co-occurrence: model which words occur in similar contexts
§ Bottleneck: model latent structure that mediates between words and their behaviors
§ These turn out to be closely related!
![Page 48: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/48.jpg)
Structure of Embedding Spaces
§ How can you fit 50K words into a 64-dimensional hypercube?
§ Orthogonality: Can each axis have a global “meaning” (number, gender, animacy, etc)?
§ Global structure: Can embeddings have algebraic structure (eg king – man + woman = queen)?
![Page 49: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/49.jpg)
Bias in Embeddings§ Embeddings can capture biases in the data! (Bolukbasi et al 16)
§ Debiasing methods (as in Bolukbasi et al 16) are an active area of research
![Page 50: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/50.jpg)
Debiasing?
![Page 51: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/51.jpg)
Neural Language Models
![Page 52: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/52.jpg)
Reminder: Feedforward Neural Nets
![Page 53: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/53.jpg)
A Feedforward N-Gram Model?
![Page 54: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/54.jpg)
Early Neural Language Models
Bengio et al, 03
§ Fixed-order feed-forward neural LMs
§ Eg Bengio et al, 03
§ Allow generalization across contexts in more nuanced ways than prefixing
§ Allow different kinds of pooling in different contexts
§ Much more expensive to train
![Page 55: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/55.jpg)
Using Word Embeddings?
![Page 56: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/56.jpg)
Using Word Embeddings
![Page 57: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/57.jpg)
Limitations of Fixed-Window NN LMs?
§ What have we gained over N-Grams LMs?
§ What have we lost?
§ What have we not changed?
![Page 58: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/58.jpg)
Recurrent NNs
Slides from Greg Durrett / UT Austin , Abigail See / Stanford
![Page 59: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/59.jpg)
RNNs
![Page 60: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/60.jpg)
General RNN Approach
![Page 61: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/61.jpg)
RNN Uses
![Page 62: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/62.jpg)
Basic RNNs
![Page 63: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/63.jpg)
Training RNNs
![Page 64: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/64.jpg)
Problem: Vanishing Gradients
§ Contribution of earlier inputs decreases if matrices are contractive (first eigenvalue < 1), non-linearities are squashing, etc
§ Gradients can be viewed as a measure of the effect of the past on the future
§ That’s a problem for optimization but also means that information naturally decays quickly, so model will tend to capture local information
Next slides adapted from Abigail See / Stanford
![Page 65: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/65.jpg)
Core Issue: Information Decay
![Page 66: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/66.jpg)
Problem: Exploding Gradients
§ Gradients can also be too large
§ Leads to overshooting / jumping around the parameter space
§ Common solution: gradient clipping
![Page 67: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/67.jpg)
Key Idea: Propagated State
§ Information decays in RNNs because it gets multiplied each time step
§ Idea: have a channel called the cell state that by default just gets propagated (the “conveyer belt”)
§ Gates make explicit decisions about what to add / forget from this channel
Image: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Cell State Gating
![Page 68: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/68.jpg)
RNNs
![Page 69: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/69.jpg)
LSTMs
![Page 70: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/70.jpg)
LSTMs
![Page 71: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/71.jpg)
LSTMs
![Page 72: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/72.jpg)
What about the Gradients?
![Page 73: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/73.jpg)
Gated Recurrent Units (GRUs)
![Page 74: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/74.jpg)
Uses of RNNs
Slides from Greg Durrett / UT Austin
![Page 75: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/75.jpg)
Reminder: Tasks for RNNs§ Sentence Classification (eg Sentiment Analysis)
§ Transduction (eg Part-of-Speech Tagging, NER)
§ Encoder/Decoder (eg Machine Translation)
![Page 76: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/76.jpg)
Encoder / Decoder Preview
![Page 77: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/77.jpg)
Multilayer and Bidirectional RNNs
![Page 78: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/78.jpg)
Training for Sentential Tasks
![Page 79: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/79.jpg)
Training for Transduction Tasks
![Page 80: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/80.jpg)
Example Sentential Task: NL Inference
![Page 81: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/81.jpg)
SNLI Dataset
![Page 82: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/82.jpg)
Visualizing RNNs
Slides from Greg Durrett / UT Austin
![Page 83: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/83.jpg)
LSTMs Can Model Length
![Page 84: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/84.jpg)
LSTMs Can Model Long-Term Bits
![Page 85: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/85.jpg)
LSTMs Can Model Stack Depth
![Page 86: SP20 CS288 -- Language Models (1)Acoustic Confusions the station signs are in deep in english -14732 the stations signs are in deep in english -14735 the station signs are in deep](https://reader034.vdocument.in/reader034/viewer/2022042201/5ea13a1498b0a57ea568c990/html5/thumbnails/86.jpg)
LSTMs Can Be Completely Inscrutable