cs388: natural language processing lecture 7: wordlecture 7: word embeddings d r ce cs administrivia...
TRANSCRIPT
![Page 1: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/1.jpg)
CS388:NaturalLanguageProcessing
GregDurre8
Lecture7:WordEmbeddings
300-d
vector
spacelexical
semantics
![Page 2: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/2.jpg)
Administrivia
‣Mini1gradesouttonightortomorrow
‣ Project1dueTuesday
![Page 3: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/3.jpg)
ClarificaIon:Forward-Backward
‣ Lecture5notesupdatedwithF-BonCRFs
‣ Forward-backwardslidesshowedforward-backwardintheHMMcase(emissionscoreswereprobabiliIesP(xi|yi))
‣ ForCRFs:usetransiIon/emissionpotenIals(computedfromfeatures+weights)insteadofprobabiliIes
![Page 4: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/4.jpg)
Recall:FeedforwardNNs
V
nfeatures
dhiddenunits
dxnmatrix num_classesxdmatrix
soWmaxWf(x)
z
nonlinearity(tanh,relu,…)
g P(y
|x)
P (y|x) = softmax(Wg(V f(x)))num_classesprobs
![Page 5: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/5.jpg)
Recall:BackpropagaIon
V
dhiddenunits
soWmaxWf(x)
zg P
(y|x)
P (y|x) = softmax(Wg(V f(x)))
@L@W err(root)@z
@Verr(z)
zf(x)
![Page 6: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/6.jpg)
ThisLecture
‣WordrepresentaIons
‣ word2vec/GloVe
‣ EvaluaIngwordembeddings
‣ TrainingIps
![Page 7: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/7.jpg)
TrainingTips
![Page 8: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/8.jpg)
Batching
‣ BatchingdatagivesspeedupsduetomoreefficientmatrixoperaIons
‣ NeedtomakethecomputaIongraphprocessabatchatthesameIme
probs = ffnn.forward(input) # [batch_size, num_classes]loss = torch.sum(torch.neg(torch.log(probs)).dot(gold_label))
...
‣ Batchsizesfrom1-100oWenworkwell
def make_update(input, gold_label)
# input is [batch_size, num_feats] # gold_label is [batch_size, num_classes]
...
![Page 9: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/9.jpg)
TrainingBasics‣ Basicformula:computegradientsonbatch,usefirst-orderopImizaIonmethod(SGD,Adagrad,etc.)
‣ HowtoiniIalize?Howtoregularize?WhatopImizertouse?
‣ Thislecture:somepracIcaltricks.TakedeeplearningoropImizaIoncoursestounderstandthisfurther
![Page 10: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/10.jpg)
HowdoesiniIalizaIonaffectlearning?
V
nfeatures
dhiddenunits
dxnmatrix mxdmatrix
soWmaxWf(x)
z
nonlinearity(tanh,relu,…)
g P(y
|x)
P (y|x) = softmax(Wg(V f(x)))
‣ HowdoweiniIalizeVandW?Whatconsequencesdoesthishave?
‣ Nonconvexproblem,soiniIalizaIonma8ers!
![Page 11: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/11.jpg)
‣ Nonlinearmodel…howdoesthisaffectthings?
‣ IfcellacIvaIonsaretoolargeinabsolutevalue,gradientsaresmall
‣ ReLU:largerdynamicrange(allposiIvenumbers),butcanproducebigvalues,canbreakdownifeverythingistoonegaIve
HowdoesiniIalizaIonaffectlearning?
![Page 12: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/12.jpg)
IniIalizaIon1)Can’tusezeroesforparameterstoproducehiddenlayers:allvaluesinthathiddenlayerarealways0andhavegradientsof0,neverchange
‣ Candorandomuniform/normaliniIalizaIonwithappropriatescale
U
"�r
6
fan-in + fan-out,+
r6
fan-in + fan-out
#‣ GlorotiniIalizer:
‣Wantvarianceofinputsandgradientsforeachlayertobethesame
‣ BatchnormalizaIon(IoffeandSzegedy,2015):periodicallyshiW+rescaleeachlayertohavemean0andvariance1overabatch(usefulifnetisdeep)
2)IniIalizetoolargeandcellsaresaturated
![Page 13: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/13.jpg)
Dropout‣ ProbabilisIcallyzerooutpartsofthenetworkduringtrainingtopreventoverfiing,usewholenetworkattestIme
Srivastavaetal.(2014)
‣ Similartobenefitsofensembling:networkneedstoberobusttomissingsignals,soithasredundancy
‣ FormofstochasIcregularizaIon
‣ OnelineinPytorch/Tensorflow
![Page 14: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/14.jpg)
OpImizer‣ Adam(KingmaandBa,ICLR2015):verywidelyused.AdapIvestepsize+momentum
‣Wilsonetal.NIPS2017:adapIvemethodscanactuallyperformbadlyattestIme(Adamisinpink,SGDinblack)
‣ Onemoretrick:gradientclipping(setamaxvalueforyourgradients)
![Page 15: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/15.jpg)
WordRepresentaIons
![Page 16: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/16.jpg)
WordRepresentaIons
‣ ConInuousmodel<->expectsconInuoussemanIcsfrominput
‣ “Youshallknowawordbythecompanyitkeeps”Firth(1957)
‣ NeuralnetworksworkverywellatconInuousdata,butwordsarediscrete
slidecredit:DanKlein
![Page 17: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/17.jpg)
DiscreteWordRepresentaIons
good
enjoyablegreat
0
fishcat
‣ Brownclusters:hierarchicalagglomeraIvehardclustering(eachwordhasonecluster,notsomeposteriordistribuIonlikeinmixturemodels)
‣Maximize
‣ UsefulfeaturesfortaskslikeNER,notsuitableforNNs
dog…
isgo
0
0 1 1
11
1
1
0
0
P (wi|wi�1) = P (ci|ci�1)P (wi|ci)
Brownetal.(1992)
![Page 18: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/18.jpg)
WordEmbeddings
Bothaetal.(2017)
…
Fedraisesinterestratesinorderto…
f(x)?? emb(raises)
‣Wordembeddingsforeachwordforminput
emb(interest)
emb(rates)
previousword
currword
nextword
otherwords,feats,etc.
‣ Part-of-speechtaggingwithFFNNs
‣WhatproperIesshouldthesevectorshave?
![Page 19: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/19.jpg)
goodenjoyable
bad
dog
great
is
‣Wantavectorspacewheresimilarwordshavesimilarembeddings
themoviewasgreat
themoviewasgood
~~
WordEmbeddings
‣ Goal:comeupwithawaytoproducetheseembeddings
‣ Foreachword,want“medium”dimensionalvector(50-300dims)represenIngit
![Page 20: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/20.jpg)
word2vec/GloVe
![Page 21: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/21.jpg)
ConInuousBag-of-Words‣ Predictwordfromcontext
thedogbittheman
‣ Parameters:dx|V|(oned-lengthcontextvectorpervocword),|V|xdoutputparameters(W)
dog
the
+
sized
soWmaxMulIplybyW
goldlabel=bit,nomanuallabelingrequired!
Mikolovetal.(2013)
d-dimensional wordembeddings
P (w|w�1, w+1) = softmax (W (c(w�1) + c(w+1)))
size|V|xd
![Page 22: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/22.jpg)
Skip-Gram
thedogbittheman‣ Predictonewordofcontextfromword
bit
soWmaxMulIplybyW
gold=dog
‣ Parameters:dx|V|vectors,|V|xdoutputparameters(W)(alsousableasvectors!)
‣ Anothertrainingexample:bit->the
P (w0|w) = softmax(We(w))
Mikolovetal.(2013)
![Page 23: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/23.jpg)
HierarchicalSoWmax
‣Matmul+soWmaxover|V|isveryslowtocomputeforCBOWandSG
‣ HierarchicalsoWmax:
P (w|w�1, w+1) = softmax (W (c(w�1) + c(w+1)))
‣ StandardsoWmax:[|V|xd]xd log(|V|)dotproductsofsized,
…
…
thea
‣ Huffmanencodevocabulary,usebinaryclassifierstodecidewhichbranchtotake
|V|xdparameters Mikolovetal.(2013)
P (w0|w) = softmax(We(w))
‣ log(|V|)binarydecisions
![Page 24: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/24.jpg)
Skip-GramwithNegaIveSampling
‣ dx|V|vectors,dx|V|contextvectors(same#ofparamsasbefore)
Mikolovetal.(2013)
(bit,the)=>+1(bit,cat)=>-1
(bit,a)=>-1(bit,fish)=>-1
‣ Take(word,context)pairsandclassifythemas“real”ornot.CreaterandomnegaIveexamplesbysamplingfromunigramdistribuIon
wordsinsimilarcontextsselectforsimilarcvectors
P (y = 1|w, c) = ew·c
ew·c + 1
‣ ObjecIve=sampled
logP (y = 1|w, c) + 1
k
nX
i=1
logP (y = 0|wi, c)<latexit sha1_base64="WNhcgQmvSCUwPzR5dEpsXRMdq3w=">AAADb3icfVJNb9NAEN04fJTwlZYDhyK0oqpkqyayCxJcIlVw4Rgk0laKw2q9WTurrD+0OyaJXB/5g9z4D1z4B6wdB9G0YiRbs++9mbc7mjCXQoPn/exY3Tt3793fe9B7+Ojxk6f9/YNznRWK8THLZKYuQ6q5FCkfgwDJL3PFaRJKfhEuPtb8xTeutMjSL7DO+TShcSoiwSgYiOx3vh8HVOZzSnxbO3iIA77K7SCfC8Jt7fpukFCYh1G5qhynt9WCrQk0al0kpNSkhNd+VeGWbk52izrXWxJwYbfpXx62RW7d38GGCznc6nfS+DVsc7BbcNeuAV3z+68puNvyxlTyCOyNV4wDkeIA+ApUUqaxook2zktiCDbLALNAiXgOTi+QWYxH9nroXy1d5uATHESKstKvykXV3lwM/eqrabeVeldLIoyY9I+8gdcEvpn4bXKE2hiR/o9glrEi4SkwSbWe+F4O05IqEEzyqhcUmueULWjMJyZNacL1tGz2pcLHBpnhKFPmS80LavTfitK8Ua+T0CjrmeldrgZv4yYFRO+npUjzAnjKNkZRITFkuF4+PBOKM5Brk1CmhLkrZnNqhgRmRXtmCP7uk28m56cD/83g9PPbo7MP7Tj20CF6hWzko3foDH1CIzRGrPPLOrAOrRfW7+7z7ssu3kitTlvzDF2LrvMHMNAQkQ==</latexit>
![Page 25: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/25.jpg)
ConnecIonswithMatrixFactorizaIon
Levyetal.(2014)
‣ Skip-grammodellooksatword-wordco-occurrencesandproducestwotypesofvectors
wordpaircounts
|V|
|V| |V|
d
d
|V|
contextvecsword vecs
‣ LooksalmostlikeamatrixfactorizaIon…canweinterpretitthisway?
![Page 26: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/26.jpg)
Skip-GramasMatrixFactorizaIon
Levyetal.(2014)
|V|
|V|Mij = PMI(wi, cj)� log k
PMI(wi, cj) =P (wi, cj)
P (wi)P (cj)=
count(wi,cj)D
count(wi)D
count(cj)D
‣ IfwesamplenegaIveexamplesfromtheuniformdistribuIonoverwords
numnegaIvesamples
‣ …andit’saweightedfactorizaIonproblem(weightedbywordfreq)
Skip-gramobjecIveexactlycorrespondstofactoringthismatrix:
![Page 27: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/27.jpg)
GloVe(GlobalVectors)
Penningtonetal.(2014)
X
i,j
f(count(wi, cj))�w>
i cj + ai + bj � log count(wi, cj))�2‣ ObjecIve=
‣ Alsooperatesoncountsmatrix,weighted regressiononthelogco-occurrencematrix
‣ Constantinthedatasetsize(justneedcounts),quadraIcinvocsize
‣ Byfarthemostcommonwordvectorsusedtoday(5000+citaIons)
wordpaircounts
|V|
|V|
![Page 28: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/28.jpg)
fastText:Sub-wordEmbeddings
‣ SameasSGNS,butbreakwordsdowninton-gramswithn=3to6
Bojanowskietal.(2017)
where:3-grams:<wh,whe,her,ere,re>4-grams:<whe,wher,here,ere>,5-grams:<wher,where,here>,6-grams:<where,where>
‣ Replaceinskip-gramcomputaIonwithw · c<latexit sha1_base64="okWze5eSFLfuKNB+HrglgUZNf6E=">AAADBXicfVLLbtQwFHXCqwyPTmHZjcWoUiLCKClIZYNUwYZlkZi20mQUOZ6bjlUnsewb2lGUDRt+hQ0LEGLLP7Djb3DSDKLTiivZOj7n3nv8SpUUBsPwt+PeuHnr9p2Nu4N79x883BxuPTo0ZaU5THgpS32cMgNSFDBBgRKOlQaWpxKO0tM3rX70AbQRZfEelwpmOTspRCY4Q0slW872TsykWrAk8oxPX9EYzpUXq4VIwDNBFMQ5w0Wa1eeN7w9WueiZBLtsU+VJbZIan0VNQ3u5W3k9619umWCA603/6rgqCtr+PrVaCnit39POr1O7hdeT63YdGdjpv6YYrMqt6RmN+bxEypPhKByHXdCrIOrBiPRxkAx/xfOSVzkUyCUzZhqFCmc10yi4hGYQVwYU46fsBKYWFiwHM6u7V2zojmXmNCu1HYV1b9l/K2qWG7PMU5vZnsSsay15nTatMHs5q0WhKoSCXxhllaRY0vZL0LnQwFEuLWBcC7tXyhdMM4724wzsJUTrR74KDnfH0fPx7rsXo/3X/XVskG3yhHgkIntkn7wlB2RCuPPR+ex8db65n9wv7nf3x0Wq6/Q1j8mlcH/+AZLG7d0=</latexit>
X
g2ngrams
wg · c!
<latexit sha1_base64="W7TRZGN2482/ctHk1BCU8sCouKE=">AAADMXicfVJNb9QwEHXCV1k+uoUjF4tVpUSEVVKQ4IJUwQGORWLbSutV5HidrFXHiewJ7CrKX+LCP0FcegAhrvwJnGwW0W3FSI6e35uZN3aclFIYCMNzx712/cbNWzu3B3fu3ru/O9x7cGyKSjM+YYUs9GlCDZdC8QkIkPy01JzmieQnydmbVj/5yLURhfoAq5LPcpopkQpGwVLxnvN2n1BZLmgcecbHrzDhy9Ij5ULE3DNBFJCcwiJJ62Xj+4NNLngmhi7bVHlcm7iGp1HT4F7udl7P+hdbxhDAdtO/OmyKgra/j62WcLjS70nn16ndxuvJbbuODOznv6YQbMqtKZE8BW9tlWEiFCbAl6DzWmWa5sYaf4qtwOYFYEa0yBbgx8NROA67wJdB1IMR6uMoHn4l84JVOVfAJDVmGoUlzGqqQTDJmwGpDC8pO6MZn1qoaM7NrO7+eIP3LTPHaaHtUnaIlv23orZjmlWe2Mz21GZba8mrtGkF6ctZLVRZAVdsbZRWEkOB2+eD50JzBnJlAWVa2FkxW1BNGdhHNrCXEG0f+TI4PhhHz8YH75+PDl/317GDHqHHyEMReoEO0Tt0hCaIOZ+db85354f7xT13f7q/1qmu09c8RBfC/f0HGUoAvg==</latexit>
‣ Advantages?
![Page 29: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/29.jpg)
UsingWordEmbeddings
‣ Approach1:learnembeddingsasparametersfromyourdata
‣ Approach2:iniIalizeusingGloVe,keepfixed
‣ Approach3:iniIalizeusingGloVe,fine-tune‣ Fasterbecausenoneedtoupdatetheseparameters
‣Worksbestforsometasks
‣ OWenworkspre8ywell
![Page 30: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/30.jpg)
Preview:Context-dependentEmbeddings
Petersetal.(2018)
‣ Trainaneurallanguagemodeltopredictthenextwordgivenpreviouswordsinthesentence,useitsinternalrepresentaIonsaswordvectors
‣ Context-sensiCvewordembeddings:dependonrestofthesentence
‣ HugeimprovementsacrossnearlyallNLPtasksoverGloVe
they hit the ballsthey dance at balls
‣ Howtohandledifferentwordsenses?Onevectorforballs
![Page 31: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/31.jpg)
ComposiIonalSemanIcs
‣WhatifwewantembeddingrepresentaIonsforwholesentences?
‣ Skip-thoughtvectors(Kirosetal.,2015),similartoskip-gramgeneralizedtoasentencelevel(morelater)
‣ IsthereawaywecancomposevectorstomakesentencerepresentaIons?Summing?
‣WillreturntothisinafewweeksaswemoveontosyntaxandsemanIcs
![Page 32: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/32.jpg)
EvaluaIon
![Page 33: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/33.jpg)
EvaluaIngWordEmbeddings
‣WhatproperIesoflanguageshouldwordembeddingscapture?
goodenjoyable
bad
dog
great
is
cat
wolf
Cger
was
‣ Similarity:similarwordsareclosetoeachother
‣ Analogy:
ParisistoFranceasTokyoisto???
goodistobestassmartisto???
![Page 34: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/34.jpg)
Similarity
Levyetal.(2015)
‣ SVD=singularvaluedecomposiIononPMImatrix
‣ GloVedoesnotappeartobethebestwhenexperimentsarecarefullycontrolled,butitdependsonhyperparameters+thesedisIncIonsdon’tma8erinpracIce
![Page 35: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/35.jpg)
Analogies
queen
king
woman
man
(king-man)+woman=queen
‣Whywouldthisbe?
‣ woman-mancapturesthedifferenceinthecontextsthattheseoccurin
king+(woman-man)=queen
‣ Dominantchange:more“he”withmanand“she”withwoman—similartodifferencebetweenkingandqueen
‣ Canevaluateonthisaswell
![Page 36: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/36.jpg)
Whatcangowrongwithwordembeddings?
‣What’swrongwithlearningaword’s“meaning”fromitsusage?
‣Whatdataarewelearningfrom?
‣Whatarewegoingtolearnfromthisdata?
![Page 37: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/37.jpg)
Whatdowemeanbybias?
‣ IdenIfyshe-heaxisinwordvectorspace,projectwordsontothisaxis
Bolukbasietal.(2016)
Manzinietal.(2019)
‣ Nearestneighborof(b-a+c)
![Page 38: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/38.jpg)
Debiasing
Bolukbasietal.(2016)
‣ IdenIfygendersubspacewithgenderedwords
she
he
homemaker
woman
man
‣ Projectwordsontothissubspace
‣ SubtractthoseprojecIonsfromtheoriginalword
homemaker’
![Page 39: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/39.jpg)
HardnessofDebiasing
GonenandGoldberg(2019)
‣ NotthateffecIve…andthemaleandfemalewordsaresIllclusteredtogether
‣ Biaspervadesthewordembeddingspaceandisn’tjustalocalpropertyofafewwords
![Page 40: CS388: Natural Language Processing Lecture 7: WordLecture 7: Word Embeddings d r ce cs Administrivia ‣ Mini 1 grades out tonight or tomorrow ‣ Project 1 due Tuesday ClarificaIon:](https://reader035.vdocument.in/reader035/viewer/2022081410/609ff2730a1c523e5d60d5d1/html5/thumbnails/40.jpg)
Takeaways
‣ Lotstotunewithneuralnetworks
‣Wordvectors:learningword->contextmappingshasgivenwaytomatrixfactorizaIonapproaches(constantindatasetsize)
‣ Training:opImizer,iniIalizer,regularizaIon(dropout),…
‣ Hyperparameters:dimensionalityofwordembeddings,layers,…
‣ NextIme:RNNsandCNNs
‣ LotsofpretrainedembeddingsworkwellinpracIce,theycapturesomedesirableproperIes
‣ Evenbe8er:context-sensiIvewordembeddings(ELMo)