sémantique distributionnelle, embeddings (et dong ...felipe/ift6285-automne2018/transp/di… ·...
TRANSCRIPT
BD Deep Eval
Semantique distributionnelle embeddings (et dong)
felipeiroumontrealca
RALIDept Informatique et Recherche Operationnelle
Universite de Montreal
V01 Last compiled 24 novembre 2018
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
I If A and B have almost identical environments we say thatthey are synonyms (Harris 1954)
I you shall know a word by the company it keeps (Firth 1957)I words which are similar in meaning occur in similar contexts
(Rubenstein amp Goodenough 1965)I In other words difference of meaning correlates with
difference of distribution (Harris 1970 p786)I words with similar meanings will occur with similar neighbors if
enough text material is available (Schutze amp Pedersen 1995)I a representation that captures much of how words are used in
natural context will capture much of what we mean by meaning(Landauer amp Dumais 1997)
I in the proposed model it will so generalize because ldquosimilarrdquowords are expected to have a similar feature vector andbecause the probability function is a smooth function of thesefeature values a small change in the features will induce a smallchange in the probability (Bengio et al 2003)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Modele vectoriel (Vector Space model)
I lire [Turney and Pantel 2010] pour une introduction
I lire [Baroni and Lenci 2010] pour une generalisation (tenseur)
1 une matrice de ldquocomptesrdquo de co-occurences2 un schema de ponderation (PMI LLR etc)3 une politique de reduction de dimensionnalite
singular value decomposition [Golub and Van Loan 1996]non-negative matrix factorization [Lee and Seung 1999]aucune (tres bon baseline)etc
I DISSECT offre les etapes 2 et 3
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesdocument
I similarite de documents
I hypothese bag of word si une requete et un document ont desrepresentations (colonnes) similaires alors ils vehiculent lameme information [Salton 1975]
I implemente (par exemple) dans Lucene
Pris de [Jurafsky and Martin 2015]felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesterme
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
I If A and B have almost identical environments we say thatthey are synonyms (Harris 1954)
I you shall know a word by the company it keeps (Firth 1957)I words which are similar in meaning occur in similar contexts
(Rubenstein amp Goodenough 1965)I In other words difference of meaning correlates with
difference of distribution (Harris 1970 p786)I words with similar meanings will occur with similar neighbors if
enough text material is available (Schutze amp Pedersen 1995)I a representation that captures much of how words are used in
natural context will capture much of what we mean by meaning(Landauer amp Dumais 1997)
I in the proposed model it will so generalize because ldquosimilarrdquowords are expected to have a similar feature vector andbecause the probability function is a smooth function of thesefeature values a small change in the features will induce a smallchange in the probability (Bengio et al 2003)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Modele vectoriel (Vector Space model)
I lire [Turney and Pantel 2010] pour une introduction
I lire [Baroni and Lenci 2010] pour une generalisation (tenseur)
1 une matrice de ldquocomptesrdquo de co-occurences2 un schema de ponderation (PMI LLR etc)3 une politique de reduction de dimensionnalite
singular value decomposition [Golub and Van Loan 1996]non-negative matrix factorization [Lee and Seung 1999]aucune (tres bon baseline)etc
I DISSECT offre les etapes 2 et 3
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesdocument
I similarite de documents
I hypothese bag of word si une requete et un document ont desrepresentations (colonnes) similaires alors ils vehiculent lameme information [Salton 1975]
I implemente (par exemple) dans Lucene
Pris de [Jurafsky and Martin 2015]felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesterme
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
I If A and B have almost identical environments we say thatthey are synonyms (Harris 1954)
I you shall know a word by the company it keeps (Firth 1957)I words which are similar in meaning occur in similar contexts
(Rubenstein amp Goodenough 1965)I In other words difference of meaning correlates with
difference of distribution (Harris 1970 p786)I words with similar meanings will occur with similar neighbors if
enough text material is available (Schutze amp Pedersen 1995)I a representation that captures much of how words are used in
natural context will capture much of what we mean by meaning(Landauer amp Dumais 1997)
I in the proposed model it will so generalize because ldquosimilarrdquowords are expected to have a similar feature vector andbecause the probability function is a smooth function of thesefeature values a small change in the features will induce a smallchange in the probability (Bengio et al 2003)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Modele vectoriel (Vector Space model)
I lire [Turney and Pantel 2010] pour une introduction
I lire [Baroni and Lenci 2010] pour une generalisation (tenseur)
1 une matrice de ldquocomptesrdquo de co-occurences2 un schema de ponderation (PMI LLR etc)3 une politique de reduction de dimensionnalite
singular value decomposition [Golub and Van Loan 1996]non-negative matrix factorization [Lee and Seung 1999]aucune (tres bon baseline)etc
I DISSECT offre les etapes 2 et 3
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesdocument
I similarite de documents
I hypothese bag of word si une requete et un document ont desrepresentations (colonnes) similaires alors ils vehiculent lameme information [Salton 1975]
I implemente (par exemple) dans Lucene
Pris de [Jurafsky and Martin 2015]felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesterme
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
I If A and B have almost identical environments we say thatthey are synonyms (Harris 1954)
I you shall know a word by the company it keeps (Firth 1957)I words which are similar in meaning occur in similar contexts
(Rubenstein amp Goodenough 1965)I In other words difference of meaning correlates with
difference of distribution (Harris 1970 p786)I words with similar meanings will occur with similar neighbors if
enough text material is available (Schutze amp Pedersen 1995)I a representation that captures much of how words are used in
natural context will capture much of what we mean by meaning(Landauer amp Dumais 1997)
I in the proposed model it will so generalize because ldquosimilarrdquowords are expected to have a similar feature vector andbecause the probability function is a smooth function of thesefeature values a small change in the features will induce a smallchange in the probability (Bengio et al 2003)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Modele vectoriel (Vector Space model)
I lire [Turney and Pantel 2010] pour une introduction
I lire [Baroni and Lenci 2010] pour une generalisation (tenseur)
1 une matrice de ldquocomptesrdquo de co-occurences2 un schema de ponderation (PMI LLR etc)3 une politique de reduction de dimensionnalite
singular value decomposition [Golub and Van Loan 1996]non-negative matrix factorization [Lee and Seung 1999]aucune (tres bon baseline)etc
I DISSECT offre les etapes 2 et 3
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesdocument
I similarite de documents
I hypothese bag of word si une requete et un document ont desrepresentations (colonnes) similaires alors ils vehiculent lameme information [Salton 1975]
I implemente (par exemple) dans Lucene
Pris de [Jurafsky and Martin 2015]felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesterme
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Modele vectoriel (Vector Space model)
I lire [Turney and Pantel 2010] pour une introduction
I lire [Baroni and Lenci 2010] pour une generalisation (tenseur)
1 une matrice de ldquocomptesrdquo de co-occurences2 un schema de ponderation (PMI LLR etc)3 une politique de reduction de dimensionnalite
singular value decomposition [Golub and Van Loan 1996]non-negative matrix factorization [Lee and Seung 1999]aucune (tres bon baseline)etc
I DISSECT offre les etapes 2 et 3
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesdocument
I similarite de documents
I hypothese bag of word si une requete et un document ont desrepresentations (colonnes) similaires alors ils vehiculent lameme information [Salton 1975]
I implemente (par exemple) dans Lucene
Pris de [Jurafsky and Martin 2015]felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesterme
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
matrice de co-occurence termetimesdocument
I similarite de documents
I hypothese bag of word si une requete et un document ont desrepresentations (colonnes) similaires alors ils vehiculent lameme information [Salton 1975]
I implemente (par exemple) dans Lucene
Pris de [Jurafsky and Martin 2015]felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesterme
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
matrice de co-occurence termetimesterme
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
matrice de co-occurence termetimesrel
I similarite de termes
I hypothese distributionnelle si deux termes ont desrepresentations (lignes) similaires alors il sont similaires
a
a Pris de [Jurafsky and Martin 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
matrice de co-occurence termestimespatron
I similarite de relations
I hypothese si deux paires de mots ont des representations(lignes) similaires alors elles sont similaires X of Y Y of X X forY Y for X X to Y et Y to X
I une liste de 64 mots comme of for ou toI formant 128 patrons (colonnes) contenant la paire (XY)
[Turney 2005]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Au menu
I un modele vedette Word2Vec [Mikolov et al 2013a]
I proprietes des embeddings [Mikolov et al 2013d Mikolov et al 2013c]
I des resultats glory [Baroni et al 2014] moderation[Levy et al 2015]
I cool works [Faruqui and Dyer 2015 Faruqui et al 2015bFaruqui et al 2015a]
I modeles bilingues [Mikolov et al 2013b Chandar et al 2014Gouws et al 2015 Coulmance et al 2016]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une revolution chez les ldquodistributionnalistesrdquo Word2Vec [Mikolov et al 2013a]
I un toolkit rapide implementant deux modeles
I httpscodegooglecomarchivepword2vecI https
radimrehurekcomgensimmodelsword2vechtmlI httpsgithubcomdavword2vec
I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300
I directement utilisable dans de nombreuses applications
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Les 2 modeles de Word2Vec[Mikolov et al 2013a]
I Skip-gram est le plus populaire (plus fiable pour les ldquopetitsrdquocorpus)
I CBOW est plus rapide (bien pour les grands corpus)felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I C un corpus drsquoentraınement aka un ensemble D de paires(w c) ou w est un mot de C et c est un mot vu dans un contextenote le modele represente differemment les mots de contextedes mots du vocabulaire
I Soit (w c) appartient-elle a C p(D = 1|w c θ) la probabiliteassociee
I Optimise par descente de gradient
L = argmaxθ
prod(wc)isinD
p(D = 1|w c θ)prod
(wc)isinDprime1minus p(D = 1|w c θ)
ou vc (resp vw) est le vecteur de c (resp w)
I Dprime est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Skip-gram [Mikolov et al 2013a]
I en posant σ(x) = 11+eminusx p(D = 1|w c θ) = σ(vcvw) alors
L = argmaxθ
sum(wc)isinD
log σ(vcvw) +sum
(wc)isinDprimelog σ(minusvcvw)
I les contextes sont definis par une fenetre centree autours dumot w considere et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)
I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off)
I ca marche (lire [Levy and Goldberg 2014] pour une explication)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Polyglot [Al-Rfou et al 2013]I 100 langues (Wikipedia)I entraıne a scorer des phrases du corpus mieux que des phrases
dans lesquelles ont a remplace un mot
I FastText [Bojanowski et al 2016]I 294 langues (Wikipedia)I skip-gram ou les mots sont representes par des sacs de n-grams
(caractere) Un embedding pour un mot inconnu peut donc etrecalcule
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Autres embeddings pre-entraınes
I Glove [Pennington et al 2014]
glove6Bzip (Wikipedia+GigaWord 2014 |V |=400Kd isin 50 100 200 300 822Mo)
glove42B300dzip (Common Crawl |V |=19M uncasedd = 300 175 Go)
glove840B300dzip (Common Crawl |V |=22M casedd = 300 203 Go)
glovetwitter27Bzip (2B tweets |V |=12M uncasedd isin 25 50 100 200 142 Go)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Arithmetique analogique des representations[Mikolov et al 2013d]
I vec(Madrid) - vec(Spain) vec(Paris) - vec(France)
I permet de resoudre des equations analogiques [x y z ]
1 calculer t = vec(y)minus vec(x) + vec(z) le vecteur cible2 rechercher dans V le mot t le plus proche de t
t = argmaxw
vec(w)vec(t)
||vec(w)|| times ||vec(t)||
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013d]
I RNN entraıne sur 320M de mots (V = 82k)
I test set de 8k analogies impliquant les mots les plus frequents
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I 6B de mots de Google News 1M de mots les plus frequents
I le test syntaxique est le meme que dans [Mikolov et al 2013d]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Comparaison a drsquoautres modeles proposes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
[Mikolov et al 2013c]
I Big Data (plus de donnees dimension plus elevee)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Embeddings meta
I idee peut-on combiner plusieurs representations vectoriellespour en creer de nouvelles plus efficaes
I 2 approches simples mais neanmoins utiles (meilleurs resultatsque les representations isolees)
I concatener les representations [Bollegala and Bao 2018]I les moyenner (normaliser padder les representations de plus
faible dimension avec des 0) [Coates and Bollegala 2018]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I plein de taches une etude des meta-parametres de chaquemethode
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
I cnt = count vector pre = word2Vec dm =[Baroni and Lenci 2010] cw = [Collobert et al 2011]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Donrsquot count predict [Baroni et al 2014]
we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict modelsdespite the almost complete lack of proper comparison to countvectors Our secret wish was to discover that it is all hype andcount vectors are far superior to their predictive counterparts we found that the predict models are so good that while thetriumphalist overtones still sound excessive there are verygood reasons to switch to the new architecture
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I en utilisant des ressources linguistiques (WordNet PTBFrameNet etc)
I vecteurs tres creux
I comparables en performance aux modeles distributionnels etatde lrsquoart entraınes sur des billions de mots
I vecteurs disponibles (pour lrsquoanglais) httpsgithubcommfaruquinon-distributional
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
features (binaires) induitspour film
SYNSETFILMV01SYNSETFILMN01
HYPOCOLLAGEFILMN01HYPER SHEETN06
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
supersenses pour les noms les verbes et les adjectifsex lioness rArr SSNOUNANIMAL
color lexique mot-couleur elabore par crowdsourcing[Mohammad 2011]ex blood rArr COLORRED
emotion lexique associant un mot a sa polarite(positifnegatif) et aux emotions (joie peurtristesse etc) elabore par crowdsourcing[Mohammad and Turney 2013]ex cannibal rArr POLNEG EMODISGUST etEMOFEARCOLORRED
pos PTB part-of-speech tagsex loverArr PTBNOUN PTBVERB
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I note difficile a faire pour toutes les langues
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer 2015]
I Skip-Gram pre-entraıne sur 300B de mots[Mikolov et al 2013a]
I Glove pre-entraıne sur 6B de mots [Pennington et al 2014]I LSA obtenue a partir drsquoune matrice de co-occurrence calculee
sur 1B de mots de Wikipedia [Turney and Pantel 2010]I Ling Dense reduction de dimensionnalite avec SVDI taches similarite sent analysis (positifnegatif) NP-bracketing
(local (phone company) versus (local phone) company )felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al 2015a]
I etape de post-traitement applicable a nrsquoimporte quellerepresentation vectorielle de mots
I rapide (5 secondes pour 100k mots et dimension 300)
I idee utiliser les informations lexico-semantiques drsquouneressource pour ameliorer une representation existante
I comment encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Une communaute qui srsquoorganise[Faruqui and Dyer 2014]
I des embeddings deja entraınes
I une suite de tests qui peuvent srsquoexecuter (similarite analogiecompletion etc)
I une interface de visualisation
I note pas certain que le site soit tres populaire (ni mis a jour)pour le moment
I httpwordvectorsorgdemophp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I on peut apprendre une transformation lineaire (rotation +scaling) drsquoun espace vers un autre avec un lexique bilingue(xi zi)
W = minW
Σi Wxi minus zi2
ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi
I W optimisee par descente de gradient sur un lexique drsquoenviron5k paires de mots
I au moment du test traduire un mot x par z
z = argmaxz
cos(z Wx)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
I 6K des most sources lesplus frequents traduits parGoogleTrans
I premieres 5K entreespour calculer W
I 1K suivantes pour lestests
I baselines edit-distanceεminusRapp
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Mikolov strikes again [Mikolov et al 2013b]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval W2V Ana Meta Eval Cool Bi
Plus de donnees (Google News)
I meme split 5K train 1Ktest
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Plan
(Before Deep) modele vectoriel
And then came the ldquoDeeprdquoWord2VecAnalogieMeta-embeddingsEvaluationIdees interessantesLe cas bilingue
Evaluation
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
I comparent 4 approches matrice de co-occurrence (PMI) SVDSkip-Gram et GloVe
I etudient leurs parametres en detail
I adaptent des choix faits dans Skip-Gram a drsquoautres methodeslorsque possible
I Bilan
I match nul en performance (pas drsquoavantage clair drsquoune approchesur une autre)
I Skip-Gram se comporte mieux (tempsmemoire) que les autresapproches
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Sur la difficulte drsquoevaluer sans biais[Levy et al 2015]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Exemple drsquoobservation [Levy et al 2015]
I dans lrsquoapproche matrice de co-occurences un mot w et soncontexte c est note
PMI(w c) = logp(w c)
p(w)p(c)
I une approche courante est de mettre a 0 les valeurs de PMIlorsque (w c) = 0 (plutot que minusinfin)
I une autre est de prendre PPMI(w c) = max(PMI(w c) 0)
I adaptation de choix faits dans Skip-Gram
I
SPPMI(w c) = max(PMI(w c)minus logk 0)I sampling des k examples negatifs (lisses avec α = 075)
PMIα(w c) = logP (w c)
p(w)Pα(c)avec Pα(c) =
(c)αsumc(c)α
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
[Schnabel et al 2015]
I recommandent de ne pas utiliser une tache extrinseque pourevaluer des embeddings pre-entraınes
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
[Antoniak and Mimno 2018]
I word2vec skipgram relance plusieurs fois avec les memesparametres
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Et pour les mots peu frequents[Jakubina and Langlais 2017]
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Et pour les mots peu frequents
1k-low 1k-highTOP1 TOP5 TOP20 TOP1 TOP5 TOP20
embedding 22 61 119 217 342 449context 20 43 76 190 327 443document 07 23 50 mdash mdash mdash
oracle 46 mdash 190 318 mdash 576
I Wikipedia dump de juin 2013 (EN 35M FR 13M articles)
I VEN = 73M VFR = 36M
I 2 test sets 1k-low (1k mots rares) 1k-high (1k mots non rares)
I rare = freq lt 26 (92 des mots de VEN)
felipeiroumontrealca Semantique distributionnelle embeddings (et dong)
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Al-Rfou R Perozzi B and Skiena S (2013)Polyglot Distributed word representations for multilingual nlpIn Proceedings of the Seventeenth Conference onComputational Natural Language Learning pages 183ndash192Sofia Bulgaria Association for Computational Linguistics
Antoniak M and Mimno D (2018)Evaluating the stability of embedding-based word similaritiesTransactions of the Association for Computational Linguistics6 107ndash119
Baroni M Dinu G and Kruszewski G (2014)Donrsquot count predict a systematic comparison ofcontext-counting vs context-predicting semantic vectorsIn Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 Long Papers) pages238ndash247 Baltimore Maryland Association for ComputationalLinguistics
Baroni M and Lenci A (2010)
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Distributional memory A general framework for corpus-basedsemanticsComput Linguist 36(4) 673ndash721
Bojanowski P Grave E Joulin A and Mikolov T(2016)Enriching word vectors with subword informationarXiv preprint arXiv 160704606
Bollegala D and Bao C (2018)Learning word meta-embeddings by autoencodingIn Proceedings of the 27th International Conference onComputational Linguistics pages 1650ndash1661 Association forComputational Linguistics
Chandar A P S Lauly S Larochelle H KhapraM M Ravindran B Raykar V C and Saha A (2014)An autoencoder approach to learning bilingual wordrepresentationsCoRR
Coates J and Bollegala D (2018)
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Frustratingly easy meta-embedding ndash computingmeta-embeddings by averaging source word embeddingsIn Conference of the North American Chapter of the Associationfor Computational Linguistics Human Language TechnologiesVolume 2 (Short Papers) pages 194ndash198
Collobert R Weston J Bottou L Karlen MKavukcuoglu K and Kuksa P (2011)Natural language processing (almost) from scratchJournal of Machine Learning Research 12 2493ndash2537
Coulmance J Marty J Wenzek G and BenhalloumA (2016)Trans-gram fast cross-lingual word-embeddingsCoRR abs160102502
Faruqui M Dodge J Jauhar S K Dyer C Hovy Eand Smith N A (2015a)Retrofitting word vectors to semantic lexiconsIn Proceedings of NAACL
Faruqui M and Dyer C (2014)
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Community evaluation and exchange of word vectors atwordvectorsorgIn Proceedings of ACL System Demonstrations
Faruqui M and Dyer C (2015)Non-distributional word vector representationsIn Proceedings of ACL
Faruqui M Tsvetkov Y Yogatama D Dyer C andSmith N A (2015b)Sparse overcomplete word vector representationsIn Proceedings of ACL
Golub G H and Van Loan C F (1996)Matrix Computations (3rd Ed)Johns Hopkins University Press
Gouws S Bengio Y and Corrado G (2015)Bilbowa Fast bilingual distributed representations without wordalignmentsIn ICML
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Jakubina L and Langlais P (2017)Reranking translation candidates produced by several bilingualword similarity sourcesIn 15th Conference of the European Chapter of the Associationfor Computational Linguitics volume 2 Short Papers pages605ndash611
Jurafsky D and Martin J H (2015)Speech and language processing(3rd ed draft)
Lee D D and Seung H S (1999)Learning the parts of objects by non-negative matrixfactorizationNature 401(6755) 788ndash791
Levy O and Goldberg Y (2014)Neural word embedding as implicit matrix factorizationIn Advances in Neural Information Processing Systems 27pages 2177ndash2185
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Levy O Goldberg Y and Dagan I (2015)Improving distributional similarity with lessons learned from wordembeddingsTransactions of the Association for Computational Linguistics3 211ndash225
Mikolov T Chen K Corrado G and Dean J (2013a)
Efficient estimation of word representations in vector spaceCoRR abs13013781
Mikolov T Le Q V and Sutskever I (2013b)Exploiting similarities among languages for machine translationCoRR abs13094168
Mikolov T Sutskever I Chen K Corrado G andDean J (2013c)Distributed representations of words and phrases and theircompositionalityCoRR abs13104546
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Mikolov T tau Yih W and Zweig G (2013d)Linguistic regularities in continuous space word representationsIn Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics Human Language Technologies (NAACL-HLT-2013)
Mohammad S (2011)Colourful language Measuring word-colour associationsIn 2Nd Workshop on Cognitive Modeling and ComputationalLinguistics CMCL rsquo11 pages 97ndash106
Mohammad S and Turney P D (2013)Crowdsourcing a word-emotion association lexiconCoRR
Pennington J Socher R and Manning C D (2014)Glove Global vectors for word representationIn Empirical Methods in Natural Language Processing (EMNLP)pages 1532ndash1543
Salton G (1975)
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-
BD Deep Eval
Dynamic information and library processing Gerard SaltonPrentice-Hall Englewood Cliffs NJ
Schnabel T Labutov I Mimno D M and JoachimsT (2015)Evaluation methods for unsupervised word embeddingsIn Marquez L Callison-Burch C Su J Pighin D andMarton Y editors EMNLP pages 298ndash307 The Associationfor Computational Linguistics
Turney P D (2005)Measuring semantic similarity by latent relational analysisCoRR
Turney P D and Pantel P (2010)From frequency to meaning Vector space models of semantics
J Artif Int Res 37(1) 141ndash188
- (Before Deep) modegravele vectoriel
- And then came the ``Deep
-
- Word2Vec
- Analogie
- Meta-embeddings
- Eacutevaluation
- Ideacutees inteacuteressantes
- Le cas bilingue
-
- Eacutevaluation
-