uncertainty in neural network word embedding: exploration of threshold for similarity
TRANSCRIPT
![Page 1: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/1.jpg)
Navid Rekabsaz ([email protected]) Mihai Lupu ([email protected])
Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity
NeuIR at SIGIR, 21st July 2016
Navid Rekabsaz, Mihai Lupu, Allan Hanbury
@NRekabsaz [email protected]
![Page 2: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/2.jpg)
Similar Terms
![Page 3: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/3.jpg)
Similar Terms
books 0.82foreword 0.77author 0.74published 0.73preface 0.69republished 0.68reprinted 0.68afterword 0.67memoir 0.67
bookcorpulent 0.44hideous 0.43unintelligent
0.42wizened 0.42catoblepas 0.42creature 0.42humanoid 0.41grotesquely 0.41tomtar 0.41
dwarfish
![Page 4: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/4.jpg)
TopN:
Similar Terms
books 0.82foreword 0.77author 0.74published 0.73preface 0.69republished 0.68reprinted 0.68afterword 0.67memoir 0.67
bookcorpulent 0.44hideous 0.43unintelligent
0.42wizened 0.42catoblepas 0.42creature 0.42humanoid 0.41grotesquely 0.41tomtar 0.41
dwarfish
![Page 5: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/5.jpg)
TopN:
Similar Terms
Using threshold in SIGIR 2016, tuned by brute-force search: Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising.Mihajlo Grbovic et al.Robust and Collective Entity Disambiguation through Semantic Embeddings.Stefan Zwicklbauer et al.
books 0.82foreword 0.77author 0.74published 0.73preface 0.69republished 0.68reprinted 0.68afterword 0.67memoir 0.67
bookcorpulent 0.44hideous 0.43unintelligent
0.42wizened 0.42catoblepas 0.42creature 0.42humanoid 0.41grotesquely 0.41tomtar 0.41
dwarfish
![Page 6: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/6.jpg)
Contribution
![Page 7: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/7.jpg)
Our Contribution • Analytical exploration of a general threshold on term similarity
• Defined for all the terms in the lexicon • Tested on Ad Hoc retrieval
• Showing the advantage of threshold-based rather than TopN-based approach
Contribution
![Page 8: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/8.jpg)
Roadmap • Uncertainty in NN-based word embeddings • Similarity distribution • Proposing threshold • Experiments
Our Contribution • Analytical exploration of a general threshold on term similarity
• Defined for all the terms in the lexicon • Tested on Ad Hoc retrieval
• Showing the advantage of threshold-based rather than TopN-based approach
Contribution
![Page 9: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/9.jpg)
Observations
![Page 10: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/10.jpg)
Observations
W2V
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 11: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/11.jpg)
Observations
W2V
BookDwarfish
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 12: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/12.jpg)
Observations
W2V
BookDwarfish
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 13: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/13.jpg)
Observations
W2V
W2V
BookDwarfish
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 14: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/14.jpg)
Observations
W2V
W2V
BookDwarfish
BookDwarfish
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 15: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/15.jpg)
Observations
W2V
W2V
BookDwarfish
BookDwarfish
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 16: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/16.jpg)
Observations
W2V
W2V
BookDwarfish
BookDwarfish X 4
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 17: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/17.jpg)
Observations
Similarity values: Base
W2V
W2V
BookDwarfish
Average of similarity values of 5 models:
Avg
BookDwarfish X 4
W2V: Word2Vec SkipGram with 200 Dimensions
![Page 18: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/18.jpg)
Uncertainty
![Page 19: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/19.jpg)
Uncertainty
Uncertainty:
![Page 20: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/20.jpg)
Uncertainty• Arbitrary term: Average of 100 representative terms [Schnabel et al. 2015]
![Page 21: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/21.jpg)
Uncertainty• Arbitrary term: Average of 100 representative terms [Schnabel et al. 2015]
• Less uncertainty in higher similarity values • Less uncertainty in higher dimensions
![Page 22: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/22.jpg)
Similarity Probability Distribution• Similarity between terms as probability distribution instead
of concrete values • Assumption: normal distribution based on 5 observed
similarities
![Page 23: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/23.jpg)
Mixture of Cumulative Similarity Distributions
• Y axes: number of neighbors, located in the space between the X value and the term
![Page 24: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/24.jpg)
Mixture of Cumulative Similarity Distributions
• Y axes: number of neighbors, located in the space between the X value and the term
![Page 25: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/25.jpg)
Mixture of Cumulative Similarity Distributions
• Y axes: number of neighbors, located in the space between the X value and the term
![Page 26: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/26.jpg)
Mixture of Cumulative Similarity Distributions
• Y axes: number of neighbors, located in the space between the X value and the term
![Page 27: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/27.jpg)
Mixture of Cumulative Similarity Distributions
• Y axes: number of neighbors, located in the space between the X value and the term
![Page 28: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/28.jpg)
Filtering NeighborsWhat is the best threshold for filtering the related terms?
![Page 29: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/29.jpg)
Filtering NeighborsWhat is the best threshold for filtering the related terms?
Hypothesis: it can be estimated based on the average number of synonyms over the terms
![Page 30: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/30.jpg)
Filtering NeighborsWhat is the best threshold for filtering the related terms?
Hypothesis: it can be estimated based on the average number of synonyms over the terms
What is the expected number of synonyms for a word in English?
![Page 31: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/31.jpg)
Filtering NeighborsWhat is the best threshold for filtering the related terms?
Hypothesis: it can be estimated based on the average number of synonyms over the terms
What is the expected number of synonyms for a word in English?
# of terms: Average # of synonyms per term:
Standard deviation :
147306 1.6 3.1
![Page 32: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/32.jpg)
Threshold• Proposed Threshold: cumulative frequency equal to 1.6
![Page 33: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/33.jpg)
Threshold• Proposed Threshold: cumulative frequency equal to 1.6
![Page 34: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/34.jpg)
Threshold• Proposed Threshold: cumulative frequency equal to 1.6
![Page 35: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/35.jpg)
Threshold• Proposed Threshold: cumulative frequency equal to 1.6
![Page 36: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/36.jpg)
Experiments Setup• Translation Language Model with Dirichlet smoothing • Use word embedding similarity value for translation probability
Zuccon et al. [2015] • Apply the proposed threshold to select the similar words • Brute-force search to find the optimal threshold
![Page 37: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/37.jpg)
Experiments Setup• Translation Language Model with Dirichlet smoothing • Use word embedding similarity value for translation probability
Zuccon et al. [2015] • Apply the proposed threshold to select the similar words • Brute-force search to find the optimal threshold
• Test collections:
![Page 38: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/38.jpg)
Experiments Setup• Translation Language Model with Dirichlet smoothing • Use word embedding similarity value for translation probability
Zuccon et al. [2015] • Apply the proposed threshold to select the similar words • Brute-force search to find the optimal threshold
• Test collections:
• Baseline: language model (Dirichlet smoothing) • Significance Test : T-Test p<0.05
![Page 39: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/39.jpg)
Experiments Results• Gain of MAP over baseline, averaged on four collections. • Conclusion1: Optimal threshold is either the same or in the
confidence interval of the proposed threshold.
![Page 40: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/40.jpg)
Experiments Results• Gain of MAP over baseline, averaged on four collections. • Conclusion1: Optimal threshold is either the same or in the
confidence interval of the proposed threshold.
![Page 41: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/41.jpg)
Threshold vs. TopN• Conclusion2: Threshold outperforms TopN
Threshold-based TopN
![Page 42: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/42.jpg)
Take Home Message
• Uncertainty in neural network word embeddings: • depends on similarity value • depends on dimensionality
• Threshold to filter unrelated terms : • Proposed threshold as good as optimal threshold • Threshold approach much better than Top-N approach
![Page 43: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/43.jpg)
@NRekabsaz [email protected]
Questions?
Ideas!
Follow-up paper in CIKM 2016: Generalizing Translation Models in the Probabilistic Relevance Framework Navid Rekabsaz, Mihai Lupu, Allan Hanbury
![Page 44: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/44.jpg)
Dimensionality
![Page 45: Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity](https://reader035.vdocument.in/reader035/viewer/2022062412/588890911a28ab3e658b69fd/html5/thumbnails/45.jpg)
Proposed vs. Optimal Threshold