a similarity measure based on semantic and linguistic information
DESCRIPTION
TRANSCRIPT
![Page 1: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/1.jpg)
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
1
A Similarity Measure Based on Semantic and Linguistic Information
Nitish AggarwalDERI, NUI Galway
Wednesday,15th June, 2011DERI, Reading Group
![Page 2: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/2.jpg)
Digital Enterprise Research Institute www.deri.ie
Based On:
“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
Authors: Giuseppe Pirro and JeoromeEuzenat
Published: International Semantic Web Conference, 2010
“SyMSS: A syntax-based measure for short-text semantic similarity ”
Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
2
![Page 3: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/3.jpg)
Digital Enterprise Research Institute www.deri.ie
Overview
Introduction
Classical Approaches
Ontology-based Similarity
Set of relations
Information Content
SyMSS (Syntax-based) Deep Parsing
Influence of adjectives and adverbs
Conclusion
3
![Page 4: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/4.jpg)
Digital Enterprise Research Institute www.deri.ie
Introduction & Motivation
Short-text Similarity Lack of Semantics and Linguistics
Applications Semantic Annotation Semantic Search Information Retrieval and Extraction
4
![Page 5: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/5.jpg)
Digital Enterprise Research Institute www.deri.ie
Classical Approaches
String Similarity Levenshtein distance, Dice Coefficient
Corpus-based ESA, Google distance,Vector-Space Model
Ontology-based Path distance, Information content
Syntax Similarity Word-order, Part of Speech
5
![Page 6: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/6.jpg)
Digital Enterprise Research Institute www.deri.ie
First Paper:
“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
Authors: Giuseppe Pirro and JeoromeEuzenat
Published: International Semantic Web Conference, 2010
“SyMSS: A syntax-based measure for short-text semantic similarity ”
Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
6
![Page 7: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/7.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Overview
Features Whole set of semantic relations defined in an ontology
Resnik’s Information Content IC(c) = -log p(c)
Intrinsic Information Content Overcome the analysis of large corpora
Extended Information Content Map feature-based model to information theoretic
domain
7
![Page 8: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/8.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Why whole set?
8
Eyes Ears
Relation: Part of
![Page 9: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/9.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based - model
Tversky’s feature-based similarity model common features of two concepts ~ similarity Extra feature ~ 1/similarity .
Ratio-base formulation of Tverky’s model
.
9
![Page 10: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/10.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Mapping
1
10
Mapping between feature-based and information theoretic similarity models
1. MSCA: Most Specific Common Abstraction
![Page 11: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/11.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Example
11
T1: Car
T2: Bicycle
Example of Concept Feature
![Page 12: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/12.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Example
12
T1: Car
T2: Bicycle
Example of Concept Feature
![Page 13: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/13.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Framework
Intrinsic information content(iIC)
.
where sub(c) is number of sub-concept of given concept c.
Extended information content(eIC) where EIC(c) is relatedness coefficient using all kind of relations
13
![Page 14: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/14.jpg)
Digital Enterprise Research Institute www.deri.ie DataSet: 65 human evaluated pairs
Correlation values:
14
Ontology-based – Evaluation of Similarity
![Page 15: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/15.jpg)
Digital Enterprise Research Institute www.deri.ie
Ontology-based – Evaluation of Relatedness
DataSet : Wordnet 353
Correlation value:
15
![Page 16: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/16.jpg)
Digital Enterprise Research Institute www.deri.ie
16
Ontology-based - Summary
Intrinsic similarity measure Ontology-based similarity Outperforms corpus measures
Limitation No short-text Model-based
– E,g, only concepts in the ontology are considered (e.g. car accident)
![Page 17: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/17.jpg)
Digital Enterprise Research Institute www.deri.ie
Second paper (SyMSS)
“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
Authors: Giuseppe Pirro and JeoromeEuzenat
Published: International Semantic Web Conference, 2010
“SyMSS: A syntax-based measure for short-text semantic similarity ”
Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
17
![Page 18: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/18.jpg)
Digital Enterprise Research Institute www.deri.ie
SyMSS - Overview
SyMSS = “syntax-based similarity for short-term text”
Syntactic Information Not only word order Deep Parsing Parts of speech
Semantic Information Wordnet similarity Different ontology-based similarity
18
![Page 19: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/19.jpg)
Digital Enterprise Research Institute www.deri.ie
SyMSS - Semantic Information
Path-base measure Shortest path Hirst and st. Onge (HSO)
Information Content Resnik measure Jiang and Corath measure Lin measure
Gloss-base measure Gloss Overlap and Gloss vector
19
![Page 20: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/20.jpg)
Digital Enterprise Research Institute www.deri.ie
SyMSS - Syntactic Information
Parse tree phrases Head of phrases
Head similarity Head of phrases which have same syntactic function
Penalization factor Non shared phrases
20
![Page 21: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/21.jpg)
Digital Enterprise Research Institute www.deri.ie
SyMSS - Model
My brother has a dog with four legs
My brother has four legs
Sim(Has,Has) = 1
Sim(brother,brother) = 1Sim(dog,leg) = 0.1414
PF = 0.03
![Page 22: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/22.jpg)
Digital Enterprise Research Institute www.deri.ie
SyMSS - Evaluation
DataSet: 30 pairs out of 65 human evaluated pairs
Correlation values:
22
![Page 23: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/23.jpg)
Digital Enterprise Research Institute www.deri.ie
SyMSS - Effect of adverb and adjective
Sentence1: ”I have a big dog”
Sentence2: ”I have a little dog”
8.68% gain in SyMSS with HSO
23
![Page 24: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/24.jpg)
Digital Enterprise Research Institute www.deri.ie
24
SyMSS - Summary
Syntax-based similarity considers… Nouns and verbs Influence of adjectives and adverbs
Limitation Depend on parsed structure
– E.g. not grammatically correct Depend on word similarity
![Page 25: A similarity measure based on semantic and linguistic information](https://reader033.vdocument.in/reader033/viewer/2022061209/54875e1bb47959fb0c8b5450/html5/thumbnails/25.jpg)
Digital Enterprise Research Institute www.deri.ie
25
Conclusion
No established method for short text Parsing of phrases is difficult
Concept similarity depend on model Weak model
– E.g. xebr: Extraordinary Income and xebr: Other Operating Income ->
Pathlength = 0.2 and Expert = 0.8
Need a syntactic similarity for concepts tag (word or phrase)