![Page 1: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/1.jpg)
WordMeaningandSimilarity
WordSenses andWordRelations
SlidesareadaptedfromDanJurafsky
![Page 2: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/2.jpg)
Reminder:lemmaandwordform
• Alemma orcitationform• Samestem,partofspeech,roughsemantics
• Awordform• The“inflected”wordasitappearsintext
Wordform Lemmabanks banksung singduermes dormir
![Page 3: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/3.jpg)
Lemmashavesenses
• Onelemma“bank”canhavemanymeanings:• …a bank can hold the investments in a custodial account…
• “…as agriculture burgeons on the east bank the river will shrink even more”
• Sense(orwordsense)• Adiscreterepresentation
ofanaspectofaword’smeaning.
• Thelemmabank herehastwosenses
1
2
Sense1:
Sense2:
![Page 4: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/4.jpg)
Homonymy
Homonyms:wordsthatshareaformbuthaveunrelated,distinctmeanings:
• bank1:financialinstitution,bank2:slopingland• bat1:clubforhittingaball,bat2:nocturnalflyingmammal
1. Homographs (bank/bank,bat/bat)2. Homophones:
1. Write andright2. Piece andpeace
![Page 5: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/5.jpg)
HomonymycausesproblemsforNLPapplications
• Informationretrieval• “bat care”
• MachineTranslation• bat:murciélago (animal)orbate (forbaseball)
• Text-to-Speech• bass (stringedinstrument)vs.bass (fish)
![Page 6: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/6.jpg)
Polysemy
• 1.Thebankwasconstructedin1875outoflocalredbrick.• 2.Iwithdrewthemoneyfromthebank• Arethosethesamesense?
• Sense2:“Afinancialinstitution”• Sense1:“Thebuildingbelongingtoafinancialinstitution”
• Apolysemous wordhasrelatedmeanings• Mostnon-rarewordshavemultiplemeanings
![Page 7: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/7.jpg)
• Lotsoftypesofpolysemyaresystematic• School, university, hospital• Allcanmeantheinstitutionorthebuilding.
• Asystematicrelationship:• Building Organization
• Othersuchkindsofsystematicpolysemy:Author (Jane Austen wrote Emma)
WorksofAuthor(I love Jane Austen)Tree (Plums have beautiful blossoms)
Fruit (I ate a preserved plum)
MetonymyorSystematicPolysemy:Asystematicrelationshipbetweensenses
![Page 8: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/8.jpg)
Howdoweknowwhenawordhasmorethanonesense?
• The“zeugma”test:Twosensesofserve?• Which flights serve breakfast?• Does Lufthansa serve Philadelphia?• ?DoesLufthansaservebreakfastandSanJose?
• Sincethisconjunctionsoundsweird,• wesaythatthesearetwodifferentsensesof“serve”
![Page 9: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/9.jpg)
Synonyms• Wordthathavethesamemeaninginsomeorallcontexts.
• filbert/hazelnut• couch/sofa• big/large• automobile/car• vomit/throwup• Water/H20
• Twolexemesaresynonyms• iftheycanbesubstitutedforeachotherinallsituations• Ifsotheyhavethesamepropositionalmeaning
![Page 10: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/10.jpg)
Synonyms
• Buttherearefew(orno)examplesofperfectsynonymy.• Evenifmanyaspectsofmeaningareidentical• Stillmaynotpreservetheacceptabilitybasedonnotionsofpoliteness,slang,register,genre,etc.
• Example:• Water/H20• Big/large• Brave/courageous
![Page 11: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/11.jpg)
Synonymyisarelationbetweensensesratherthanwords
• Considerthewordsbig andlarge• Aretheysynonyms?
• Howbig isthatplane?• WouldIbeflyingonalarge orsmallplane?
• Howabouthere:• MissNelson becameakindofbigsistertoBenjamin.• ?MissNelson becameakindoflarge sistertoBenjamin.
• Why?• big hasasensethatmeansbeingolder,orgrownup• large lacksthissense
![Page 12: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/12.jpg)
Antonyms
• Sensesthatareoppositeswithrespecttoonefeatureofmeaning• Otherwise,theyareverysimilar!
dark/light short/long fast/slow rise/fallhot/cold up/down in/out
• Moreformally:antonymscan• defineabinaryopposition
orbeatoppositeendsofascale• long/short, fast/slow
• Bereversives:• rise/fall, up/down
![Page 13: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/13.jpg)
HyponymyandHypernymy
• Onesenseisahyponym ofanotherifthefirstsenseismorespecific,denotingasubclassoftheother• car isahyponymofvehicle• mango isahyponymoffruit
• Converselyhypernym/superordinate (“hyperissuper”)• vehicle isahypernym ofcar• fruit isahypernym ofmango
Superordinate/hyper vehicle fruit furnitureSubordinate/hyponym car mango chair
![Page 14: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/14.jpg)
Hyponymymoreformally• Extensional:
• Theclassdenotedbythesuperordinateextensionallyincludestheclassdenotedbythehyponym
• Entailment:• AsenseAisahyponymofsenseBifbeinganAentailsbeingaB
• Hyponymyisusuallytransitive• (AhypoBandBhypoCentailsAhypoC)
• Anothername:theIS-Ahierarchy• AIS-A B(orAISA B)• Bsubsumes A
![Page 15: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/15.jpg)
HyponymsandInstances
• WordNet hasbothclasses andinstances.• Aninstance isanindividual,apropernounthatisauniqueentity
• San Francisco isaninstance ofcity• Butcity isaclass• city isahyponym ofmunicipality...location...
15
![Page 16: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/16.jpg)
WordMeaningandSimilarity
WordSenses andWordRelations
![Page 17: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/17.jpg)
WordMeaningandSimilarity
WordNet andotherOnlineThesauri
![Page 18: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/18.jpg)
ApplicationsofThesauriandOntologies
• InformationExtraction• InformationRetrieval• QuestionAnswering• Bioinformaticsand MedicalInformatics• MachineTranslation
![Page 19: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/19.jpg)
WordNet 3.0
• Ahierarchicallyorganizedlexicaldatabase• On-linethesaurus+aspectsofadictionary
• Someotherlanguagesavailableorunderdevelopment• (Arabic,Finnish,German,Portuguese…)
Category UniqueStringsNoun 117,798Verb 11,529Adjective 22,479Adverb 4,481
![Page 20: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/20.jpg)
Sensesof“bass”inWordnet
![Page 21: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/21.jpg)
Howis“sense”definedinWordNet?• The synset (synonymset),thesetofnear-synonyms,
instantiatesasenseorconcept,withagloss• Example:chumpasanounwiththegloss:
“apersonwhoisgullibleandeasytotakeadvantageof”
• Thissenseof“chump”issharedby9words:chump1, fool2, gull1, mark9, patsy1, fall guy1, sucker1, soft touch1, mug2
• Eachofthese senseshavethissamegloss• (Notevery sense;sense2ofgullistheaquaticbird)
![Page 22: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/22.jpg)
WordNet Hypernym Hierarchyfor“bass”
![Page 23: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/23.jpg)
WordNet NounRelations
![Page 24: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/24.jpg)
WordNet 3.0
• Whereitis:• http://wordnetweb.princeton.edu/perl/webwn
• Libraries• Python:WordNet fromNLTK• http://www.nltk.org/Home
• Java:• JWNL,extJWNL onsourceforge
![Page 25: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/25.jpg)
Synset
• MeSH (MedicalSubjectHeadings)• 177,000entrytermsthatcorrespondto26,142biomedical“headings”
• HemoglobinsEntryTerms:Eryhem, FerrousHemoglobin,HemoglobinDefinition:Theoxygen-carryingproteinsofERYTHROCYTES.Theyarefoundinallvertebratesandsomeinvertebrates.Thenumberofglobinsubunitsinthehemoglobinquaternarystructurediffersbetweenspecies.Structuresrangefrommonomerictoavarietyofmultimeric arrangements
MeSH:MedicalSubjectHeadingsthesaurusfromtheNationalLibraryofMedicine
![Page 26: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/26.jpg)
TheMeSH Hierarchy
• a
26
![Page 27: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/27.jpg)
UsesoftheMeSH Ontology
• Providesynonyms(“entryterms”)• E.g.,glucoseanddextrose
• Providehypernyms (fromthehierarchy)• E.g.,glucoseISAmonosaccharide
• IndexinginMEDLINE/PubMED database• NLM’sbibliographicdatabase:• 20millionjournalarticles• Eacharticlehand-assigned10-20MeSH terms
![Page 28: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/28.jpg)
WordMeaningandSimilarity
WordNet andotherOnlineThesauri
![Page 29: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/29.jpg)
WordMeaningandSimilarity
WordSimilarity:ThesaurusMethods
![Page 30: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/30.jpg)
WordSimilarity
• Synonymy:abinaryrelation• Twowordsareeithersynonymousornot
• Similarity(or distance):aloosermetric• Twowordsaremoresimilariftheysharemorefeaturesofmeaning
• Similarityisproperlyarelationbetweensenses• Theword“bank”isnotsimilartotheword“slope”• Bank1 issimilartofund3
• Bank2 issimilartoslope5
• Butwe’llcomputesimilarityoverbothwordsandsenses
![Page 31: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/31.jpg)
Whywordsimilarity
• Informationretrieval• Questionanswering• Machinetranslation• Naturallanguagegeneration• Languagemodeling• Automaticessaygrading• Plagiarismdetection• Documentclustering
![Page 32: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/32.jpg)
Wordsimilarityandwordrelatedness
• Weoftendistinguishwordsimilarity fromwordrelatedness• Similar words:near-synonyms• Relatedwords:canberelatedanyway• car, bicycle: similar• car, gasoline: related,notsimilar
![Page 33: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/33.jpg)
Twoclassesofsimilarityalgorithms
• Thesaurus-basedalgorithms• Arewords“nearby”inhypernym hierarchy?• Dowordshavesimilarglosses(definitions)?
• Distributionalalgorithms• Dowordshavesimilardistributionalcontexts?
![Page 34: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/34.jpg)
Pathbasedsimilarity
• Twoconcepts(senses/synsets)aresimilariftheyareneareachotherinthethesaurushierarchy• =haveashortpathbetweenthem• conceptshavepath1tothemselves
![Page 35: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/35.jpg)
Refinementstopath-basedsimilarity
• pathlen(c1,c2) =1+numberofedgesintheshortestpathinthehypernym graphbetweensensenodesc1 andc2
• rangesfrom0to1(identity)
• simpath(c1,c2) =
• wordsim(w1,w2) = max simpath(c1,c2)c1Îsenses(w1),c2Îsenses(w2)
1pathlen(c1,c2 )
![Page 36: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/36.jpg)
Example:path-basedsimilaritysimpath(c1,c2) = 1/pathlen(c1,c2)
simpath(nickel,coin)=1/2 = .5simpath(fund,budget)=1/2 = .5simpath(nickel,currency)=1/4 = .25simpath(nickel,money)=1/6 = .17simpath(coinage,Richter scale)=1/6 = .17
![Page 37: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/37.jpg)
Problemwithbasicpath-basedsimilarity
• Assumeseachlinkrepresentsauniformdistance• Butnickel tomoney seemstoustobecloserthannickel tostandard
• Nodeshighinthehierarchyareveryabstract• Weinsteadwantametricthat
• Representsthecostofeachedgeindependently• Wordsconnectedonlythroughabstractnodes• arelesssimilar
![Page 38: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/38.jpg)
Informationcontentsimilaritymetrics
• Let’sdefineP(c) as:• Theprobabilitythatarandomlyselectedwordinacorpusisaninstanceofconceptc
• Formally:thereisadistinctrandomvariable,rangingoverwords,associatedwitheachconceptinthehierarchy• foragivenconcept,eachobservednouniseither
• amemberofthatconceptwithprobabilityP(c)• notamemberofthatconceptwithprobability1-P(c)
• Allwordsaremembersoftherootnode(Entity)• P(root)=1
• Theloweranodeinhierarchy,theloweritsprobability
Resnik 1995.Usinginformationcontenttoevaluatesemanticsimilarityinataxonomy.IJCAI
![Page 39: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/39.jpg)
Informationcontentsimilarity
• Trainbycountinginacorpus• Eachinstanceofhill countstowardfrequencyofnaturalelevation,geologicalformation,entity,etc• Letwords(c) bethesetofallwordsthatarechildrenofnodec
• words(“geo-formation”)= {hill,ridge,grotto,coast,cave,shore,natural elevation}• words(“naturalelevation”)={hill,ridge}
P(c) =count(w)
w∈words(c)∑
N
geological-formation
shore
hill
naturalelevation
coast
cave
grottoridge
…
entity
![Page 40: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/40.jpg)
Informationcontentsimilarity• WordNet hierarchyaugmentedwithprobabilitiesP(c)
D.Lin.1998.AnInformation-TheoreticDefinitionofSimilarity.ICML1998
![Page 41: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/41.jpg)
Informationcontent:definitions
• Informationcontent:IC(c) = -log P(c)
• Mostinformativesubsumer(Lowestcommonsubsumer)LCS(c1,c2) = Themostinformative(lowest)nodeinthehierarchysubsumingbothc1 andc2
![Page 42: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/42.jpg)
Usinginformationcontentforsimilarity:theResnik method
• Thesimilaritybetweentwowordsisrelatedtotheircommoninformation
• Themoretwowordshaveincommon,themoresimilartheyare
• Resnik:measurecommoninformationas:• Theinformationcontentofthemostinformative(lowest)subsumer (MIS/LCS)ofthetwonodes
• simresnik(c1,c2) = -log P( LCS(c1,c2) )
PhilipResnik.1995.UsingInformationContenttoEvaluateSemanticSimilarityinaTaxonomy.IJCAI1995.PhilipResnik.1999.SemanticSimilarityinaTaxonomy:AnInformation-BasedMeasureanditsApplicationtoProblemsofAmbiguityinNaturalLanguage.JAIR11,95-130.
![Page 43: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/43.jpg)
Dekang Linmethod
• Intuition:SimilaritybetweenAandBisnotjustwhattheyhaveincommon
• Themoredifferences betweenAandB,thelesssimilartheyare:• Commonality:themoreAandBhaveincommon,themoresimilartheyare• Difference:themoredifferencesbetweenAandB,thelesssimilar
• Commonality:IC(common(A,B))• Difference:IC(description(A,B))-IC(common(A,B)
Dekang Lin.1998.AnInformation-TheoreticDefinitionofSimilarity.ICML
![Page 44: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/44.jpg)
Dekang Linsimilaritytheorem• ThesimilaritybetweenAandBismeasuredbytheratio
betweentheamountofinformationneededtostatethecommonalityofAandBandtheinformationneededtofullydescribewhatAandBare
simLin(A,B)∝IC(common(A,B))IC(description(A,B))
• Lin(alteringResnik)definesIC(common(A,B))as2xinformationoftheLCS
simLin(c1,c2 ) =2 logP(LCS(c1,c2 ))logP(c1)+ logP(c2 )
![Page 45: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/45.jpg)
Linsimilarityfunction
simLin(A,B) =2 logP(LCS(c1,c2 ))logP(c1)+ logP(c2 )
simLin(hill, coast) =2 logP(geological-formation)logP(hill)+ logP(coast)
=2 ln0.00176
ln0.0000189+ ln0.0000216= .59
![Page 46: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/46.jpg)
The(extended)Lesk Algorithm
• Athesaurus-basedmeasurethatlooksatglosses• Twoconceptsaresimilariftheirglossescontainsimilarwords
• Drawingpaper:paper thatisspeciallypreparedforuseindrafting• Decal:theartoftransferringdesignsfromspeciallypreparedpaper toawoodorglassormetalsurface
• Foreachn-wordphrasethat’sinbothglosses• Addascoreofn2
• Paperandspeciallypreparedfor1+22 =5• Computeoverlapalsoforotherrelations• glossesofhypernyms andhyponyms
![Page 47: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/47.jpg)
Summary:thesaurus-basedsimilarity
simpath (c1,c2 ) =1
pathlen(c1,c2 )
simresnik (c1,c2 ) = − logP(LCS(c1,c2 )) simlin (c1,c2 ) =2 logP(LCS(c1,c2 ))logP(c1)+ logP(c2 )
sim jiangconrath (c1,c2 ) =1
logP(c1)+ logP(c2 )− 2 logP(LCS(c1,c2 ))
simeLesk (c1,c2 ) = overlap(gloss(r(c1)),gloss(q(c2 )))r,q∈RELS∑
![Page 48: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/48.jpg)
Librariesforcomputingthesaurus-basedsimilarity
• NLTK• http://nltk.github.com/api/nltk.corpus.reader.html?highlight=similarity-nltk.corpus.reader.WordNetCorpusReader.res_similarity
• WordNet::Similarity• http://wn-similarity.sourceforge.net/• Web-basedinterface:
• http://marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi
48
![Page 49: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/49.jpg)
Evaluatingsimilarity• Extrinsic(task-based,end-to-end)Evaluation:
• QuestionAnswering• SpellChecking• Essaygrading
• IntrinsicEvaluation:• Correlationbetweenalgorithm andhumanwordsimilarityratings• Wordsim353:353nounpairsrated0-10.sim(plane,car)=5.77
• TakingTOEFLmultiple-choicevocabularytests• Levied is closest in meaning to:imposed, believed, requested, correlated
![Page 50: Word Meaning and Similarity - ecology lab•sim resnik (c 1,c 2) = -log P( LCS(c 1,c 2) ) Philip Resnik. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy](https://reader033.vdocument.in/reader033/viewer/2022051806/5fff33effd8828617014a076/html5/thumbnails/50.jpg)
WordMeaningandSimilarity
WordSimilarity:ThesaurusMethods