yago2:’a’spaally’and’temporally’ enhanced’knowledge’base...
TRANSCRIPT
![Page 1: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/1.jpg)
YAGO2: A spa+ally and temporally enhanced knowledge base from Wikipedia (J. Hoffart et al, 2012)
presented by Gabe Radovsky
![Page 2: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/2.jpg)
YAGO2: overview
• temporally/spa+ally anchored knowledge base
• built automa+cally from Wikipedia, GeoNames, and WordNet
• contains 447 million facts about 9.8 million en++es
• human evaluators judged 97.8% of facts correct
![Page 3: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/3.jpg)
The (original) YAGO knowledge base
• introduced in 2007 • automa+cally constructed from Wikipedia – each ar+cle in Wikipedia became an en+ty
• about 100 manually defined rela+ons – e.g. wasBornOnDate, locatedIn, hasPopula+on
• used SPO (subject, predicate, object) triples to represent facts – reifica+on: every fact given an iden+fier, e.g. wasFoundIn(fact, Wikipedia)
![Page 4: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/4.jpg)
YAGO2: mo+va+on
• WordNet/other lexical resources: – manually compiled – knows that ‘musician’ is a hyponym of ‘human’; doesn’t know that Leonard Cohen is a musician
• Wikipedia/GeoNames – very large collec+ons of (semi-‐)structured data – advances in informa+on extrac+on make them easier to mine
![Page 5: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/5.jpg)
YAGO2: mo+va+on
• “...current state-‐of-‐the-‐art knowledge bases are mostly blind to the temporal dimension” (29).
• e.g. (knowing that Abraham Lincoln was born in 1809 and died in 1865) != (knowing that Abraham Lincoln was alive in 1850)
![Page 6: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/6.jpg)
www.wolframalpha.com
![Page 7: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/7.jpg)
www.wolframalpha.com
![Page 8: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/8.jpg)
YAGO2: contribu+on
• top-‐down ontology “with the goal of integra+ng en+ty-‐rela+onship-‐oriented facts with the spa+al and temporal dimensions” (29).
• new representa+on model: SPOTL tuples – (SPO [subject, predicate, object] + +me + loca+on)
• frameworks for extrac+ng knowledge from structured or unstructured text
![Page 9: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/9.jpg)
Extrac+on architecture for YAGO2 • factual rules – “declara+ve transla+ons of all the manually defined excep+ons and facts that the previous YAGO code contained” (30)
• implica+on rules – e.g. if rela%on b is a sub-‐property of rela%on a, all instances of b are also instances of a
• replacement rules – |“\{\{USA\}\}” replace “[[United States]]” – eliminate Wikipedia administra+ve categories, e.g. “Ar+cles to be cleaned up”
![Page 10: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/10.jpg)
Giving YAGO a temporal dimension
• YYYY-‐MM-‐DD format for dates – YYYY-‐##-‐## if only year is known
• en++es – given a +me span
• facts – +me point for instantaneous events, +me span for events with extended dura+on
• not all en++es/facts could be temporally annotated
![Page 11: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/11.jpg)
En++es and +me • people – wasBornOnDate, diedOnDate
• groups, ar+facts – wasCreatedOnDate, wasDestroyedOnDate – some have unbounded end points, e.g. pieces of music, scien+fic theories
• events – startedOnDate, endedOnDate, happenedOnDate (for punctual events)
• en++es w/o defined start or end point – e.g. numbers, mythological figures, virus strains – not assigned temporal informa+on
![Page 12: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/12.jpg)
Facts and +me
• facts with an extracted +me – ElvisPresley diedOnDate 1977-‐08-‐16
• facts with a deduced +me – ([ElvisPresley diedIn Memphis] 1977-‐08-‐16)
• extrac+on +me of facts is also included – e.g. extractedFrom Wikipedia on YYYY-‐MM-‐DD
![Page 13: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/13.jpg)
Giving YAGO a spa+al dimension
• YAGO2 “concerned with en++es that have a permanent spa+al extent on Earth” (34) – e.g. countries, ci+es, mountains, rivers – original YAGO, WordNet have no geographical super-‐class
• new class: yagoGeoEn+ty – type yagoGeoCoordinates stores la+tude/longitude pair
• only coordinates, no polygons – city center, not exhaus+ve boundaries
![Page 14: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/14.jpg)
Harves+ng geo-‐en++es
• harvested from Wikipedia and GeoNames • assigned only one class – Berlin = “capital of a poli+cal en+ty”
• hierarchical – Berlin is located in Germany is located in Europe
![Page 15: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/15.jpg)
Assigning a loca+on
• given to both en++es and facts when “ontologically reasonable” (36)
• loca+ons are themselves geo-‐en++es
![Page 16: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/16.jpg)
En++es and loca+on
• events – if specific loca+on, e.g. basles and sports compe++ons
– happenedIn rela+on • groups – company headquarters, university campus – isLocatedIn rela+on
• ar+facts – Mona Lisa in the Louvre – isLocatedIn rela+on
![Page 17: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/17.jpg)
(Con-‐)textual data in YAGO2
• non-‐ontological informa+on from Wikipedia (take strings as arguments) – hasWikipediaAnchorText (visible text in hyperlink) – hasWikipediaCategory – hasCita+onTitle (from references list)
• mul+lingual informa+on – extracted from inter-‐language links in ar+cles – e.g. [BasleAtWaterloo isCalled SchlachtBeiWaterloo] with associated fact [inLanguage German]
![Page 18: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/18.jpg)
YAGO2: evalua+on • formed one pool for each rela+on – e.g. wasBornOnDate, hasGDP – randomly selected test data from each pool
• used 26 human judges – judge presented with fact, along with original Wikipedia ar+cle to assess its accuracy • accuracy of Wikipedia not assessed
– con+nued evalua+ng each pool un+l confidence interval was smaller than ±5%, to assure sta+s+cal significance
• 97.8% of facts were judged correct
![Page 19: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/19.jpg)
YAGO2: evalua+on
p. 42
![Page 20: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/20.jpg)
Task-‐based evalua+on: Jeopardy
p. 46
![Page 21: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/21.jpg)
Task-‐based evalua+on: Jeopardy
p. 55
![Page 22: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/22.jpg)
Task-‐based evalua+on: Jeopardy
p. 56
![Page 23: YAGO2:’A’spaally’and’temporally’ enhanced’knowledge’base ...verbs.colorado.edu/~mpalmer/Ling7800/YAGO2.pdfThe’(original)’YAGO’knowledge’base’ • introduced’in’2007’](https://reader035.vdocument.in/reader035/viewer/2022071420/6119542c98338300ab2d9e80/html5/thumbnails/23.jpg)
Our project • were originally planning to asempt hierarchical ontology based on Wikipedia
• new project: hierarchical classifica+on of social science journal ar+cles
• mine text of ar+cles with Python NLTK – plain text ngrams for n=(1-‐5) – stemmed/POS tagged unigrams – possibly named en++es
• run different clustering algorithms on en++es (ar+cle +tles with features mined from text)
• asempt to automa+cally generate reasonable names for clusters