putting ontologies to work in nlpjohn.mccr.ae/papers/langonto2-invited-talk.pdf · putting...

41
Putting ontologies to work in NLP The lemon model and its future John P. McCrae — National University of Ireland, Galway

Upload: others

Post on 03-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Putting ontologies to work in NLP

The lemon model and its future

John P. McCrae — National University of Ireland, Galway

Page 2: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Introduction

• In natural language processing we are doing three mainthings

• Understanding natural language• Generating natural language• Tranformation (translation, summarization)

• These can be typed as:• NL→ Representation• Representation→ NL• NL→ NL

Putting ontologies to work in NLP Slide 2

Page 3: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Representation

• We can think of representations as falling into two largeclasses1. Symbolic representations2. Numeric representations

• For example: ‘‘John sent her a text’’1. sent(John, x ,m, SMS) u female(x) uMessage(m)2. (0.664, 0.059, 0.557, 0.906, 0.031)T

Putting ontologies to work in NLP Slide 3

Page 4: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Symbolic versus numeric representations

• Numeric representations are:• Easy-to-learn from plain text• Robust• General

• Symbolic representations are:• Easier to understand• Can make complex inferences• Fine-grained

Putting ontologies to work in NLP Slide 4

Page 5: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

What is an ontology?

• A natural language has a lexicon:• A set of words• That are combined with rules (syntax)

• A symbolic representation has an ontology:• An set of symbols• That are combined with rules (logic)

• What is the ontology of a numeric representation?

Putting ontologies to work in NLP Slide 5

Page 6: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Ontology-Lexica

• An ontology-lexicon is amodel that is both anontology and a lexicon

• Since 2009 we have beendeveloping lemon—TheLexicon Model forOntologies

• Now (May 2016!) releasedby the W3C OntologyLexicon Community Groupas a W3C Vocabulary

• https://www.w3.org/2016/05/ontolex/

Putting ontologies to work in NLP Slide 6

Page 7: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Resources for ontology-lexica

Putting ontologies to work in NLP Slide 7

Page 8: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Existing resources

Lexicon: Princeton WordNetSemantic Network: DBpediaOntology: SUMO

Putting ontologies to work in NLP Slide 8

Page 9: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Is WordNet an Ontology?

• Provides symbols

• Supports inference, e.g.,inverse/symmetricproperties

• No frame semantics:• WordNet can say

‘‘Canberra is a captialcity’’

• Cannot say ‘‘Canberra isthe capital of Australia’’

• Words are definedprimarily by text

Putting ontologies to work in NLP Slide 9

Page 10: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

WordNet and Word Sense Disambiguation

• The sequence of annotations is a formal representation• Canberra[i83245] is a capital_city[i82619]

• WordNet alone has proven useful for word sensedisambiguation (Personalized PageRank - Agirre andSoroa, 2009)

• Produces good performance about 50-70%

Putting ontologies to work in NLP Slide 10

Page 11: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

DBpedia

• Derived fromWikipedia,so very large

• Has an ‘‘ontology’’ in OWL• DBpedia can say:

• ‘‘Canberra is a capitalcity’’

• ‘‘Canberra is the capitalof Australia’’

• ‘‘Canberra is the secondlargest city in Australia’’

Putting ontologies to work in NLP Slide 11

Page 12: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

SUMO

• Suggested Upper MergedOntology

• Based on KIF language

• Has definitions in terms ofresults and consequents

(subclass EuropeanNationNation)

(=>(instance ?N EuropeanNation)(part ?N Europe))

Putting ontologies to work in NLP Slide 12

Page 13: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Comparison of these resources

Ontology Symbols Links Ave. Degree

Princeton WordNet 117,791 285,668 2.43DBpedia-OWL 3,955 4,154 1.05DBpedia (Infobox EN) 2,866,327 18,328,273 6.39SUMO c.25,000 c.80,000 3.2

Putting ontologies to work in NLP Slide 13

Page 14: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Does number of symbols matter?

1 2 5 10 20 50 100 200

0.0

0.2

0.4

0.6

0.8

1.0

Frequency

WS

D P

reci

sion

Yes, but exponentially less.

Putting ontologies to work in NLP Slide 14

Page 15: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Does number of symbols matter?

1 2 5 10 20 50 100 200

0.0

0.2

0.4

0.6

0.8

1.0

Frequency

WS

D P

reci

sion

Yes, but exponentially less.

Putting ontologies to work in NLP Slide 14

Page 16: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Does degree matter?

1 2 5 10 20 50 100 200

0.5

0.6

0.7

0.8

0.9

Degree

WS

D P

reci

sion

Yes, quite a lot!

Putting ontologies to work in NLP Slide 15

Page 17: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Does degree matter?

1 2 5 10 20 50 100 200

0.5

0.6

0.7

0.8

0.9

Degree

WS

D P

reci

sion

Yes, quite a lot!

Putting ontologies to work in NLP Slide 15

Page 18: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Is one of these resources the best?

• DBpedia is the biggest and densest

• Many basic concepts are missing, e.g., beautiful

• Other collaborative resources (Wiktionary) are of lowerdensity with structural issues

• Combining resources is another approach, e.g., BabelNet,UBY, etc.

Putting ontologies to work in NLP Slide 16

Page 19: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

The Lexical Gap

Putting ontologies to work in NLP Slide 17

Page 20: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

The Lexical Gap

• The primary issues with applying ontologies is the lexicalgap:1. We don’t know all the ways to express the concept in

languages2. We cannot easily map linguistic structures to formal

expressions3. These concepts are often insufficiently defined

Putting ontologies to work in NLP Slide 18

Page 21: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Lexical Gap 1: Synonym discovery

• Most approaches are based on textual similarity

• Recent models, such as word2vec are showing strongperformance on term similarity

• Maybe solved soon?

Putting ontologies to work in NLP Slide 19

Page 22: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Lexical Gap 2: Mapping

• Word meaning is not exact

• Arguments

• Lexical semantics is not always computable

Putting ontologies to work in NLP Slide 20

Page 23: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Systematic polysemy

• ‘‘I went to the school’’

• ‘‘He painted the school’’

• ‘‘The school announced major changes’’

Putting ontologies to work in NLP Slide 21

Page 24: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Linguistic Mapping

Putting ontologies to work in NLP Slide 22

Page 25: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Frames and Correspondence

• The verb ‘‘know’’ is meaningless by itself• ‘‘John knows Fahad’’

• Similarly foaf:knows is only used in a triple• insight:jmccrae foaf:knows cnr:fkhan

• It is necessary to state how these corresponds

Putting ontologies to work in NLP Slide 23

Page 26: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Frames and Correspondence

• Linguistically we define each word as having asubcategorization frame

• e.g., ‘‘X knows Y’’

• Each RDF property has two arguments• Subject• Object

• We need to state the correspondence of syntacticarguments and semantic arguments

Putting ontologies to work in NLP Slide 24

Page 27: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Frames and Correspondence

Putting ontologies to work in NLP Slide 25

Page 28: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Correspondence to Adjectives

• ‘‘Many beautiful linguistic theories fail decidely when itcomes to adjectives’’ (Bankston, 2003)

• Especially scalar adjectives, such as ‘‘high’’.• Scalar adjectives are:

• Context-sensitive• Fuzzy• Non-monotonic

Putting ontologies to work in NLP Slide 26

Page 29: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Putting ontologies to work in NLP Slide 27

Page 30: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Lexcial Gap 3: Defining concepts

• OWL is not a sufficient ontology model

• Interlinkage (graph density) is very important

• We do not need to capture every ‘shade’ of a sense• Minimum definition of a definition:

• Given only the machine-readable definition of a concept• It should be possible to uniquely distinguish this node

Putting ontologies to work in NLP Slide 28

Page 31: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Building resources

Putting ontologies to work in NLP Slide 29

Page 32: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Improving an existing resource - Princeton WordNet

• PWN 3.0 was released in 2006.• Not in PWN 3.0: netbook, social media, steampunk,

Sriracha, hoverboard, fanbase, binge watch, relatable, text(v), spoiler (new sense), trope (new synonym)

• PWN has a low degree

• PWN is only English

Putting ontologies to work in NLP Slide 30

Page 33: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Social Media WordNet

• We are working on extending PWN with neologisms

• Analyzing term frequency on Twitter relative to baselinecorpus

• Term types:• General• Novel: affluenza, unboxing• Vulgar: chaturbate• Abbreviation/Misspelling: finna, idk• Names/Non-Lexical: zayn, i love you

Putting ontologies to work in NLP Slide 31

Page 34: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

The Princeton WordNet gloss corpus

• The adjective ‘Slovenian’ has one link (pertains to‘Slovenia’)

• But the definition is more detailed and has been tagged:• of or relating to or characteristic of Slovenia or its people

or language.

• Could we improve the density of WordNet this way?

Putting ontologies to work in NLP Slide 32

Page 35: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Multilingual WordNets

• WordNets have been translated into many languages

• Not always easy to translate, e.g., ‘teacher’

Lehrer A (male) teacherLehrerin A female teacher

• New languages introduce new concepts

Putting ontologies to work in NLP Slide 33

Page 36: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

The WordNet Interlingual Index

• Each synset now has a Interlingual Identifier• http://globalwordnet.org/ili/i16907

• Any WordNet can propose a new synset:• English definition• At least one link

Putting ontologies to work in NLP Slide 34

Page 37: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Building a new resource - Lemon Design Patterns

• Many entries have common descriptions• Name• Class Nouns• Object Property Noun• Relational Nouns• State Verbs• Consequence Verbs• Intersective Adjectives• Relational Adjectives• Scalar Adjectives• . . .

Putting ontologies to work in NLP Slide 35

Page 38: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Lemon Design Patterns

ScalarAdjective("hoch",[ ontology:elevation > 50 for

ontology:Building ]) with comparative "höher"

Putting ontologies to work in NLP Slide 36

Page 39: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Lemon DBpedia

• For 4 Languages: English, German, Spanish, Japanese

• Covers 353 classes and 300 properties

• Finding usage in question answering, ontologyengineering

Putting ontologies to work in NLP Slide 37

Page 40: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Summary

Putting ontologies to work in NLP Slide 38

Page 41: Putting ontologies to work in NLPjohn.mccr.ae/papers/LangOnto2-invited-talk.pdf · Putting ontologies to work in NLP Slide 5. Ontology-Lexica An ontology-lexicon is a model that is

Summary

• Ontologies are still a relevant target for natural languageunderstanding

• Detail is more important than coverage• More semantics• Lexical-ontological mapping• More models like OntoLex-Lemon

Putting ontologies to work in NLP Slide 39