babelnet, babelfy and beyondaiia2014.di.unipi.it/babelaiia2014.pdf · ctx(balloon (aircraft)) = {...

101
12/15/2014 1 ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 http://lcl.uniroma1.it BabelNet, Babelfy and Beyond AI*IA 2014 Tutorial 12 th December 2014 Roberto Navigli [email protected] ERC Starting Grant MultiJEDI No. 259234 6 Tutorial Outline 1. Foundations in Semantic Processing 2. BabelNet: the largest multilingual semantic resource 3. Babelfy: Multilingual Word Sense Disambiguation and Entity Linking 4. Beyond: what comes next? BabelNet, Babelfy and Beyond AI*IA 2014 Tutorial Roberto Navigli

Upload: others

Post on 06-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

1

11ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI)

Roberto Navigli

1

http://lcl.uniroma1.it

BabelNet, Babelfy and BeyondAI*IA 2014 Tutorial – 12th December 2014

Roberto Navigli

[email protected]

ERC Starting Grant MultiJEDI No. 259234

66

Tutorial Outline

1. Foundations in Semantic Processing

2. BabelNet: the largest multilingual semantic resource

3. Babelfy: Multilingual Word Sense Disambiguation and

Entity Linking

4. Beyond: what comes next?

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 2: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

2

88

Projects thanks to which this tutorial exists

MultiJEDI (1.3Meuros): ERC Starting Grant

LIDER (1.5Meuros): EU CSA

Google Focused Research Award (200k$)

8BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

99

Also starring

15/12/2014

9

Simone

Ponzetto

Tiziano

Flati

David

Jurgens

Andrea

Moro

Daniele

Vannella

Taher

Pilehvar

Francesco

Cecconi

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 3: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

3

1010

Part 1:

Foundations

10BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

1111

Barack Obama peruses the internet.

Understanding a simple phrase

11BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 4: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

4

121212BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

1313

Natural language is ambiguous

Listen to some rock!

Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial

Roberto Navigli and David Jurgens

Page 5: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

5

1414BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens.I cannot hear anything…

1919

The Multilingual, Big-Picture Goal

“Underground

rock concert”

“언더그라운드락콘서트"

“Underground rock

formation”

“지하암석"

NLP

Applications

[semantic representation]

[semantic representation]

Black

Box

19BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 6: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

6

2020

The General Problem

POLYSEMY

• The most frequent words have several

meanings!

• Our job: model meaning from a

computational perspective

20BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

2121

Monosemous vs. Polysemous words

• Monosemous words have only one meaning– Examples:

• plant life

• internet

• Polysemous words have more than onemeaning– Example: bar

– “a room or establishment where alcoholic drinks are served”

– “a counter where you can obtain food or drink”

– “a rigid piece of metal or wood”

– “musical notation for a repeating pattern of musical beats”

21BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 7: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

7

2525

How do we represent and

encode semantics?

“Underground

rock concert”

“언더그라운드락콘서트"

“Underground rock

formation”

“지하암석"

NLP

Applications

[semantic representation]

[semantic representation]

Black

Box

What comes out of the black box?

25BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

2626

How do we represent and encode semantics?

• Thesauri

• Groups words according to similar meaning

• Relations between groups (e.g., narrower meanings)

• Roget’s Thesaurus (1911)

• Machine Readable Dictionaries

• Enumerates all meanings of a word

• Includes definitions, morphology, example usages, etc.

• Oxford Dictionary of English, LDOCE, Collins, etc.

• Computation Lexicons

• Repositories of structured knowledge about a word semantics and

syntax

• Include relations like hypernymy, meronymy, or entailment

• WordNet

26BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 8: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

8

2727

• Each meaning is encoded as a synset (synonym set), which is a

collection of synonymous senses

Senses and Relations in WordNet

27BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

2828

• Each meaning is encoded as a synset (synonym set), which is a

collection of synonymous senses

• Semantic relations between synsets

– Hypernymy (carn1 is-a motor vehiclen

1)

– Meronymy (carn1 has-a car doorn

1)

– Entailment, similarity, attribute, etc.

• Lexical relations between word senses

– Antonymy (gooda1 antonym of bada

1)

– Pertainymy (dentala1 pertains to toothn

1)

– Nominalization (servicen2 nominalizes servev

4)

Senses and Relations in WordNet

28BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 9: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

9

2929

{wheeled vehicle}

{self-propelled vehicle}

{motor vehicle} {tractor}

{car,auto, automobile,

machine, motorcar}

{convertible}

{air bag}

is-a

is-a

is-a

is-a

is-a

has

-par

t{golf cart,

golfcart}

is-a

{wagon,

waggon}

is-a

{accelerator,

accelerator pedal,

gas pedal, throttle}

has-part

{car window}

has-part

{locomotive, engine,

locomotive engine,

railway locomotive}

is-a

{brake}has-part

{wheel}

has-part

{splasher}

has-part

concepts

semantic relation

WordNet [Miller et al., 1990; Fellbaum, 1998]

29BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

3030

Wordnets in other Languages

• EuroWordNet (Vossen, 1998)

• BalkaNet (Tufis et al., 2004)

• Multilingual Central Repository (Atserias et al., 2003)

• GermaNet (Hamp and Feldweg, 1997)

• SloWNet (Fišer and Sagot, 2008)

• WOLF (Sagot and Fišer, 2008)

• Hungarian WN (Miháltz et al, 2008)

• Japanese WN (Isahara et al, 2008)

• …

• Currently 73 unique wordnets: http://globalwordnet.org/wordnets-in-the-world/

30

WordNet

MultiWordNet

WOLF

MCRGermaNet

BalkaNet

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 10: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

10

3131

An ideal resource for Multilingual Semantic

Processing

• Capable of representing the meaning of a piece of text as

word senses in any language

• broad coverage of different senses, including

language-specific senses

• currently problematic for many language-specific

wordnets

• Encodes semantic and syntactic relationships between

the synsets

• Highly beneficial for NLP applications

• Encodes definitions and usages for synsets

31BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

3232BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

• Question Answering

• Semantic Information Retrieval

• Cross-lingual Document Retrieval

• Semantically-enhanced Machine Translation

• Computer-assisted translation

• Language learning/teaching

• Linguistically-grounded Multilingual Knowledge

Representation

• Semantic annotation

• (Linguistic) Linked Open Data

• Computer vision: Vision & Language

Motivations at the intersection of NLP

Page 11: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

11

3333

Part 2a:

Making Multilingual Knowledge

33BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

3434

Objective and motivation

Goal:

• A large repository of knowledge in a multilingual setting

Motivations:

• A common ground for language technologies that brings

together:

• Multilinguality

• Encyclopedic knowledge

• Lexicographic knowledge

• Semantic relations

• Textual definitions

• Domain information

• …34BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 12: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

12

3737

How many meanings for «balloon»?

balloon

WordNet

Wikipedia

37BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

3838

Core Challenges

1. Integrating and unifying heterogeneous resources

2. Managing many different languages

3. Having a wide range of semantic relations between

concepts and named entities

4. Maintaining high accuracy

38BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 13: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

13

3939

This is where the ERC (and our project) comes

into play

A 5-year ERC Starting Grant (2011-2016)

on Multilingual Word Sense Disambiguation

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

40

Key Objective 1: create knowledge for all languages

Multilingual Joint Word Sense Disambiguation

(MultiJEDI)

WordNet

MultiWordNet

WOLF

MCRGermaNet

BalkaNet

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 14: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

14

4141

Goal: Creating a Multilingual Semantic Network

Start from two large complementary resources:

WordNet: full-fledged taxonomy

Wikipedia: multilingual and continuously updated

{wheeled vehicle}

{self-propelled vehicle}

{motor vehicle} {tractor}

{car,auto, automobile,

machine, motorcar}

{convertible}

{air bag}

is-a

is-a

is-a

is-a

is-a

has

-par

t

{golf cart,

golfcart}

is-a

{wagon,

waggon}

is-a

{accelerator,

accelerator pedal,

gas pedal, throttle}

has-part

{car window}

has-part

{locomotive, engine,

locomotive engine,

railway locomotive}

is-a

{brake}has-part

{wheel}

has-part

{splasher}

has-part

Get the best from both worlds

41BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

4242

{wheeled vehicle}

{self-propelled vehicle}

{motor vehicle} {tractor}

{car,auto, automobile,

machine, motorcar}

{convertible}

{air bag}

is-a

is-a

is-a

is-a

is-a

has

-par

t

{golf cart,

golfcart}

is-a

{wagon,

waggon}

is-a

{accelerator,

accelerator pedal,

gas pedal, throttle}

has-part

{car window}

has-part

{locomotive, engine,

locomotive engine,

railway locomotive}

is-a

{brake}has-part

{wheel}

has-part

{splasher}

has-part

concepts

semantic relation

WordNet [Miller et al., 1990; Fellbaum, 1998]

42BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 15: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

15

4343

Playing with senses

Bla bla bla bla bla bla bla

Bla bla bla bla bla bla bla

Bla bla bla bla bla bla bla

Bla bla bla bla bla bla bla

Bla bla bla bla bla bla bla

concepts

(unspecified) semantic relation

Wikipedia [The Web Community, 2001-today]

43BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens.

4444

BabelNet: concepts and semantic relations (1)

Concepts and relations in BabelNet are harvested from

WordNet and Wikipedia:

WordNet: BabelNet:

Wikipedia: BabelNet:

synsets concepts

lexico-semantic relations semantic relations

pages

hyperlinks

concepts

semantic relations

44BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 16: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

16

4545

An example of mapping

45BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

4646

Creation of the Wikipedia disambiguation

contexts

ctx(Balloon (aircraft)) = { }

46BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 17: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

17

4747

Creation of the Wikipedia disambiguation contexts

ctx(Balloon (aircraft)) = { aircraft }

sense label

47BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

4848

Creation of the Wikipedia disambiguation contexts

ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy,

airship, …, gondola }

hyperlinkshyperlinkshyperlinks

48BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 18: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

18

4949

Creation of the Wikipedia disambiguation contexts

ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy,

airship, …, gondola, ballooning, hydrogen, aeronautics }

categoriescategories

49BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

5050

Building BabelNet: Mapping Wikipedia to

WordNet

Given a Wikipage w and its disambiguation context ctx(w):

For each WordNet sense s of w, calculate score(s, w) as follows:

50BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 19: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

19

5252

The Wikipedia page context in the WordNet

graph

balloon#n#1

ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy,

airship, …, gondola }

52BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

5353

The Wikipedia page context in the WordNet

graph

airship#n#1

aerostat#n#1

aircraft#n#1

buoyancy#n#1gondola#n#1

balloon#n#1

53BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 20: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

20

5454

The Wikipedia page context in the WordNet

graph

airship#n#1

aerostat#n#1

aircraft#n#1

buoyancy#n#1gondola#n#1

balloon#n#1

balloon#n#1 -> aircraft#n#1

balloon#n#1 -> aircraft#n#1 -> airship#n#1

balloon#n#1 -> gondola#n#1

balloon#n#1 -> gondola#n#1 -> flight#n#1 -> buoyancy#n#1

balloon#n#1 -> aerostat#n#1

54BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

5555

The Wikipedia page context in the WordNet

graph

airship#n#1

aerostat#n#1

aircraft#n#1

buoyancy#n#1gondola#n#1

balloon#n#1

balloon#n#1 -> aircraft#n#1

balloon#n#1 -> aircraft#n#1 -> airship#n#1

balloon#n#1 -> gondola#n#1

balloon#n#1 -> gondola#n#1 -> flight#n#1 -> buoyancy#n#1

balloon#n#1 -> aerostat#n#1

0.35

55BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 21: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

21

5656

BabelNet: concepts and semantic relations (2)

We encode knowledge as a labeled directed graph:

Each vertex is a Babel synset

Each edge is a semantic relation between synsets:

is-a (balloon is-a aircraft)

part-of (gasbag part-of balloon)

instance-of (Einstein instance-of physicist)

unspecified/relatedness (balloon related-to flight)

balloonEN, BallonDE,

aerostatoES, aerostatoIT,

pallone aerostaticoIT,

mongolfièreFR

56BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

5757

Building BabelNet: Translating Babel synsets

1. Exploiting Wikipedia interlanguage links

pallone

aerostatico

globo

aerostàtico

Ballon

57BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 22: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

22

5858

Building BabelNet: Translating Babel synsets

2. Filling the lexical translation gaps using a Machine

Translation system to translate the English lexicalizations of

a concept

On August 27, 1783 in Paris, Franklin witnessed the

world's first hydrogen [[Balloon (aircraft)|balloon]]

flight.

Le 27 Août, 1783 à Paris, Franklin vu le premier vol en

ballon d'hydrogène.

Google Translate

58BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

5959

Building BabelNet: Translating Babel synsets

2. Filling the lexical translation gaps using a Machine

Translation system to translate the English lexicalizations of

a concept

For each word sense s, we translate:

sentences from SemCor (a corpus annotated with WordNet

senses) which contain s

sentences from Wikipedia linked to the Wikipage of s

The most frequent translation of s is selected for each target

language

59BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 23: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

23

6060

The most frequent translation of a word in a given

meaning

left context term right context

wikification may refer to: the…

geoinformatics services' and ' wikification of GIS by the masses'

the process may be called wikification (as in ...

which is then called " wikification and to the related problem

reason needs copyediting, wikification , reduction of POV, work on references

huge amount of cleanup, wikification , etc. Version of 12 Nov

60BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

6161

left context term right context

wikificazione potrebbe riferirsi a: il…

servizi geoinformatici' e ' wikification di GIS dalle masse'

il processo chiamato wikificazione (come in ...

che è quindi chiamato wikificazione e al problema correlato…

ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference

grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre

The most frequent translation of a word in a given

meaning

61BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 24: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

24

6262

left context term right context

wikificazione potrebbe riferirsi a: il…

servizi geoinformatici' e ' wikification di GIS dalle masse'

il processo chiamato wikificazione (come in ...

che è quindi chiamato wikificazione e al problema correlato…

ragione richiede copyediting, wikification , riduzione di POV, lavoro su reference

grandi quantità di pulizia, wikificazione , ecc. Versione del 12 Novembre

The most frequent translation of a word in a given

meaning

62BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

6363

BabelNet [Navigli and Ponzetto, AIJ 2012]

A wide-coverage multilingual semantic network

including both encyclopedic (from Wikipedia) and

lexicographic (from WordNet) entries

Concepts from WordNetNEs and specialized

concepts from Wikipedia

Concepts integrated from

both resources

63BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 25: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

25

6464

Integrating WordNet with Wikipedia…

Is that all?!?

WordNet

64BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

6565

Open Multilingual WordNet

[Bond and Foster, 2013]

• http://compling.hss.ntu.edu.sg/omw/

• 22 languages

• Mappings to the Princeton WordNet synsets

• More than 600,000 lexicalizations

Francis Bond and Kyonghee Paik. 2012. A survey of wordnets and their

licenses. In Proc. of GWC 2012

Francis Bond and Ryan Foster. 2013. Linking and extending an open

multilingual wordnet. In Proc. of ACL

65BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 26: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

26

6666

OmegaWiki (http://www.omegawiki.org)

• Hundreds of languages

• About 50,000 entries («synsets»)

66BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

6767

Some statistics for OmegaWiki

Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial

Roberto Navigli and David Jurgens

67

Page 27: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

27

6868

• A collaborative dictionary!

• Hundreds of languages

• About 3.7M entries

68

Wiktionary (http://www.wiktionary.org)

Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial

Roberto Navigli and David Jurgens

6969BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 69

Some statistics for Wiktionary

Page 28: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

28

7070

• A collaborative knowledge base!

• Hundreds of languages

• About 15M entries

BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 70

Wikidata (http://www.wikidata.org)

7171

But how to integrate all these resources?

BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 71

Page 29: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

29

7272

Alignment Approaches

Usually measure the similarity of two concepts

WordNet

plant#n#1plant#n#1

72BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

7373

Alignment Approaches

Usually measure the similarity of two concepts

And align two concepts if their similarity exceeds

a threshold

Page 30: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

30

7474

SemAlign: Cross-resource Concept Alignment

[Pilehvar and Navigli, ACL 2014]

We combine two different similarity measures:

74BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

7777

Alignment Approaches

Gloss similarity

Strong baseline

Falls short whenTotally different wordings are used for same concepts

When we lack quality glosses

An area within a building enclosed by walls and floor and ceiling.

A room is any distinguishable space within a structure.

Gloss similarityDefinition similarity

77BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 31: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

31

8282

SemAlign: structural similarity

Structural similarity

82BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

8383

1. paper -- a material made

of cellulose pulp derived

mainly from wood or rags

or certain grasses.

sheet

cellulose

Wikipedia

Semantic

Network

WordNet

Semantic

Network

fiber

fiber

material

cellulose

SemAlign: structural similarity

83BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 32: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

32

8484

SemAlign: Core

84BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

8585

Structural Similarity with Personalized PageRank

[Pilehvar and Navigli, ACL 2014]

some

Page 33: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

33

8686

Personalized PageRank

9696

SemAlign: signature comparison

Structural similarity

96BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 34: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

34

9797

Semantic Signature Comparison

BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 97

102102

• We calculate the following formula:

• where rik is the ranking of the i-th element in vector k

Comparing Semantic SignaturesWeighted Overlap [Pilehvar et al., ACL 2013]

102BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 35: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

35

108108

BabelNet 3.0 is now out: http://babelnet.org

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

109109

BabelNet 3.0 is now out: http://babelnet.org

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 36: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

36

110110

BabelNet 3.0 is now out: http://babelnet.org

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

111111

BabelNet goes at a faster pace than I can cope

withKey fact!

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 37: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

37

112112

Anatomy of BabelNet 3.0

271 languages covered (including Latin!)

List at http://babelnet.org/stats

112BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

113113

Anatomy of BabelNet 3.0

271 languages covered (including Latin!)

13.8M Babel synsets

6.4M concepts, 7.4M Named Entities

117M word senses

355M semantic relations (26 edges per synset on avg.)

11M synset-associated images

40M textual definitions

113BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 38: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

38

114114

• Seamless integration of:

• WordNet 3.0

• Wikipedia

• Wikidata

• Wiktionary

• OmegaWiki

• Open Multilingual WordNet [Bond and Foster, 2013]

• Translations for all open-class parts of speech

• 1.1B RDF triples available via SPARQL endpoint

New 3.0 version out!

114BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

115115

BabelNet 1.1.1 2.0 2.5 3.0

1. From six to 50 to 271 languages;

2. From two resources to six;

3. From 5M to 9.3M to 13.8M million synsets;

4. From 50M to 68M to 117M word senses;

5. From 140M to 262M to 355M semantic relations.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 39: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

39

116116

WordNet+OpenMultilingualWordNet+Wikipedia+…

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

117117

+OmegaWiki+automatic translations…

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 40: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

40

118118

+textual definitions

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

119119

More definitions+Wikipedia categories+…

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 41: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

41

120120

+images

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

121121

Evaluations: I (might) have to go fast here!

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 42: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

42

122122

WordNet-Wikipedia mapping accuracy

Overall quality of the mapping: ~90%

Note: this concerns only those 50k synsets in the intersection

BabelNet goes to the

Multilingual Semantic Web.

Roberto Navigli and David

Jurgens.

123123

We are not alone in the (resource) universe!

15/12/2014 BabelNet: a Very Large

Multilingual Ontology

Roberto Navigli

123

Page 43: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

43

124124

We are not alone in the (resource) universe!DBPedia [Bizer et al. 2009] - a resource obtained from

structured information in Wikipedia

«Describes 3.77M things»

Core of the Linked Open Data Cloud

YAGO [Suchanek et al. 2007]

«Contains 10M entities and 120M facts about these entities»

Links Wikipedia categories to WordNet synsets

MENTA [de Melo and Weikum, 2010]

A «multilingual taxonomy with 5.4M entities»

WikiNet [Nastase and Strube, 2013]

Semantic network connecting Wikipedia entities

«3M concepts and 38+M relations»

Freebase (http://freebase.com): collaborative effort

Structured data; started from Wikipedia, MusicBrainz, ChefMoz, etc.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

12612615/12/2014 Natural Language

Processing:

Regular Expressions,

Automata and Morphology

Pagina 126

Hands-on Session: the BabelNet Java API 3.0

Page 44: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

44

127127

Part 2b:

Structuring Knowledge

127BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

128128

{wheeled vehicle}

{self-propelled vehicle}

{motor vehicle} {tractor}

{car,auto, automobile,

machine, motorcar}

{convertible}

{air bag}

is-a

is-a

is-a

is-a

is-a

has

-par

t

{golf cart,

golfcart}

is-a

{wagon,

waggon}

is-a

{accelerator,

accelerator pedal,

gas pedal, throttle}

has-part

{car window}

has-part

{locomotive, engine,

locomotive engine,

railway locomotive}

is-a

{brake}has-part

{wheel}

has-part

{splasher}

has-part

(The nominal part of) WordNet is structured as

a taxonomy!

128BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 45: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

45

129129

The Wikipedia structure

Article pages

~4M

Category pages

~ 700K

Two noisy graphs with no explicit hypernym relation.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

130130

The Wikipedia structure: an example

Pages Categories

Mickey Mouse

Funny AnimalSuperman

Cartoon

Donald Duck

Disney comics

characters

Disney comicsDisney character

Fictional characters

by mediumComics by

genre

Fictional

characters

The Walt Disney

Company

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 46: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

46

131131

Our goal

To automatically create a Wikipedia Bitaxonomy

for Wikipedia pages and categories in a

simultaneous fashion.

pages categories

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

132132

Our goal

To automatically create a Wikipedia Bitaxonomy

for Wikipedia pages and categories in a

simultaneous fashion.

The page and category level are mutually

beneficial for inducing a wide-coverage

and fine-grained integrated taxonomy

KEY IDEA

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 47: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

47

133133

The Wikipedia Bitaxonomy: an example

Pages Categories

Disney comics

characters

Disney comicsDisney character

The Walt Disney

Company

Fictional characters

by mediumComics by

genre

Fictional

characters

Mickey Mouse

Funny AnimalSuperman

Cartoon

Donald Duckis a

is a

is a is a

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

134134

A 3-phase method

Starting from two noisy graphs

pages categories

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 48: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

48

139139

The WiBi Page

taxonomy1

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

140140

Assumption

• The first sentence of a page is a good definition

(also called gloss)

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 49: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

49

141141

The WiBi Page taxonomy

1. [Syntactic step]

Extract the hypernym lemma

from a page definition using

a syntactic parser;

2. [Semantic step]

Apply a set of linking

heuristics to disambiguate

the extracted lemma.

Scrooge McDuck is a character […]

Syntactic step

Hypernym lemma: character

ASemantic step

Scrooge McDuck is a character[…]

nn nsubj

cop

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

155155

The story so far

1

Noisy page graph Page taxonomy

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 50: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

50

156156

2The Bitaxonomy

algorithm

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

157157

The Bitaxonomy algorithm

The information available in the two taxonomies

is mutually beneficial

• At each step exploit one taxonomy to update

the other and vice versa

• Repeat until convergence

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 51: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

51

158158

pages categories

Real Madrid

F.C.

Football team Football teams

Football clubs

in Madrid

Atlético MadridFootball clubs

Starting

from the

page

taxonomy

The Bitaxonomy algorithm

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

159159

The Bitaxonomy algorithm

Real Madrid

F.C.

Football team Football teams

Football clubs

in Madrid

Football clubs

Exploit the cross links to infer hypernym relations in the category taxonomy

Atlético Madrid

pages categories

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 52: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

52

160160

The Bitaxonomy algorithm

Real Madrid

F.C.

Football team Football teams

Football clubs

in Madrid

Football clubs

Take advantage of cross links to infer back is-a relations in the page taxonomy

Atlético Madrid

pages categories

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

161161

The Bitaxonomy algorithm

Real Madrid

F.C.

Football team Football teams

Football clubs

in Madrid

Football clubs

Use the relations found in previous step to infer new hypernym edges

Atlético Madrid

pages categories

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 53: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

53

162162

The Bitaxonomy algorithm

Atlético MadridReal Madrid

F.C.

Football team Football teams

Football clubs

in Madrid

Football clubs

Mutual enrichment of both taxonomies until convergence

pages categories

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

166166

The story so far

2

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 54: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

54

167167

3The WiBi category

taxonomy refinement

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

168168

Comics characters

by protagonist

Comics

characters

Garfield

characters

Category taxonomy refinement

Some categories are affected by some

structural problems.

pages categories

No pages

associated!

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 55: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

55

169169

Category taxonomy evaluation: coverage

+50%

categories

covered!

1SUP SUB SUPER

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

171171

WiBi: Experimental Setup

We created 2 datasets:

o 1000 randomly sampled pages;

o 1000 randomly sampled categories.

Each item was annotated with the most suitable

generalization (lemma+page or category).

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 56: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

56

172172

Other resources to compare with

WikiNet

MENTA

WikiTaxonomy

pages categories

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

175175

Page Taxonomy Comparison

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 57: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

57

176176

Page Taxonomy Comparison

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

177177

Category Taxonomy Comparison

“Football in Catalonia” is-a “entity#n#1”

“Human height” is-a “entity#n#1”

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 58: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

58

178178

Category Taxonomy Comparison

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

181181BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 59: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

59

182182

Part 3:

Addressing ambiguity

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

183183

Motivation

• Web content is available in many languages

• Information should be extracted and processed

independently of the source/target language

• This could be done automatically by means of high-

performance multilingual text understanding

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 60: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

60

184184

Motivation

One of the key challenges of multilingual text

understanding regards the effective treatment of one of

the fundamental aspects of language:

Ambiguity!

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

185185

Word Sense Disambiguation and Entity Linking

Thomas and Mario are strikers playing in Munich

Entity Linking: The task

of discovering mentions

of entities within a text

and linking them in a

knowledge base.

WSD: The task aimed at

assigning meanings to word

occurrences within text.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 61: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

61

186186

Word Sense Disambiguation in a Nutshell

strikers

(target word)

“Thomas and Mario are strikers playing in Munich”

(context)

WSD

system

knowledge

sense of target word

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

187

Main references

A complete survey of the field:Navigli R. Word Sense Disambiguation: a Survey. ACM

Computing Surveys, 41(2), ACM Press, 2009, pp. 1-69.

WSD book:Agirre E. and Edmonds P. Word Sense Disambiguation:

Algorithms and Applications, New York, USA, Springer, 2006.

Another survey from last decade:Ide N. and Véronis J. Word Sense Disambiguation: The

State of The Art. Computational Linguistics, 24(1), 1998, pp. 1-40.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 62: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

62

188188

WSD: main approaches

Supervised WSDFrames the problem as a classification task

Relies on hand-labeled training sets

Knowledge-based WSDUses knowledge resources to identify the best senses for words in context

Typically, it does not need a training phase and relies on an existing inventory of senses

Word Sense Discrimination / InductionUnsupervised WSD: clustering

Does not need manually-tagged datasets

Can make the task more difficult to evaluate

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

189

Supervision: labeled data vs.

knowledge

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 63: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

63

198198

Entity Linking in a Nutshell

Thomas

(target mention)

“Thomas and Mario are strikers playing in Munich”

(context)

EL

system

knowledge

Named Entity

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

199199

Entity Linking

EL encompasses a set of similar tasks:

• Named Entity Disambiguation, that is the task of

linking entity mentions in a text to a knowledge base

• Wikification, that is the automatic annotation of text by

linking its relevant fragments of text to the appropriate

Wikipedia articles.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 64: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

64

200200

Entity Linking

State-of-the-art approaches are based on the following

concepts:

• Collective disambiguation of mentions vs. indipendent

disambiguation of mentions;

• Enforcing semantic coherence among the chosen

named entities;

• Efficiency: there are orders of magnitude between the

number of word senses and named entities!

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

201201

State-of-the-art EL systems

• AIDA (Hoffart et al., 2011): a graph-based framework for the

exploitation of similarity measures between candidate entities;

• KORE (Hoffart et al., 2012): a graph-based similarity measure

integrated with key phrases contained within the context to

disambiguate entities;

• Tagme (Ferragina and Scaiella, 2012): a combination of the

Milne-Witten measure (hyperlinks similarity on Wikipedia) with the

commonness of an entity;

• Wikifier (Cheng and Roth, 2013): a global and local approach

based on the TF-IDF score combined with hyperlinks in Wikipedia;

• DBpedia Spotlight (Mendes et al., 2011): a generative model

based on counts obtained from manually disambiguated Wikipedia

hyperlinks (high prec., low recall).

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 65: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

65

202202

State-of-the-art EL systems

• AIDA (Hoffart et al., 2011): a graph-based framework for the

exploitation of similarity measures between candidate entities;

• KORE (Hoffart et al., 2012): a graph-based similarity measure

integrated with key phrases contained within the context to

disambiguate entities;

• Tagme (Ferragina and Scaiella, 2012): a combination of the

Milne-Witten measure (hyperlinks similarity on Wikipedia) with

the commonness of an entity;

• Wikifier (Cheng and Roth, 2013): a global and local approach

based on TF-IDF combined with hyperlinks in Wikipedia;

• DBpedia Spotlight (Mendes et al., 2011): a generative model

based on counts obtained from manually disambiguated Wikipedia

hyperlinks (high prec., low recall).

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

213213

The multilingual aspect of disambiguation

• In both tasks, WSD and EL, knowledge-based

approaches have been shown to perform well

• What about multilinguality?

• Which kind of resources are available out there?

Open

Multilingual

WordNet

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 66: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

66

214214

BabelNet as a Multilingual Inventory for:

Concepts

Calcio in Italian can denote different concepts:

Named Entities

The text Mario can be used to represent different things

such as the video game charachter or a soccer player

(Gomez) or even a music album

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

215215

Calcio / Kick in BabelNet 2.5

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 67: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

67

216216

Calcio / Calcium in BabelNet 2.5

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

217217

Calcio / Soccer in BabelNet 2.5

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 68: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

68

218218

Word Sense Disambiguation in a Nutshell

striker

(target word)

“Thomas and Mario are strikers playing in Munich”

(context)

WSD

system

knowledge

sense of target word

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

219219

Entity Linking in a Nutshell

Thomas

(target mention)

“Thomas and Mario are strikers playing in Munich”

(context)

Entity Linking

system

Named Entity

knowledge

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 69: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

69

220220

Disambiguation and Entity Linking together!

BabelNet is a huge multilingual inventory

for both word senses and named entities!

220BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

221

Multilingual Joint Word Sense Disambiguation

(MultiJEDI)

Key Objective 2: use all languages to disambiguate one

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 70: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

70

222222

So what?

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

223223

Babelfy: A Joint approach to WSD and EL

[Moro et al., TACL 2014]

• Based on Personalized PageRank, the state-of-the-art

method for graph-based WSD.

However, it cannot be run for each new input on huge graphs.

• Idea: Precompute semantic signatures for the nodes!

• Semantic signatures are the most relevant nodes for

a given node in the graph computed by using random

walk with restart

Andrea Moro and Alessandro Raganato and Roberto Navigli. 2014. Entity

Linking meets Word Sense Disambiguation: a Unified Approach.

Transactions of the Association for Computational Linguistics (TACL), 2.

http://babelfy.orgBabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 71: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

71

224224

Babelfy: A Joint approach to WSD and EL

[Moro et al., TACL 2014]

1. Precompute semantic signatures;

2. Given an input text select all the possible candidate

meanings from BabelNet by matching mentions with

BabelNet lexicalizations;

3. Connect all the candidate meanings by using semantic

signatures;

4. Extract a dense subgraph containing semantically

coherent candidates;

5. Select the most connected candidate for each fragment

of text.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

225225

Step 1: Semantic Signatures

a. Start from one target vertex of the semantic network;

b. Randomly select a neighbor of the current vertex or

restart from the target vertex;

c. Keep the counts of hitting frequencies;

d. Take the most visited vertices.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 72: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

72

226226

Step 1: Semantic Signatures

striker

offside

athlete

sportsoccer player

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

227227

1. Precompute semantic signatures;

2. Given an input text select all the possible candidate

meanings from BabelNet by matching mentions with

BabelNet lexicalizations;

3. Connect all the candidate meanings by using semantic

signatures;

4. Extract a dense subgraph containing semantically

coherent candidates;

5. Select the most connected candidate for each fragment

of text.

Babelfy: A Joint approach to WSD and EL

[Moro et al., TACL 2014]

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 73: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

73

228228

Step 2: Find all possible meanings of words

1. Exact Matching (good for WSD, bad for EL)

Thomas and Mario are strikers playing in Munich

Thomas,

Norman Thomas,

Seth

They both have

Thomas as one of

their lexicalizations

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

229229

Step 2: Find all possible meanings of words

1. Partial Matching (good for EL)

Thomas and Mario are strikers playing in Munich

Thomas,

Norman Thomas,

Seth

Thomas

Müller

It has Thomas as a

subsequence of one

of its lexicalizations

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 74: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

74

230230

Step 2: Find all possible meanings of words

“Thomas and Mario are strikers playing in Munich”

Thomas (novel)

Seth Thomas

Thomas Müller

Mario Gómez

Mario (Album)

Mario (Character)

Striker (Movie)

Striker (Video Game)

striker (Sport)Munich (City)

FC Bayern Munich

Munich (Song)

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

231231

Step 2: Find all possible meanings of words

“Thomas and Mario are strikers playing in Munich”

Thomas (novel)

Seth Thomas

Thomas Müller

Mario Gómez

Mario (Album)

Mario (Character)

Striker (Movie)

Striker (Video Game)

striker (Sport)Munich (City)

FC Bayern Munich

Munich (Song)

Ambiguity!

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 75: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

75

232232

1. Precompute semantic signatures;

2. Given an input text select all the possible candidate

meanings from BabelNet by matching mentions with

BabelNet lexicalizations;

3. Connect all the candidate meanings by using semantic

signatures;

4. Extract a dense subgraph containing semantically

coherent candidates;

5. Select the most connected candidate for each fragment

of text.

Babelfy: A Joint approach to WSD and EL

[Moro et al., TACL 2014]

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

233233

Step 3: Connect all the candidate meanings

Thomas and Mario are strikers playing in Munich

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 76: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

76

234234

1. Precompute semantic signatures;

2. Given an input text select all the possible candidate

meanings from BabelNet by matching mentions with

BabelNet lexicalizations;

3. Connect all the candidate meanings by using semantic

signatures;

4. Extract a dense subgraph containing semantically

coherent candidates;

5. Select the most connected candidate for each fragment

of text.

Babelfy: A Joint approach to WSD and EL

[Moro et al., TACL 2014]

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

235235

Step 4: Extract a dense subgraph

Thomas and Mario are strikers playing in Munich

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 77: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

77

236236

Step 4: Extract a dense subgraph

Thomas and Mario are strikers playing in Munich

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

237237

1. Precompute semantic signatures;

2. Given an input text select all the possible candidate

meanings from BabelNet by matching mentions with

BabelNet lexicalizations;

3. Connect all the candidate meanings by using semantic

signatures;

4. Extract a dense subgraph containing semantically

coherent candidates;

5. Select the most connected candidate for each fragment

of text.

Babelfy: A Joint approach to WSD and EL

[Moro et al., TACL 2014]

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 78: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

78

238238

Step 5: Select the most reliable meanings

• We take into account both the lexical coherence, in

terms of the number of fragments a candidate relates to,

and the semantic coherence, using a graph centrality

measure among the candidate meanings.

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

239239

Step 5: Select the most reliable meanings

Thomas and Mario are strikers playing in Munich

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 79: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

79

240240

Step 5: Select the most reliable meanings

“Thomas and Mario are strikers playing in Munich”

Thomas (novel)

Seth Thomas

Thomas Müller

Mario Gómez

Mario (Album)

Mario (Character)

Striker (Movie)

Striker (Video Game)

striker (Sport)Munich (City)

FC Bayern Munich

Munich (Song)

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

242242

Experimental Results:

Fine-grained (Multilingual) Disambiguation

Senseval-3

SemEval-2007

task 17

SemEval-2013 task 12

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 80: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

80

243243

Experimental Results:

Coarse-grained Word Sense Disambiguation

SemEval-2007 task 7 dataset:

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

244244

Experimental Results: Entity Linking

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 81: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

81

245245

http://babelfy.org

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

246246BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens. 246

Page 82: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

82

247247Multilingual Word Sense Disambiguation and Entity Linking – COLING 2014 TutorialRoberto Navigli and Andrea Moro

248248Multilingual Word Sense Disambiguation and Entity Linking – COLING 2014 TutorialRoberto Navigli and Andrea Moro

Page 83: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

83

249249

Babelfy: RESTful API

Babelfy bfy = Babelfy.getInstance(AccessType.ONLINE);

String inputText = "hello world, I'm a computer scientist";

Annotation annotations = bfy.babelfy("key", inputText, Matching.PARTIAL, Language.EN);

System.out.println("inputText: "+inputText);

System.out.println("annotations:");

for(BabelSynsetAnchor annotation : annotations.getAnnotations()){

System.out.println(annotation.getAnchorText());

System.out.println("\t"+annotation.getBabelSynset().getId()+"\t"+annotation.getBabelSynset());

}

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

25025015/12/2014 Natural Language

Processing:

Regular Expressions,

Automata and Morphology

Pagina 250

Hands-on Session: Babelfy

Page 84: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

84

251251

Annotating with BabelNet:

all in one!

Annotating with BabelNet implies annotating with WordNet

and Wikipedia

(now also OmegaWiki, Open Multilingual WordNet,

Wiktionary and WikiData!)

Key fact!

251

BabelNet

7

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

261261

Publishing Structured Data as Linked Data

BabelNet goes to the Multilingual Semantic Web. Roberto Navigli and David Jurgens.

Page 85: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

85

26226215/12/2014 Natural Language

Processing:

Regular Expressions,

Automata and Morphology

Pagina 262

Hands-on Session: RDF & SPARQL

Go to:

http://babelnet.org:8084/sparql/

263

Retrieve all the RDF information of a synset

● For instance, given the synset:

● http://babelnet.org/2.0/s00000356n

DESCRIBE <http://babelnet.org/2.0/s00000356n>

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 86: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

86

264

Retrieve the senses of a given lemma for a

certain language

● Given a word, e.g. home, retrieve all its

senses and corresponding synsets in all

supported languages:

SELECT DISTINCT ?sense ?synset WHERE {

?entries a lemon:LexicalEntry .

?entries lemon:sense ?sense .

?sense lemon:reference ?synset .

?entries rdfs:label "home"@EN .

}

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

265

Retrieve the translations of a given sense

● Given a sense, we want to obtain all its

translations:

● For instance, given the sense:

– http://babelnet.org/2.0/home_EN/s00044488n

SELECT ?translation WHERE {

<http://babelnet.org/2.0/home_EN/s00044488n> lexinfo:translation

?translation .

}

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 87: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

87

266

Retrieve license information about a sense

● For instance, given the sense:

– http://babelnet.org/2.0/home_EN/s00044488n

SELECT ?license WHERE {

<http://babelnet.org/2.0/home_EN/s00044488n> dcterms:license ?license .

}

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

267

Retrieve textual definitions in all languages

● For instance, given the synset:

● http://babelnet.org/2.0/s00000356n

SELECT DISTINCT ?language ?gloss ?license ?sourceurl WHERE {

<http://babelnet.org/2.0/s00000356n> bn-lemon:definition ?definition .

?definition lemon:language ?language .

?definition bn-lemon:gloss ?gloss .

?definition dcterms:license ?license .

?definition dc:source ?sourceurl .

}

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 88: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

88

268

Retrieve a synset’s hypernyms

● For instance, given the synset:

● http://babelnet.org/2.0/s00000356n

SELECT ?broader WHERE {

<http://babelnet.org/2.0/s00000356n> skos:broader ?broader

}

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

302302

Conclusion

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 89: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

89

303303

To summarize

• I have taken you through a tour of:

A very large multilingual semantic network: BabelNet

A state-of-the-art WSD and EL system: Babelfy

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

304304304Multilingual Semantic Processing with BabelNet – LREC 2014 Tutorial

Roberto Navigli and David Jurgens

Actually there’s much much

much more!

Fei, thanks for this crazy photo!

Page 90: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

90

305305

Next feature

in BabelNet!Semantic Predicates (SPred)

[Flati and Navigli, ACL 2013]

cup of *

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

306306

Next feature

in BabelNet!Semantic Predicates (SPred)

[Flati and Navigli, ACL 2013]

cup of *cup of *

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 91: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

91

307307

Next feature

in BabelNet!Semantic Predicates (SPred)

[Flati and Navigli, ACL 2013]

cup of *cup of *

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

308308

Earl Grey tea

Green tea

Indian tea

Black tea

Tea

Water

Seawater

Coffee

Turkish coffee

Drip coffee

Espresso

Cappuccino

Caffè latte

Decaffeinated

coffee

Wine

Sack

White wine

Red wine

Claret

Kosher wine

Madeira wine

Wine in China

…Classes sorted by relevance!

Next feature

in BabelNet!Semantic Predicates (SPred)

[Flati and Navigli, ACL 2013]

wine coffee beverage water

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 92: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

92

30930915/12/2014Homework 2

Roberto Navigli

309

Bored worker Enthusiastic player

31031015/12/2014Homework 2

Roberto Navigli

310

Bored worker Enthusiastic playerEnthusiastic player Enthusiastic worker

Page 93: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

93

311311

We want people to play, play, and play!

Real videogames «with a purpose»

Having fun with annotations

[Vannella et al. ACL 2014; Jurgens and Navigli, TACL 2014]

311BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

312312

Case Study #1: Validate and Extend

Semantic Relations in BabelNet[Vannella et al., ACL 2014]

• Given a pair of concepts, decide if they are

related

• “doctor” and “medicine”

• “doctor” and “USA”

• used for new and existing relations

312BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 94: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

94

313313

Game #1: Data Design

• Pick a target BabelNet synset

• Pick a related synset and a lemma from that synset

(validation case)

• Show the gloss of the target synset as a clue

• We know what synset the other word comes from

• Generate true negative data automatically by picking

random words related to other synsets

• Low probability of being related

• Lets us measure player accuracy

313BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

314314

Game #1: Infection (a zombie survival game)

Page 95: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

95

315315

Game #1: Infection (a zombie survival game)

316316

Game #1: Presenting a clue to the player

Page 96: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

96

317317

Game #1: Gameplay

317BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

318318

Game #1: Results

• ~250 players made 6.5K annotations in a two-week period

• Better performance than crowdsourcing, with zero cost*

• Gamers spotted 67.8% of true positive relations

compared with 16.9% on Crowdflower

Players were very

accurate at spotting

false negative items!

318BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 97: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

97

319319

• Given a concept and an image, decide if the

image depicts the concept

Case Study #2: Validate and Extend

Image-Sense Associations in BabelNet[Vannella et al., ACL 2014]

319BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

321321

Game #2: The Knowledge Towers (Action RPG)

Page 98: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

98

322322

Game #2: Showing the concept hint

323323

Game #2: Gameplay

ies

323BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 99: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

99

324324

Game #2: Gameplay

324

ies

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

325325

Game #2: Results

• ~200 players made 6.3K annotations in a two-week period

• Better performance than crowdsourcing, with zero cost*

• Gamers spotted 82.5% of true positive images compared

with 59.5% on Crowdflower

Players were very

accurate at spotting

false negative items!

325BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 100: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

100

32632615/12/2014 BabelNet & friends

Roberto Navigli

326

327327

Thanks or…

m i(grazie)

BabelNet, Babelfy and Beyond – AI*IA 2014 Tutorial

Roberto Navigli

Page 101: BabelNet, Babelfy and Beyondaiia2014.di.unipi.it/BabelAIIA2014.pdf · ctx(Balloon (aircraft)) = { aircraft, aerostat, buoyancy, airship, …, gondola, ballooning, hydrogen, aeronautics

12/15/2014

101

328328

http://lcl.uniroma1.it

http://babelnet.org

http://babelfy.org

Google group: babelnet-group