tiziano flati and roberto navigli spred: large-scale harvesting of semantic predicates cup of
TRANSCRIPT
Tiziano Flati and Roberto Navigli
SPred: Large-scale Harvesting of Semantic Predicates
Cup of
Over 2.25 billioncups of coffee are consumed in the world every day
“
”
2SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
cup of *
3SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
cup of *
4SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
cup of *
𝒍𝒊𝒒𝒖𝒊 𝒅𝒏𝟏
𝒅𝒂𝒊𝒓𝒚 𝒑𝒓𝒐𝒅𝒖𝒄𝒕𝒏𝟏
𝒄𝒐𝒖𝒏𝒕𝒓 𝒚 𝒏𝟏
Objective:
5SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Challenge #1: discovering representative arguments
6SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Challenge #2: inferring semantic classes
cup of *
𝒍𝒊𝒒𝒖𝒊 𝒅𝒏𝟏 𝒅𝒂𝒊𝒓𝒚 𝒑𝒓𝒐𝒅𝒖𝒄𝒕𝒏
𝟏
𝒄𝒐𝒖𝒏𝒕𝒓 𝒚 𝒏𝟏
7SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
LEXICALPATTERNS
X such as Y
[Resnik ‘96,Erk ‘07,
Chambers & Jurasky ‘10]
[Hearst 92,Kozareva & Hovy ‘10,
Wu & Weld ‘10]
EAT
MEAT
GAS
FISH
ICE CREAM
SELECTIONALPREFERENCES
8SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
[Resnik ‘96,Erk ‘07,
Chambers & Jurasky ‘10]
[Hearst 92,Kozareva & Hovy ‘10,
Wu & Weld ‘10]
EAT
MEAT
GAS
FISH
ICE CREAM
SELECTIONALPREFERENCES
LEXICALPATTERNS
X such as Y
9SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
[Resnik ‘96,Erk ‘07,
Chambers & Jurasky ‘10]
[Hearst 92,Kozareva & Hovy ‘10,
Wu & Weld ‘10]
EAT
MEAT
GAS
FISH
ICE CREAM
SELECTIONALPREFERENCES
LEXICALPATTERNS
X such as Y
10SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Challenge #1: discovering representative arguments
Challenge #2: inferring semantic classes
SPred
11SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Challenge #2: inferring semantic classes
SPredCONTRIBUTION # 1
Capturing concepts for long tail arguments using a novel wikification
procedure
12SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
CONTRIBUTION # 1Capturing concepts for long tail
arguments using a novel wikification procedure
CONTRIBUTION # 2Inferring WordNet semantic classes
from a distribution of Wikipedia pages
SPred
13SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
METHODOLOGY
14SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
WordNet
WordNet
HARVESTING ARGUMENTS
FROM WIKIPEDIA
LINKING ARGUMENTSTO WIKIPEDIA
AND WORDNET
LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES
…
cup of ** was designed by
the biggest * in 1987
a very big *
…
…
cup of [Beverage]
[Structure] was designed by
the biggest [Event] in 1987
a very big [Phenomenon]
…
15SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
cup of *
LEXICAL PREDICATE
16SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Cup of coffee
LEXICAL PREDICATE cup of
was designed by
the biggest in 1987
a very big
…
****
17SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
cup of coffee
FILLING ARGUMENT
18SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
FILLING ARGUMENT
Cup of coffee
red wine
Italy
was designed by
was designed by
artist
hotel
…
cup of
cup of
dress
bridge
a very big
a very big
…
19SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
20
cup of [Beverage]
SEMANTICPREDICATE
[Liquid]
[Milk] [Alcohol] [Coffee]
[Irish coffee]
Example output
SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Cup of Beverage
SEMANTIC PREDICATE
cup of
cup of
[Clothing]
[Platform]
a very big
a very big
…
[Beverage]
[Country]
was designed by
was designed by
[Artist]
[Building]
…
21SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
…
cup of Beverage
Structure was designed by
the biggest Event in 1987
a very big Phenomenon
…WordNet
WordNet
HARVESTING ARGUMENTS
FROM WIKIPEDIA
LINKING ARGUMENTSTO WIKIPEDIA
AND WORDNET
LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES
…
cup of ** was designed by
the biggest * in 1987
a very big *
…
lexical predicate
lexical predicate
CLASSCLASS
CLASSCLASS
22SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
cup of * ( )
23SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
cup of *
coffee
tea
Italy
milk
yeast
…
24SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
…
cup of Beverage
Structure was designed by
the biggest Event in 1987
a very big Phenomenon
…WordNet
WordNet
HARVESTING ARGUMENTS
FROM WIKIPEDIA
LINKING ARGUMENTSTO WIKIPEDIA
AND WORDNET
LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES
…
cup of ** was designed by
the biggest * in 1987
a very big *
…
lexical predicate *
lexical predicate
[CLASS][CLASS]
[CLASS][CLASS]
25SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
𝑳𝑰𝑵𝑲
!𝑴𝑨𝑷
!
Earl grey tea
𝒕𝒆𝒂cup of
cup of
Earl grey tea
cup of
Earl grey tea
cup of
26SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Research question #1: How to determine which Wikipedia page best corresponds to
an argument?
… and drank over twenty
cups of coffee each day…
?
27SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Wikipedians will occasionallylink the arguments for us
William G. McGowan
He was also a three-pack-a-day smoker and drank over twenty cups of coffee each day until his first heart attack. As leader of MCI, he labored for several years to gain the financing and …
For free!
28SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
All instances of ‘coffee’
linked
Problem #1: Not many arguments are linked
29SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
113
4
?
113
All instances of ‘coffee’
How to link these instances?
Problem #1: Not many arguments are linked
4 linked
30SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
?
the greatest benefits were observed in those who drank coffee for a long period in their lifetime.
[…]
roughly 80 to 100 cups of coffee for an average adult taken within a limited time…
1st heuristic: One sense per page
Health effects of
caffeine
If the argument text has been linked somewhere else in the article, use that link’s page
Manually linked
One sense
per page
31SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Trust the
inventory
2nd heuristic: Trust the inventory
1 sense
only!
If there’s only one page for that argument text, link to that page
his days in the library with a cup ofEarl Grey tea. The main character of the…
32SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Problem #2: Same argument linkedto multiple pages
42
78
linked
linked
All instances of ‘water’
?33SPred: Large-scale Harvesting of Semantic Predicates
Flati, Navigli
100%
Research question #2: How to determinewhich WordNet concepts best represent Wikipedia pages?
cup of * ( )𝒘𝒂𝒕𝒆𝒓
𝒕𝒆𝒂
𝑰𝒕𝒂𝒍𝒚34SPred: Large-scale Harvesting of Semantic Predicates
Flati, Navigli
NEs andspecialized concepts from Wikipedia
BabelNet: a mapping from Wikipedia pages to concepts
[Navigli & Ponzetto, 2012]
𝒘𝒘𝒘 .𝒃𝒂𝒃𝒆𝒍𝒏𝒆𝒕 .𝒐𝒓𝒈35SPred: Large-scale Harvesting of Semantic Predicates
Flati, Navigli
Concepts from WordNet
Concepts integrated from both resources
Argument mapping
Coffee is a brewed beverage with a distinct aroma and flavor, prepared from the roasted seeds…
Coffee
𝝁 (𝑪𝒐𝒇𝒇𝒆𝒆 )
𝝁
𝒄𝒐𝒇𝒇𝒆𝒆
36SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Argument mapping
The vast majority of Wikipedia pages [4M+]do not have a corresponding concept in WordNet [117K+]
= ?
37
( ) SPred: Large-scale Harvesting of Semantic Predicates
Flati, Navigli
Argument mapping: hypernym extraction
Earl Grey tea
Earl Grey tea is a tea
with a distinctive flavour and aroma derived from the addition of oil extracted from the rind of the bergamot orange, a fragrant citrus fruit. Traditionally, the term "Earl Grey“…
Target lemma
Hypernym extracted by WCL
Definitional sentence
Tea is an aromatic beverage commonly prepared by pouring hot or boiling water…
Tea
WCL+ link
[Navigli & Velardi, 2010]
38SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Argument mapping: an example
We can thus synergistically map to WordNet more than 500K pages!
WCL
In literature, the main character in Haruki Murakami's Kafka on the Shore starts his days in the library with a cup of
Earl Grey tea. The main character of the…
Earl Grey tea is a tea with a distinctive flavour and aroma derived from…
Earl Grey teaTea is an aromatic beverage commonly prepared by pouring hot or boiling water…
Tea
Trust t
he
invento
ry
WCL
BabelNet
𝒕𝒆𝒂
39SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
…
cup of Beverage
Structure was designed by
the biggest Event in 1987
a very big Phenomenon
…WordNet
WordNet
HARVESTING ARGUMENTS
FROM WIKIPEDIA
LINKING ARGUMENTSTO WIKIPEDIA
AND WORDNET
LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES
…
cup of ** was designed by
the biggest * in 1987
a very big *
…
lexical predicate
SEMANTICPREDICATE
40SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
𝒄𝒐𝒇𝒇𝒆𝒆
[𝑩𝒆𝒗𝒆𝒓𝒂𝒈𝒆 ]
𝒄𝒐𝒇𝒇𝒆𝒆
Research question #3: how to generalize WordNet concepts associated with
arguments?
41SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
3K+ most frequent concepts freely downloadable
Generalization to semantic classes
{}
{}
{}
{}
{}
{}
{}
CORECONCEPTS
Core concepts of {}
42SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
3K+ most frequent concepts freely downloadable
Generalization to semantic classes
{}
{}
{}
{}
{}
{}
{}
Core concepts of {}
Semantic Class of {}
CORECONCEPTS
43SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
• By repeating the same procedure for all thearguments of a lexical predicate we discover clusters of arguments for each semantic class
Generalization to semantic classes
Semantic class
In literature, the main character in Haruki Murakami's Kafka on the Shore starts his days in the library with a cup of
Earl Grey tea. The main character of the…
Earl Grey tea is a tea with a distinctive flavour and aroma derived from…
Earl Grey teaTea is an aromatic beverage commonly prepared by pouring hot or boiling water…
Tea
Trust t
he
invento
ry
WCL
BabelNet
𝒕𝒆𝒂
[𝑩𝒆𝒗𝒆𝒓𝒂𝒈𝒆 ]
44SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
[𝒘𝒊𝒏𝒆𝒏𝟏 ] [𝒃𝒆𝒗𝒆𝒓𝒂𝒈𝒆𝒏
𝟏 ][𝒄𝒐𝒇𝒇𝒆𝒆𝒏𝟏 ] [𝒘𝒂𝒕𝒆𝒓 𝒏
𝟏 ]
…
earl grey tea
tea
…
water
seawater
…
coffee
cappuccino
…
wine
white wine
…
Classes sorted by frequency!
cup of *
45SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
EVALUATION
46SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
1st EvaluationSemantic class ranking quality
47SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Experimental SetupLexical predicate Argument
provide * minerals
give birth to * child
publish * review
build * suspence
* collide car
get stuck in * traffic jam
reduce * pollution
… …
DATASET 150 random
lexical predicatesfrom
Oxford AdvancedLearner's Dictionary
48SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Precision @ K[Wine][Feeling]
[Coffee][Water]
[Dairy product]
[Country]
…
Impo
rtan
ce
Top Ksemanticclasses
# correct
KP@K =
49SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Results for dataset 1
1 2 3 4 5 6 7 8 9 10111213141516171819200.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
SPred
K (semantic classes)
Pre
cisi
on
@K
50SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Experimental Setup
DATASET 224 lexical patterns
fromKozareva & Hovy 2010
Lexical predicate
work for *
* work for
fly to *
* fly to
go to *
* go to
* celebrate
* dress
…
51SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
1 2 3 4 5 6 7 8 9 10111213141516171819200.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
SPred
K (semantic classes)
Pre
cisi
on
@K
K&H
Results for dataset 2
52SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
2nd EvaluationArgument disambiguation quality
53SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Lexical predicate Argument
provide * minerals
give birth to * child
publish * review
build * suspence
* collide car
get stuck in * traffic jam
reduce * pollution
… …
Experimental Setup
54
• ~ 800 lexical predicatessampled from theOxford AdvancedLearner’s Dictionary
• 3,245 items manuallyannotated with themost suitablesemantic class
DATASET
SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Results
Precision Recall F10
10
20
30
40
50
60
70
80
90
SPredRandom
Per
form
ance
55SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
SPred: a novel approach to large-scale harvesting of semantic predicates
Contributions
WCLSPred
WordNet
56
• Novel heuristics for linking arguments• High performance argument classifier• Freely available dataset of semantic predicates
SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
http://lcl.uniroma1.it/spred/
~ 1500 predicates
57SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Thanks or…
m i
58SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli
Tiziano Flati
Linguistic Computing Laboratoryhttp://lcl.uniroma1.it
Joint work with Roberto Navigli