Do we need lexicographers?Prospects for automatic
lexicography
Adam Kilgarriff
Lexical Computing Ltd
University of Leeds
UK
Bolzano, May 2012 Adam Kilgarriff 2
Outline
Precision and recall Between corpus and dictionary Shopping list Conclusions
Bolzano, May 2012 Adam Kilgarriff 3
Find me all the fat cats
a request for information
Bolzano, May 2012 Adam Kilgarriff 4
High recall
Lots of responses Maybe not all good
Bolzano, May 2012 Adam Kilgarriff 5
High precision
Fewer hits Higher confidence
Bolzano, May 2012 Adam Kilgarriff 6
Information-seeking
Recall Precision
Computers good bad
People bad good
Bolzano, May 2012 Adam Kilgarriff 7
Cyborg: part-human, part-computer
Treat your computer with respect. You and it can do great things
together.
Bolzano, May 2012 Adam Kilgarriff 8
Lexicography: finding facts about words
Shopping list collocations grammatical patterns examples synonyms labels
– region– domain– register
translations meanings
Szeged, Jan 2008 Kilgarriff, Global WordNet 9
What is a word sense (1) SFIP
– Sufficiently frequent insufficiently predictable
(a glass of) whisky x (a glass of) tequila
Szeged, Jan 2008 Kilgarriff, Global WordNet 10
What is a word sense (2)
homonymy
analogy polysemy rules
collocation
Szeged, Jan 2008 Kilgarriff, Global WordNet 11
What is a word sense (3) A cluster
– Of instances of use Operationalised as: corpus lines
– Clustered by lexicographers
Szeged, Jan 2008 Kilgarriff, Global WordNet 12
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 13
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 14
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 15
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 16
What is a word sense (3) A cluster
– Of instances of use Operationalised as: corpus lines
– Clustered by lexicographers Makes sense of
– Overlapping senses– Different dictionaries, different senses– Lumping and splitting
Szeged, Jan 2008 Kilgarriff, Global WordNet 17
I don’t believe in word senses
Believe in:– resurrection ghost witch vampire god miracle
fairy Philosophy:
– Ontological commitment– (same meaning different register)
“good entities to build belief systems on”
Szeged, Jan 2008 Kilgarriff, Global WordNet 18
But I’m an NLP person Automatic clustering? Inspiration:
– Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999
– You can get semantic sense from corpora+stats
Szeged, Jan 2008 Kilgarriff, Global WordNet 19
First attempt Longman 1994 Abject failure
– No grammar– Corpus too small and noisy– Naïve clustering– Useless programmer
Szeged, Jan 2008 Kilgarriff, Global WordNet 20
Second attempt SENSEVALS 1998, 2001, 2004… mitigated failure
– Rarely over two thirds correct
Szeged, Jan 2008 Kilgarriff, Global WordNet 21
Third attempt SADD (semi-automatic dictionary
drafting) 2008 With Pavel Rychly I thought I knew what I was doing but
– Probably a failure
Szeged, Jan 2008 Kilgarriff, Global WordNet 22
Collocations Easy
– Most words don’t go with most other words
Then build on what we can do well (metaphor, analogy, homonymy, rules:
all much harder)
Bolzano, May 2012 Adam Kilgarriff 23
Lexicography: finding facts about words
Shopping list
collocations grammatical patterns examples synonyms labels
– region– domain– register
translations meanings
Yes
Yes
Yes
Yes
YesYes
Yes
Yes
?
No
Bolzano, May 2012 Adam Kilgarriff 24
Thank you
http://www.sketchengine.co.uk