do we need lexicographers? prospects for automatic lexicography adam kilgarriff lexical computing...
TRANSCRIPT
Do we need lexicographers?Prospects for automatic
lexicography
Adam Kilgarriff
Lexical Computing Ltd
University of Leeds
UK
Bolzano, May 2012 Adam Kilgarriff 2
Outline
Precision and recall Between corpus and dictionary Shopping list Conclusions
Bolzano, May 2012 Adam Kilgarriff 3
Find me all the fat cats
a request for information
Bolzano, May 2012 Adam Kilgarriff 4
High recall
Lots of responses Maybe not all good
Bolzano, May 2012 Adam Kilgarriff 5
High precision
Fewer hits Higher confidence
Bolzano, May 2012 Adam Kilgarriff 6
Information-seeking
Recall Precision
Computers good bad
People bad good
Bolzano, May 2012 Adam Kilgarriff 7
Cyborg: part-human, part-computer
Treat your computer with respect. You and it can do great things
together.
Bolzano, May 2012 Adam Kilgarriff 8
Lexicography: finding facts about words
Shopping list collocations grammatical patterns examples synonyms labels
– region– domain– register
translations meanings
Szeged, Jan 2008 Kilgarriff, Global WordNet 9
What is a word sense (1) SFIP
– Sufficiently frequent insufficiently predictable
(a glass of) whisky x (a glass of) tequila
Szeged, Jan 2008 Kilgarriff, Global WordNet 10
What is a word sense (2)
homonymy
analogy polysemy rules
collocation
Szeged, Jan 2008 Kilgarriff, Global WordNet 11
What is a word sense (3) A cluster
– Of instances of use Operationalised as: corpus lines
– Clustered by lexicographers
Szeged, Jan 2008 Kilgarriff, Global WordNet 12
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 13
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 14
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 15
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 16
What is a word sense (3) A cluster
– Of instances of use Operationalised as: corpus lines
– Clustered by lexicographers Makes sense of
– Overlapping senses– Different dictionaries, different senses– Lumping and splitting
Szeged, Jan 2008 Kilgarriff, Global WordNet 17
I don’t believe in word senses
Believe in:– resurrection ghost witch vampire god miracle
fairy Philosophy:
– Ontological commitment– (same meaning different register)
“good entities to build belief systems on”
Szeged, Jan 2008 Kilgarriff, Global WordNet 18
But I’m an NLP person Automatic clustering? Inspiration:
– Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999
– You can get semantic sense from corpora+stats
Szeged, Jan 2008 Kilgarriff, Global WordNet 19
First attempt Longman 1994 Abject failure
– No grammar– Corpus too small and noisy– Naïve clustering– Useless programmer
Szeged, Jan 2008 Kilgarriff, Global WordNet 20
Second attempt SENSEVALS 1998, 2001, 2004… mitigated failure
– Rarely over two thirds correct
Szeged, Jan 2008 Kilgarriff, Global WordNet 21
Third attempt SADD (semi-automatic dictionary
drafting) 2008 With Pavel Rychly I thought I knew what I was doing but
– Probably a failure
Szeged, Jan 2008 Kilgarriff, Global WordNet 22
Collocations Easy
– Most words don’t go with most other words
Then build on what we can do well (metaphor, analogy, homonymy, rules:
all much harder)
Bolzano, May 2012 Adam Kilgarriff 23
Lexicography: finding facts about words
Shopping list
collocations grammatical patterns examples synonyms labels
– region– domain– register
translations meanings
Yes
Yes
Yes
Yes
YesYes
Yes
Yes
?
No
Bolzano, May 2012 Adam Kilgarriff 24
Thank you
http://www.sketchengine.co.uk