labels: automation adam kilgarriff. auckland 2012kilgarriff / labels: automation2 which words are: ...
TRANSCRIPT
Labels: automation
Adam Kilgarriff
Auckland 2012 Kilgarriff / Labels: automation 2
Which words are:
Most distinctive of business English? Most often in plural?
For eg English nouns
Most often used in gerund? For eg Spanish verbs
Auckland 2012 Kilgarriff / Labels: automation 3
Common issue for lexicographers
Ordinary cases No need to say anything in dictionary
Extreme cases (“most X”) Needs saying
Auckland 2012 Kilgarriff / Labels: automation 4
Not hard in principle
Given the right corpus For each word
Count, under condition 1 Eg plural instances
Count, under condition 2 Eg all instances
Compute ratio Sort all words according to ratio Words at top of list are most X
Auckland 2012 Kilgarriff / Labels: automation 5
In practice
Programming task Big corpora: big and slow Slightly different each time Very rarely done
(except Keywords in WordSmith)
Now: automated in Sketch Engine demo
Auckland 2012 Kilgarriff / Labels: automation 6
FindX specification file: type 1=plural nameQ1 [lempos="%s" & tag="NN2"]Q2 [lempos="%s" & tag="NN[10]"]
The two queries to compare frequencies for lempos="%s" means the list we want is a list of lempos (lemma + pos)
RE ^[a-z]+-n$ items matching this RegExp only
(here: all-lower-case nouns)
Auckland 2012 Kilgarriff / Labels: automation 7
FindX specification file: type 2
=passive
nameHR passives
human-readable name (optional)WS passive
use the word sketch relation ‘passive’RE -v$
only for items matching this RegExp (here: only verbs) (optional)