Transcript
Page 1: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair: Compare and Visualise the Usage of Language

David Beavan University of Glasgow [email protected] @DavidBeavan

Page 2: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

‘You shall know a word by the company it keeps’

Firth, John R., 1957. Modes of meaning. Oxford: Oxford University Press.

Page 3: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Collocation

•  Words which go together •  More than by chance, they show an association

•  Take a corpus •  Search for a term (node word) •  Examine words in a window (e.g. 5) either side of node •  Aggregate these co-occurring words •  Rank (e.g. by frequency or collocational strength)

Page 4: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

‘Stanford’ collocate search via Davies, Mark. (2004-) BYU-BNC: The British National Corpus.Available online at http://corpus.byu.edu/bnc.

Collocates

Page 5: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Collocate Cloud

‘Stanford’ search via Beavan, David. (2008-) BNC Collocate Cloud. Available online at http://www.scottishcorpus.ac.uk/corpus/bnc/collocatecloud.php

Page 6: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Collocate Cloud properties

•  100 most frequent collocates listed alphabetically •  Font size shows frequency of word •  Brightness shows collocational strength of word •  Interactively create new clouds

•  Best New Idea for Improving a Current Web-Based Tool,

2008 TADA Research Evaluation eXchange (T-REX)

Page 7: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Comparison

•  Investigate and compare word usage –  Expose attitudes and cultures –  Investigate degrees of synonymy

•  Semantic prosody –  How synonymous words can actually take on positive or negative

connotations

•  Applications for language learning –  Examine real-world usage of words

Page 8: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011
Page 9: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011
Page 10: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011
Page 11: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair properties

•  Visualise usage of two node words •  Distribute 150+ collocates on a continuum •  Colour shows attraction to node •  Brightness shows degree of collocational attraction

•  Currently uses British National Corpus •  Can be applied to any corpus or dataset (in progress)

Page 12: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair how-to

•  Take two collocate word lists –  Same corpus, different node words –  Different corpora, same node word

•  Calculate collocational strength towards each node –  Mutual Information etc.

•  Place collocates on continuum between node words –  Those with attraction to a single node appear near that node –  Those with little attraction to either node appear central and dim –  Those with attraction to both nodes appear central and bright

Page 13: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair: http://www.scottishcorpus.ac.uk/corpus/bnc/compair.php

David Beavan University of Glasgow [email protected] @DavidBeavan


Top Related