examining the glowbe corpus for lexicographic evidence in...

52
Examining the GloWbe corpus for lexicographic evidence in SgE, MyE and Hk English Vincent B Y Ooi [email protected] National University of Singapore

Upload: others

Post on 20-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

Examining the GloWbe corpus for lexicographic

evidence in SgE, MyE and Hk English

Vincent B Y Ooi [email protected]

National University of Singapore

Page 2: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

Outline p A. Using the Internet as a lexicographic

resource p B. Summary and application of Davies and

Fuchs’ (2015) findings to Singapore, Malaysia and Hong Kong

p C. Findings beyond Davies and Fuchs - Singapore, Malaysian and Hong Kong English

p D. Evaluation of the GloWbe corpus for lexicographic evidence

Page 3: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

A. Using the Internet as a lexicographic resource p  (Fuertes-Olivera 2012) “I am using the

concept of corpus in a lexicographical way, i.e., a lexicographical corpus is any collection of texts where lexicographers can find inspiration for completing the dictionary structures they need when making a real dictionary… I will focus on ways of exploiting and exploring the Internet as a lexicographical corpus, i.e., the virtual space in which lexicographers can easily access data they might need.”

Page 4: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

A. Sinclair (2004) On the Web… p  “The World Wide Web is not a corpus,

because its dimensions are unknown and constantly changing, and because it has not been designed from a linguistic perspective. At present it is quite mysterious, because the search engines, through which the retrieval programs operate, are all different, none of them are comprehensive, and it is not at all clear what population is being sampled. “

Page 5: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

A. Sinclair, on curating the Web p  “It is important to know precisely what is

actually copied or downloaded from a web page. This is not always obvious, and quite often it is not at all the document that is required…The cheerful anarchy of the Web thus places a burden of care on a user, and slows down the process of corpus building. The organisation and discipline has to be put in by the corpus builder.”

Page 6: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of
Page 7: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. English World-Wide 36:1 (Feb 2015) p  “The Internet as a lexicographical resource” can

be reified, especially for those interested in varieties of English, in the 1.9 billion-word Global Web-based English Corpus (GloWbe)

p  Davies and Fuchs (2015) – February issue of English World-Wide

p  Responses to D&F by: i)  Christian Mair –Nigerian English etc. ii)  Joybrato Mukherjee, for South Asian English iii)  Pam Peters, for Australian English etc. iv)  Gerald Nelson, for the ICE corpus

Page 8: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. Davies and Fuchs on the ICE corpus §  Ice corpus

Ø  1m words each (600,000 S; 400,000 W) Ø  14 varieties of English (all of which GloWbe also

covers) – a total of merely 12.2million words §  However, ICE is limited in size

Ø  Enough data for high frequency syntactic constructions only

Ø  Not so useful for lexical variation which needs more data examples (for lexicographic evidence)

Page 9: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. Motivation for GloWbe §  Need for a larger corpus to study World Englishes

§  The GloWbe corpus Ø  1.9b words Ø  20 different countries (6 inner circle, 14 outer circle) Ø  Notice that Expanding Circle countries (e.g. Japan,

China, Korea) are excluded Ø  Strength: Compare the frequency of a word,

phrase or grammatical construction across these different varieties of English à mapping different varieties of English

Page 10: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F on data collection §  Genre balance

Ø  Between formal & informal language (like ICE corpora)

Ø  ~60% from informal blogs, ~40% from other formal genres & text types

§  Accuracy in identifying dialect Ø  Google “Advanced Search”, limited search by the

region (ß-we’ll revisit this later)

Page 11: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. Size by country (VO – note the uneven sizes)

Page 12: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – Lexical variation freak out

Page 13: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – Concordance for freak out

Page 14: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – Lexical variation fortnight p More British English than U.S. English

Page 15: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – Lexical variation banjaxed p  Irish English (‘ruined’, ‘screwed up’)

Page 16: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – Lexical variation eve teas p  “Public sexual harassment”

Page 17: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – Lexical variation handphone p Mobile / cell phone

Page 18: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F lexical variation: equipments

Page 19: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F lexical variation: equipments

Page 20: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F phraseology [keep in] view

Page 21: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F phraseology [discuss] about

Page 22: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D& F (be) different to

Page 23: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D& F (be) different than

Page 24: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – had + {gotten/got}

Page 25: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. D&F – the quotative “like” construction (May Wong on HkE)

Page 26: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. Singular/plural agreement: Each of them is/are (“innovative” plural)

Page 27: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

B. The “way” construction (not typically HkE – May Wong)

Page 28: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Singapore: killer litter

Page 29: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Dictionary entry for killer litter (this is not Singlish)

p killer litter /…./ noun (uncount; Singapore and Malaysian English)

p Killer litter is something heavy, eg a television, that is disposed of by being thrown from the higher storeys of a building, putting passers-by below at risk of injury: The throwing of killer litter is irresponsible and highly dangerous.

Page 30: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. GloWbe: killer litter

Page 31: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. GloWbe concordance:killer litter

Page 32: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Google Advanced search

Page 33: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Google adv search: killer litter (Au)

Page 34: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. GloWbe: lepak (Malaysian and Singapore English; not in HkE)

Page 35: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Concordance of lepak (MyE; SgE)

Page 36: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Google adv search: lepak (MyE)

Page 37: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Oxforddictionaries.com: lepak

Page 38: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. GloWbe: shroff

Page 39: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Concordance for shroff

Page 40: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. shroff in oxforddictionaries.com

Page 41: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Dictionary defn for shroff/shroffing

Page 42: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

C. Measuring diglossia – kiasu most prototypical “Singlish” item

Page 43: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

Measuring diglossia – kiasu for Sri Lanka?!

Page 44: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

(TCEED2 Appendix entry for kiasu)

p kiasu / … /: adjective p  (of a person) afraid to lose out.

Page 45: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

kiasu in Oxforddictionaries.com

Page 46: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

D. Evaluating GloWbe for lexicographic evidence §  What does GloWbe represent? “Whatever is found on

the web…[so] it may include very little from certain genres, such as students’ academic writing, fiction and business letters.” (D&F responding to Nelson)

§  “Blogs are not the same as spontaneous spoken conversation” (D&F) This may pose an issue for capturing informal/colloquial Malaysian, Singapore and Hong Kong English. [“Singlish”, for instance, is inherently spoken in nature]. But, still, GloWbe is remarkable in capturing quite a number of the sociocultural features characteristic of the informal varieties, e.g. kiasu is the most prototypical Singlish item.

Page 47: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

D. Evaluating GloWbe for lexicographic evidence p  Mair asks whether blogs constitute a

recognizable genre in the first place. p  While this is true, the 60% proportion of blogs

may mean that everyday topics and everyday values are represented – in the personal blog (but D&F haven’t disclosed the proportion of different types of blogs, e.g. travel blog, etc) There’s also the question of ‘blog death’ – so it would be good to know how the sampling is done.

Page 48: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

D. Evaluating GloWbe for lexicographic evidence

p Gerry Nelson and J Mukherjee suggest that some writers from a particular country domain may not actually be from the country in question. D&F say that they provide the original URLs for each of the 1.8 million pages. Users may want to examine the original pages in doubtful cases.

Page 49: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

D. Evaluating GloWbe for lexicographic evidence p  In conclusion, GloWbe is useful as a

welcome and additional “toolbox” for researchers of world Englishes. It should be triangulated with the ICE corpus and other sources of data available.

p  In conclusion, GloWbe still allows us to confirm many of our intuitions and provisional findings on varieties of English. For Stephanie Horch (Mair’s student), “GloWbe is the best source of data: free, fast, vast.”

Page 50: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

D. Evaluating GloWbe for lexicographic evidence §  Disadvantages

Ø  No actual spoken material Ø  Particular website is from a particular country, but did

not check for speaker

§  Davies & Fuchs encourage us to use the various corpora available in a combinational & complementary way (my emphasis!)

Page 51: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

References p  Bolton K. 2003. Chinese Englishes: A Sociolinguistic History.

Cambridge: Cambridge University Press. p  Davies M, and R Fuchs. 2015. Expanding horizons in the study of

World Englishes with the 1.9 billion word Global Web-based English corpus (GloWbe), In English World-Wide 36:1, pp1-29.

p  Fuertes-Olivera, P. 2012. Lexicography and the Internet as a (Re-)source. In Lexicographica 28:1.

p  Kilgarriff, A and G Grefenstette. 2003. Web as corpus. URL: http://www.kilgarriff.co.uk/Publications/2003-KilgGrefenstette-WACIntro.pdf

p  Sinclair, J. 2004a. Corpus and text – basic principles. In Developing Linguistic Corpora: A Guide to Good Practice. URL: http://www.ahds.ac.uk/creating/guides/linguistic-corpora/chapter1.htm

p  Sinclair, J. 2004b. Appendix – how to build a corpus. p  http://www.ahds.ac.uk/creating/guides/linguistic-corpora/

appendix.htm

Page 52: Examining the GloWbe corpus for lexicographic evidence in ...aelinco.blogs.uva.es/files/2015/03/Vincent-Ooi.pdf · Davies M, and R Fuchs. 2015. Expanding horizons in the study of

Thank You for your kind attention!