corpus linguistics in sociolinguistics

16

Click here to load reader

Upload: ken-harvey-cenas-famor

Post on 11-Dec-2015

14 views

Category:

Documents


4 download

DESCRIPTION

This is my report in sociolinguistics for my MA Class in Ateneo de Davao University

TRANSCRIPT

Page 1: Corpus Linguistics in Sociolinguistics

CORPUS LINGUISTICS IN SOCIOLINGUISTICS

Ken Harvey C. Famor

Page 2: Corpus Linguistics in Sociolinguistics

INTRODUCTION

This chapter outlines a range of ways that methodological techniques from the field of corpus linguistics can be used in order to conduct sociolinguistic analyses.

Corpus linguistics is a method of analysis that relies on a large collections of naturally occurring data, stored in electronic form, which can be analyzed with the help of computer software.

Page 3: Corpus Linguistics in Sociolinguistics

CORPUS LINGUISTICS

The word corpus is Latin for “body,” and in linguistics a corpus is a body (or large collection) of texts, carefully chosen so that they are representative of a particular language or language variety.

Corpora can consist of millions or even billions of words.

The texts in a corpus are stored in electronic form and are analyzed by a specialist computer software which can count and perform calculations on very large amounts of language data quickly without making mistakes.

Page 4: Corpus Linguistics in Sociolinguistics

CORPUS LINGUISTICS

A frequency list can useful when comparing the frequencies of different words together

Corpus programs can compare two frequency lists together and then identify all of the words which are statistically significant in one of those lists when compared against the other. These “significant” words are referred to as keywords in corpus linguistics, and are helpful because they can direct researchers to aspects of language in a corpus that we may not have realized were especially important.

Page 5: Corpus Linguistics in Sociolinguistics

CORPUS LINGUISTICS

Another aspect of frequency – collocation.

A collocate is a word which frequently occurs next to or near another word.

An analysis of collocates helps to reveal something about how words acquire meanings.

Example:Elderly collocates ill, infirm, disabled

Page 6: Corpus Linguistics in Sociolinguistics

CORPUS LINGUISTICS

Concordance analysis – looking at a table that shows every occurrence of a word or phrase in a corpus, with a few words either side. This makes it easier to spot similar uses of the search word.

Page 7: Corpus Linguistics in Sociolinguistics

CORPUS LINGUISTICS

One benefit of using a corpus approach is that it helps us to make claims that are:

1. based on actual language use

2. based on a lot of language use

Studies that use introspection can result in inaccurate claims due to various cognitive biases that all humans have.

Studies which use a small amount of data are more difficult to generalize from.

Page 8: Corpus Linguistics in Sociolinguistics

BUILDING A CORPUS

In building a corpus, it is often helpful to begin by thinking of the sorts of research questions that you want to answer, and then deciding what constraints can be placed on data collection.

We should also try to balance as many factors as we can when we are carry out sampling. Sometimes the job of building a perfectly balanced corpus proves to be difficult, and so we have to make compromises, settling for what we can get rather than what we would like to have, and we may need to adjust our research questions to reflect that.

Page 9: Corpus Linguistics in Sociolinguistics

BUILDING A CORPUS

Once collected, corpus data then need to be saved electronically.

Page 10: Corpus Linguistics in Sociolinguistics

RESEARCH QUESTIONS

One type of research question that corpus linguists ask is based on testing an existing theory: “Does the age of a speaker impact on the way that people apologize?”

This type of question refers to corpus-based research.

Another approach is the corpus-driven and exploratory in that the analyst may not really know what they should be analyzing in advance. Their analysis of the frequencies drives them toward certain lines of investigation: “What is distinctive about this corpus?”

Page 11: Corpus Linguistics in Sociolinguistics

COMPARING THE SPEECH OF YOUNGER AND OLDER ADULTSCategory Key in the 25-34 group Key in the 60+ group

Codes and names Tim [gap:name], [gap:address]

Fillers and discourse Okay, erm, yea, right Er, yes, aye, well, mm

Markers

Auxiliary verbs “s, is, ‘re are do Was, were, had

Modal verbs Can, ‘ll, need ‘d

Page 12: Corpus Linguistics in Sociolinguistics

QUAGMIRES AND TROUBLESHOOTING Corpus linguistics methods might not always be the best option to use.

The problem of getting hold of corpus data in the first place.

Be careful not to overgeneralize about language differences based on one type of distinction.

How widely is a feature distributed across the speakers from the same social group?

How do we interpret information from corpus data?

Page 13: Corpus Linguistics in Sociolinguistics

TIPS

Go online to see if you can use an existing corpus or collection of transcribed texts, rather than building your own from scratch

If you are tagging your own corpus, bear in mind that some corpus tools use characters like the asterisk *, the full stop, and brackets as special symbols to allow you to carry out complex searches, so try to avoid these characters as codes when transcribing your spoken data.

Don’t save the files as Word or PDF documents. Most corpus tools work best if saved as text only documents (.txt extension)

If you are collecting data from the Web, make sure you strip out the “noise.”

Page 14: Corpus Linguistics in Sociolinguistics

TIPS

Read the Help documentation that often comes with online corpora or corpus tools

Check to see whether your corpus tool is capable of uncovering the linguistic feature that you want to study.

Do not assume that a word/phrase is always used in a consistent way.

Sometimes concordance lines need to be expanded so you can see an entire utterance.

If you are comparing frequencies of a feature between two or more groups of speakers, bear in mind that the total number of words spoken by each group may not be the same.

Page 15: Corpus Linguistics in Sociolinguistics

TIPS

Consider the dispersion of a linguistic feature. Even if it is very frequent in one group, is this because only one or two people use this feature a lot?

Page 16: Corpus Linguistics in Sociolinguistics

THANK YOU! =)