corpus linguistics and language teaching the next nexus? doug biber northern arizona university
TRANSCRIPT
Corpus linguistics and language teaching
The next nexus?
Doug Biber
Northern Arizona University
Goals of the talk
• Introduce corpus linguistics
• Present case studies illustrating the surprising findings that emerge from corpus-based research
• Discuss the application of corpus research to classroom teaching and materials development
What is corpus linguistics?
• A research approach for describing language use:
How do speakers and writers actually use the vocabulary and grammar resources available in a language?
What is a corpus?
• A large, principled collection of ‘natural’ texts stored on computer
• A corpus should ‘represent’ particular language varieties or registers (e.g., conversation or university textbooks)
– Design is important: texts must be sampled from particular target registers
– Size is equally important: Some language features are rare but still have systematic patterns of use
Characteristics of corpus-based analysis (I)
• Relies on computer-assisted techniques– Concordancers (‘KWIC’ displays = ‘Key Word In
Context’)– Computer programs
• Automatic (e.g., grammatical ‘taggers’)• Interactive (to code grammatical variants)
Example of concordance output (from MonoConc)
Characteristics of corpus-based analysis (II)
• Analyses are empirical
• Uses both quantitative and qualitative / interpretive techniques
• Meaningful analyses must be motivated by linguistic research questions (not simply by the availability of a corpus)
So what is corpus linguistics?• A research approach – A way of thinking
about language– Shines the spot light on language use: registers
and language for specific purposes– Allows investigation of language choice: Why
does a speaker use a particular word or grammatical form rather than alternatives?
– Allows investigation of meaning in context: why synonyms are usually not interchangeable
– Allows investigation of language preference: what forms are rare? What is especially common?
Corpus descriptions capture the complexities of actual use
– Language use is often systematic but complex
– Corpus-based studies can consider the range of relevant factors and the interactions among factors
– Corpus analysis describes the patterns of use, but it cannot directly determine how those findings are relevant for language learning
– That is, corpus analyses provide the basis for informed decisions by teachers – not necessarily the immediate content of our language teaching
Case studies
• Vocabulary
• Grammar
• Lexico-grammar
Corpus-based descriptions of vocabulary:Selected reference works
Learner dictionaries based on corpora:Longman Dictionary of Contemporary English (LDOCE);
Collins COBUILD English Dictionary
Vocabulary textbooks based on corpora:McCarthy and O’Dell; Basic Vocabulary in Use
Thornbury; Natural Grammar
Academic studies of collocation:Sinclair 1991; Partington 1998
Case studies on vocabulary
• Corpus-based dictionaries• Collocation• Semantic prosody
Case studies on vocabulary (1):Corpus-based dictionaries
• The order of meanings reflects use
e.g. LDOCE entry for concerned:Meaning 1: ‘involved in something’ (reach an agreement with all concerned)Meaning 2: ‘worried’(concerned about how little I eat)
• Identifies common words and register differences
Words moderately common in speech (not writing -- LDOCE)flood, hopefully, messy, potato, shave, underneath
Words moderately common in writing (not speech -- LDOCE)focus, glance, moreover, pollution, scope, underlying
Synonyms: large, great, and big
Case studies on vocabulary (2):Collocations
For example:
Large number(s) ‘quantity’scaleproportionamount
versus
Great deal (of) ‘impressive’importancemajority
(see Firth 1957; Sinclair 1991; Partington 1998; Biber, Conrad, Reppen 1998)
Case studies on vocabulary (3): Semantic prosody
Copular verbs that mean ‘become’:
turn black, red, white, pale
come alive, loose, true, unstuck
go crazy, mad, wrong, bad
(Longman Grammar of Spoken and Written English, 444-445)
(cf. Partington 1998)
Corpus-based studies of grammar
• Demonstrative pronouns: this versus that• Word classes: nouns, verbs, pronouns• Dependent clauses: that-clauses versus
to-clauses
• (From the Longman Grammar of Spoken and Written English)
Case studies on grammar (1)
The grammar of individual words: Demonstrative pronouns this versus that
• The traditional description of the difference:
– This refers to a thing near the speaker
– That refers to something that is not near the speaker
The grammar of individual words (cont.) Demonstrative pronouns that versus this
0
2000
4000
6000
8000
10000
12000
Conversation Academic WR
Fre
qu
ency
per
mil
lio
n w
ord
s
that
this
Demonstrative pronouns that versus this (cont.)
• Examples of that in conversation(vague or situational reference)
That was delicious.
A: I was, I was flat on my back. B: Uh, I can't sleep like that
• Examples of this in academic writing(text deixis)
GAAP requires that a business use the accrual basis. This means that the accountant records revenues as they are earned…
Case studies on grammar (2)
The register distribution of grammatical classes:
Nouns, verbs, personal pronouns
Distribution of nouns, verbs, and pronouns across four registers
0
50
100
150
200
250
300
350
Conversation ClassroomTeaching
Textbooks Academic Prose
Fre
qu
ency
per
1,0
00 w
ord
s
Nouns
Verbs
Personalpronouns
Case studies on grammar (3)
Syntactic features
Dependent clauses are common in writing but rare in speech:
Contrasting intuitions with actual use
That-clauses and to-clauses in conversation vs. academic prose
0
1000
2000
3000
4000
5000
6000
7000
Conversation Academic prose
Verb + THAT-clause
"Extraposed"THAT-clause
Verb + TO-clause
"Extraposed"TO-clause
• Verb + that-clause in conversation:I know (that) I told you.
I think (that) we picked it up.
• Extraposed to-clauses in academic prose:It is important to specify the states …
It is difficult to maintain a consistent level…
It is impossible to liquefy a gas …
Corpus-based studies of lexico-grammar
Case studies from the Longman Grammar of Spoken and Written English:
– The grammatical ‘patterns’ of individual words: tell and promise
(cf. Hunston and Francis 2000; Thornbury 2004)
– Passive verbs: common and rare
– Common verbs with that-clauses in conversation
Case studies on lexico-grammar (1)
The grammar of words: tell versus promise
• Both verbs have identical valency patterns:– They can occur as monotransitive verbs (with a
direct object) – or as ditransitive verbs (with a direct object and
an indirect object)
Grammatical patterns for tell and promise in newspaper language
0
10
20
30
40
50
60
70
80
90
TELL PROMISE
V + Direct Object
V + Clause
V + IndirectObject + Clause
• Example of TELL in newspapers – expressing both the addressee AND the content of the message:
Cheney told [Navy Secretary H. Lawrence Garrett] [that he would cancel the $50 billion project] …
• Example of PROMISE in newspapers – expressing only the content of the promise:
The company promised [to donate about $500,000 to the cause] …
Case studies on lexico-grammar (2)
The words of grammar:
Verbs with passive voice
Verbs with passive voice
• Selected verbs that almost always occur with passive voice in academic prose (over 70% of the time):
– Verbs of scientific methodology: be analyzed, be calculated, be collected, be measured, be tested
– Their occurrence is measured in a few parts per million.
– Verbs expressing logical relations and interpretations: be based (on), be associated (with), be attributed (to), be interpreted (as), be regarded (as)
– Their presence must be regarded as especially undesirable.
Verbs with passive voice (2)
• Selected transitive verbs that almost never occur in the passive voice:
agree, guess, have, like, love, quit, reply, try, want, watch, wish, wonder
Case studies on lexico-grammar (3)
Verbs controlling that-clauses versus to-clauses
That-clauses and to-clauses in conversation vs. academic prose
0
1000
2000
3000
4000
5000
6000
7000
Conversation Academic prose
Verb + THAT-clause
"Extraposed"THAT-clause
Verb + TO-clause
"Extraposed"TO-clause
Verbs that control that-clauses
• Almost 200 verbs attested in the LSWE Corpus (e.g., feel, realize, hear, assume, suggest, ensure, indicate, imply, propose)
• Only 4 verbs are extremely common in conversation:
think, say, know, guess
Verbs controlling that-clauses in conversation
0
500
1000
1500
2000
2500
THINK SAY KNOW GUESS All otherverbs
Applications of corpus-based research
Language for specific purposes
• Language use is mediated by register
• That is, notions like ‘common’, ‘rare’, and ‘typical’ are usually not meaningful for general English.
• Rather, language features and patterns are typical of particular registers.
• Case study of modal verbs in university registers
Modal verb classes across specialized university registers
0
5
10
15
20
25
Classroomteaching
Classroommanagement
Textbooks Syllabi, etc.
Possibilitymodals
Necessitymodals
Predictionmodals
Why are there so many prediction modals in class management?
These usually serve (indirect) directive functions:
• I'd like you to review your quizzes
• I would encourage you to add this to your stack of materials
• and then assignment six will be due Tuesday
Students using corpora in the classroom
• The student as researcher: Data-driven learning (e.g., article use) (Johns – e.g., 1991, ELR Journal)
• LSP applications: student concordancing based on a specialized corpus (see, e.g., Donley and Reppen 2001, TESOL Journal; Gavioli and Aston 2001)
• Do students benefit? Yes: enhances vocabulary learning and transfer of word knowledge (Cobb 1997, System; 1999, CALL)
General considerations for curricula, materials development, and lesson
planning
• What language features and grammatical topics to include / exclude
• What vocabulary to include
• Sequencing
• Providing meaningful practice
Using corpus-based materials in the classroom: Issues (1)
• How to adapt corpus-based research findings?
• What kinds of corpus findings are useful for learners?
• How to adapt natural text for classroom use?
• What kinds of gains in proficiency should we expect from corpus-based materials?
Developing corpus-based materials for the classroom: Issues (2)
• How important is frequency / typicality? What about representation of specific target registers?
• Difficulty and learnability of the construction; inter-language sequences – natural order of acquisition.
• To what extent are current practices actually informed by research on acquisition??
• Unreliability of intuitions
Future research directions
• Need for empirical research on the translation of corpus research findings to classroom materials:
– Overall distribution of grammatical features Issues of inclusion and sequencing
– Collocation and lexico-grammatical patterns Issues of word choice and practice within a lesson
– Discourse factors influencing grammatical variation and choice Presentation and practice within a lesson
• What kinds of gains in proficiency, in response to what kinds of materials?