using corpora to teach vocabulary
DESCRIPTION
Using Corpora to Teach Vocabulary. Helping Students Help Themselves. 1. What are Corpora?. Large free computerized databases of natural language Corpus of Contemporary American English (COCA) MICASE (Michigan Corpus of Academic Spoken English - PowerPoint PPT PresentationTRANSCRIPT
Using Corpora to Teach Vocabulary
•Helping Students Help Themselves
1
What are Corpora?
Large free computerized databases of natural language
• Corpus of Contemporary American English (COCA)• MICASE (Michigan Corpus of Academic Spoken English• MICUSP (Michigan Corpus of Upper-Level Student Papers)• British National Corpus
2
Corpus Linguistics = Methodology
Bennett (2010)– Corpus-influenced materials
• Textbooks, materials based on frequency & patterns
– Corpus-cited texts• Dictionaries (Collins COBUILD)• Grammar books (Real Grammar: A Corpus-Based
Approach to English)
– Corpus-designed materials• Learner or teacher-created using a corpus
Corpus learning 101Pre-made Materials
Vocabulary Based on Corpus Studies
Frequency Lists• West’s General Service List (first ~2000 most
frequent words)
• Academic Word List (570 word families; 3000 words)
LexTutor’s VocabProfiler• Insert your own texts to assess vocabulary
level
West’s General Service List
1 the2 be3 of4 and5 a6 to7 in8 he9 have10 it11 that12 for13 they14 I15 with16 as
17 not18 on19 she20 at21 by22 this23 we24 you25 do26 but27 from28 or29 which30 one31 would
AWL
abandonabstractacademyaccessaccommodateaccompany
accumulateaccurateachieveacknowledgeacquireadapt
AWL
Analyse – head wordanalysersanalysers analyses analysing analysis – most commonanalyst Analysts
analytic analytical analytically analyze analyzedanalyzesanalyzing
General English
VocabProfiler
Why?
• Materials development• Check vocabulary levels of
webpages• Decide on vocabulary to
focus on
How?
• Create a .txt document• In Word (save as, then
select .txt)• Copy the text • Paste the text into the
VocabProfile site• Double click on proper
nouns to exclude• Click Submit
MS Office Shortcuts
Ctrl + A select all
Ctrl + C copy
Ctrl + V paste
Ctrl + X cut
Ctrl + Z undo
VocabProfiler
Using a Corpus to Teach VocabularyData-Driven Learning
Knowing a Word (Nation, 2001)
Metalinguistic awareness = dictionary definition
+ •spelling•morphology•part of speech•pronunciation•variant meanings•collocations•specific uses•register
Data Driven Learning (Johns, 1991)
Learners become “language detectives” Johns, 1991
Authentic examples & encourages “noticing” or “awareness-raising”
Romer, 2008
Using a Corpus
Pros
Natural Language
Practice analytical skills/verify choices
Creates self-sufficient learners
Contexts rich, varied
Focus on accuracy
Cons
Significant teacher training needed
Few ready-made exercises and challenging to design
Lexical information vast/confusing
Contexts incomplete
No focus on fluency
19
Data-Driven Learning: The Corpus of contemporary american english
COCA• 450 million words
• 20 million words added yearly (1990-2012)• 90 million spoken words
• Academic and general• Spoken• Fiction• Magazines• Newspapers• Academics
21
Academic Genres
• Education• Geography/Social Science• Law/Philosophy• Humanities• Philosophy/Religion• Science/Technology• Medicine• Miscellaneous
22
Training Yourself to Use the COCA
Brief Five-Minute Tour
Class Use
Sign up for group access at least 2 days prior to use– http://corpus.byu.edu/groupAccess.asp
Notice the group limits– One active request at a time– Four hour limit– Teacher must be a registered user
COCA Search Screen
COCA Corpus Search
Parts of Speech with KWIC (Key Words in Context)
They certainly will not grow as learners without opportunity to analyze their strengths and weaknesses.
Language Development
• KWIC search– Parts of speech color coded
• Students code nearby words• Student code 100 word sample
Language Development
Frequency searches (easiest)•Reading fluency – Should you memorize dawdle, meander, or drift?
Phrasal Verb Frequencies
Intermediate Class– Explain what phrasal verbs are with examples
(mess around, use up, call on, wrap up)– Use COCA to find sample sentences
High beginning writing class– Check spelling and non-English words on 30-
minute timed writing– Students look for words that might be misspelled
• Use COCA• If frequency below 10, circle the word (e.g., speciel)
COCA for Morphology• Transport– transportation– transported– transports
Wildcard* Searches
Circle the word not related in meaningclar* *noteclarify connoteclarinet denoteclarity keynoteclark
What are Concordancers?
• Computer programs used to analyze text• LexTutor
• VocabProfiler• AntConc
• Create specialized corpora for ESP classes
Websites of InterestELT Resource Training Wiki (with Amber Warren)•http://eltresourcetraining.pbworks.com
AWL•http://englishvocabularyexercises.com
VocabProfiler•http://www.lextutor.ca/vp/
Grimm’s Fairy Tales in .txt•http://www.cs.cmu.edu/~spok/grimmtmp/
Contact Information
Debra S. LeeVanderbilt University English Language [email protected]
Twitter: dleetnGoogle+: dleetn