talking about your homework news story? –what made you choose…? one of your words? –what made...

29
Talking about your homework News story? What made you choose…? One of your words? What made you choose…? (Give your vocabulary books to another student. He or she will test you. Give clues!) What does ___ mean? How many senses does ___ have? What words does ___ collocate with? How many grammar frames does it occur in? Can you remember the example sentence?

Post on 20-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Talking about your homework

• News story?– What made you choose…?

• One of your words?– What made you choose…?

• (Give your vocabulary books to another student. He or she will test you. Give clues!)– What does ___ mean?– How many senses does ___ have? – What words does ___ collocate with?– How many grammar frames does it occur in?– Can you remember the example sentence?

Today

• Review and discuss homework• Introduction to language corpus

study• Start on Food and diet

Giving and following instructions

• Student 1: Give step-by-step instructions for finding the top collocations and best sentences

• Student 2: Use a laptop computer and follow the instructions exactly

Introduction to corpus linguistics

Simon Smith & Adam Kilgarriff

Plan for today

• Short review of corpus basics• 4 ages of corpus research

– From pre-computer age, to SkE

• Functions of SkE• Demonstration of SkE in use

Quiz

• What’s a (linguistic) corpus?• What does the Latin word mean?• What are corpora?• What’s the BNC?• How big is the British National

Corpus?• What is the advantage of having a

very big corpus?• What can corpora be used for?

5 major uses for linguistic corpora

• Language learning and teaching• Theoretical research on Language and

Linguistics• Literary research and analysis• Language technology• Lexicography• (=dictionary making)

– Cobuild, Longman, … – All learner dictionaries now use corpora

How do you make a dictionary? (What sources can you use?)

• Use your own knowledge of words• Ask all your friends for their

knowledge • Consult other dictionaries

– and copy them

• Read thousands of books– and take lots of notes

• Use a corpus

Taiwan, Dec 2006

Four ages of corpus research (in lexicography)

Kilgarriff, Lexical ComputingSlide: 9

Age 1: Pre-computer Age 2: KWIC concordance (KWIC=?) Age 3: Corpus query tools

e.g. Sketch Engine

Taiwan, Dec 2006Kilgarriff, Lexical ComputingSlide: 10

Age 1:Pre-computer

First Oxford English (1860)Dictionary:• 20 million index cards

– a word (usually rare) and a citation

Taiwan, Dec 2006Kilgarriff, Lexical ComputingSlide: 11

Age 2: KWIC Concordance1 arity, which will be used to take a party of under-privileged children to D 2 from outside. You are invited to a party and after a couple of drinks you d 3 tion, we believe politicians of all parties will listen to our views. &equo 4 ould be reaching agreement with all parties concerned, as to which events, 5 lack people. I have certainly been party to one or two discussions amongst 6 . These should be discussed by both parties before entering into the relatio 7 presents They had hosted a cocktail party at Kensington palace, for example 8 akes. By midnight the end-of-course party is in full swing, but most cadet 9 e should be a right for the injured party to terminate the contract. A mana 10 by the Safran Peoples ' Liberation Party. This presents the powerful neigh 11 s. Ahead I could see the rest of my party plodding towards the final slope t 12 cial ethic. The two main political parties - the Tories and the Liberals - 13 ritish successes in Perth The small party of British players competing in th 14 to help control. One member of the party went to summon the rescue team and 15 rket society fashion magazine. The party was held at his flat which was a l 16 security and secrecy than any Tory Party Conference : it seems that bootleg

Taiwan, Dec 2006Kilgarriff, Lexical ComputingSlide: 12

Age 2 (~1980-1990): KWIC Concordances

Using computers List of lines which contain a

keyword The keyword is in the middle

Taiwan, Dec 2006Kilgarriff, Lexical ComputingSlide: 13

1 political association 4 person in an 4 person in an agreement/dispute agreement/dispute 2 social event 5 to be party to something...3 group of people

1 arity, which will be used to take a party of under-privileged children to D2 from outside. You are invited to a party and after a couple of drinks you d3 tion, we believe politicians of all parties will listen to our views. &equo4 ould be reaching agreement with all parties concerned, as to which events,5 lack people. I have certainly been party to one or two discussions amongst6 . These should be discussed by both parties before entering into the relatio7 presents They had hosted a cocktail party at Kensington palace, for example8 akes. By midnight the end-of-course party is in full swing, but most cadet9 e should be a right for the injured party to terminate the contract. A mana10 by the Safran Peoples ' Liberation Party. This presents the powerful neigh11 s. Ahead I could see the rest of my party plodding towards the final slope t12 cial ethic. The two main political parties - the Tories and the Liberals -13 ritish successes in Perth The small party of British players competing in th14 to help control. One member of the party went to summon the rescue team and15 rket society fashion magazine. The party was held at his flat which was a l16 security and secrecy than any Tory Party Conference : it seems that bootleg

The coloured pens method

Taiwan, Dec 2006Kilgarriff, Lexical ComputingSlide: 14

Age 2: limitations

as corpora get bigger:too much data

• 50 lines for a word: read all • 500 lines: could read all, takes a long

time• 5000 lines: impossible

Taiwan, Dec 2006

Why do corpora keep getting bigger? (anyone?)

• Improvements in technology– Price of storage is going down– Speed of access is going up

• Representativeness– Small corpus many examples of

common words, maybe– But not enough examples of unusual

words

Lexical distribution

• What’s the most common word in English?• What % does it make up of a whole corpus?• The 100 most common words make up __%

of all the words in a corpus?• The 7500 most common words make up __

%• Answers:

– The, 5%, 45% and 90%• So:

– you need massive corpora, if you want to really represent rare words properly

19

Limitation of KWIC analysis

• As corpora get bigger: too much data– 50 lines for a word: read all– 500 lines: could read all, takes a long time– 5000 lines: no

• Instead, look at a Word Sketch from Sketch Engine– a statistical summary of word usage– shows most common collocates

Taiwan, Dec 200620

Taiwan, Dec 200621

Functions of SkE

• KWIC concordance– Sorting, filtering etc

• Word sketch• Automatic thesaurus• Sketch difference

– discriminate near-synonyms

22

23

Lexical approach to language learning

• Lewis (1993) and Schmitt (2000) say– the vocab is stored in the brain in collocations – Bacon is stored near eggs– 蛋 is stored near 炒飯– scotch is stored with whisky

• Saying strong car or powerful tea or broken house seems very “foreign”

24

From www.teachingenglish.org - a lexical approach activity, based on a story text

Unit 7

Food and diet

Fruit and veg

• What fruit and veg do you like?• How many servings of fruit and veg

do you eat each day?• Is that enough?• Do you have a good diet?

– What other kinds of good food do you often eat?

– What junk food do you eat?

Food pyramid (p 119)

• Label the pyramid in your book• Look at http://mypyramid.gov • Compare the two

– See http://www.mypyramid.gov/downloads/MyPyramid_Anatomy.pdf

• Use the website to see how much you should be eating from each food group.– Now, comment on your diet!

Genetically modified food

• (picture)• Guess answers to p 123a • Read