recent change in american and british english, a corpus...

Post on 10-Oct-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Why British and American English?

• The “special relationship” • Two nations divided by a common language • To what extent is the British “lag” in evidence? • Are the varieties getting more similar or different? • Which forms of English are likely to influence global

usage? • Quite a bit of disagreement or uncertainty (Juola,

Hebblethwaite, Leech, Finegan) • A lot of studies have focussed on existing hypotheses

or questions about specific types of language (e.g. corpus-based).

The Brown Family

• 8 matched corpora, each 1 million words, 500 samples of 2000 words each

• 15 genres represented

• American and British written published English

• Sampling points 1931, 1961, 1991/2, 2006

Sampling framework Text category letter and description Files

A Press: Reportage 44

B Press: Editorial 27

C Press: Reviews 17

D Religion 17

E Skills, Trades and Hobbies 36

F Popular Lore 48

G Belles Lettres, Biographies, Essays 75

H Miscellaneous: Government

documents, industrial reports etc

30

J Academic prose in various disciplines 80

K General Fiction 29

L Mystery and Detective Fiction 24

M Science Fiction 6

N Adventure and Western 29

P Romance and Love story 29

R Humour 9

Health warnings

• Findings only relate to standard published English only (which is not at the coalface of innovation)

• 1 million words is relatively small – so I have focussed on high frequency patterns

• With four sampling points we must take care not to assume straightforward linear changes (we can only infer what was happening at other points)

Average Word length

4.4

4.45

4.5

4.55

4.6

4.65

4.7

4.75

4.8

1920 1940 1960 1980 2000 2020

American

British

Average Sentence length

17.2

17.4

17.6

17.8

18

18.2

18.4

18.6

18.8

19

19.2

19.4

1920 1940 1960 1980 2000 2020

American

British

Proportion of different words used (type token ratio)

43

43.5

44

44.5

45

45.5

46

46.5

1920 1940 1960 1980 2000 2020

American

British

Features examined

• Single words (e.g. money) • 2, 3, 4 and 5 word clusters (on the other hand) • Single part of speech tags (NN1 = singular

common noun) • 2, 3, 4 and 5 word sequences of POS tags (NP1

NP1 = proper noun, proper noun) • Semantic tags via Wmatrix (G3: defence and

warfare) • 2, 3, 4 and 5 letter sequences within words (-ology, -fess-)

Two methods

• 4 sets of 2- way keyword comparisons (e.g. AE 1931 vs BE 1931)…

Tag 1931 1961 1991 2006

AT article (the, no) ✓

DDQ wh-determiner (which, what) ✓ ✓ ✓

EX existential there ✓ ✓ ✓

NNB preceding noun of title (e.g. Mr) ✓ ✓

RG degree adverb (very, so, too) ✓ ✓ ✓

RR general adverb ✓ ✓

…and the Coefficient of Variance

• CV = a word’s standard deviation divided by its mean, multiplied by 100

• around: 110, 245, 407, 630 (CV = 64)

• money: 306, 325, 306, 332 (CV = 4)

around (high cv) and money (low cv)

0

100

200

300

400

500

600

700

1920 1940 1960 1980 2000 2020

around

money

Cut-offs

• I am interested in relatively high frequency phenomena: for most features, I considered items that occurred 1000 times in either the 4 American or the 4 British corpora (for single words I also went down to 100 times)

• Focus mostly on items which showed a constant increase or decrease over time.

• Rather than look at every word, I have concentrated on those with the highest and lowest CVs or keyness scores (usually the top 10, 20 or 50 cases)

Densification American

English

British English Both

Increasing

that’s, children’s,

Mom, ensure

*ism

phone

I’m, onto, Mum

UK, NHS, BBC

it’s, didn’t, don’t,

The ‘s’-genitive

Dad kids, TV

*ology

Et al

NN1 NN1 NN1

NN1 NN1 NN2

[Mean word length]

Decreasing

let us

automobile

upon, cannot, need not

more or less, at any rate

two or three, in view of

on the other hand

from time to time

II31 II32 II33 AT

Great Britain

The of-genitive

any one

on the part of

[Mean sentence

length]

Colloquialisation

American

English

British

English

Both

Increasing guys

okay

I

my

your

don’t want to

a kind of

I’m going to

VVGK

guy

a bit of

I have to

apostrophe use

Dad

kids

TV

then

you have to

Taboo

language

Decreasing I do not

Sexual and excretory swear words

0

50

100

150

200

250

300

350

400

1920 1940 1960 1980 2000 2020

American

British

Profane uses of language

0

50

100

150

200

250

300

1920 1940 1960 1980 2000 2020

American

British

Democratisation

American

English

British

English

Both

Increasing feminist

of women in

might

gender access to

support for

could

can

Decreasing men

of man

Mr and Mrs

Colonel

*fess

NNB

Sir

Rev.

shall

must

Mr

Mrs

NNB NP1 NP1

Spelling American

spelling example

British spelling

example

or/our color colour

re/er centre center

ize/ise organize organise

ization/isation civilization civilisation

yze/yse analyze analyse

og/ogue catalog catalogue

e/ae anemia anaemia

e/oe fetal foetal

ce/se defense defence

l/ll canceled cancelled

ction/xion connection connexion

-/e aging ageing

toward/towards toward towards

-/st while whilst

gray/grey gray grey

-ize vs -ise

-100

-80

-60

-40

-20

0

20

40

60

80

100

1931 1961 1991 2006

American

British

Preference for -ise

Preference for -ize

Summary of findings

• American English – almost 100% adherence in 9 out of 17 cases

• British English – almost 100% adherence in 6 out of 17 cases

• British English – weakening grasp on colour, practise as verb, travelling, queueing

• American English – switched to amoeba, weakening grasp on queuing

Summary of all spellings

-88.9 -85.4 -85.8 -86.4

89.6 93.1 92.3 91.9

-100

-80

-60

-40

-20

0

20

40

60

80

100

1931 1961 1991 2006

American

British

Is automisation of language responsible?

• Spell-check can be set to American or British English.

• Grammar and style checks may also influence writing e.g. avoid clichéd language: on the part of, the fact that, so far as, as to the, the spirit of, for the most part

• Avoid passive sentences – big decreases in taken, given and made (in passive cases), increases in I. Also decreases in dummy pronouns it and there as well as decreases in BE, VVN

Part of Speech tags

• Decreasing modality and hedging – less use of modal

verbs, gradable adverbs (AE leading) • 96% of RG consists of: very, so, as, too, about, quite,

over, rather, far and pretty • Rather – often used to ‘understate’ a negative

evaluation rather unseemly, rather unsightly, rather disappointing

• Very – to strengthen positive evaluation good, great, useful, important etc.

• This could be a move towards densification too?

Gradable adverbs

0

1000

2000

3000

4000

5000

6000

1920 1940 1960 1980 2000 2020

American (CV=11)

British (CV=17)

Translation Table (Telegraph 2 September 2013)

What the British say What the British mean What Foreigners

understand

That is a very brave

proposal

You are insane He thinks I have courage

Quite good A bit disappointing Quite good

I would suggest Do it or be prepared to

justify yourself

Think about the idea, but

do what you like

Very interesting That is clearly nonsense They are impressed

You must come for

dinner

It’s not an invitation, I’m

just being polite

I will get an invitation

soon

Could we consider some

other opinions

I don’t like your idea They have not yet

decided

Decline of DDQ (which/what)

0

1000

2000

3000

4000

5000

6000

7000

8000

1931 1941 1951 1961 1971 1981 1991 2001

American

British

Are we using relative clauses differently?

• Theory 1 – which is declining due to other ways of writing relative clauses (e.g. that or the zero clause)

• Theory 2 - which is declining because relative clauses are declining generally

• Theory 3 – which is declining despite changes to relative clauses

• *_AT* (_{A})? *_N* (_{N})? (that|which)

The “which” relative clause

0

500

1000

1500

2000

2500

1931 1941 1951 1961 1971 1981 1991 2001

American

British

The “that” relative clause

0

500

1000

1500

2000

2500

1931 1941 1951 1961 1971 1981 1991 2001

American that

British that

American which

British which

Finding zero relative clauses

_A* (_{A})* (_{N})* _N* (he|she|they|you|it|we)

False positives (around 10%)

• Shake the hand of a squaddie They deserve our thanks

• To a certain extent it is horribly dangerous

The majority replace that not which

• Apart from anything else , there was a feeling he had become a joke figure

• by the time you get this Mary will be turned three

The zero relative clause

0

500

1000

1500

2000

2500

1931 1941 1951 1961 1971 1981 1991 2001

American

British

Grammar/style checkers

Key Semantic tags

Key in American English 1931 1961 1991 2006

G2.1 Law and Order ✓ ✓ ✓ ✓

G3 Warfare, defence and army;

weapons

✓ ✓ ✓ ✓

S5+ Belonging to a group ✓ ✓ ✓

Y2 Information, technology and

computing

✓ ✓ ✓ ✓

Key in British English 1931 1961 1991 2006

A3+ Existing ✓ ✓ ✓ ✓

A13.3 Degree: Boosters ✓ ✓ ✓

A13.5 Degree: Compromisers ✓ ✓ ✓

Qualitative analysis

• Information, technology and computing is key in AmE due to the high frequency of program(s) tagged this way, whereas programme(s) is not.

• Warfare – common in the press – Vietnam, Gulf War, War on Terror. The firearms debate in the US.

G2.1 Law and Order

0

1000

2000

3000

4000

5000

6000

1931 1941 1951 1961 1971 1981 1991 2001

American

British

S5+ Belonging to a group

0

1000

2000

3000

4000

5000

6000

1931 1941 1951 1961 1971 1981 1991 2001

American

British

Conclusions

• General trends in densification, democratisation, colloquialisation

• More cases where American English seems to have led change, especially for grammatical tags and tag sequences

• Moves towards convergence in the latter time period.

• But spelling differences are likely to hold in the future (word processing options?)

Future predictions

• Further densification: e.g. “zero” forms. This. Because science, missing apostrophes, acronyms, emojis

• Vanishing titles and references to gender: gender-neutral pronouns.

• Less hedging, more active forms, more first and second person pronouns, but more nominalisation.

• Taboo language more common (except for religious taboo)

• Wider vocabulary

Thank you

• Baker, P. (2011) 'Times may change but we'll always have money: a corpus driven examination of vocabulary change in four diachronic corpora.' Journal of English Linguistics 39: 65-88.

• Finegan, E. (2004) American English and its distinctiveness. In E. Finegan and J. R. Rickford (eds) Language in the USA. Cambridge: Cambridge University Press, pp. 18-38.

• Leech, Geoffrey. 2002. Recent grammatical change in English: data, description, theory. In Karin Aijmer & Bengt Altenberg (eds.), Proceedings of the 2002 ICAME Conference, 61-81. Gothenburg.

• Potts, A. and Baker. P. (2012) 'Does semantic tagging identify cultural change in British and American English?' International Journal of Corpus Linguistics 17:3 295-324.

• Smith, Nicholas. 2002. Ever moving on? The progressive in recent British English. In Pam Peters, Peter Collins & Adam Smith (eds.), New frontiers of corpus research, 317-330. Amsterdam: Rodopi.

top related