recent change in american and british english, a corpus...
Post on 10-Oct-2020
0 Views
Preview:
TRANSCRIPT
Recent change in American and British English: a corpus-driven approach
Paul Baker
Lancaster University
Why British and American English?
• The “special relationship” • Two nations divided by a common language • To what extent is the British “lag” in evidence? • Are the varieties getting more similar or different? • Which forms of English are likely to influence global
usage? • Quite a bit of disagreement or uncertainty (Juola,
Hebblethwaite, Leech, Finegan) • A lot of studies have focussed on existing hypotheses
or questions about specific types of language (e.g. corpus-based).
The Brown Family
• 8 matched corpora, each 1 million words, 500 samples of 2000 words each
• 15 genres represented
• American and British written published English
• Sampling points 1931, 1961, 1991/2, 2006
Sampling framework Text category letter and description Files
A Press: Reportage 44
B Press: Editorial 27
C Press: Reviews 17
D Religion 17
E Skills, Trades and Hobbies 36
F Popular Lore 48
G Belles Lettres, Biographies, Essays 75
H Miscellaneous: Government
documents, industrial reports etc
30
J Academic prose in various disciplines 80
K General Fiction 29
L Mystery and Detective Fiction 24
M Science Fiction 6
N Adventure and Western 29
P Romance and Love story 29
R Humour 9
Health warnings
• Findings only relate to standard published English only (which is not at the coalface of innovation)
• 1 million words is relatively small – so I have focussed on high frequency patterns
• With four sampling points we must take care not to assume straightforward linear changes (we can only infer what was happening at other points)
Average Word length
4.4
4.45
4.5
4.55
4.6
4.65
4.7
4.75
4.8
1920 1940 1960 1980 2000 2020
American
British
Average Sentence length
17.2
17.4
17.6
17.8
18
18.2
18.4
18.6
18.8
19
19.2
19.4
1920 1940 1960 1980 2000 2020
American
British
Proportion of different words used (type token ratio)
43
43.5
44
44.5
45
45.5
46
46.5
1920 1940 1960 1980 2000 2020
American
British
Features examined
• Single words (e.g. money) • 2, 3, 4 and 5 word clusters (on the other hand) • Single part of speech tags (NN1 = singular
common noun) • 2, 3, 4 and 5 word sequences of POS tags (NP1
NP1 = proper noun, proper noun) • Semantic tags via Wmatrix (G3: defence and
warfare) • 2, 3, 4 and 5 letter sequences within words (-ology, -fess-)
Two methods
• 4 sets of 2- way keyword comparisons (e.g. AE 1931 vs BE 1931)…
Tag 1931 1961 1991 2006
AT article (the, no) ✓
DDQ wh-determiner (which, what) ✓ ✓ ✓
EX existential there ✓ ✓ ✓
NNB preceding noun of title (e.g. Mr) ✓ ✓
RG degree adverb (very, so, too) ✓ ✓ ✓
RR general adverb ✓ ✓
…and the Coefficient of Variance
• CV = a word’s standard deviation divided by its mean, multiplied by 100
• around: 110, 245, 407, 630 (CV = 64)
• money: 306, 325, 306, 332 (CV = 4)
around (high cv) and money (low cv)
0
100
200
300
400
500
600
700
1920 1940 1960 1980 2000 2020
around
money
Cut-offs
• I am interested in relatively high frequency phenomena: for most features, I considered items that occurred 1000 times in either the 4 American or the 4 British corpora (for single words I also went down to 100 times)
• Focus mostly on items which showed a constant increase or decrease over time.
• Rather than look at every word, I have concentrated on those with the highest and lowest CVs or keyness scores (usually the top 10, 20 or 50 cases)
Densification American
English
British English Both
Increasing
that’s, children’s,
Mom, ensure
*ism
phone
I’m, onto, Mum
UK, NHS, BBC
it’s, didn’t, don’t,
The ‘s’-genitive
Dad kids, TV
*ology
Et al
NN1 NN1 NN1
NN1 NN1 NN2
[Mean word length]
Decreasing
let us
automobile
upon, cannot, need not
more or less, at any rate
two or three, in view of
on the other hand
from time to time
II31 II32 II33 AT
Great Britain
The of-genitive
any one
on the part of
[Mean sentence
length]
Colloquialisation
American
English
British
English
Both
Increasing guys
okay
I
my
your
don’t want to
a kind of
I’m going to
VVGK
guy
a bit of
I have to
apostrophe use
Dad
kids
TV
then
you have to
Taboo
language
Decreasing I do not
Sexual and excretory swear words
0
50
100
150
200
250
300
350
400
1920 1940 1960 1980 2000 2020
American
British
Profane uses of language
0
50
100
150
200
250
300
1920 1940 1960 1980 2000 2020
American
British
Democratisation
American
English
British
English
Both
Increasing feminist
of women in
might
gender access to
support for
could
can
Decreasing men
of man
Mr and Mrs
Colonel
*fess
NNB
Sir
Rev.
shall
must
Mr
Mrs
NNB NP1 NP1
Spelling American
spelling example
British spelling
example
or/our color colour
re/er centre center
ize/ise organize organise
ization/isation civilization civilisation
yze/yse analyze analyse
og/ogue catalog catalogue
e/ae anemia anaemia
e/oe fetal foetal
ce/se defense defence
l/ll canceled cancelled
ction/xion connection connexion
-/e aging ageing
toward/towards toward towards
-/st while whilst
gray/grey gray grey
-ize vs -ise
-100
-80
-60
-40
-20
0
20
40
60
80
100
1931 1961 1991 2006
American
British
Preference for -ise
Preference for -ize
Summary of findings
• American English – almost 100% adherence in 9 out of 17 cases
• British English – almost 100% adherence in 6 out of 17 cases
• British English – weakening grasp on colour, practise as verb, travelling, queueing
• American English – switched to amoeba, weakening grasp on queuing
Summary of all spellings
-88.9 -85.4 -85.8 -86.4
89.6 93.1 92.3 91.9
-100
-80
-60
-40
-20
0
20
40
60
80
100
1931 1961 1991 2006
American
British
Is automisation of language responsible?
• Spell-check can be set to American or British English.
• Grammar and style checks may also influence writing e.g. avoid clichéd language: on the part of, the fact that, so far as, as to the, the spirit of, for the most part
• Avoid passive sentences – big decreases in taken, given and made (in passive cases), increases in I. Also decreases in dummy pronouns it and there as well as decreases in BE, VVN
Part of Speech tags
• Decreasing modality and hedging – less use of modal
verbs, gradable adverbs (AE leading) • 96% of RG consists of: very, so, as, too, about, quite,
over, rather, far and pretty • Rather – often used to ‘understate’ a negative
evaluation rather unseemly, rather unsightly, rather disappointing
• Very – to strengthen positive evaluation good, great, useful, important etc.
• This could be a move towards densification too?
Gradable adverbs
0
1000
2000
3000
4000
5000
6000
1920 1940 1960 1980 2000 2020
American (CV=11)
British (CV=17)
Translation Table (Telegraph 2 September 2013)
What the British say What the British mean What Foreigners
understand
That is a very brave
proposal
You are insane He thinks I have courage
Quite good A bit disappointing Quite good
I would suggest Do it or be prepared to
justify yourself
Think about the idea, but
do what you like
Very interesting That is clearly nonsense They are impressed
You must come for
dinner
It’s not an invitation, I’m
just being polite
I will get an invitation
soon
Could we consider some
other opinions
I don’t like your idea They have not yet
decided
Decline of DDQ (which/what)
0
1000
2000
3000
4000
5000
6000
7000
8000
1931 1941 1951 1961 1971 1981 1991 2001
American
British
Are we using relative clauses differently?
• Theory 1 – which is declining due to other ways of writing relative clauses (e.g. that or the zero clause)
• Theory 2 - which is declining because relative clauses are declining generally
• Theory 3 – which is declining despite changes to relative clauses
• *_AT* (_{A})? *_N* (_{N})? (that|which)
The “which” relative clause
0
500
1000
1500
2000
2500
1931 1941 1951 1961 1971 1981 1991 2001
American
British
The “that” relative clause
0
500
1000
1500
2000
2500
1931 1941 1951 1961 1971 1981 1991 2001
American that
British that
American which
British which
Finding zero relative clauses
_A* (_{A})* (_{N})* _N* (he|she|they|you|it|we)
False positives (around 10%)
• Shake the hand of a squaddie They deserve our thanks
• To a certain extent it is horribly dangerous
The majority replace that not which
• Apart from anything else , there was a feeling he had become a joke figure
• by the time you get this Mary will be turned three
The zero relative clause
0
500
1000
1500
2000
2500
1931 1941 1951 1961 1971 1981 1991 2001
American
British
Grammar/style checkers
Key Semantic tags
Key in American English 1931 1961 1991 2006
G2.1 Law and Order ✓ ✓ ✓ ✓
G3 Warfare, defence and army;
weapons
✓ ✓ ✓ ✓
S5+ Belonging to a group ✓ ✓ ✓
Y2 Information, technology and
computing
✓ ✓ ✓ ✓
Key in British English 1931 1961 1991 2006
A3+ Existing ✓ ✓ ✓ ✓
A13.3 Degree: Boosters ✓ ✓ ✓
A13.5 Degree: Compromisers ✓ ✓ ✓
Qualitative analysis
• Information, technology and computing is key in AmE due to the high frequency of program(s) tagged this way, whereas programme(s) is not.
• Warfare – common in the press – Vietnam, Gulf War, War on Terror. The firearms debate in the US.
G2.1 Law and Order
0
1000
2000
3000
4000
5000
6000
1931 1941 1951 1961 1971 1981 1991 2001
American
British
S5+ Belonging to a group
0
1000
2000
3000
4000
5000
6000
1931 1941 1951 1961 1971 1981 1991 2001
American
British
Conclusions
• General trends in densification, democratisation, colloquialisation
• More cases where American English seems to have led change, especially for grammatical tags and tag sequences
• Moves towards convergence in the latter time period.
• But spelling differences are likely to hold in the future (word processing options?)
Future predictions
• Further densification: e.g. “zero” forms. This. Because science, missing apostrophes, acronyms, emojis
• Vanishing titles and references to gender: gender-neutral pronouns.
• Less hedging, more active forms, more first and second person pronouns, but more nominalisation.
• Taboo language more common (except for religious taboo)
• Wider vocabulary
Thank you
• Baker, P. (2011) 'Times may change but we'll always have money: a corpus driven examination of vocabulary change in four diachronic corpora.' Journal of English Linguistics 39: 65-88.
• Finegan, E. (2004) American English and its distinctiveness. In E. Finegan and J. R. Rickford (eds) Language in the USA. Cambridge: Cambridge University Press, pp. 18-38.
• Leech, Geoffrey. 2002. Recent grammatical change in English: data, description, theory. In Karin Aijmer & Bengt Altenberg (eds.), Proceedings of the 2002 ICAME Conference, 61-81. Gothenburg.
• Potts, A. and Baker. P. (2012) 'Does semantic tagging identify cultural change in British and American English?' International Journal of Corpus Linguistics 17:3 295-324.
• Smith, Nicholas. 2002. Ever moving on? The progressive in recent British English. In Pam Peters, Peter Collins & Adam Smith (eds.), New frontiers of corpus research, 317-330. Amsterdam: Rodopi.
top related