research unit for variation, contacts and change in...

44
Terttu Nevalainen Research Unit for Variation, Contacts and Change in English Department of English, University of Helsinki [email protected] Historical sociolinguistics as corpus linguistics

Upload: buiphuc

Post on 21-Mar-2018

219 views

Category:

Documents


5 download

TRANSCRIPT

Terttu NevalainenResearch Unit for Variation, Contacts and Change in English

Department of English, University of Helsinki

[email protected]

Historical sociolinguistics as corpus linguistics

Challenge: the ‘bad-data problem’

Historical linguistics can then be thought of as the art

of making the best use of bad data. The art is a highly

developed one, but there are some limitations of the data

that cannot be compensated for. Except for very recent

times, no phonetic records are available for instrumental

measurements. We usually know very little about the social

position of the writers and not much more about the social

structure of the community. Though we know what was

written, we know nothing about what was understood, and

we are in no position to perform controlled experiments on

crossdialectal comprehension. (Labov 1994: 11)

… and how to deal with it

by systematic corpus compilation

collecting metadata

reconstructing earlier communities

building up baseline evidence

Topics addressed

Kinds of corpora

What is a historical sociolinguistic corpus?

HC, CEEC, OBC, CED

What can historical sociolinguistic corpora tell us

about language change?

a case study

Kinds of corpora

synchronic vs. diachronic

single-genre vs. multigenre

special purpose vs. general

small and tidy vs. big and messy

flat vs. annotated

> the first category more frequent than the

second in sociolinguistic corpora

What is a sociolinguistic corpus?

sampling unit: person

sampling frame: regional variation, variation in

socio-economic status, gender, age, ethnicity etc.

e.g. Sali Tagliamonte’s Roots Corpora:

Northwest England, Lowland Scotland, Nothern Ireland

110 speakers, c. 1 million words

Historical ‘proto-corpora’

diachronic

multigenre

general-purpose

small and tidy

increasingly grammatically annotated

include basic metadata

Sub-period Words %

OLD ENGLISH

I -850

II 850-950

III 950-1050

IV 1050-1150

Total

2 190

92 050

251 630

67 380

413 250

0.5

22.3

60.9

16.3

100.0MIDDLE ENGLISH

I 1150-1250

II 1250-1350

III 1350-1420

IV 1420-1500

Total

EModE, BRITISH

113 010

97 480

184 230

213 850

608 570

18.6

16.0

30.3

35.1

100.0

I 1500-1570

II 1570-1640

III 1640-1710

Total

190 160

189 800

171 040

551 000

34.5

34.5

31.0

100.0

The Helsinki Corpus of English Texts.

(http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/index.html)

Resampling general-purpose, multigenre diachronic corpora for sociolinguistic studies

Helsinki Corpus ( >850-1710)

letters

diaries

trials

Helsinki Corpus of Older Scots (1450-1700)

letters

diaries

trials

Representative Corpus of Historical English

Registers (ARCHER; 1650-1990)

letters

Letters, diaries and trials in the Helsinki Corpus.

Sub-period Words (total) %

OLD ENGLISH

I -850

II 850-950

III 950-1050

IV 1050-1150

Total

- (413 250)

- (100,0)

MIDDLE ENGLISH

I 1150-1250

II 1250-1350

III 1350-1420

IV 1420-1500

Total

5 010

19 090

24 100 (608 570)

0.8

3.1

3.9 (100.0)

EMODE, BRITISH

I 1500-1570

II 1570-1640

III 1640-1710

Total

45 970

44 000

43 980

133 950 (551 000)

8.3

8.0

8.0

24.3 (100.0)

Lady Hoby’s diary (Margaret Hoby, 1571-1633, http://www.oxforddnb.com/view/article/37555)

(Munday the 17)

After priuat praier I saw a mans Legg dressed, took order for #

thinges in the house, and wrough tell dinner time : after dinner I #

went about the house, and read of the arball : then I tooke my

Cocth and Came to Linton, wher, after I had talked a whill with

my mother, examened my selfe and praied, I went to supper, and

then praied publeckly, and so to bed :

(E2 NN DIARY HOBY 72)

Header code Explanation

<QE2 NN DIARY HOBY> (text identifier)

<N DIARY HOBY> (name of text)

<A HOBY MARGARET> (author)

<C E2> (corpus period)

<O 1570-1640> (period of original)

<D ENGLISH> (dialect)

<V PROSE> (verse/prose)

<T DIARY PRIV> (text type)

<W WRITTEN> (relationship to spoken language)

<X FEMALE> (sex of author)

<Y 20-40> (age of author)

<H HIGH> (social rank of author)

<I INFORMAL> (setting)

<Z NARR NON-IMAG> (prototypical text category)

HC reference codes (http://www.helsinki.fi/varieng/CoRD/corpora/HelsinkiCorpus/generalintro.

html)

Historical sociolinguistic corpora

diachronic

single genre; typically letters and trials share a number of linguistic characteristics with face-to-

face conversations

provide data by known individuals, similar to interview

data used in present-day sociolinguistic research

purpose-built

small to medium size

increasingly grammatically annotated

include metadata

The CEEC family of corpora.(http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/index.html)

CEEC

1998

CEEC

Extension

CEEC

SupplementTOTALS

words 2,597,795 2,219,422 442,484 5,259,701

collections 96 77 19 192

letters 5,961 4,923 829 11,713

writers 778 308 94 1,180

time span c. 1410-1681 1653-1800 1402-1663 1402-1800

Published versions the CEEC corpora.(http://www.helsinki.fi/varieng/CoRD/corpora/CEEC/index.html)

CEEC Sampler

(1999)

Parsed CEEC

(2006)*

words 450,085 2,159,132

collections 23 84

letters 1,123 4,970

writers 194 666

time span 1418-1681 1410-1681

* Historical Sociolinguistics Team’s collaborative project with the University of York, Prof. Anthony Warner, Dr. Susan Pinzuk and Dr. Ann Taylor. Tagging by Arja Nurmi and parsing by Ann Taylor.

The Corpus of Early English Correspondence (1998

version: c. 2,6 million words; 778 writers; c. 6,000 letters).____________________________________________________________

TIME SPAN: 1410–1681

WRITERS BY SOCIAL RANK: WRITERS BY DOMICILE:

Royalty: 3% Court: 8%

Nobility: 15% London: 14%

Gentry: 39% East Anglia: 17%

Clergy: 14% North: 12%

Professionals: 11% Other: 49%

Merchants: 8%

Other non-gentry: 10%

WRITERS BY GENDER:

Female: 26%

Male: 74%___________________________________________________________________

Compilers: Terttu Nevalainen, Helena Raumolin-Brunberg; Jukka Keränen, Minna

Nevala, Arja Nurmi, Minna Palander-Collin

Gregory King’s estimate of population and wealth in England and Wales,

1688: cumulative percentages of population (Nevalainen 2010: 8).

Social status Average annual

income (£)

Number of families

Temporal lords 6,060 200

Baronets 1,500 800

Bishops 1,300 26

Knights 800 600

Esquires 562.5 3,000

Merchants, greater 800 5,264

Gentlemen 280 15,000

Persons in greater offices 240 5,000

1% of population

Merchants, lesser 400 21,057

Artisans and handicrafts 200 6,745

Law 154 8,062

Persons in lesser offices 120 5,000

Freeholders, greater 91 27,568

5% of population

Naval officers 80 5,000

Clergymen, greater 72 2,000

Military officers 60 4,000

Science, liberal arts 60 12,898

Freeholders, lesser 55 96,490

Clergymen, lesser 50 10,000

10% of population

Shopkeepers and tradesmen 45 101,704

20% of population

Farmers 42.5 103,382

Manufacturing trades 38 162,863

Building trades 25 73,018

Common seamen 20 50,000

Miners 15 14,240

40% of population

Labouring people and

outservants

15 284,997

60% of population

Common soldiers 14 35,000

Cottagers and paupers 6.5 313,183

Vagrants 2 23,489

I am sure, howsoever I measurd by the cold clime

Aprill for a late May, or missed to signe my name, I

omitted it not for want of grace, but for hast; which

shall be at layzure mended. The hand as I take it

was, as this, my owne, and therefore my owne, and

not my secretarie’s fault; and I confesse I love to

write no dobles of letters, but will affirm my hand and

it whansoever your Grace shall nede to call uppon it.

(CEEC, Peregrine Bertie 1598 (HUTTON) 131)

Holograph vs. autograph?

If you could read my lres [letters] your self I would

have written largelie of your owne buisenes, And

because I will have none acquainted wth them but

who you thinke fitt besides your self, I have taken the

paines to write it in Romaine hand in this inclosed

paper, wch I thinke your self can read.

(CEEC, Anthony Antony 1615 (Stockwell) I,37)

Secretary vs. italic hand?

http://www.crazydiamond.co.uk/fonts/

Hands: secretary (above) and italic (below)

George Puttenham: The Arte of English Poesie(1589)

GEORGE CELY AT ANTWERP TO RICHARD CELY THE

YOUNGER AT CALAIS, 27 SEPTEMBER 1476

Ryght whellbelovyd brothyr, I recomeavnde me vnto as lowyngly as I

con or may. Fordyrmor, plesythe yt yow to vndyrstonde I hawe

resseywyd an letter ffrom yow, the wheche I hawe rede and <P 5> do

whell vndyrstonde/ I hawe wrytt owr ffathyr an answer therof, etc. Owr

ffathyr wold that I showd hy me vnto Calles. Ytt ys so I resseywyd of

Thomas an byll of an Cxxj li. vj s. vj d., wherof I con resseywe but lxx li.

Fl., that hawe I resseywyd and Thomas Kesten hathe promyssyd me to

delyuyr me the rest, and mor to. Allso ther ys an veryavns bytuyxt

Kesten and John Vandyrhay ffor ix sarplers woll. Thys ys an shrowd

[{matter{] : I whas at Mekyllyn and saw yt yll woll. Yt ys thys ys

bytuyxt Kesten and hym.

Cely letter (first half): flat text

Ryght_ADV whellbelovyd_ADV+VAN brothyr_N ,_, I_PRO

recomeavnde_VBP me_PRO vnto_P as_ADVR lowyngly_ADV as_P

I_PRO con_MD or_CONJ may_MD ._. Fordyrmor_ADVR+QR ,_,

plesythe_VBP yt_PRO yow_PRO to_TO vndyrstonde_VB I_PRO

hawe_HVP resseywyd_VBN an_D letter_N ffrom_P yow_PRO ,_,

the_D wheche_WPRO I_PRO hawe_HVP rede_VBN and_CONJ

<P_5> do_DOP whell_ADV vndyrstonde_VB __. I_PRO hawe_HVP

wrytt_VBN owr_PRO$ ffathyr_N an_D answer_N therof_ADV+P ,_,

etc_FW ._. Owr_PRO$ ffathyr_N wold_VBP that_C I_PRO

showd_MD hy_VB me_PRO vnto_P Calles_NPR ._. Ytt_PRO

ys_BEP so_ADV I_PRO resseywyd_VBD of_P Thomas_NPR an_D

byll_N of_P an_D Cxxj_NUM li._NS vj_NUM s._NS vj_NUM d._NS ,_,

wherof_WADV+P I_PRO con_MD resseywe_VB but_FP lxx_NUM

li_NS ._.

Cely letter (beginning): tagged text

( (IP-MAT (NP-VOC (ADJP (ADV Ryght) (ADV+VAN whellbelovyd))

(N brothyr))

(, ,)

(NP-SBJ (PRO I))

(VBP recomeavnde)

(NP-OB1 (PRO me))

(PP (P vnto)

(NP *))

(ADVP (ADVR as) (ADV lowyngly)

(PP (P as)

(CP-CMP (WADVP-1 0)

(C 0)

(IP-SUB (ADVP *T*-1)

(NP-SBJ (PRO I))

(MD (MD con) (CONJ or) (MD may))

(VB *)))))

Cely letter (very beginning): parsed text

Trials: The Proceedings of the Old Bailey, 1674-1913

A fully searchable edition of the largest body of texts

detailing the lives of non-elite people ever published,

containing 197,745 criminal trials held at London's

central criminal court.

http://www.oldbaileyonline.org/index.jsp

"probably the best accounts we shall ever have of what

transpired in ordinary English criminal courts before the

later eighteenth century".

the material reported was neither invented nor

significantly distorted.

at the same time, the Proceedings are far from

comprehensive transcripts of what was said in court.

(see Huber 2007)

"The Old Bailey, Known Also as the Central Criminal Court“ (1808)

(http://en.wikipedia.org/wiki/File:Old_Bailey_Microcosm_edited.jpg)

Old Bailey Proceedings: number of words and

proportion of direct speech per decade, 1734-1834

(Huber 2007).

Corpus of Early English Dialogues (Compiled under the

supervision of Merja Kytö and Jonathan Culpeper).

Degree of narratorial intervention

Authentic dialogue

Constructed dialogue

Minimum narratorial intervention

Trial Proceedings 285,660 words

Drama Comedy 238,590 words

Didactic Works A. Other 162,250 words B. Language Teaching 74,390 words

Miscellaneous 25,970 words

Considerable narratorial intervention

Witness Depositions 172,940 words

Prose Fiction 223,890 words

Total word count

458,600

725,090

http://www.engelska.uu.se/corpus.html

Period word counts for direct speech in the Corpus of English Dialogues (Compiled under the supervision of Merja

Kytö and Jonathan Culpeper).

Period Period totals

1 1560-1599

140,410

2 1600-1639

145,880

3 1640-1679

192,150

4 1680-1719

237,030

5 1720-1760

178,630

Total 894,100

(Source: http://www.engelska.uu.se/corpus.html)

Replacement of THOU by YOU: HC1

Helsinki Corpus: the use of THOU c. 500 instances in

1500-1570 (sermons, the Bible; but also handbooks,

educational treatises, fiction, comedy, and trials):

This whete and rye that thou shalt sowe ought to be

very clene of wede, and therfore er thou thresshe thy

corne open thy sheues and pyke oute all maner of

wedes, and than thresshe it and wynowe it clene, & so

shalt thou haue good clene corne an other yere. (John

Fitzherbert, The Boke of Husbandry 1534: 41).

Replacement of THOU by YOU: HC2

Helsinki Corpus: the use of THOU c. 350 instances in

1570-1640 (sermons, the Bible; comedy, fiction, trials)

sociodialectal narrowing during the seventeenth

century:

- in comedies and fiction, for example, thou is found

in the mouths of servants and country people.

- to some extent, thou continues to be used by

social superiors addressing their inferiors.

rare in letters

Users and non-users of THOU in the CEEC (Nevala 2004: 165).

Writer/recipient relation Users of THOU Non-users of THOU

Family members

15th century 2 (4%) 50 (96%)

16th century 6 (8%) 97 (92%)

17th century 21 (12%) 158 (88%)

18th century 1 (5%) 20 (95%)

Close friends

15th century 0 (0%) 8 (100%)

16th century 0 (0%) 2 (100%)

17th century 7 (21%) 27 (79%)

18th century 1 (6%) 16 (94%)

Findings on CEEC (Nevala 2004)

all THOU users also use YOU

15th - 16th c: mostly from London & Court

17th c.: mostly from other parts of England

18th c.: in poetical & biblical contexts

17th c.: writer/recipient: male writers to their wives,

female writers to their husbands and children

some typical users: the Kentish gentleman Henry

Oxinden writing to his wife

Lady Katherine Paston, an East Anglian gentlewoman

writing to her son

‘Heavy’ THOU users (1): Henry Oxinden

Deare Heart

How glad I was to heare from thee I cannott well expresse: I will

assure thee, leaving all manner of expressions out which are not as

reall as God is true, I do exceedingly love and honour thee.

And the more because of thy industrie in advancing her who if this

businesse in hand aile, cannot expect fanie thing of consequence.

Prethee if the rub be onlie in her, remove itt by all meanes possible,

and I shall thinke nothing too much for thee that I may be able to

give thee. I would thou didst but know one halfe of my ardent

affections towards thee and then I dare say thou wouldst run

through fire and water to effect my desires.

(Henry Oxinden to Katherine Oxinden, 1647)

‘Heavy’ THOU users (2): Katherine Oxinden

My good will: Christ Iesus blese the ever: I did take thy wrightinge to

me in very kinde parte, seinge that at that time thow mightest haue

pretended wearines withe travill yett woldest not make that any lett

to hinder me of thy most louinge and respectiue lines, the which

wear and ever shall be most well com to me, I was glad to heer of

your prosperous Iorny, and of the kind wellcom which you fownd

from that worthy master./whom, I wold by any means thou sholdest

haue a very reverend respect ofe:/ and beware good child that thou

be not too talketiue befor him, but only to learne what is fittinge

behauiour for you to vse before him and that observe and doe:

(Lady Katherine Paston to William Paston, April 1624)

Pursuing Region: some cross-corpus comparisons

THOU vs. YOU

corpora: trials

Old Bailey Proceedings, 1674-1913

English Witness Depositions 1560–1760

Q: the disappearance of THOU?

The use of THOU vs. YOU in Old Bailey trials (London).

Period THOU % THOU YOU % YOU

1700-1709 7 28 % 19 (9 items) 72 %

1710-1719 3 1 % 240 (75 items) 99 %

1720-1729 2 < 0.5% (514 items) > 99.5 %

1730-1739 20 < 0.5% (1,539 items) > 99.5 %

1740-1749 12 < 0.5% (2,837 items) > 99.5 %

William Wilson … one silver watch, val. 3 l. one metal watch, val. 4 l. one three pound twelve shill. piece, two thirty six shill. pieces, and three moidores, the property of Joseph Millikin … did steal, take and carry away, 18 June, 1750

We prevailed upon the countryman to change his

dress, by pulling his great coat off, and I put my hat

and wig on his head, and put on the countryman's

wig, and walked up after him. We gave him charge if

that was the man to give us notice and we would

assist him; he went and took a survey of the man,

went past him a few yards, I planted myself by the

prisoner, the countryman turn'd upon him, and said, ''

mon thou '' hast not altered thy heed if thou hast thy

dress, '' thou art the mon that robbed me.''

> report of a ’countryman’s’ words

Old Bailey: William Davis was indicted for stealing one grey gelding, value 3 l. the property of John Southal, 16 January, 1751

Said I to the prisoner, how came you by this horse?

said he, he had been in a pound, and was brought to

me; I took him to a justice near there; the justice ask'd

him where he was going; said he, to service: said the

justice, how much money hast thou in thy pocket?

said he, but six-pence; said the justice, thou settest

out very empty; the saddle was mine.

> report of a justice talking to the prisoner

The use of THOU vs. YOU in English Witness Depositions (Kytö et al. 2007).

Region/

Period

THOU % THOU YOU % YOU

North-east

(1696–1760)

30 39 % 47 61 %

North-west

(1724–1758)

30 35 % 56 65 %

East

(1700–1754)

0 - 55 100 %

English Witness Depositions examples (Kytö et al. 2007)

said he, Thou knows y=t= y=u= and thy Daughter Murthered a man, and conveyed him away. (North-west: National Archives, London. Palatinate of Lancaster, Crown

Court Depositions. MS PL 27/2, the information of Thomas Airton, 1697)

[...] the said Bassett said Damm ye for a whore youhave pict my Pockett (East: Norfolk Record Office, Norwich. Norwich Quarter Sessions files,

interrogatories and depositions. MS NCR Case 12b(2), the information

of Ellen Wakefeild, 1714)

Conclusions

historical sociolinguistic corpora usefully complement

each other

chronologically

regionally

socially

building up baseline evidence is necessary for a

comprehensive picture of historical developments

new corpora always needed to make our ‘bad data’

better!

The work goes on …

Find out more at: http://www.helsinki.fi/varieng/CoRD/index.html

CoRD is an open-access online resource on which academic corpus compilers can make available basic information about their corpora. It is part of the eVARIENG online services, offered and maintained by the Research Unit for Variation, Contacts and Change in English (VARIENG).

References

Beal, J.C., K.P. Corrigan & H.L. Moisl, (eds) (2007). Creating and Digitizing Language Corpora. Vol. 2: Diachronic databases. Houndsmills: Palgrave-Macmillan.

Huber, M. (2007). The Old Bailey Proceedings, 1674-1834. Evaluating and annotating a corpus of 18th- and 19th-century spoken English. Studies in Variation, Contacts and Change in English, Volume 1, ed. by A. Meurman-Solin & A. Nurmi. http://www.helsinki.fi/varieng/journal/volumes/01/huber/

Kytö, M., P. Grund & T. Walker (2007). Regional variation and the language of English witness depositions 1560-1760: constructing a 'linguistic' edition in electronic form. Studies in Variation, Contacts and Change in English, Volume 2, ed. by P. Pahta et al. http://www.helsinki.fi/varieng/journal/volumes/02/kyto_et_al/

Labov, W. (1994). Principles of Linguistic Change. Vol. 1: Internal factors. Oxford: Blackwell.

Nevala, Minna (2004). Address in Early English Correspondence: Its Forms and Socio-pragmatic Functions. Mémoires de la Société Néophilologique de Helsinki 64. Helsinki: Société Néophilologique.

Nevalainen, T. (2010). Theory and practice in English historical sociolinguistics. Studies in Modern English 26: 1–24

Nevalainen, T. & H. Raumolin-Brunberg (2003). Historical Sociolinguistics: Language Change in Tudor and Stuart England. London: Longman.

Romaine, S. (1982). Socio-historical Linguistics: Its Status and Methodology. Cambridge: CUP.

Tagliamonte, S. (2008). Conversations from the speech community: Exploring language variation in synchronic dialect corpora. The Dynamics of Linguistic Variation, ed. by T. Nevalainen, I. Taavitsainen, P. Pahta & M. Korhonen, 107-128. Amsterdam/Philadelphia: Benjamins.