investigating speech, thought and writing presentation in a corpus of spoken british english an ahrb...
TRANSCRIPT
![Page 1: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/1.jpg)
Investigating speech, thought and writing presentation in a corpus of spoken British English
An AHRB funded project under the supervision of
Mick Short, Elena Semino and Tony McEnery
Research Assistants
John Heywood and Dan McIntyre
![Page 2: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/2.jpg)
Project outline To compare speech, thought and writing
presentation in spoken and written English. To build a new corpus of 260,000 words of
spoken British English to compare with the ST&WP Written English Corpus (1995-99).
To investigate the presentation of speech, thought and writing in the ST&WP Spoken Corpus by tagging with the Leech and Short (1981) category set.
To further test and adapt the Leech and Short (1981) model of S&TP.
The project is funded until February 2003.
![Page 3: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/3.jpg)
Construction of the corpus 120 texts - approximately 260,000
words. Texts rich in ST&WP taken from the
British National Corpus (BNC) and the Centre for North West Regional Studies (CNWRS) oral history archives at Lancaster University.
CNWRS interview tapes digitised to be time-aligned with text.
![Page 4: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/4.jpg)
Number and distribution of NWRS files in the corpus
NWRS Archive
Family and Social Life Archive Childhood and Schooling Archive
Male Female Male Female
1890-1940 1940-1970 1890-1940 1940-1970
7 records 7 records 8 records 8 records 15 records 15 records
i.e. 60 files with an equal balance of male and female speakers in each age-range
![Page 5: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/5.jpg)
Number and distribution of BNC files in the corpus
BNC spoken data
Spoken Demographic Spoken Context- Governed
Male Female
0-14 15-24 25-34 35-44 45-59 60+ 0-14 15-24 25-34 35-44 45-59 60+
5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files i.e. 60 files with an equal balance of male and female speakers in each age-range
![Page 6: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/6.jpg)
The development of the tag-set
N NV NRSA-P NRS/IS FIS NRS/DS FDS
N NI NRTA-P NRT/IT FIT NRT/DT FDT
N NW NRWA-P NRWS/IW
FIW NRW/DW FDW
NRA NRSA NRS/IS FIS NRS/DS FDS
NRTA NRT/IT FIT NRS/DT FDT
Leech & Short (1981)
The ST&WP Written Project (1995…)3 main genres: Fiction, Biography & Autobiography, and Newspaper Journalism: each divided into Serious/Popular sections.
embedded, hypothetical, inferred, quote
![Page 7: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/7.jpg)
The development of the tag-set – new tags
RM
A RV RSA-P RS/IS FIS RS/DS FDS
A RI RTA-P RT/IT FIT RT/DT FDT
A RN RWA-P RW/IW FIW RW/DW FDW
The ST&WP Spoken Project (2001)BNC spoken demographic data and NWRS oral history interviews
embedded, negative / absence, hypothetical, inferred, quote, reiterated, interrogative, imperative, uncompleted, 2 / 3 / 4
![Page 8: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/8.jpg)
A 15-field tag-set: 5 main categories
FIELD CHARACTER ‘VALUE’
1 x, A, F, Anything! Free
2 x, #, R, I, D Representation, Indirect, Direct
3 x, S, T, W, V, I, N, M Speech, Thought, Writing, Voice, Internal state, WritiNg, Mention
4 x, A Act
5 x, P toPic
![Page 9: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/9.jpg)
A 15-field tag-set: 10 category attributes
FIELD CHARACTER ‘VALUE’
6 x, #, 1, 2, 3, 4
# = odd interesting borderline cases, no.s = repeated (-ing or –ed) adjacent categories
7 xe embedded
8 xxg/a negative action etc e.g. 'we weren't allowed to go', absence eg 'I didn't say anything'
9 xxxh hypothetical
10 xxxxi inferred
11 xxxxxq quote
12 xxxxxxr iterative
13 xxxxxxxv/p interrogative, imperative
14 xxxxxxxxu uncompleted
15 xexxxxxxx2 level of embedding (2, 3, 4)
![Page 10: Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short,](https://reader036.vdocument.in/reader036/viewer/2022082917/5515f730550346cf6f8b56db/html5/thumbnails/10.jpg)
Issues arising Technical issues:
Legibility. Comparability between NWRS and BNC data.
Tagging issues: Comparability between written and spoken corpora. What counts as ST&WP? Functional and formal criteria. Embedding. Repetition (e.g. he said he said well he said). Report of ‘mention’. Reading, hearing, listening and singing dogs!