paul thompson applied linguistics ([email protected])

12
Paul Thompson Applied Linguistics ([email protected]) Corpora: Resources for the study of language

Upload: waneta

Post on 20-Jan-2016

27 views

Category:

Documents


1 download

DESCRIPTION

Corpora: Resources for the study of language. Paul Thompson Applied Linguistics ([email protected]). British Academic Spoken English corpus (BASE). 160 lectures, 39 seminars Transcripts, video and audio 199 XML files: Transcripts with detailed annotation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

Paul ThompsonApplied Linguistics

([email protected])

Corpora: Resources for the study of language

Page 2: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

160 lectures, 39 seminars Transcripts, video and audio 199 XML files:

Transcripts with detailed annotation Metadata included in header

160 lecture transcripts are tagged for Part-of-Speech

www.reading.ac.uk/AcaDepts/ll/base_corpus/ Funded by AHRB, Euralex, BALEAP and university

sources

Page 3: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

A corpus of assessed student writing at university level

Texts collected at Warwick, Reading and Oxford Brookes University

Funded by Economic and Social Research Council of England (ESRC)

RES-000-23-0800

Page 4: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

6.5 million words 2,896 texts

2,761 assignments XML files, POS-tagged

30+ disciplines 4 levels of study

Page 5: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

Query interface:

Sketch Engine

Commercial service: Applied Linguistics

pays annual subscription

Page 6: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)
Page 7: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)
Page 8: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

Level Raw Rel %

3 225 121.7

2 275 107.7

1 255 96.0

PG 66 62.1

Page 9: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

BASE: Linking audio and video to the transcripts, either online or on hard drives Insertion of timestamp data into transcripts

Example Why?

Access to temporal, spatial, paralinguistic, phonological information

Studies of speech rate, for example

Page 10: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

Comparison between languages Historical linguistics Stylistics Studies of language in use Specialised language use [eg, doctor-

patient interactions] Investigations of multimodality

Page 11: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

PhD thesis corpus Electronic submission

Academic speech events Seminars, tutorials, etc

Student use of computers in preparing assignments [video and text]

Reading and writing of undergraduates

Page 12: Paul Thompson Applied Linguistics (p.a.thompson@reading.ac.uk)

Hosting corpus resources at Reading or other university – preferably on Linux servers – with customisable interfaces BASE, BAWE, and other corpora that Reading

possesses For use by all departments at Reading and also

elsewhere Varied levels of user access Centralised support needed – lack of continuity

with project staff