annotating the hkcse pragmatically martin weisser visiting professor school of english and education...
TRANSCRIPT
![Page 1: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/1.jpg)
Annotating the HKCSE Pragmatically
Martin WeisserVisiting Professor
School of English and EducationGuangdong University of Foreign Studies
mail: [email protected] web: martinweisser.org
![Page 2: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/2.jpg)
Outline
• The Conversion Process• Pre-processing Requirements• Annotation & Post-processing• Searching & Exploring the Corpus• Conclusion
![Page 3: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/3.jpg)
The Conversion Process I – Issues
• how to convert to DART XML format?– identify original conventions
• some documented in Cheng et al. (2008)• some undocumented
– use tone unit marking?• unfortunately tone units in Brazil’s system for ‘discourse intonation’ ≠ C-
units • → no ‘sentence’ intonation inferable directly
– remove prosodic information, apart from stress and tone movements, to ensure readability
– handle overlap• exact extent not marked or inferable • → better to delete
– etc.
![Page 4: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/4.jpg)
The Conversion Process II – Original Format
![Page 5: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/5.jpg)
The Conversion Process III – the Conversion Editor
original input fileconversion result
view
conversion script editor
save output
![Page 6: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/6.jpg)
The Conversion Process IV – Conversion Results
converted to DART XML format
retained stress marking
converted & moved tone marking
converted ‘non-speech’ to comments
added gender attribute
added speaker type attribute
moved pauses to next turn
![Page 7: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/7.jpg)
Pre-processing Requirements
• creating new resources in/for DART– adapt DART modules to handle mixed case– ‘synthesise’ domain-specific lexicon– create domain-specific topic ‘thesaurus’
• pre-processing– fix conversion errors– identify/mark incomplete words– split turns– add punctuation, partly based on original prosodic features– etc.
![Page 8: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/8.jpg)
Annotation & Post-processing I –Steps
• annotation in DART– fully automated– less than 80 sec for
• 24 files• ~72,100 words• ~10,300 C-Units
• Post-processing to fix potential errors on the levels of– syntax: potentially missing syntax rules– pragmatics: missing inferencing rules or modes (‘IFIDs’)– semantics: incorrectly identified topics
![Page 9: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/9.jpg)
Annotation & Post-processing II –Annotation Result
identified syntacticcategory
automatically split off DM
annotated identifiablespeech acts
![Page 10: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/10.jpg)
Searching the Corpus
• easily searchable via DART– speech act stats hyperlinked to concordancer– formulaic patterns or disfluencies via n-grams– manual searches in concordancer for specific
• speech acts• syntactic categories + speech acts• speech acts + speaker types• speech acts + gender• responses to questions• searches for specific tone features
![Page 11: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/11.jpg)
Conclusion
• DART annotation enriches the HKCSE through– adding syntactic and pragmatic annotation– ability to analyse features based on (functional) C-
units, rather than intonation units– new search options based on the above features
![Page 12: Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail: weissermar@gmail.comweb:](https://reader036.vdocument.in/reader036/viewer/2022062517/56649f155503460f94c2a3da/html5/thumbnails/12.jpg)
References
• Cheng, W. Greaves, C. and Warren, M. 2008. A Corpus-driven Study of Discourse Intonation: the Hong Kong Corpus of Spoken English (prosodic). Amsterdam/Philadelphia: John Benjamins.
• Weisser, M. 2010. Annotating Dialogue Corpora Semi-Automatically: a Corpus-Linguistic Approach to Pragmatics. Unpublished Habilitation (professorial) thesis, University of Bayreuth.
• Weisser, M. 2012; forthcoming 2014. Pragmatic annotation. In: Aijmer, K. & Rühlemann, C. (Eds.). Corpus Pragmatics: a Handbook. Cambridge: CUP.
• Weisser, M. 2014. The DART Manual.• Weisser, M. (in progress). DART – the Dialogue Annotation and
Research Tool.