talk 4: linguistic annotation; customising the tei; real...

81
Linguistic Analysis Customising the TEI Real World TEI Talk 4: Linguistic Annotation; Customising the TEI; Real World TEI James Cummings 19 September 2013 @jamescummings 1/81

Upload: others

Post on 18-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Talk 4: Linguistic Annotation; Customising theTEI; Real World TEI

James Cummings

19 September 2013

@jamescummings 1/81

Page 2: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Linguistic Analysis

associating simple analyses and interpretations with textelementssemantic or syntactic interpretations which an encoder wishesto attach to all or part of a textmainly covering linguistic informationas often in the TEI, you can do the same thing in many ways:

using generic <seg> (任意の句レベルのテキスト単位を示す(要素segを含む))elements with @type attributesusing the straightforward analyses described hereusing the more powerful and general TEI Feature Structures

@jamescummings 2/81

Page 3: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

<teiCorpus> reminderGrouping documents into a corpus allows you to factor out themetadata they have in common:.

......

<teiCorpus xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><!-- shared metadata --></teiHeader><TEI><teiHeader>

<!-- specific metadata --></teiHeader><text>

<!-- ... --></text></TEI><TEI><teiHeader>

<!-- specific metadata --></teiHeader><text>

<!-- ... --></text></TEI></teiCorpus>

@jamescummings 3/81

Page 4: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

<particDesc> example (1)

<particDesc> (言語交流における,特定可能な発話者,声,その他の参加者を示す).

......

<particDesc xml:id="p2"><p>Female informant, well-educated, born in Shropshire UK,12 Jan 1950, of unknown occupation. Speaks French fluently.Socio-Economic status B2 in the PEP classificationscheme.</p></particDesc>

<particDesc> can just contain paragraphs of prose, or a more structured<person> element in <listPerson>

@jamescummings 4/81

Page 5: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

<particDesc> example (2).

......

<particDesc><listPerson><person xml:id="SL"><persName><forename>Stuart</forename><surname>Lee</surname></persName><note><reftarget="http://users.ox.ac.uk/~stuart/Site/About_Me.html"> Stuart

Lee's home page</ref></note>

<!-- We could give more details about Stuart here --></person><person xml:id="IH"><persName><forename>Ian</forename><surname>Hislop</surname></persName><note><reftarget="http://en.wikipedia.org/wiki/Ian_Hislop"> Ian Hislop's entry

in Wikipedia</ref></note></person></listPerson></particDesc>

@jamescummings 5/81

Page 6: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Basic linguistic units

To mark up text for linguistic purposes:<s> (s-unit) 文に相当するテキスト単位を示す

<cl> (clause) 言語学上の節を示す

<phr> (phrase) 文法上の句を示す

<w> (word) 文法上の語を示す(正書形である必要はない)

<m> (morpheme) 言語学上の形態素を示す

<c> (character) 文字を示す

From the att.segLike class, these elements all have @type and@function attributes

@jamescummings 6/81

Page 7: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Example of linguistic markup

Compare.

......<u>Like a suck of one of my sweets?</u><u>No I don't take sweets from strangers, oh God</u>

with....

@jamescummings 7/81

Page 8: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Another example of linguistic markup.

......

<u who="PS1K5"><s n="5963"><w type="AV0">Like</w><w type="AT0">a</w><w type="NN1">suck</w><w type="PRF">of</w><w type="CRD">one</w><w type="PRF">of</w><w type="DPS">my</w><w type="NN2">sweets</w> ?</s>

</u><u trans="smooth" who="PS1BY"><s n="5964"><w type="ITJ">No </w><w type="PNP">I </w><w type="VDB">do</w><w type="XX0">n't </w><w type="VVI">take </w><w type="NN2">sweets </w><w type="PRP">from </w><w type="NN2">strangers</w><c type="PUN">, </c><w type="ITJ">oh </w><w type="NP0">God</w></s></u>

(from British National Corpus, KSV 5963)

@jamescummings 8/81

Page 9: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Stand-off interpretation

When inline markup is inappropriate, the <span>(テキスト部分に解釈的注釈を関連づける) element can be used to make ad hocremarks about bits of text, linked to by @xml:id. And <spanGrp>(要素spanをまとめる) is available to group assertions together..

......

<sp><speaker>CORNWALL</speaker><ab xml:id="eye_start">Lest it see more, prevent it. Out, vile jelly!</ab><ab>Where is thy lustre now?</ab></sp><sp><speaker>GLOUCESTER</speaker><ab>All dark and comfortless. Where's my son Edmund?</ab><ab>Edmund, enkindle all the sparks of nature,</ab><ab xml:id="eye_end">To quit this horrid act.</ab></sp><span from="#eye_start" to="#eye_end">the eye is pulled out</span>

@jamescummings 9/81

Page 10: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Stand-off interpretation with <interp>

The <interp> (あるテキスト部分とリンクする,特定の解釈的注釈をまとめる) element isused to encode an interpretation. The global @ana attribute can pointfrom the text to such an interpretation (or a taxonomy classification):.

......

<ab n="2Sam_12:14"><gap/>by this deed thou hast given great occasion to the enemies of the LORDto blaspheme, the child also that is born unto thee shall surely die.</ab><ab n="2Sam_12:15"><gap/>And the LORD struck the child that Uriah's wife bare unto David<gap/></ab><gap/><ab n="2Sam_12:18" ana="#infanticide">And it came to pass on the seventh day,that the child died.</ab><!-- elsewhere in document --><interp resp="#SAB" xml:id="infanticide">Infanticide: God seems to likekilling children.</interp>

The <interpGrp> element is used to group interpretations together.

@jamescummings 10/81

Page 11: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Interpretation example (1)

In this example:A set of possible interpretations is defined, using <interp>elements<seg> is used to markup distinct portions of a narrative<s> is used to mark sentencesthe @ana attribute links sections or milestones to appropriateinterpretation

.

......

<interpGrp resp="#TMA" type="structuralUnit"><interp xml:id="INTRO">introduction</interp><interp xml:id="CONFLICT">conflict</interp><interp xml:id="CLIMAX">climax</interp><interp xml:id="REVENGE">revenge</interp><interp xml:id="RECONCIL">reconciliation</interp><interp xml:id="AFTERM">aftermath</interp></interpGrp>

@jamescummings 11/81

Page 12: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Interpretation example (2)

.

......

<p xml:id="PP1"><seg xml:id="SS1-SS3" ana="#INTRO"><s xml:id="SS1">Sigmund ... was a king in Frankish country.</s><s xml:id="SS2">Sinfiotli was the eldest of his sons.</s><s xml:id="SS3">Borghild, Sigmund's wife, had a brother ... </s></seg><s xml:id="SS4A" ana="#CONFLICT">But Sinfiotli ... wooed the same woman</s><s xml:id="SS4B" ana="#I3">and Sinfiotli killed him over it.</s><seg xml:id="SS5-SS17" ana="#CLIMAX"><s xml:id="SS5">And when he came home, ... she was obliged to accept it.</s><s xml:id="SS6">At the funeral feast Borghild was serving beer.</s><s xml:id="SS17">Sinfiotli drank it off and at once fell dead.</s></seg></p><anchor xml:id="NIL1" ana="#RECONCIL"/><p xml:id="PP2">Sigmund carried him a long way in his arms ... </p>

@jamescummings 12/81

Page 13: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

<taxonomy> Example

.

......

<taxonomy xml:id="part-of-speech"><category xml:id="adje"><catDesc>adjectives</catDesc><category xml:id="AJ0"><catDesc>adjective (unmarked) (e.g. GOOD, OLD)</catDesc></category><category xml:id="AJC"><catDesc>comparative adjective (e.g. BETTER, OLDER)</catDesc></category><category xml:id="AJS"><catDesc>superlative adjective (e.g. BEST, OLDEST)</catDesc></category></category><category xml:id="AT0"><catDesc>article (e.g. THE, A, AN)</catDesc></category><!-- ... --></taxonomy>

.

......<w ana="#AJ0">brilliant</w>

@jamescummings 13/81

Page 14: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Phrase segmentation

.

......

<s><cl type="finite-declarative" function="independent"><phr type="NP" function="subject">It</phr><phr type="VP" function="predicate"><phr type="V" function="verb-main">was</phr>also

<phr type="NP" function="predicate-nom.">a crucial year for me</phr></phr></cl></s>

@jamescummings 14/81

Page 15: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Words with lemmas and morphemes with types

.

......

<s xml:lang="la"><w lemma="timeo">timeo</w><w lemma="danaii">Danaos</w><w lemma="et">et</w><w lemma="donum">dona</w><w lemma="fero">ferentes</w></s>

or.

......

<w type="adjective"><m type="prefix" baseForm="con">com</m><m type="root">fort</m><m type="suffix">able</m></w>

.

......One could use @lemmaRef instead of @lemma to point to a word’slemma with a URI.

@jamescummings 15/81

Page 16: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Nested <w>

.

......

<s><w>I</w><w><w>did</w><m>n't</m></w><w>do</w><w>it</w><c>.</c></s>

@jamescummings 16/81

Page 17: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Word analysis

.

......

<s><w ana="#AT0">The</w><w ana="#NN1">victim</w><w ana="#POS">'s</w><w ana="#NN2">friends</w><w ana="#VVD">told</w><w ana="#NN2">police</w><w ana="#CJT">that</w><w ana="#NP0">Kruger</w><w ana="#VVD">drove</w><w ana="#PRP">into</w><w ana="#AT0">the</w><w ana="#NN1">quarry</w><w ana="#CJC">and</w><w ana="#AV0">never</w><w ana="#VVD">surfaced</w></s>

@jamescummings 17/81

Page 18: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Interpretation

.

......

<interpGrp type="POS"><interp xml:id="AT0">Definite article</interp><interp xml:id="AV0">Adverb</interp><interp xml:id="CJC">Conjunction</interp><interp xml:id="CJT">Relative that</interp><interp xml:id="NN1">Noun singular</interp><interp xml:id="NN2">Noun plural</interp><interp xml:id="NP0">Proper noun</interp><interp xml:id="POS">Genitive marker</interp><interp xml:id="PRP">Preposition</interp><interp xml:id="VVD">Verb past tense</interp></interpGrp>

@jamescummings 18/81

Page 19: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

More interpretation

.

......

<u xml:id="u1">Can I have ten oranges and a kilo of bananas please?</u><u xml:id="u2">Yes, anything else?</u><u xml:id="u3">No thanks.</u><u xml:id="u4">That'll be dollar forty.</u><u xml:id="u5">Two dollars</u><u xml:id="u6">Sixty, eighty, two dollars. Thank you.</u><spanGrp type="transactions"><span from="#u1">sale request</span><span from="#u2" to="#u3">sale compliance</span><span from="#u4">sale</span><span from="#u5">purchase</span><span from="#u6">purchase closure</span></spanGrp>

@jamescummings 19/81

Page 20: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

British National Corpus

a snapshot of British English, taken at the end of the 20thcentury100 million words in approx 4000 different text samples, bothspoken (10%) and written (90%)synchronic (1990-4), sampled, general purpose corpusavailable under licence; latest edition is BNC-XML (13 March2007)Part-of-speech and lemma taggingUses a variant of TEI XML originally called CDIF

@jamescummings 20/81

Page 21: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

BNC XML.

......

<div level="1" n="1" type="leaflet"><head type="MAIN"><s n="1"><w c5="NN1" hw="factsheet" pos="SUBST">FACTSHEET</w><w c5="DTQ" hw="what" pos="PRON">WHAT</w><w c5="VBZ" hw="be" pos="VERB">IS</w><w c5="NN1" hw="aids" pos="SUBST">AIDS</w><c c5="PUN">?</c></s>  </head><p><s n="2"><hi rend="bo">  <w c5="NN1" hw="aids" pos="SUBST">AIDS</w><c c5="PUL">(</c><w c5="VVN-AJ0" hw="acquire" pos="VERB">Acquired</w><w c5="AJ0" hw="immune" pos="ADJ">Immune</w><w c5="NN1" hw="deficiency" pos="SUBST">Deficiency</w><w c5="NN1" hw="syndrome" pos="SUBST">Syndrome</w><c c5="PUR">)</c></hi><w c5="VBZ" hw="be" pos="VERB">is</w><w c5="AT0" hw="a" pos="ART">a</w><w c5="NN1" hw="condition" pos="SUBST">condition</w>

<!-- ... --></s></p></div>

@jamescummings 21/81

Page 22: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Spoken Texts

A spoken text may contain any of the following components:utterancespausesvocalized but non-lexical phenomena such as coughskinesic (non-verbal, non-lexical) phenomena such as gesturesentirely non-linguistic incidents occurring during and possiblyinfluencing the course of speechwriting, regarded as a special class of incident in that it canbe transcribed, for example captions or overheads displayedduring a lectureshifts or changes in vocal quality

@jamescummings 22/81

Page 23: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

The notion of ”utterance”

<u> (一般に沈黙または話者交替が前後する,発話の時間)problematic, but pragmatica sequence of speech from a single speakermay be grouped into higher-level <div>sor fragmented into smaller segments <seg> or <s>the @who attribute points to speaker information

@jamescummings 23/81

Page 24: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Transcriptions of Speech

Elements defined: <broadcast>, <equipment>, <incident>,<kinesic>, <pause>, <recording>,<recordingStmt>, <scriptStmt>, <shift>, <u>,<vocal>, <writing>,

Classes defined: att.duration, model.divPart.spoken,model.global.spoken, model.recordingPart

@jamescummings 24/81

Page 25: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Simple examplesMixture of utterance and ‘paralinguistic’ information:.

......

<u who="#Jan">This is just delicious</u><incident><desc>telephone rings</desc></incident><u who="#Kim">I'll get it</u><u who="#Tom">I used to <vocal><desc>coughs</desc></vocal> smoke a lot</u><u who="#Bob"><vocal><desc>sniffs</desc></vocal>He thinks he's tough</u><vocal who="#Ann"><desc>snorts</desc></vocal><u who="#Tom">Yeah<kinesic><desc>gives uplifted middle finger sign</desc></kinesic></u>

@jamescummings 25/81

Page 26: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Back channelling

<vocal> (音声化されているが,必ずしも単語化される必要はない現象を示す.例えば,有声の間,単語化されない相づち,など).

......

<u who="#a">So what could I have done <vocal who="#b"><desc>tut-tutting</desc></vocal> about it anyway?</u>

@jamescummings 26/81

Page 27: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Example using other TEI elements.

......

<u who="#mar">you never <pause/> take this cat forshow and tell<pause/> meow meow</u><u who="#ros">yeah well I dont want to</u><incident><desc>toy cat has bell in tail which continuesto make a tinkling sound</desc>

</incident><u who="#ros">because it is so old</u><u who="#mar">how <choice><orig>bout</orig><reg>about</reg></choice><emph>your</emph> cat <pause/>yours is <emph>new</emph><kinesic><desc>shows Father the cat</desc></kinesic></u><u trans="pause" who="#fat">thats <pause/> darling</u><u who="#mar">no <emph>mine</emph> isnt oldmine is just um a little dirty</u>

@jamescummings 27/81

Page 28: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Shifts in voice qualityClassic multiple hierarchy problem

can use <shift> (発話者による一連の発話 (パラ言語) 素性が変化する場所を示す)or <milestone> (当該セクションが構造要素により表現することができない場合に,標準的な参照機能によりテキストの各種セクション間にある境界点を示す) to markboundaries...... or can use typed <seg> elements

useful also for code shifting.

......

<u who="#LB"><shift feature="loud" new="f"/>Elizabeth</u><u who="#EB">Yes</u><u who="#LB"><shift feature="loud"/>Come and try this <pause/><shift feature="loud" new="ff"/>come on!

</u>

@jamescummings 28/81

Page 29: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

<shift> vs <incident>

<shift> (発話者による一連の発話 (パラ言語) 素性が変化する場所を示す) vs <incident>(必ずしも言語化またはコミュニケーションには上らない現象や出来事を示す.例えば,偶発的な雑音またはコミュニケーションに影響を与える他の事象な ど) Compare:.

......

<u who="#a">Listen to this <shift new="reading"/>Thegovernment is confident, he said, that the current economicproblems will be completely overcome by June<shift/> whatnonsense!</u>

and.

......

<u who="#a">Listen to this<incident><desc>reads aloud from newspaper</desc></incident> what nonsense!</u>

@jamescummings 29/81

Page 30: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

<vocal> vs <u>

<vocal> (音声化されているが,必ずしも単語化される必要はない現象を示す.例えば,有声の間,単語化されない相づち,など) vs <u>(一般に沈黙または話者交替が前後する,発話の時間) Compare:.

......

<vocal who="#ann"><desc>snorts</desc></vocal>

and.

......

<u who="#ann"><vocal><desc>snorts</desc></vocal></u>

@jamescummings 30/81

Page 31: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

<writing> example

<writing> (発話テキストの中で参加者に示される,書かれたテキストの一節を示す).

......

<u who="#a">look at this</u><writing who="#a" type="newspaper" gradual="false">Government claims economic problems <soCalled>over byJune</soCalled></writing><u who="#a">what nonsense!</u>

@jamescummings 31/81

Page 32: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Timing issues

pausing: use <pause> elementduration: use @dur attributesynchronization: use @synch attributeoverlap: use @trans attribute

.

......

<u>Okay <pause dur="PT2M"/>U-m<pause dur="PT75S"/>the sceneopens up<pause dur="PT50S"/> with <pause dur="PT20S"/> um<pause dur="PT145S"/> you see a tree okay?</u>

@jamescummings 32/81

Page 33: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Overlap

Mutt: Have you heard the --Jeff: the election result?Mutt: It's a disaster!

.

......

<u who="#mutt">have you heard the</u><u trans="latching" who="#jeff">the election result</u><u who="#mutt">its a disaster</u><u who="#jeff" trans="overlap">its a miracle</u>

@jamescummings 33/81

Page 34: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Synchronization

.

......

<u who="#mutt">have you heard <anchor synch="#t1"/>the</u><u who="#jeff" synch="#t1">the election result</u><u who="#mutt" synch="#t2">its a disaster</u><u who="#jeff" synch="#t2">its a miracle</u><!-- Elsewhere in Document --><timeline origin="#t1"><when xml:id="t1"/><when xml:id="t2"/></timeline>

<timeline> (時間的なまとまりを示すために,発話テキストの要素をリンクすることができる,時間軸上の順序付き時点の集合を示す)<when> (同じ要素timeline中にある他の要素に対応する時点,または絶対的な時点を示す)

@jamescummings 34/81

Page 35: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Using Elements Seen Elsewhere...

.

......

<u><del type="truncation">s</del>see<del type="repetition">you you</del> you know<del type="falseStart">it's</del> he's crazy

</u>

.

......<gap reason="passing truck" extent="5" unit="s"/>

.

......

<u who="#P1">I proposed that <foreign xml:lang="de"> wirkönnen <pause dur="PT1S"/> vielleicht </foreign> go towarsaw and <emph>vienna</emph></u>

@jamescummings 35/81

Page 36: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Metadata: Participant Description.

......

<particDesc><listPerson><person xml:id="P-1234" sex="2" age="mid"><p>Female informant, well-educated, born in Shropshire

UK, 12 Jan 1950, of unknown occupation. Speaks Frenchfluently. Socio-Economic status B2.</p></person><person xml:id="P-4332" sex="1"><persName><surname>Hancock</surname><forename>Antony</forename><forename>Aloysius</forename><forename>St John</forename></persName><residence notAfter="1959"><address><street>Railway Cuttings</street><settlement>East Cheam</settlement></address></residence><occupation>comedian</occupation></person></listPerson></particDesc>

@jamescummings 36/81

Page 37: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Metadata: <scriptStmt> Example

<scriptStmt> (発話テキストで使われている台本の詳細に関する引用を示す).

......

<sourceDesc><scriptStmt xml:id="CNN12"><bibl><author>CNN Network News</author><title>News headlines</title><date when="1991-06-12">12 Jun 91</date></bibl></scriptStmt></sourceDesc>

@jamescummings 37/81

Page 38: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Metadata: <recordingStmt> Example

<recordingStmt> (発話テキストの転記の元になる録音,録画されたものを示す).

......

<recordingStmt><recording type="audio" dur="P30M"><respStmt><resp>Location recording by</resp><orgName>Sound Services Ltd.</orgName></respStmt><equipment><p>Multiple close microphones mixed down to stereo

Digital Audio Tape, standard play, 44.1 KHz samplingfrequency</p></equipment><date>12 Jan 1987</date></recording></recordingStmt>

@jamescummings 38/81

Page 39: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Detailed <recording><recording>(発話テキストの元資料として使われる,直接または放送から事象を録音,録画したものの詳細を示す).

......

<recording type="audio" dur="P10M"><equipment><p>Recorded from FM Radio to digital tape</p></equipment><broadcast><bibl><title>Interview on foreign policy</title><author>BBC Radio 5</author><respStmt><resp>interviewer</resp><name>Robin Day</name></respStmt><respStmt><resp>interviewee</resp><name>Margaret Thatcher</name></respStmt></bibl></broadcast></recording>

@jamescummings 39/81

Page 40: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Customising the TEI

.

......Every use of the TEI involves making use of a customisation.

@jamescummings 40/81

Page 41: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Some terminologyThe TEI encoding scheme consists of a number of modulesEach module contains a number of element specificationsEach element specification contains:

a canonical name (<gi>) for the element, and optionally othernames in other languagesa canonical description (also possibly translated) of its functiona declaration of the classes to which it belongsa definition for each of its attributesa definition of its content modelusage examples and notes

a TEI schema specification (<schemaSpec>) is made byselecting modules or elements and (optionally) modifying theircontentsa TEI document containing a schema specification is called anODD (One Document Does it all)

@jamescummings 41/81

Page 42: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

What is a module?

A convenient way of grouping together a number of elementdeclarationsThese are usually on a related topic or specific applicationMost chapters of P5 focus on elements drawn from a singlemodule, which that chapter then definesA TEI Schema is created by selecting modules and adding orremoving elements from them as needed

@jamescummings 42/81

Page 43: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Which modules exist?Module name Chapteranalysis Simple Analytic Mechanismscertainty Certainty and Responsibilitycore Elements Available in All TEI Documentscorpus Language Corporadictionaries Dictionariesdrama Performance Textsfigures Tables, Formulae, and Graphicsgaiji Non-standard Characters and Glyphsheader The TEI Headeriso-fs Feature Structureslinking Linking, Segmentation, and Alignmentmsdescription Manuscript Descriptionnamesdates Names, Dates, People, and Placesnets Graphs, Networks, and Treesspoken Transcriptions of Speechtagdocs Documentation Elementstei The TEI Infrastructuretextcrit Critical Apparatustextstructure Default Text Structuretranscr Representation of Primary Sourcesverse Verse

@jamescummings 43/81

Page 44: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

How do you choose?

Just choose everything (not really a good idea)The TEI provides a small set of predefined combinations (TEILite, TEI Bare...)Or you could roll your own (but then you need to know whatyou’re choosing)

Roma a command line script, with a web front end,designed to make this process much easier

http://www.tei-c.org/Roma/

@jamescummings 44/81

Page 45: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: New

@jamescummings 45/81

Page 46: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: Customize

@jamescummings 46/81

Page 47: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: Schema

@jamescummings 47/81

Page 48: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: Documentation

@jamescummings 48/81

Page 49: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

What did we just do?We processed a pre-existing ODD file which contained (as well assome discursive prose) the following schema specification:.

......

<schemaSpec ident="tei_bare" start="TEI"><moduleRef key="core"/><moduleRef key="tei"/><moduleRef key="header"/><moduleRef key="textstructure"/><elementSpec ident="abbr" mode="delete" module="core"/><elementSpec ident="add" mode="delete" module="core"/><!-- ... --><elementSpec ident="trailer" mode="delete" module="textstructure"/><elementSpec ident="title" mode="change" module="core"><attList><attDef ident="level" mode="delete"/></attList></elementSpec><!-- ... --></schemaSpec>

We selected four modules, deleted loads of elements, and alsodeleted an attribute.

@jamescummings 49/81

Page 50: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma provides an interface to the detail

The [Modules] tab shows the modules availableSelecting a module from it shows the elements within thatmodule, and gives you the choice to

include all of them (and then remove some)exclude all of them (and then put back the ones you want)

You can also change an element’s attribute list, and thevalues they permit

@jamescummings 50/81

Page 51: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: Modules

@jamescummings 51/81

Page 52: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: Change Module

@jamescummings 52/81

Page 53: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

The ODD advantageWe can express these constraints in our ODD meta-schema, andthen generate a formal schema to enforce them using whicheverschema language we like.

TEI schemas can be generated inISO RELAX NG languageW3C Schema LanguageXML DTD language

ODD itself defines an element’s content models using a subsetof RELAX NG syntaxDatatypes are defined in terms of W3C datatypesSome facilities (e.g. alternation, namespaces) cannot beexpressed in DTDs — RELAX NG schema is recommendedAdditional constraints stored in the TEI ODD can beexpressed in Schematron

@jamescummings 53/81

Page 54: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: selecting attributes

@jamescummings 54/81

Page 55: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Roma: constraining attribute values

@jamescummings 55/81

Page 56: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

What did we just do?Our ODD now includes something like this:.

......

<elementSpec ident="div" mod-ule="textstructure" mode="change"><attList><attDef ident="type" mode="change" usage="req"><valList type="closed" mode="replace"><valItem ident="prose"/><valItem ident="verse"/><valItem ident="drama"/>

<!-- ... --></valList></attDef></attList></elementSpec>

Note that we can also add documentation to the ODD:.

......

<valItem ident="verse"><gloss>contains (parts of ) a poem</gloss></valItem>

@jamescummings 56/81

Page 57: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Real World TEI

Main website for the TEI communityRuns version of Roma, OxGarage, etc.Maintains a vault of old material and in /Vault/P5/ of all P5releasesURL: http://www.tei-c.org/

@jamescummings 57/81

Page 58: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

http://www.tei-c.org/

@jamescummings 58/81

Page 59: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/index.html

@jamescummings 59/81

Page 60: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/MS.html

@jamescummings 60/81

Page 61: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/MS.html#msdates

@jamescummings 61/81

Page 62: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/ref-history.html

@jamescummings 62/81

Page 63: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Other sources of information

In addition to the TEI-C Website, the TEI Community has varioustools and sources of information:

an active and extremely helpful mailing list TEI-La detailed wiki http://wiki.tei-c.org/a SourceForge project tei.sourceforge.netwhere you can see allthe work that is done, report bugs, request features, etc.TEI-C Stylesheets (transform to HTML, PDF, ePub, etc.)TEI By Example: http://tbe.kantl.be/TBE/

@jamescummings 63/81

Page 64: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

There are also tools like: OxGarage

An experimental RESTful web service to managetransformations between various formatsUses TEI P5 XML as a pivot formatJoins conversions together into a pipelineCan be installed locally (but non-trivial)http://www.oucs.ox.ac.uk/oxgarage/

@jamescummings 64/81

Page 65: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

@jamescummings 65/81

Page 66: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

@jamescummings 66/81

Page 67: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

TEI Wiki

Open to the TEI community for editingSample files, Stylesheets, CustomisationsFAQs, SIGs work, Information on ToolsAnd much more...http://wiki.tei-c.org/

@jamescummings 67/81

Page 68: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

http://wiki.tei-c.org/index.php/Main_Page

@jamescummings 68/81

Page 69: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

TEI on SourceForge

Main place of work for the TEI Technical Council (aside frompublicly archived mailing list)Place to submit bugs and feature requestsPlace to download the official releasesWhere the TEI Subversion repository is (up-to-minutechanges)http://tei.sourceforge.net/

@jamescummings 69/81

Page 70: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

http://tei.sourceforge.net

@jamescummings 70/81

Page 71: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Oxford Corpus of Old JapaneseOCOJ is a long-term research project to create an annotatedcorpus of all Old Japanese textsThe texts come primarily from the Asuka and Nara periods(7-8th Century AD)Parallel romanization in a phonemic transcription forphonology of Old JapaneseOCOJ uses the Frellesvig & Whitman system of transcriptionLed by Professor Bjarke Frellesvig with Dr Stephen WrightHorn and Dr Kerri L Russell from Oxford’s Faculty of OrientalStudiesCreated in collaboration with NINJAL in TokyoWhat I did: converted MS Word transcriptions (andtransliterations) to richly marked up TEI XML

.

......http://vsarpj.orinst.ox.ac.uk/corpus/

@jamescummings 71/81

Page 72: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

OCOJ Example

.

......

”It should be better to drink saké and weep drunkenly than talkingin a clever fashion”「さかしみと 物言ふよりは 酒飲みて 酔い泣きするし まさりたるらし」

@jamescummings 72/81

Page 73: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Oxford Corpus of Old Japanese

@jamescummings 73/81

Page 74: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

OCOJ Markup Example

@jamescummings 74/81

Page 75: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

OCOJ Markup Example

@jamescummings 75/81

Page 76: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

William Godwin Diary

William Godwin:1756-1836, philosopher, writer, political activist,husband of Mary Wollstonecraft, father of Mary Godwin (akaMary Wollstonecraft Shelley).

Project: Led by the Politics Department: Dr Mark Philp, DrDavid O’Shaughnessy, two students + othersObjectives include:

provide a browsable/searchable transcription with digitalimages of the diaryresearch to identify people mentioned (over 64000 nameinstances) in the 48 years of diary;

http://godwindiary.bodleian.ox.ac.uk/

@jamescummings 76/81

Page 77: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

William Godwin Diary

@jamescummings 77/81

Page 78: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Wandering Jew’s Chronicle Archive

@jamescummings 78/81

Page 79: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Workshop ConclusionThis workshop has just quickly seen some of the basics of the TEI,but the TEI Guidelines are all freely available online for you to readat a much slower pace!Finally(!) Remember that:

All course materials including:All slides from lecturesAll exercisesAll materials for the exercises

are available at: http://tei.it.ox.ac.uk/Talks/All the slides, exercises, and some materials are licensed witha Creative Commons Attribution license, which means theyare freely available for re-use (though do let us know!)Email: [email protected] or the TEI-L mailing list

@jamescummings 79/81

Page 80: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

After the workshop...

After the workshop, if you have questions about:The workshop materials or teaching other workshops:[email protected] TEI generally by joining: [email protected]

If you mail the TEI-L mailing list it is better because:we’ll still try to answer as well as we would privatelyyou get answers not only from us, but TEI experts around theworldquestions from those of all levels of ability stop the listbecoming too technicaleveryone benefits from having the answers be public – and youbenefit by reading (and sometimes answering!) others’problems

@jamescummings 80/81

Page 81: Talk 4: Linguistic Annotation; Customising the TEI; Real ...tei.oucs.ox.ac.uk/Talks/2013-09-japan/talk4.pdf · 100 million words in approx 4000 different text samples, both spoken

Linguistic Analysis Customising the TEI Real World TEI

Questions?

[email protected] mailing [email protected] ask now if we have time!

@jamescummings 81/81