multimodal annotation guidelines - wordpress.com...noises occurring within a pause), to be...
Post on 22-May-2020
3 Views
Preview:
TRANSCRIPT
BollyCorpAGEst©2016
AnnotationManual–Version1.3Thismanualwaslastupdatedon2016-05-02
CorpAGEst(2013-2015):“Acorpus-basedmultimodalapproachtothepragmaticcompetenceoftheelderly”
PeopleMarieCurieActions(PIEF-GA-2012-328282)
Multimodalannotationguidelines
II.Speechannotationguidelines(Praatprogram/EasyAlignplugin/ELANsoftware)
OriginalAuthor:CatherineBollyWiththecollaborationof:JulieKairet
Contact:catherine.bolly@uclouvain.be;catherinebolly@hotmail.comWeb:http://corpagest.orgLogo:©TheShelfCompanyhttp://www.theshelf.fr
BollyCorpAGEst©2016 1
ContentIntroduction...........................................................................................................2F TheCorpAGEstproject..........................................................................................................................2F Multimodalannotation:fromform-basedtofunction-basedanalysis.............................2
II.Speechannotationguidelines............................................................................4Generalprinciples..............................................................................................................................................4
1.Speechtranscription(Praatsoftware)................................................................4Transcriptionprinciples.................................................................................................................................4
1.1.TierstructureinPraat.....................................................................................5
1.2.Conventionsoftranscription...........................................................................5Transcriptionprinciples.................................................................................................................................51.3.Text/soundanonymization..............................................................................9Anonymizationprinciples..............................................................................................................................9
1.4.Transcriptionrevision....................................................................................101.4.1.Revisionwiththesoundsignal......................................................................................................101.4.2.Revisionwithoutthesoundsignal...............................................................................................10
1.5.Transcriptionexport(Transformerprogram).................................................101.5.1.Visualizationoftherawtextoutput............................................................................................111.5.2.VisualizationoftheXMLoutput....................................................................................................11
2.Text-soundalignment......................................................................................12
2.1.Automaticalignment(EasyAlignplugin)........................................................122.1.1.Creationofa“simplified”textgrid................................................................................................122.1.2.Isolationofeachspeakerinonetextgrid..................................................................................132.1.3.HowtousetheEasyAlignplugin..................................................................................................132.2.Manualverification.......................................................................................152.2.1.Towardthefinaltextgrid.................................................................................................................15
BollyCorpAGEst©2016 2
IntroductionF TheCorpAGEstproject The CorpAGEst project (“A corpus-based multimodal approach to the pragmatic competence of theelderly”)aimsatestablishing thegesturalandverbalprofileofveryoldpeople inaging, lookingat theirpragmatic competence from a naturalistic perspective. The CorpAGEst assumption is that multimodal(inter)subjective markers of stance are highly relevant cues for the measurement of communicativecompetenceinlaterlife.Theprojectaimsinfineatabetterunderstandingofthewayinwhichtheverbalandgesturaldimensionsinteracttomakesenseinreal-worldsettings(thusgoingfarbeyondthespecificscope of the present project). This project has received funding from the European Union SeventhFrameworkProgramme([FP7/2007-2013])undergrantagreementn°[PIEF-GA-2012-328282].
The CorpAGEst corpus (Bolly & Boutet, forthcoming) is comprised of face-to-face conversationsbetweenanadultandaveryoldsubject(75y.oldandmore)livingathomeorinaresidentialhome.Thecorpusdataconsistofsemi-directed interviews,whichhavebeenaudioandvideorecorded, transcribedandalignedtothesoundsignal.Thecorpusistwo-fold,includingtransversalandlongitudinalsubcorpora.Contextual independent variables are part of the corpus design, namely environment (private vs.residentialhome),thesocialtiebetweentheparticipants(familiarvs.unknowninterviewer)andthetasktype(focusingonpasteventsvs.present-daylife).ThecorpusispartoftheinternationalCLAReinitiative(“CorporaforLanguageandAgingResearch”),whichcombinesmethodsinlinguisticsandissuesinaging,andadvocatesformorecorpus-based“naturalistic”approachesinthefield.
Themultimodaldata (text, soundandvideo)werealignedto thesoundsignal inpartitionmode,usingthePraatprogram(Boersma&Weernink,2014),theEasyAlignplugin(Goldman,2011),andtheELANsoftware (Wittenburg et al., 2006). The transcription standards adopted for the oral component wereslightlyadaptedfromthoseoftheValibelresearchcenter(Disteretal.,2007[2009]),asdescribedinthepartofthemanualdedicatedtospeech.
Forfurtherdetail,seetheprojectwebsite:http://corpagest.org.
F Multimodalannotation:fromform-basedtofunction-basedanalysis
TheperspectiveadoptedintheCorpAGEstprojectisaform-basedone(seeMülleretal.,2013),extendedandappliedtofacialexpressions,gaze,handgestures,andbodygestures(viz.head,shoulders,torso,legs,feet).Notably, theannotationprocedure laysona tripleprinciple, according towhichavisibleaction isconsideredasapotentiallymeaningfulgestureunit in theongoing flowof interaction: (i) the“visibility”criterion:identificationofallactionsthatarevisibleintheinteractionflow,throughtheeyesofthecamerarecorder and through those of the analyst; (ii) the “meaning potential” criterion: from the semantic-pragmatic perspective, every visible action identified must potentially convey one semantic-pragmaticmeaning in the particular context of its realization (thus also including beats, adaptors, deictic, andinteractivegestures), fromthepointofviewof theanalyst; (iii) the“formaldistinctiveness”criterion: todistinguish between consecutive moves in the interaction flow, there must be at least one change informal/physiologicalparameters(e.g.shapeforthehand,directionofthehead,etc.),bycomparisonwiththeprecedingandfollowinggesturephaseormove.
Atthelevelofspeech,theprotocolfordiscoursemarkersidentificationandannotationfollowstheone developedwithin theMDMA project (“Model for DiscourseMarker Annotation” – see Bolly et al.,2015;Bollyetal.,forthc.).ThemethodologyofMDMAstartsfromanindependentselectionofcandidatediscourse markers by several expert coders, which then undergo syntactic and semantic descriptionthroughanoperationalannotationmodel.Aspecificsectionofthemanual(stillinprogress)isdedicatedtospeechtranscription(viaPraat),alignment(viaEasyAlign),andannotation(inELAN).
BollyCorpAGEst©2016 3
Startingwithmono-modalanalyses(gesturevs.speech)andfocusingononegroupofarticulatorsatatimewithin eachmodality (viz. face, gaze, head, shoulders, torso, hands, legs, and feet – for the nonverbalmode), the annotationprocedure nextmoves to amultimodal and functional perspective onpragmaticcues (viz. emotions and (non)verbal pragmatic markers. The model for the annotation of pragmaticfunctionsispartoftheMDMAproject(seeabove)andisacollaborativework(seeBolly&Crible,Antwerp2015),whichhasbeendevelopedtobetransferrabletoseveralmodalitiesandlanguages(seeBollyetal.,Göttingen2015).
Modality:nonverbal/gesture Articulators
FO
RM-BASED:PARAM
ETERANALYSIS
1.Facialdisplays Eyebrows Eyes Gaze Mouth2.Handgestures Hands3.Bodygestures Head Shoulders Torso Legs FeetModality:verbal/speech Levelsofanalysis PragmaticmarkersFUNCTION-BASEDANALYSIS-Multimodalannotationofemotions-Multimodalannotationofpragmaticfunctions
Table1.Form-basedandfunction-basedapproachtocorpusdatainCorpAGEst
FORM
-BASED:PARAMETERAN
ALYSIS
FORM
-BASED:PARAMETERAN
ALYSIS
BollyCorpAGEst©2016 4
II.SpeechannotationguidelinesGeneralprinciplesHowtonamespeechfiles,stepbystep,duringdatatreatmentinordertokeeptracesoftheworkinprogress?(Thisisasuggestion,notarecommendation,toadaptaccordingtoresearchneeds…)
ageBN1r-1_S2_transcript_Ju_20150313_EC
Ongoingphaseoftranscription(“EC”=“EnCours”)byJulie(“Ju”).Filelastlysavedon13/03/2015.Recording,workingdata:secondsample(“S2”)ofthefirstinterview(“r-1”)withNadine(“ageBN1”).
ageBN1r-1_S2_transcript_ok Transcriptionofthesampledone(“ok”)andideallyrevisedbyasecondcoder.
ageBN1r-1_S2_align_Ju_20150313_EC
Ongoingphaseofalignment(“EC”=“EnCours”)byJulie(“Ju”).
ageBN1r-1_S2_align_ok Automaticalignmentfinished,includingtheaposteriorimanualtrackofauto-generatederrors(e.g.“mm”erroneouslytransformedinto“millimeter”bytheEasyAlignplugin).
ageBN1r-1_S2_L1L2_aligned_ok Outputfileresultingfromthealignmentprocedure.ageBN1r-1_S2_L1_aligned_ok Outputfilecreatedtoobtainonefileperspeaker(amongothers,
usefulforin-depthprosodicanalyses).Here,theL1speakerreferstotheolderspeaker(viz.theinformant).
ageBN1r-1_S2_L2_aligned_ok Outputfilecreatedtoobtainonefileperspeaker(amongothers,usefulforin-depthprosodicanalyses).Here,theL2speakerreferstotheyoungerspeaker(viz.theinterviewconductor).
1.Speechtranscription(Praatsoftware)ThesoundsampleswillbetranscribedusingthesoftwarePraat.Theuseofhigh-qualityheadphoneisrequired.TranscriptionprinciplesHowtocreateafileinPraat,thatis,howtocreateanew“Textgrid”?
1) OpenthefileinPraat(“Open”>“Readfromfile”)2) CreateaTextgrid(“Annotate”>“Textgrid”)3) OpentheTextgridandthesoundbyselectingtogetherthetextgridandthesound(>“View&Edit”)4) Create5tiers(“Tier”>“Addintervaltier”,seesection1.1)
Howtocreatean“interval”,thatis,howtocreateanannotationspan?Inthetargetedtier,clicktocreateaboundaryandmoveittocreateaninterval.Theboundariesareputtocreateanintervalcorrespondinggrossomodotoonemeaningunitatthelevelofutterances.Notethatthereisnotheoreticalimplicationhere,asthisisatechnicalpreliminarystepinthedatatranscriptionprocess.Howmanyspeakersatatime?Itisrecommendedtotranscribethespeechofonespeakeratatime.Butthisisnotarigidrule,asitissometimesmoreefficienttotakeintoaccountthetwospeakersatthesametime,especiallyinthecaseofoverlaps.Howtosavemywork?!!!Becareful:Praatdoesnotautomaticallysaveyourwork!!!Thusfrequentlysaveyourwork.Howlongmustbeatranscriptioninterval?Usually,itisrecommendednottoexceed10secondsperintervaltobetranscribed(butthisisanapproximation,ofcourse).HowcanIusethePraatfunctionalities(boundaries,shortcuts,principles,etc.)?Seeusefulexistingpapers(Goldman,etc.).
BollyCorpAGEst©2016 5
1.1.TierstructureinPraatTiersarestructuredasfollowsineveryTexGrid:
Firstspeakercode[agexxx]
Orthographictranscriptionofthefirstspeakerspeech(usuallytheinterviewee)
Secondspeakercode[agexxx]
Orthographictranscriptionofthesecondspeakerspeech(usuallytheinterviewer)
Segmenttobeanonymized[Anon]
Delimitationofthesegmentthatwillbeanonymized(recognizablethankstothe“#”symbolprecedingthetargetedsegment)
Metacomment[Meta] Anycommentofthetranscriberaboutanymetalinguisticevent(e.g.aboutnoisesoccurringwithinapause),tobetranscribedasin5.3(seeTablebelow)
Comment[Com] Anycommentofthetranscriberaboutthetranscriptionprocess(e.g.whenhelpisneededtodecidebetweentwopossiblechoicesforoneword)
Status[Status] Nameofthetranscriber.Statusofthetranscription:underway,done,corrected,revised…
1.2.ConventionsoftranscriptionTheconventionused ismainly inspiredbyValibelconventionof transcription.However, somechangeshavebeenmadeaccordingtoICORandCiel-Fconvention.SeveralrulesarespecifictotheCorpAGEstproject.Theconventionsaredescribedinthetablebelow(inFrench).TranscriptionprinciplesHowtocalculatedurationofpauses?In the CorpAGEst project, it has been decided tomanually annotate the duration of pauses (rather than to haverecourse toautomaticcalculationofpausing),mainly inorder tokeep theperceptivedimensionof the fluencyofolder speakers’ speech,which is evenmore fluctuant in later life. This notmeans that an automatic detectionofpausescouldnotbeenvisagedforfurtherinvestigation.Whatconventionforpausing?Ithasbeendecidedtotimeallthepausesthatlastlongerthan200msbylookingatdurationsuchastheyappearinPraat, then transposed by numbers into parentheses. For instance “(2.4)” stands for “2 seconds and 400milliseconds”.Micro-pauses–thatlastlessthan200ms–arecodedbythe“(.)”convention(seethetablebelow).
WechosenottofollowtheValibelconventionsforpausing(“/”forshortpausesand“//”forlongpauses),sincethesamesymbolisalsousedthereforthenotationoffalsestarts(“/”withoutanyspacebefore)andisthuspronetoerrorsduringthetranscriptionprocess.
BollyCorpAGEst©2016 6
Phénomène Convention Exemple Correspondance1 Identificationdescorpusetlocuteurs 1.1 Codedelocuteur Uncodeuniqueestattribuéàchaquelocuteur(5lettressuiviesd’1chiffre):
• Les3permièreslettresrenvoientaunomducorpus(«age»pourlescorpusCorpAGEstetCorpage).
• Les2lettressuivantescorrespondentauxinitialesréellesdulocuteur(rem.:levéritablenomdelapersonnenefigurenullepartentouteslettres,nidanslecorpus,nidanslesmétadonnées–cf.sectionsurl’anonymisation).
• Ledernierchiffreestlechiffre1pardéfaut(avecpossibilitéd’attribuerleschiffres2,3,4,etc.pourdistinguerleshomonymes,lecaséchéant).
agePA1pourAndréPetit Valibel
1.2 Coded’enregistrement Uncodeestattribuéàchaqueenregistrement.Ils’agitducodedupremierlocuteur(l’interviewédanslesentretienssemi-dirigés;lapersonneprenantlaparoleenpremierlieudanslecasdeconversationsspontanées)suividelalettre«r»pourrecording.Danslecasoùplusieursenregistrementsportentlemêmenom,onlesdistingueenajoutant2,3,etc.précédéd’untiret(parordrechronologiqued’enregistrement).
agePA1r-1agePA1r-2
Valibel,CorpAGEst
2 Orthographeettypographie 2.1 Orthographe Respectdel’orthographeconventionnelle.
Danslecasoùlelocuteurprononceunmotquin’estpasrépertoriédansledictionnaire,l’orthographereflèteraaumieuxlaprononciation.Danslecasd’empruntsàuneautrelangue,ceux-cisonttranscritssuivantlesstandardsorthographiquedelalanguesource(rem.:lestandardestl’orthographeFellerpourleWallon).
Valibel,Ciel-F
2.2 Ponctuation Lepointetlavirgulenesontpasutilisés.Lepointd’interrogationestnotélorsquel’onobserveunemontéeintonativedansunequestionàlaformedéclarative.
l'endroitidéal?(ageBN1r-2_sample4)
Valibel,Ciel-F
2.3 Usagedelamajuscule Seulslesnomspropresetlesnomsàréférentuniquecommencentparunemajuscule(ainsiquelestitres).
Valibel,Ciel-F
2.4 Titres Letraitd’unionetlamajusculesontutilisésdanslatranscriptiondestitres Le-soir Valibel 2.5 Chiffres Leschiffresetnombressonttranscritsentouteslettresetavecdestraitsd’union dix-neuf-cent-quatre-vingt-trois Valibel,Ciel-F,
ICOR 2.6 Abréviation Pasd’utilisationd’abréviationconventionnelle etceteraetpasetc. Valibel,Ciel-F 2.7 Siglesetacronymes Unsigle(prononcéenépelantleslettres)estnotéenmajusculessansespaceetsans
point.Unacronyme(prononcécommeunmot)estnotécommeunnom(avecmajuscule).
CNRSSetca
Valibel
2.8 Morphologie • Accordduparticipepassé:L’accordauplurielesttoujoursnoté.Encequiconcernel’accordauféminin,celui-cin’estpasnotéuniquementdanslecasoùsanon-réalisationestaudiblechezlelocuteur
Valibel
BollyCorpAGEst©2016 7
• «Ilya»:Quellequesoitlaprononciation,nousnotonstoujourslaformestandardilya(etpasy’aouya)
• Négation:«n(e)»:Lorsquelaprésenceoul’absencedelaparticuledenégationn’n’estpasaudible(entrelepronomonetunverbecommençantparunevoyelle),onnotelen’d’office
onn’estpascertainsd’arriveràtemps
• Varianteverbalenonstandard:Silelocuteurproduitunevarianteverbalenonstandard,celle-ciestannotéetellequeproduite.
j’aiprenduunlivre
2.9 Interjections,onomatopéesetparticulesdiscursives
Lesinterjections,onomatopéesetparticulesdiscursivessonttranscritesselonlalistesuivante:ah,ahlala,ahlàlà,aïe,areu,arf,atchoum,badaboum,baf,bah,bam,bang,bé[be],bè[bɛ],bêêê,ben[be~],beurk,bing,boh,boah,bouh,boum,broum,cataclop,clapclap,coacoa,cocorico,coincoin,crac,croacroa,cuicui,ding,dingdengdong,dingdong,dring,eh[e],ehben[ebe~],ehbien,enfin[a~fe~,fe~],etcetera,euh,euhm,f[f],ff[f:],flicflac,flipflop,froufrou,froufou,glouglou,glouglou,gnagnagna,groingroin,grr,hein,hep,hihan,héhé,hiphiphiphourra,hourra,hu,hum(raclementdegorge),m[m],mm[m:],mmmm(acquiescement),mêêê,meuh,mf[mf],mff[mf:],miam,miaou,moah,moh,moui,mouais,m’enfin,of,oh,ohlala,ohlàlà,ok,ouah,ouahouah,ouais,ouf,ouh,ouille,oula,ouhlà,ouhlala,ouhlàlà,oup,oups,p[p],paf,pan,patatras,pchhh,pchit,pf[pf],pff[pf:],pfiou,pfou,pfoua,pif-paf,pinpon,pioupiou,plouf,pof,pouet,pouetpouet,pouf,psst,pt,roh,rohlala,rohlàlà,rohr,ronron,schlaf,snif,splaf,splatch,sss,t,tacatac,tagada,tchac,teufteuf,tictac,toc,tuttut,tss,vlan,vroum,vrrr,wo,wouah,wouaw,wouf,waf,zip.
Valibel,Ciel-F,TCOF,CorpAGEst
3 Hésitationsetamorces 3.1 Notationexhaustive
delaproductionverbaleLesmotsincomplets,leshésitations,lesrépétitionsetlesamorcessonttranscrits.
3.2 Amorces Danslecasd'uneinterruption(avecousansrepriseultérieure),lemotamorcéestimmédiatementsuividelabarreoblique(sansespace).
onadeplusenpl/(.)onestdeplu/(ageBN1r-2_sample4)
Valibel
Lepronompersonnelilpeutêtreprononcé[il]ou[i].Quandilestprononcé[i],mêmedemanièrerépétitive,ontranscrirail.
(prononcé[iii]):L1ilililpense
4 Phénomènestemporels 4.1. Chevauchement Leschevauchementsserontencadréspardescrochetsouvrantetfermant(«[»et
«]»)auseindechaqueTierlocuteur.Rem:aucunespacenedoitêtreinséréentrelecrochetetletexteprononcéenchevauchement.
L1:ilfautchanger(0.7)quandonsaitencore[faut]paslefairetroptard(0.5)L2:[ouais](ageBN1r-2_sample4)
Ciel-FICOR
Unévènementnonlangagierpeutégalementêtretranscritentantquechevauchements(afinderendrelatranscriptionsousformedetexteplusprécise).Onveilleracependantàscinderlaparoleoul’évènementnonlangagierafindedistinguerl’empanoùilyachevauchementdansl’empanoùiln’yenapas.
L1:lelieuidéalpourbienvieillir[((rire))]L2:[oui](ageBN1r-2_sample4)
BollyCorpAGEst©2016 8
4.2 Silence,pause Lessilencessontchronométrés:laduréeestindiquéeensecondesentreparenthèses.Lesvaleursdécimalesaucentièmeserontarrondiesaudixièmesupérieurpourlesvaleursentre5et9(parex.(1.16)devient(1.2))etaudixièmeinférieurpourlesvaleursentre1et4(parex.(1.12)devient(1.1)).Onnotele0aprèslepoint,lecaséchéant.
(1.2)(2.0)
Ciel-F,ICOR
Lesmicro-pauses,c’est-à-direlessilencesd’uneduréeinférieureà200ms,sontannotéesàl’aided’unpoint.
(.)
4.3 Attributiondessilences/pauses
Lessilences/pausesnesontpasattribués.LespausessontdoncannotéesdanslaTierdulocuteurquidétenaitlaparoleavantcelle-ci.Parconsequent,lespausesdeplusde200msn’admettentpaslechevauchement.
CorpAGEst
Encasdedébutsimultané,lapauseestannotéedanslaTierdulocuteurquioccupaitlecanalavantlaprisedeparolesimultanée.
4.4 Aspiration,prisedesouffle L’aspirationestnotéeparlalettre“.h”précédéed’unpoint,sipossibledansunintervalledistinct.Elleestattribuéeaulocuteurquilaproduit.
(.h) Ciel-F,ICOR
4.5 Expiration,soupir L’expirationestnotéeparlalettre“h”(sanslepoint),sipossibledansunintervalledistinct.Elleestattribuéeaulocuteurquilaproduit.
(h) Ciel-F,ICOR
4.6 Ouverturedebouche Lesondentaloulabialémislorsdel’ouverturedebouche(parfoisaccompagnéd’unclaquementdelangue)estnotéparleslettres“.tsk”(dental)ou“.mt”(labial)précédéesd’unpoint,sipossibledansunintervalledistinct.Ceson,quiprécèdegénéralementlaprisedeparole,estattribuéaulocuteurquileproduit.
(.tsk)(.mt)
ICOR,CorpAGEst
5 Productionvocale(paraverbale)
5.1 Productionvocalequiaccompagnelaparole
Letexteconcernéparlaproductionvocaleestencadréàl’aidedechevrons(“<”et“>”)etladescriptiondustylevocalestnotéeentredoublesparenthèsesavantlatranscription.<((description))transcription>
L1:<((rire))onest>.honestlibreenquelquesorte(ageBN1r-2_sample4)
Ciel-F,ICOR
5.2 Productionvocaleisolée LesproductionsvocalesisoléescommelerireoulatouxsontindiquésentredoublesparenthèsesdanslaTierdulocuteurresponsabledelaproduction.Quandellesnepeuventêtreattribuéesàunlocuteur,ellessontnotéesdanslaTier“Meta”.
((rire)) Ciel-F,ICOR
5.3 Orthographedesélémentsparaverbauxvocaux
Latranscriptiondesélémentsparaverbauxrespecteral’orthographeci-dessous:((bâillement)),((chantonné)),((chuchoté)),((imitation)),((rire)),((soupir)),((toux))
6. Multitranscriptionetpassageinaudible
6.1 Doute Lorsqueletranscripteurdoutedesatranscriptionouqu’ilnepeutsedécidersurlaformeprononcée(parmiplusieurspossibilités),ilnoteentreaccoladeslaformequiluisembleêtrelaplusprobable.
[ouais{l'âgedelapen/}(.)oui](ageDA1r-1_sample2)
Valibel
6.2 Passageinaudible Lesparenthèsessontutiliséespourl’indicationd’unpassageincompréhensible- (x)=unesyllabeinaudible- (xx)=ungroupedesyllabesinaudible- (xxx)=unpassagepluslonginaudible
quec'estunepériodeeuh(.)ouiémotionnellement(x)(.)(ageDA1r-1_sample2)
Valibel
BollyCorpAGEst©2016 9
1.3.Text/soundanonymization• In the speaker’s tier: names, surnames and place-names are directly anonymized in the orthographic
transcription.Thealiasisprecededbya“#”symbol.• IntheAnon’stier:thepassagethathavetobeanonymizedisprecisely(allthepassageandonlythepassage)
delimitedandpointedbythe“#”symbol.• A script will be used to anonymize the sound on the basis of the Anon’s tier (Daniel Hirst’s script:
hdl:11041/sldr000526)
AnonymizationprinciplesHowtochooseandpickanalias?Thealiasmust:
• Beginbythefirstletteroftheoriginalname• Becloseoftheoriginalethnicconsonance• Containthesamenumberofsyllablesthantheoriginalname• BelistedonthefollowingwebsitesforFrench-speakingBelgiannames:
- Familynames:http://www.nom-famille.com/noms-les-plus-portes-par-initiale.html- Firstnames:
https://fr.wikipedia.org/wiki/Liste_de_pr%C3%A9noms_fran%C3%A7ais_et_de_la_francophonie#Pr.C3.A9noms_f.C3.A9minins,http://meilleursprenoms.com/
- Placenames:http://fr.wikipedia.org/wiki/Listes_des_villes_du_mondeorhttp://fr.wikipedia.org/wiki/Cat%C3%A9gorie:Village_de_Wallonie
Howtoavoidmismatchingaliaswitheachother?Mostimportantly,thealiasmustbelistedwiththeoriginalname(andpossiblythespeakercode)inanExcelspreadsheetinordertoassign(onlyonce)pseudonymsandaliasinastructuredandnon-redundantmanner,takingintoaccountthepossibilitythatthesameplaceorpersoncanbementionedindistinctrecordings.
BollyCorpAGEst©2016 10
1.4.TranscriptionrevisionItisstronglyrecommendedtocheckorthographictranscriptionsinatwo-stepprocess:
1) Revisionofthetranscriptionwithregardtothesound2) Revisionofthetranscriptionwithoutregardtothesound,therearetwosolutions.
Moreover,asecondanalystshouldrereadthetranscriptioninafinalstep,withrecoursetothesoundsignal.
1.4.1.RevisionwiththesoundsignalTherearetwosolutions:
• Reread the Textgrid using Praat (Note that it is possible to navigate quickly fromone interval to anotherusingtheshortcutALT+Keyboardarrows).
• Open the transcription in the software Transformer 6 (Be careful: this software can bug and is notMaccompatible.So,thefirstoptionispreferred).
1.4.2.RevisionwithoutthesoundsignalRereadthetranscriptionwithoutthesoundcanbehelpfultospotspellingmistakes,transcriptionmistakes,etc.Anoutputformat(cf.below,section3.)canbeusedtofacilitatetherevision:preferablythetableformatwhichismorepleasanttoread.Whenthemistakesarespotted,youcanchangeyourtranscriptiondirectlyinPraatorbyopeningthefilewithatexteditor(e.g.Notepad++).TheNotepad++solutionisquick:youcandirectlychangeinthefileusingthesearchandreplaceoption.Rem.:Alwaysdobackupcopiesofyourfiles.Theycanbeusefulincaseofproblem.
1.5.Transcriptionexport(Transformerprogram)UsingTransformer6,youcanobtainthreeoutputformatsthataredescribedfurther(NOTMaccompatible!!!).
Toproceed,followthesteps:1)Importthetextgrid(s):“Uploadfiles”function2)Gotothemainwindowandselecttheexportformatneededforan“express”exporttowards.txt,Praat,ELAN,EXMARaLDA,etc.
BollyCorpAGEst©2016 11
1.5.1.Visualizationoftherawtextoutput
1.5.2.VisualizationoftheXMLoutput
</Metadata><Timelineid="Transformer_Timeline1"/><AGid="Transformer_AG1"type="type"timeline="Transformer_Timeline1"><Anchorid="Transformer_AG1_Anchor"offset="1672"unit="milliseconds"/><Annotationid="Transformer_AG1_Annotation1"type="ageBN1"start="Transformer_AG1_Anchor"end="Transformer_AG1_Anchor"><Featurename="description">bahonpeutpeut-êtretoutdoucementpasserààladeuxième[question(.)euh](.)del'entretien</Feature></Annotation><Annotationid="Transformer_AG1_Annotation2"type="ageBN1"start="Transformer_AG1_Anchor"end="Transformer_AG1_Anchor"><Featurename="description">[.h]</Feature></Annotation><Annotationid="Transformer_AG1_Annotation3"type="ageBN1"start="Transformer_AG1_Anchor"end="Transformer_AG1_Anchor"><Featurename="description">[oui](.)etjevousdiraiimmédiatementnousavionsunegrandemaison(0.3)</Feature></Annotation><Annotationid="Transformer_AG1_Annotation4"type="ageBN1"start="Transformer_AG1_Anchor"end="Transformer_AG1_Anchor"><Featurename="description">[.h]</Feature></Annotation><Annotationid="Transformer_AG1_Annotation5"type="ageBN1"start="Transformer_AG1_Anchor"end="Transformer_AG1_Anchor">
<Featurename="description">maintenantnoussommesretraités(.)jevousaidit(.)monmariaquatre-vingt-deuxans.h</Feature>
BollyCorpAGEst©2016 12
2.Text-soundalignmentTherecommendationsarehereforthealignmentatthelevelofwordunit.NotethatonepartofthespokendatahavealsobeenalignedatthelevelofphoneswithintheframeworkofDuboisdindien’sPhDThesis.
2.1.Automaticalignment(EasyAlignplugin) SizeofthefilestorunthepluginItisrecommendedtocompresstheaudiofilesfirstbeforestartingwiththealignmentprocedure.Ifithasbeenrecordedat44.100Hz,itmaybethecasethatyoushouldcompressthefileat22.050Hz(forinstance,inAudacityorPraat)insuchawaythattheplugincouldrun.
2.1.1.Creationofa“simplified”textgridInordertoavoidthephonetictranscriptionofthepausesduration((1.4),forexample)itisrecommendedtocreateanewcopyofthetextgridwithasimplifiedversionoftheorthographictranscription.Howtoproceed?
1) Alwayscopytheinitialtextgrid2) Openthecopyinatexteditorsuchas“Notepadd++”,“JEdit”3) Usingthesearchfunctionanddeleteeverythingthatappearinparenthesis(includingpausesduration,short
pauses,comments,styles,breath,etc.)a. Selecttheoptionwhichuseregularexpressions(Regex)b. Search “\(.*\)|\{|\}|#” and replace by nothing where “.*” means to search every sign that is
repeatedwithinaparenthesis.Note that it is alsopossible toprecise thenatureof the signs thatpossiblyoccurwithintheparentheses(inordertoavoidanydeletionofunintendedsegments),byusing this Regex “\([a-z,â,ê,à,é,è,ê,î,ô,û,ç,ë,ï,ü\.,0-9]*\)” OR “([a-z,â,ê,à,é,è,ê,î,ô,û,ç,ë,ï,ü\.,0-9]*)”(dependingon the text editor used),whichmeans to searchevery chain including thementionedsigns,withtheexceptionofspaces,intoparenthesis.
Rem.:the“<”,“>”,“[“,“]”symbolscannotbechangedatthatstage,becauseusedinthelanguageoftheTextEditor,thenmustbemanuallydeletedlaterinthedatatreatment.
4) Donotforgettosavethenewtextgridaddingthenote“simplified”
BollyCorpAGEst©2016 13
2.1.2.IsolationofeachspeakerinonetextgridInordertofacilitatetheautomaticalignmentusingthesoundfromeachspeakermicrophone,itisrecommendedtoisolateeachspeaker’stierinonetextgrid.Howtoproceed?
1) InPraat,openthe“simplified”versionofthetextgrid2) Selectitandusethefunction“extractonetier”3) Indicatethenumberthetierthatyouwanttoisolate(usually1or2)
ð Thetextgridforonespeakeriscreated(textgridB)
Tip:inordertofacilitatethefollowingsteps,youcanrenamethespeakertier“ortho”.Ifyoudonotdoit,youmustthink,duringthefollowingsteps,tochangethenameoftheorthotierintheEasyAlignplugin.
2.1.3.HowtousetheEasyAlignplugin1) Opentheappropriatesoundfile2) SelectthetextgridBandthesoundfile3) SelectEasyAlign>“phonétisation”
Note:thefirststep(“macrosegmentation”)isuselessbecauseyouhavealreadysegmentedwhenyouweretranscribing.
BollyCorpAGEst©2016 14
4) Manually, check the phonetization. Suppress the “*”. “*” is used by the script to indicate an optionalphoneme. Either the phoneme is pronounced and you can only suppress the “*”, or the phoneme is notpronounced, thensuppress the“*”andthepreceding letter (if thephoneme isaddedtothebeginningofthe word) or the following letter (if the phoneme is added to the end of the word).Thisstepisalsotheoccasiontodeletethefollowingsymbolsfromtheorthotier“[“,“]”,“<”,“>”.
5) SelectEasyAlign>“phonosegmentation”
Note:thinkaboutreadingtheinfowindowsinordertoseeifthereareanyproblemsandwhy.(i) Ifsomeintervalsaremisformated:checkifthenumberofwordsintheorthotierandinthephono
tieristhesame.Correctitifitisnotthecase.(ii) Someintervalscanbetoolong:trytomakethemshorter.(iii) Thephonetictranscriptioncanbewrong:checkitandcorrectit.Itespeciallyconcernswordssuch
as:mm,bé…(iv) Redothestepofphonosegmentation
BollyCorpAGEst©2016 15
2.2.Manualverification Withregardstothesound,checkthealignment:donothesitatetoadjusttheboundaries.Ifsomeintervalsarestillnotaligned,doitmanually.Tip:Theshortcut“Shift+useofthemouse”facilitatethedisplacementofthefrontiers:theywillremainperfectlyalignedineachtier.
2.2.1.Towardthefinaltextgrid1) Usingatexteditorreplaceevery“_”bynothinginordertobecleanerwhenyouwillimportitinElan.
Tip:Then,youcanusetheoption“Mergesimilarconsecutiveintervals”tosuppresseveryemptyintervals.Thetextgridwillbecleaner.
2) Ineachspeakertextgrid,changethenameofthewordtier:addthespeakercode.3) Extractthetierswordsofeachspeakers.4) Selectthe3tiersandtheinitialtextgrid(theoneinwhichthepausesdurationsareannotated)>merge
Donotforgettosavethefinaltextgridandtorenameit.Note:attheendyouaresupposedtohavethreedifferenttextgridsbysample.
1) The final textgrid containing the initial tiers (described in the beginning of the manual) and the tierscontainingthewordsalignment(importedfromthetextgridofeachspeaker)
2) Two textgrids (one by speaker) containing the simplified orthographic transcription, the phonetictranscription, thealignmentbyword,syllableandphonemeanda“status”tier.This is theseTexgridsthatwillbeusedifmorefine-grainedprosodicanalysisisneeded(forinstance,studyoftheintonationperiodsbyAnalor, prosodic contour, etc.). However, note that the phonemes and syllables tiers have not beensystematically checkedwithin theCorpAGEst project, because they arenot crucial for thepurposeof thestudy (with the exception of * that have been checked and of the adjustment of word frontiers, whererelevant–seeabove).
BollyCorpAGEst©2016 16
OriginalAuthor:CatherineBollyWiththecollaborationof:JulieKairet
Contact:catherine.bolly(at)uclouvain.be;cbolly(at)uni-koeln.deWeb:http://corpagest.orgLogo:©TheShelfCompanyhttp://www.theshelf.fr
top related