tqe: transcription quality evaluation

24
TQE: Transcription Quality Evaluation A CLARIN-NL project Radboud University Nijmegen Institute for Dutch Lexicology Max Planck Institute for Psycholinguistics

Upload: yamin

Post on 24-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

TQE: Transcription Quality Evaluation. A CLARIN-NL project. Radboud University Nijmegen Institute for Dutch Lexicology Max Planck Institute for Psycholinguistics. TQE: practical information. Duration: 01/04/2010 – 01/07/2011 Type: Demonstrator Project Project team: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TQE: Transcription Quality Evaluation

TQE: Transcription Quality Evaluation

A CLARIN-NL project

Radboud University Nijmegen

Institute for Dutch Lexicology

Max Planck Institute for Psycholinguistics

Page 2: TQE: Transcription Quality Evaluation

TQE: practical information

• Duration: 01/04/2010 – 01/07/2011

• Type: Demonstrator Project

• Project team:o CLST: Centre for Language and Speech Technology

Helmer Strik (coord.), Joost van Doremalen, Eric Sanders,Catia Cucchiarini, Robin Oostrum, Ferdy Hubers

o INL: Instituut voor Nederlandse LexicologieRemco van Veenendaal, Laura van Eerten

o MPI: Max Planck Institute for PsycholinguisticsDaan Broeder, Tobias van Valkenhoef, Peter Withers

• CLARIN centreo MPI: Max Planck Institute for Psycholinguistics

Daan Broeder

Page 3: TQE: Transcription Quality Evaluation
Page 4: TQE: Transcription Quality Evaluation
Page 5: TQE: Transcription Quality Evaluation
Page 6: TQE: Transcription Quality Evaluation
Page 7: TQE: Transcription Quality Evaluation

Automatic TranscriptionQuality Evaluation• Input:

o Audio signalso Phone(tic) transcriptions

• Output:o For each phone: TQE measure

• How:o Audio and phonetic transcriptions are alignedo Phone boundaries are derivedo For each phone a TQE measure is determined,

a confidence measure, e.g. ranging from 0-100%indicating how well phone & segment ‘fit together’,i.e. what the quality of the transcription is

Page 8: TQE: Transcription Quality Evaluation

MPI version

Page 9: TQE: Transcription Quality Evaluation

CLST development version

Page 10: TQE: Transcription Quality Evaluation
Page 11: TQE: Transcription Quality Evaluation
Page 12: TQE: Transcription Quality Evaluation
Page 13: TQE: Transcription Quality Evaluation
Page 14: TQE: Transcription Quality Evaluation
Page 15: TQE: Transcription Quality Evaluation
Page 16: TQE: Transcription Quality Evaluation

Survey: 2a) De bestandsformaten

Antwoord Telling %WAV 30 34,88OGG 6 6,98AIFF 13 15,12MP3 16 18,60MP4 5 5,81FLAC 5 5,81ALAW 4 4,65ULAW 3 3,49anders 4 4,65

0

5

10

15

20

25

30

35

WAV OGG AIFF MP3 MP4 FLAC ALAW ULAW anders

Page 17: TQE: Transcription Quality Evaluation

Survey: 2c) De opnameprecisie

Antwoord Telling %8 bit 3 7,89

12 bit 3 7,8916 bit 24 63,1624 bit 8 21,05

0

5

10

15

20

25

30

8 bit 12 bit 16 bit 24 bit

Page 18: TQE: Transcription Quality Evaluation

Survey: 3) De formaten en standaarden voor fonetische transcripties

Antwoord Telling %SAMPA 23 28,40

X-SAMPA 6 7,41IPA 25 30,86

CGN-set 9 11,11YAPA 3 3,70Celex 7 8,64LH+ 3 3,70

anders 5 6,17 0

5

10

15

20

25

30

SAMPA X-SAMPA IPA CGN-foneemset

YAPA Celex LH+ anders

Page 19: TQE: Transcription Quality Evaluation

Survey: 4) De software

Antwoord Telling %Praat 32 53,33

Audacity 10 16,67CoolEdit 7 11,67Audition 5 8,33anders 6 10,00

0

5

10

15

20

25

30

35

Praat Audacity CoolEdit Audition anders

Page 20: TQE: Transcription Quality Evaluation

Survey: 8) Interesse in opname CLARIN-infrastructuur

Antwoord Telling %Ja 22 64,71

Nee 11 32,35Weet niet 1 2,94

0

5

10

15

20

25

Ja Nee Weet niet

Page 21: TQE: Transcription Quality Evaluation

Survey: 9) Bereid tot meeleveren metadata

Antwoord Telling %Ja 27 90Nee 3 10

0

5

10

15

20

25

30

Ja Nee

Page 22: TQE: Transcription Quality Evaluation

Survey: 10) Huidig gebruik van metadataforma(a)t(en)

Antwoord Telling %OLAC 1 2,38IMDI 4 9,52CMDI 5 11,9

Dublin Core 4 9,52TEI 4 9,52

Geen 21 50Anders 3 7,14

0

5

10

15

20

25

OLAC IMDI CMDI Dublin Core TEI Geen Anders

Page 24: TQE: Transcription Quality Evaluation