d3.3 progress report on rich audio transcription€¦ · progress report on rich audio...

22
D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION Project co-funded by ICT-7th Framework Programme from the European Commission EUMSSI_D3.3 Progress report on rich audio transcription Grant Agreement nr 611057 Project acronym EUMSSI Start date of project (dur.) December 1st 2013 (36 months) Document due Date : M24 Actual date of delivery December 24th 2015 Leader LIUM Reply to [email protected] Document status Submitted

Upload: others

Post on 04-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION

Project co-funded by ICT-7th Framework Programme from the European Commission

EUMSSI_D3.3 Progress report on rich audio transcription

Grant Agreement nr 611057Project acronym EUMSSIStart date of project (dur.) December 1st 2013 (36 months)Document due Date : M24Actual date of delivery December 24th 2015Leader LIUMReply to [email protected] status Submitted

Page 2: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

Project ref. no. 611057

Project acronym EUMSSI

Project full title Event Understanding through Multimodal Social Stream Interpretation

Document name EUMSSI_D3.1 Progress report on rich audio transcription_20141209

S e c u r i t y (distribution level)

PU - Public

Contractual date of delivery

M24

A c t u a l d a t e o f delivery

December 24th 2015

Deliverable name D3.3. Progress report on rich audio transcription

Type R – ReportStatus SubmittedVersion number 1

Number of pages 20

W P / T a s k responsible

WP3/LIUM

Author(s) Yannick Estève

Other contributorsEC Project Officer Mrs. Alina Lupu

[email protected] Abstract Progress report on rich audio transcription: speech

recognition in English and German. Architecture, training, data, performances. Error detetection

Keywords Speech recognition on video document.

C i r c u l a t e d t o partners

Yes

P e e r r e v i e w completed

Yes

Peer-reviewed by IDIAP

C o o r d i n a t o r approval

Yes

EUMSSI_D3.3 Progress report on rich audio transcription

Page 3: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

7DEOH�RI�&RQWHQWV

� ,1752'8&7,21 �

� $5&+,7(&785( 2) 7+( �����/,80 $65 6<67(0 ���� (YROXWLRQV�RI�WKH�/,80 $65 V\VWHP � � � � � � � � � � � � � � � � � � � � � � � ���� 7KH������$65 V\VWHP� PDLQ�ODQJXDJH�LQGHSHQGHQW�IHDWXUHV � � � � � � � � � � �

����� 6SHDNHU�VHJPHQWDWLRQ � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ 6SHHFK�UHFRJQLWLRQ � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� (9$/8$7,21 2) 7+( $65 6<67(06 ���� (QJOLVK�ODQJXDJH� SDUWLFLSDWLRQ�WR�WKH�$65 WDVN�RI�WKH�0*% �����&KDOOHQJH � �

����� /DQJXDJH�PRGHOV � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ $FRXVWLF�PRGHOV� XVLQJ�LPSHUIHFW�WUDQVFULSWV�WR�EXLOG�D�WUDLQLQJ�FRUSXV

IRU�DFRXVWLF�PRGHOV � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ :RUG�HUURU�UDWH�DQG�FRPSXWDWLRQ�WLPH � � � � � � � � � � � � � � � � � � �

��� *HUPDQ�ODQJXDJH� SDUWLFLSDWLRQ�WR�WKH�$65 WDVN�RI�WKH�,:6/7 �����HYDOXDWLRQFDPSDLJQ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� $65 (5525 '(7(&7,21 ����� 5HODWHG�ZRUN � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� 6HW�RI�IHDWXUHV � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

����� $65��OH[LFDO�DQG�V\QWDFWLF�IHDWXUHV � � � � � � � � � � � � � � � � � � � � ������� :RUG�HPEHGGLQJV � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

��� 1HXUDO�QHWZRUN�DUFKLWHFWXUH � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� ([SHULPHQWV � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

����� ([SHULPHQWDO�GDWD � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ����� 5HVXOWV � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� &21&/86,21 $1' 3(563(&7,9(6 ��

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 4: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

� ,1752'8&7,21

7KLV�GHOLYHUDEOH�GHVFULEHV�WKH�ODVW�HYROXWLRQV�RI�WKH�DXWRPDWLF�VSHHFK�UHFRJQLWLRQ��$65� V\V�WHPV�GHYHORSHG�E\�WKH�/,80 XQGHU�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW� $W�WKH�EHJLQQLQJRI�WKH�SURMHFW� WKH�/,80 SODQQHG�WR�GHYHORS�FRPSHWLWLYH�$65 V\VWHPV�LQ�IRXU�(XURSHDQ�ODQ�JXDJHV� (QJOLVK� )UHQFK� *HUPDQ� DQG�6SDQLVK� 6\VWHP�SHUIRUPDQFHV�DUH�UHODWHG�WR�DXWR�PDWLF�WUDQVFULSWLRQ�DFFXUDF\�FRQMRLQWO\�WR�FRPSXWDWLRQ�WLPH� 'XULQJ�WKH�ILUVW�\HDU� FRPSHWLWLYHV\VWHPV�ZHUH�SURGXFHG�IRU�)UHQFK�DQG�(QJOLVK�ODQJXDJHV��FI� GHOLYHUDEOH�'������ $IWHU�WKHILUVW� UHYLHZ�RI� WKH�SURMHFW�RQ� -DQXDU\������ LW�KDV�EHHQ�GHFLGHG� WR� IROORZ� WKH� UHYLHZHUVUHPDUNV�DQG�WR�IRFXV�RQ�RQO\�WZR�ODQJXDJHV� FRQVLGHUHG�DV�WKH�PRVW�UHOHYDQW�RQHV�WR�WKHSURMHFW� (QJOLVK�DQG�*HUPDQ�ODQJXDJH�ZHUH�UHWDLQHG�

)RU�WKLV�VHFRQG�\HDU� VWURQJ�HIIRUW�ZDV�SURGXFHG�LQ�RUGHU�WR�JHW�WKH�IDVWHVW�SRVVLEOH�$65V\VWHP� ZLWKRXW�UHGXFLQJ�DFFXUDF\� IRU�ERWK�(QJOLVK�DQG�*HUPDQ�ODQJXDJHV� 2QH�FDQ�QRWLFHWKDW�LW�ZDV�D�UHDO�FKDOOHQJH�IRU�WKH�/,80 SDUWQHU�WR�GHYHORS�D�VR�FRPSHWLWLYH�$65 V\VWHP�LQ*HUPDQ�ODQJXDJH� EHFDXVH�WKLV�ODQJXDJH�ZDV�QHYHU�SURFHVVHG�E\�WKLV�SDUWQHU�EHIRUH� DQGEHFDXVH�OLQJXLVWLF�UHVRXUFHV�IRU�*HUPDQ�ODQJXDJH�QHFHVVDU\�WR�GHYHORS�D�VXFK�V\VWHP�DUHYHU\�UDUH�ZLWK�D�UHDVRQDEOH�FRVW�

,Q�RUGHU�WR�FRPSDUH�RXU�$65 V\VWHP�ZLWK�RWKHU�VWDWH�RI�WKH�DUW�$65 V\VWHPV� DQG�DOVRWR�JHW�DQ�LQGHSHQGHQW�HYDOXDWLRQ�RI�RXU�$65 V\VWHPV� ZH�KDYH�GHFLGHG�WR�SDUWLFLSDWH�WR�WZRLQWHUQDWLRQDO�HYDOXDWLRQ�FDPSDLJQV�RQ�VSHHFK�UHFRJQLWLRQ�

� WKH�$65 WDVN�RI�WKH�0*% FKDOOHQJH�IRU�(QJOLVK�ODQJXDJH�

� WKH�$65 WDVN�RI�WKH�,:6/7 �����FDPSDLJQ�IRU�*HUPDQ�ODQJXDJH�

7KHVH�SDUWLFLSDWLRQV�ZHUH�VXFFHVVIXO� /,80 UHDFKHG�WKH�VHFRQG�SRVLWLRQ�DW�WKH�$65 WUDFNRI� WKH�0*% FDPSDLJQ� �LQ�FROODERUDWLRQ�ZLWK� WKH�&5,0 ODERUDWRU\� IURP�0RQWUHDO� 4XHEHF�&DQDGD��DQG�/,80 ZRQ�WKH�$65 WDVN�RI�,:6/7 �����IRU�*HUPDQ� 0RUHRYHU� WKH������$65V\VWHP�LV�PRUH�WKDQ�WHQ�WLPHV�IDVWHU�WKDQ�WKH������$65 RQH�

,Q�DGGLWLRQ�WR�WKHVH�ZRUNV� /,80 KDV�VWDUWHG�D�VWXG\�RQ�$65 HUURU�GHWHFWLRQ� 7KLV�WDVNFDQ�EH�YHU\�XVHIXO�LQ�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW�IRU�VHYHUDO�UHDVRQV�

� WR�ILOWHU�PLVUHFRJQL]HG�ZRUGV�WR�UHGXFH�IDOVH�DODUPV�ZKHQ�ORRNLQJ�IRU�DXWRPDWLF�WUDQ�VFULSWLRQV�FRQWDLQLQJ�VRPH�UHTXHVWHG�ZRUGV�

� WR� KHOS� QDWXUDO� ODQJXDJH� SURFHVVLQJ� DSSOLHG� RQ� DXWRPDWLF� WUDQVFULSWLRQV� �OLNH� QDPHHQWLW\�UHFRJQLWLRQ��

� WR�LPSURYH�WKH�$65 SHUIRUPDQFHV�E\�LQMHFWLQJ�FRQILGHQW�DXWRPDWLF�WUDQVFULSWLRQV�LQWRWKH� WUDLQLQJ�FRUSXV�RI�DFRXVWLF�PRGHO� ODUJHU�DPRXQW�RI� WUDLQLQJ�GDWD� LPSURYHV� WKHTXDOLW\�RI�DFRXVWLF�PRGHOV�

7KLV�SUHOLPLQDU\�VWXG\�KDV�RXWSHUIRUPHG�VWDWH�RI�WKH�DUW�DSSURDFKHV�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 5: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

� $5&+,7(&785( 2) 7+( �����/,80 $65 6<67(0

��� (YROXWLRQV�RI�WKH�/,80 $65 V\VWHP

7KH�/,80 $65 V\VWHP�EXLOW�IRU�WKH�(8066, SURMHFW�KDV�HYROYHG�GXULQJ�WKH�VHFRQG�\HDU�RI�WKHSURMHFW� 7KH�HQJLQH�FRUH� EDVHG�RQ�WKH�.DOGL�6SHHFK�5HFRJQLWLRQ�7RRONLW >��@� LV�WKH�VDPH�EXW�WKH�PXOWL�VWHS�DUFKLWHFWXUH�KDV�FKDQJHG�LQ�RUGHU�WR�UHGXFH�WKH�FRPSXWDWLRQ�WLPH� ,Q�WKH�����DUFKLWHFWXUH� WZR�VXFFHVVLYH�DFRXVWLF�GHFRGLQJ�SURFHVVHV�ZHUH�QHHGHG�EHIRUH�UHVFRULQJZRUG�JUDSKV� 7KH�ILUVW�RQH� XVLQJ�*00�+00���ZDV�XVHG�LQ�RUGHU�WR�H[SORLW�LWV�RXWSXWV�WRFRPSXWH� D� I0//5 PDWUL[� WUDQVIRUPDWLRQ� 7KLV� I0//5 PDWUL[�ZDV� DSSOLHG� WR� WKH� DFRXVWLFIHDWXUHV�LQ�RUGHU�WR�PDNH�WKH�VHFRQG�DFRXVWLF�GHFRGLQJ�SURFHVV� EDVHG�RQ�'11�+00���PRUHDGDSWHG�WR�WKH�VSHDNHU�DQG�WR�WKH�DFRXVWLF�FRQGLWLRQV�

$PRQJ�WKH�GLIIHUHQW�VWHSV�RI�WKH�HQWLUH�UHFRJQLWLRQ�SURFHVV� DFRXVWLF�GHFRGLQJ�SURFHVVHVDUH�ODUJHO\�WKH�PRVW�WLPH�DQG�FRPSXWDWLRQ�SRZHU�FRQVXPLQJ�RQHV�

:H�KDYH�GHFLGHG�WR�NHHS�RQO\�RQH�DFRXVWLF�GHFRGLQJ�SURFHVV�LQ�WKH�QHZ�DUFKLWHFWXUH�RIWKH�$65 V\VWHP�LQVWHDG�RI�WZR�SUHYLRXVO\� 7KLV�PHDQV�WKDW�QR�I0//5 PDWUL[�LV�EXLOW� QR�I0//5DGDSWDWLRQ�LV�DSSOLHG� DQG�VR�QR�ORXG�VSHDNHU�DGDSWDWLRQ�LV�UHDOL]HG� RQO\�D�FHSVWUDO�PHDQQRUPDOL]DWLRQ� LV�DSSOLHG�RQ�D�VSHHFK�VHJPHQWV� ODEHOHG�WR�WKH�VDPH�VSHDNHU� ,Q�DGGLWLRQ�DFRXVWLF�IHDWXUHV�DUH�QRZ�GLIIHUHQW� 75$3 IHDWXUHV�UHSODFHV�3/3�/'$ IHDWXUHV� )LJXUH ���LOOXVWUDWHV�WKHVH�FKDQJHV�

��� 7KH������$65 V\VWHP� PDLQ�ODQJXDJH�LQGHSHQGHQW�IHDWXUHV

7KLV�VHFWLRQ�SUHVHQWV�WKH�FRPPRQ�IHDWXUHV�RI�WKH������$65 V\VWHPV�IRU�(QJOLVK�DQG�*HUPDQODQJXDJHV�

����� 6SHDNHU�VHJPHQWDWLRQ

7KH�VSHDNHU�GLDUL]DWLRQ�V\VWHP�XVHG�ZLWKLQ�WKH�$65 SURFHVV� LV�WKH�VDPH�DV�WKH�RQH�LQWH�JUDWHG�LQ�WKH������/,80 $65 V\VWHP� WR�VHJPHQW�WKH�DXGLR�UHFRUGLQJV�DQG�WR�FOXVWHU�VSHHFKVHJPHQWV�E\�VSHDNHU� ZH�XVHG�WKH /,80B6SN'LDUL]DWLRQ VSHDNHU�GLDUL]DWLRQ�WRRONLW >��@� 7KLVVSHDNHU�GLDUL]DWLRQ�V\VWHP�LV�FRPSRVHG�RI�DQ�DFRXVWLF�%D\HVLDQ�,QIRUPDWLRQ�&ULWHULRQ��%,&��EDVHG�VHJPHQWDWLRQ�IROORZHG�E\�D�%,&�EDVHG�KLHUDUFKLFDO�FOXVWHULQJ� (DFK�FOXVWHU�UHSUHVHQWVD�VSHDNHU�DQG�LV�PRGHOHG�ZLWK�D�IXOO�FRYDULDQFH�*DXVVLDQ� $ 9LWHUEL�GHFRGLQJ�UH�VHJPHQWVWKH�VLJQDO�XVLQJ�*00V�ZLWK���GLDJRQDO�FRPSRQHQWV�OHDUQHG�E\�(0�0/��IRU�HDFK�FOXVWHU� 6HJ�PHQWDWLRQ� FOXVWHULQJ�DQG�GHFRGLQJ�DUH�SHUIRUPHG�ZLWK����0)&&�(��FRPSXWHG�ZLWK�D���PVIUDPH�UDWH� *HQGHU�DQG�EDQGZLGWK�DUH�GHWHFWHG�EHIRUH�WUDQVFULELQJ�WKH�VLJQDO�

0RUH�GHWDLOV�DERXW�WKH�VSHDNHU�VHJPHQWDWLRQ�DUH�JLYHQ�LQ�WKH�GHOLYHUDEOH�'����

����� 6SHHFK�UHFRJQLWLRQ

7KH�/,80 $65 V\VWHP�FDQ�EH�VWLOO�FRQVLGHUHG�DV�D�PXOWL�SDVV�V\VWHP� HYHQ�LI�RQO\�RQH�DFRXVWLFGHFRGLQJ�SURFHVV�LV�QRZ�GRQH� ,W�LV�EDVHG�RQ�WKH�.DOGL�V\VWHP�IRU�DFRXVWLF�GHFRGLQJ�DQG�RQ

�*00��*DXVVLDQ�0RGHO�0L[WXUH� +00��+LGGHQ�0DUNRY�0RGHO��'11��'HHS�1HXUDO�1HWZRUNV�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 6: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

9 x 39 13-dimension PLP features

LDA: 40-dimension (+MLLT)

GMM-HMM acoustic models9,500 tied states

200,000 Gaussians

bigram language model

fMLLR Matrix

1

DNN-HMM acoustic models6 hidden layers

with 2048 neurons each

bigram language model

2

Compute fMLLRtransformationmatrix

Apply fMLLRtransformationmatrix

5-gram continuous space language model

3

word graph hypotheses

Build word graph hypotheses

Decode

Decode

Rescore

confusion network

Final recognitionhypothesis

Speaker segmentation and diarization

0

Consensus

Parametrize

Different acoustic

features

)LJXUH �� 0DLQ�FKDQJHV�IURP������/,80 $65 V\VWHP�WR������/,80 $65 V\VWHP�GHYHORSHGLQ�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 7: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

/,80 WRROV�EXLOW�IURP�WKH�&08 6SKLQ[�SURMHFW�IRU�OLQJXLVWLF�UHVFRULQJ >�@� 6RPH�SDUWV�RI�WKHVRXUFH�FRGHV�ZHUH�PRGLILHG�LQ�RUGHU�WR�DFFHOHUDWH�WKH�GHFRGLQJ�SURFHVVLQJ� IRU�LQVWDQFH�E\LPSURYLQJ�WKH�PXOWL�WKUHDGLQJ�PDQDJHPHQW�LQ�RUGHU�WR�EHWWHU�H[SORLW�WKH�FRPSXWDWLRQ�SRZHUDYDLODEOH�RQ�D�PDFKLQH�

7KH�ILUVW�SDVV�SURGXFHV�ZRUG�JUDSKV�E\�XVLQJ�WKH�'11 DFRXVWLF�PRGHOV�FRPELQHG�ZLWK�D��JUDP�ODQJXDJH�PRGHOV� $FRXVWLF�PRGHOV�DUH�EDVHG�RQ�'11��)RU�HDFK�IUDPH� '11 LQSXWVDUH�FRPSRVHG�RI�����75$3 FRHIILFLHQWV�FRPSXWHG�RQ�D�VOLGLQJ�ZLQGRZ�RI���� IUDPHV� 7RFRPSXWH�75$3 IHDWXUHV� WKH����GLPHQVLRQDO�ILOWHUEDQN�IHDWXUHV�DUH�QRUPDOL]HG�WR�]HUR�PHDQSHU�DXGLR�ILOH� ���IUDPHV�RI�WKHVH����GLPHQVLRQDO�ILOWHUEDQN�IHDWXUHV�����IUDPHV�RQ�HDFK�VLGHRI�WKH�FXUUHQW�IUDPH��DUH�VSOLFHG�WRJHWKHU�WR�IRUP�D�����GLPHQVLRQDO�IHDWXUH�YHFWRU� 7KLV����GLPHQVLRQDO�IHDWXUH�YHFWRU�LV�WUDQVIRUPHG�XVLQJ�D�KDPPLQJ�ZLQGRZ��WR�HPSKDVL]H�WKHFHQWHU�� SDVVHG�WKURXJK�D�GLVFUHWH�FRVLQH�WUDQVIRUP�DQG�WKH�GLPHQVLRQDOLW\�UHGXFHG�WR���[��RU�����GLPHQVLRQDO�IHDWXUH�YHFWRU�SHU�IUDPH� $V�ZULWWHQ�DERYH� VSHDNHU�DGDSWDWLRQ�LV�WULYLDO�LW� RQO\� FRQVLVWV� RQ�PHDQ� VXEWUDFWLRQ� DSSOLHG� RQ� WKH� ILOWHUEDQN� IHDWXUHV� RI� DOO� WKH� IUDPHVDVVRFLDWHG� WR� D� VSHDNHU� 7KLV� FKRLFH�ZDV� UHWDLQHG�EHFDXVH� LQWHUQDO� H[SHULPHQWV� VKRZHGWKDW�WKH�XVH�RI�75$3 IHDWXUHV�LQ�FRPELQDWLRQ�ZLWK�'11 SURYLGHV�VLPLODU�UHVXOWV� LQ�WHUPV�RIDFFXUDF\� WR�RXU�IRUPHU�DUFKLWHFWXUH�XVLQJ�0/3�/'$ IHDWXUHV�DQG�I00/5 DGDSWLRQ� ,Q�WKHVDPH�WLPH� WKLV�QHZ�VROXWLRQ�GLYLGHV�E\�PRUH�WKDQ�WZR�WKH�FRPSXWDWLRQ�WLPH�QHHGHG�IRU�WKHVSHHFK�UHFRJQLWLRQ�SURFHVV� 7KH�'11 ZDV�EXLOW�IROORZLQJ�WKH�DSSURDFK�GHVFULEHG�LQ >��@�DQGLW�ZDV�FRPSRVHG�RI�VL[�KLGGHQ�OD\HUV�ZLWK������XQLWV� ZKLOH�WKH�RXWSXW�VRIWPD[�OD\HU�KDGVHYHUDO�WKRXVDQGV�RXWSXWV�GHSHQGLQJ�RQ�WKH�ODQJXDJH�������IRU�(QJOLVK��

1H[W�SDVVHV�FRQVLVWV�RQ�H[SDQGLQJ�DQG�UHVFRULQJ�WKH�ZRUG�JUDSKV�E\�XVLQJ���JUDP� WKHQ��JUDP�EDFN�RII�/0V� WKHQ�D���JUDP�QHXUDO�QHWZRUN�PRGHO��LQFOXGLQJ�WKH���JUDP�EDFN�RII/0� >��@�

$W�WKH�HQG� DQ�DFFHOHUDWHG�YHUVLRQ�RI�WKH�FRQVHQVXV�DSSURDFK >�@� ZKLFK�WDNHV�LQWR�DF�FRXQW�WHPSRUDO�LQIRUPDWLRQ�WR�VSHHG�XS�WKH�SURFHVVLQJ� LV�DSSOLHG�RQ�WKH�FRQIXVLRQ�QHWZRUNVEXLOW�IURP�WKH���JUDP�UHVFRUHG�ZRUG�JUDSKV�

)LJXUH���SUHVHQWV�WKH�JHQHUDO�DUFKLWHFWXUH�RI�WKH������/,80 $65 V\VWHP�LQ�WKH�IUDPHZRUNRI�WKH�(8066, SURMHFW�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 8: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

DNN-HMM acoustic models6 hidden layers

with 2048 neurons each

bigram language model

1

5-gram (continuous space) language model

2

word graph hypotheses

Build word graph hypotheses

Decode

Rescore

confusion network

Final recognitionhypothesis

Speaker segmentation and diarization

0

Consensus

Parametrize

TRAP features

)LJXUH �� *HQHUDO�DUFKLWHFWXUH�RI�WKH������/,80 $65 V\VWHP�GHYHORSHG�LQ�WKH�IUDPHZRUN�RIWKH�(8066, SURMHFW

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 9: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

� (9$/8$7,21 2) 7+( $65 6<67(06

��� (QJOLVK�ODQJXDJH� SDUWLFLSDWLRQ�WR�WKH�$65 WDVN�RI�WKH�0*%�����&KDO�OHQJH

7KH�0XOWL�*HQUH�%URDGFDVW�&KDOOHQJH�DW�$658 �����LV�D�FRQWUROOHG�HYDOXDWLRQ�RI�VSHHFK�UHFRJ�QLWLRQ� VSHDNHU�GLDUL]DWLRQ� DQG�OLJKWO\�VXSHUYLVHG�DOLJQPHQW�XVLQJ�%%& 79 UHFRUGLQJV� 7KLVFKDOOHQJH�ZDV�DQ�RIILFLDO�HYHQW�RI�WKH�,((( ZRUNVKRS�RQ�$XWRPDWLF�6SHHFK�5HFRJQLWLRQ�DQG8QGHUVWDQGLQJ�

/,80 SDUWLFLSDWHG�WR�WZR�VXEPLVVLRQV� RQH�LV�WKH�IDVW�V\VWHP�GHYHORSHG�LQ�WKH�IUDPHZRUNRI�WKH�(8066, SURMHFW� WXQHG�WR�EH�DV�IDVW�DV�SRVVLEOH�ZLWK�D�KLJK�DFFXUDF\� DQG�WKH�DQRWKHURQH� PDGH�LQ�FROODERUDWLRQ�ZLWK�WKH�&HQWHU�RI�5HVHDUFK�RQ�,QIRUPDWLFV�RI�0RQWUHDO��&5,0��GRHV�QRW�WDNH�LQWR�DFFRXQW�WKH�FRPSXWDWLRQ�WLPH� DQG�DLPV�WR�JHW�D�WKH�ORZHVW�SRVVLEOH�ZRUGHUURU�UDWH�

����� /DQJXDJH�PRGHOV

7KH�/,80 $65 V\VWHP�LQYROYHG�LQ�WKH�0*% FKDOOHQJH�XVHV���JUDP� ��JUDP� ��JUDP�EDFN�RII/0V� DQG�D���JUDP�EDFN�RII�XVHG�LQ�FRPELQDWLRQ�ZLWK�D���JUDP�IHHG�IRUZDUG�QHXUDO�QHWZRUNPRGHO� LW�LV�WKH�RQH�GHVFULEHG�LQ�VHFWLRQ ������ %DFN�RII�/0V�ZHUH�HVWLPDWHG�WKURXJK�WKH65,/0 WRRONLW� ZKLOH�WKH�QHXUDO�QHWZRUN�ODQJXDJH�PRGHO��11/0� ZDV�HVWLPDWHG�E\�XVLQJ�WKH&6/0 WRRONLW� GHYHORSHG�DW�/,80 DQG�GLVWULEXWHG�XQGHU�/*3/ OLFHQVH >��@� $OO�WKHVH�ODQJXDJHPRGHOV�FUHDWHG�E\�WKH�/,80 ZHUH�HVWLPDWHG�RQ�WKH�HQWLUH�QRUPDOL]HG�GDWD�SURYLGHG�E\�WKHRUJDQL]HUV� 1R�/0 DGDSWDWLRQ�ZDV�DSSOLHG�

)RU�WKLV�FDPSDLJQ� /,80V�YRFDEXODU\�FRQWDLQV����. ZRUGV� WKH�PRVW�IUHTXHQW�RQHV�LQWKH�QRUPDOL]HG�WUDLQLQJ�GDWD� &ODVVLFDO�EDFN�RII�Q�JUDP�PRGHOV�ZHUH�WUDLQHG�E\�XVLQJ�WKHPRGLILHG�.QHVHU�1H\�VPRRWKLQJ� ZLWKRXW�FXWRII�QRU�SUXQLQJ� 7KH���JUDP�/0 LV�FRPSRVHG�RI���. ��JUDPV� ��0 ��JUDPV� ���0 ��JUDPV� ���0 ��JUDPV� DQG����0 ��JUDPV� 7KH���JUDP11/0 LV�FRPSRVHG�RI�D�SURMHFWLRQ�OD\HU�RI�����XQLWV� FRUUHVSRQGLQJ�WR�����GLPHQVLRQDO�ZRUGHPEHGGLQJV� WZR�KLGGHQ�OD\HUV�RI������XQLWV�HDFK� DQG�DQ�RXWSXW�OD\HU�SURYLGLQJ�SUREDELOLWLHVIRU�D�VKRUW�OLVW�FRPSRVHG�RI�WKH�������PRVW�IUHTXHQW�ZRUGV�

7KH�LPSDFW�RI�WKH�XVH�RI�HDFK�/0V�SUHVHQWHG�LQ�VHFWLRQ ������

����� $FRXVWLF�PRGHOV� XVLQJ�LPSHUIHFW�WUDQVFULSWV�WR�EXLOG�D�WUDLQLQJ�FRUSXV�IRUDFRXVWLF�PRGHOV

7R�WUDLQ�WKH�DFRXVWLF�PRGHOV� SDUWLFLSDQWV�WR�WKH�0*% SURMHFW�FRXOG�RQO\�XVHG�WKH�LPSHUIHFWWUDQVFULSWV�RI�DERXW�����K�RI�79 VKRZV� ,PSHUIHFW� WUDQVFULSWV�ZHUH�ERWK�VXEWLWOHV�PDGHPDQXDOO\�ZLWK�YHU\�URXJK�WLPHFRGHV�DQG�DXWRPDWLF�WUDQVFULSWV�SURYLGHG�E\�D�EDVHOLQH�V\VWHPRZQHG�E\�WKH�RUJDQL]HUV�

/,80 LQYHVWLJDWHG� LWV�RZQ�DSSURDFK� WR�H[WUDFW� UHOHYDQW�DXGLR�WH[W�DOLJQPHQWV� WR� WUDLQDFRXVWLF�PRGHOV� )LUVW� $65 RXWSXWV�DQG�SURQXQFLDWLRQ�GLFWLRQDU\�SURYLGHG�E\�WKH�RUJDQL]HUVZHUH�XVHG�WR�WUDLQ�'11 DFRXVWLF�PRGHOV� 7KHQ� DOO�WKH�DXGLR�ILOHV�SURYLGHG�E\�WKH�RUJDQL]HUVDV�SDUW�RI�WKH�WUDLQLQJ�FRUSXV�ZHUH�SURFHVVHG�E\�XVLQJ�WKH�/,80 LQWHUQDO�WRRO�IRU�VSHDNHUGLDUL]DWLRQ >��@� (DFK�SURGXFHG�VSHHFK� VHJPHQW�ZDV� WUDQVFULEHG�E\�XVLQJ� WKH� ILUVW�'11DFRXVWLF�PRGHOV� FRPELQHG�ZLWK�D���JUDP�ODQJXDJH�PRGHO�SUHVHQWHG�LQ�VHFWLRQ ������ 7KLV

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 10: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

SURFHVVLQJ�JHQHUDWHG�D�ZRUG�JUDSK�IRU�HDFK�VSHHFK�VHJPHQW� (DFK�ZRUG�JUDSK�ZDV�DOLJQHGZLWK�VXEWLWOHV�PDGH�E\�KXPDQ�DQQRWDWRUV�DQG�SURYLGHG�ZLWK�WKH�DXGLR� ILOHV� :RUG�JUDSKDOLJQPHQW�FRQVLVWV�RI�VHDUFKLQJ�D�SDWK�ZLWKLQ�WKH�ZRUG�JUDSK�WKDW�PDWFKHV�ZLWK�WKH�VXEWL�WOHV� DFFHSWLQJ�WKDW�URXJK�WLPHFRGH�YDOXHV�IURP�VXEWLWOHV�DQG�SUHFLVH�WLPHFRGH�YDOXHV�ZLWKLQWKH�ZRUG�JUDSK�FRXOG�EH�GHOD\HG�E\����VHFRQGV�PD[� 2QO\�ORQJ�VSHHFK�VHJPHQWV�ZLWK�QRPRUH�WKDQ�RQH�ZRUG�PLVPDWFK��LQVHUWLRQ� VXEVWLWXWLRQ� RU�GHOHWLRQ��EHWZHHQ�VXEWLWOHV�DQG�WKHFORVHVW�SDWK�LQ�WKH�ZRUG�JUDSK�ZHUH�VHOHFWHG� 7KH�WH[W�DVVRFLDWHG�ZLWK�D�VHOHFWHG�VSHHFKVHJPHQW� LV� WKH�RQH� FRPLQJ� IURP� WKH� FORVHVW� SDWK� LQ� WKH�ZRUG�JUDSK� LQ� UHJDUGV�ZLWK� WKHVXEWLWOHV� 7KH�WUDLQLQJ�DOLJQPHQWV�JHQHUDWHG�E\�/,80 UHVXOW�LQ�����KRXUV�RI�WUDLQLQJ�DXGLR�

����� :RUG�HUURU�UDWH�DQG�FRPSXWDWLRQ�WLPH

7DEOH � SUHVHQWV�WKH�RIILFLDO�UHVXOWV�RI�WKH�0*% FDPSDLJQ� 7KH\�ZLOO�EH�SXEOLVKHG�GXULQJ�WKH,((( $658 ZRUNVKRS�RQ�'HFHPEHU������

6\VWHP *OREDO�:(56\V� �����

&5,0�/,80 �����6\V�� �����6\V�� �����6\V�� �����

)DVW�/,80 �����6\V�� �����6\V�� �����6\V�� �����6\V��� �����6\V��� �����6\V��� �����

7DEOH �� 2IILFLDO�UHVXOWV�RI�WKH�0*% &KDOOHQJH�

7KH�IDVW�/,80 $65 V\VWHP�UHDFKHV�DQ� LQWHUHVWLQJ�UDQN� ZKLOH�WKLV�V\VWHP�LV�GHVLJQHGWR�EH�DV�IDVW�DV�SRVVLEOH� LW�UHDFKHV�WKH��WK�UDQN�RQ����SDUWLFLSDQWV� 7KLV�V\VWHP�ZDV�DOVRLQWHJUDWHG�LQWR�WKH�$65 V\VWHP�FRPELQDWLRQ�EXLOW�ZLWK�WKH�&5,0��ZKLFK�UHDFKHV�WKH�VHFRQGUDQN�RI�WKH�FKDOOHQJH� 7KH�ILYH�ILUVW�V\VWHPV�DUH�EXLOW�RQ�D�VXFK�DUFKLWHFWXUH�EDVHG�RQ�$65V\VWHP�FRPELQDWLRQ� WKLV�LPSOLHV�D�FRPSXWDWLRQ�WLPH�KLJKO\�PRUH�LPSRUWDQW�WKDQ�WKH�RQHQHHGHG�IRU�WKH�IDVW�/,80 V\VWHP�

7KLV�FRPSXWDWLRQ�WLPH�RI�WKH�IDVW�/,80 V\VWHP�ZDV�DQDO\]HG�LQ�GHWDLOV�RQ�WKH�GHYHORSPHQWFRUSXV� (DFK�VWHS�KDV�EHHQ�VWXGLHG� DQG�WDEOH � SUHVHQWV�WKHVH�UHVXOWV� 6SHHG�LV�FRPSXWHGLQ�WHUPV�RI�UHDO�WLPH� 7KH�HQWLUH�GHFRGLQJ�SURFHVV�QHHGV������WLPHV�UHDO�WLPH� ZKLFKPHDQV�WKDW����PLQXWHV�DUH�QHFHVVDU\�WR�SURFHVV�����PLQXWHV���K����RI�VSHHFK�7KLV�FDQ�EH�UHGXFHG�WR������WLPHV�UHDO�WLPH�LI�WKH�&6/0 UHVFRULQJ�LV�QRW�DSSOLHG� 7KLV�ZRXOGLPSO\�DQ�LQFUHDVH�RI�WKH�ZRUG�HUURU�UDWH����SRLQW�� 2WKHU�LQWHUQDO�H[SHULPHQWV� QRW�GHVFULEHGKHUH� KDV�VKRZQ�WKDW�VLPLODU�ZRUG�HUURU�UDWHV�ZHUH�UHDFKHG�E\�WKH������V\VWHP�DQG�WKH�IDVW�����RQH� 7KH�PRVW�LQWHUHVWLQJ�GLIIHUHQFH�FRPHV�IURP�WKH�FRPSXWDWLRQ�WLPH� VLQFH�WKH�����$65 V\VWHP�LV�PRUH�WKDQ�WHQ�WLPH�IDVWHU�WKDQ�WKH������$65 RQH�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 11: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

6WHS &RPPHQW :(5 &RPSXW� WLPH� '11 ����JUDP � ������[�57� ��JUDP�UHVFRULQJ ����� �������[�57� ��JUDP�UHVFRULQJ ����� ������[�57� &6/0 ��JUDP�UHVFRULQJ ����� ����[�57� FRQVHQVXV ����� ������[�57

7RWDO )XOO�SURFHVV ����� �����[�57�RIILFLDO�VXEPLVVLRQ�

7DEOH �� :(5 DQG�FRPSXWDWLRQ�WLPH��LQ�5HDO�7LPH��RQ�WKH�'HY�VHW��GHY�IXOO�GHY�ORQJLWXGLQDO�RI�WKH�)DVW�/,80 V\VWHP�ZKLFK�KDV�SDUWLFLSDWHG�WR�WKH�0*% &KDOOHQJH�

7KH�0*% GDWDVHW�LV�FRPSRVHG�RI�YHU\�KHWHURJHQHRXV�GDWD� 7KLV�LPSOLHV�D�KLJK�YDULDELO�LW\�RI�DFRXVWLF�FRQGLWLRQV�DQG�VSRNHQ�ODQJXDJHV��IOXHQW� GLVIOXHQW� VSRQWDQHRXV� SUHSDUHG�IDPLOLDU� XQIDPLOLDU� ����� 7DEOH � SUHVHQWV�GHWDLOHG�UHVXOWV�RI�WKH�/,80 $65 V\VWHP�IRU�HDFKNLQG�RI�VKRZ�LQ�WKH�0*% WHVW�HYDOXDWLRQ�GDWDVHW� 2QH�FDQ�QRWLFH�WKDW�WKH�$65 REWDLQV�JRRGUHVXOWV�RQ�GRFXPHQWDULHV�RU�SROLWLFDO�QHZV� ZKLFK�DUH�GDWD�FORVH�WR�WKH�RQHV�SURYLGHG�E\�WKH'HXWVFKH�:HOOH�SDUWQHU� 3HUIRUPDQFHV�DUH�GHJUDGHG�ZKHQ�VNHWFK�FRPHGLHV�RU�VHULHV�DUHSURFHVVHG� $ VLPLODU�EHKDYLRXU�LV�REVHUYHG�ZLWK�WKH�&5,0�/,80 $65 V\VWHP�DQG�ZLWK�DOO�WKHRWKHU�$65 V\VWHPV�SDUWLFLSDWLQJ�WR�WKH�0*% FDPSDLJQ�

��� *HUPDQ�ODQJXDJH� SDUWLFLSDWLRQ�WR�WKH�$65 WDVN�RI�WKH�,:6/7 ����HYDOXDWLRQ�FDPSDLJQ

/DVW�\HDU� /,80 KDV�SDUWLFLSDWHG�WR�WKH�$65 WDVN�RI�WKH�,:6/7 �����HYDOXDWLRQ�FDPSDLJQIRU�WKH�(QJOLVK�ODQJXDJH� LQ�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW� DQG�DOVR�WR�WKH�$65 WDVNIRU�WKH�,WDOLDQ�ODQJXDJH�LQ�WKH�IUDPHZRUN�RI�D�SDUWQHUVKLS�ZLWK�DQ�LQGXVWULDO�SDUWQHU� 7KLV\HDU� ZH�DLPHG�WR�SDUWLFLSDWH�WR�WKH�$65 WDVN�RI�WKH�,:6/7 �����FDPSDLJQ�IRU�WKH�*HUPDQODQJXDJH� WR�HYDOXDWH�RXU�$65 V\VWHP�GHGLFDWHG�WR�*HUPDQ�ODQJXDJH�DQG�GHYHORSHG�IRU�WKH(8066, SURMHFW�

$ FUXFLDO�LVVXH�LQ�GHYHORSLQJ�VXFK�D�V\VWHP�LV�WKH�DFFHVV�WR�DYDLODEOH�WUDLQLQJ�FRUSXV�IRUDFRXVWLF�PRGHOV�LQ�WKH�IRFXVHG�ODQJXDJH� 7KHVH�WUDLQLQJ�GDWD�DUH�LGHDOO\�DXGLR�UHFRUGLQJVZLWK�PDQXDO�WUDQVFULSWLRQV� )RU�WKH�*HUPDQ�ODQJXDJH� VXFK�GDWD�DUH�YHU\�UDUH�DW�D�UHDVRQDEOHFRVW� +RSHIXOO\� WKH�LQGXVWULDO�SDUWQHU�ZKLFK�FR�SDUWLFLSDWHG�ZLWK�/,80 LQ�WKH�,:6/7 ����$65 WDVN�IRU�,WDOLDQ�ODQJXDJH�KDV�QLFHO\�DFFHSWHG�WR�SURYLGH�D�OLWWOH�PRUH�WKDQ�RQH�KXQGUHGRI�KRXUV�RI�DXGLR�UHFRUGLQJV� LQ�*HUPDQ�ZLWK�WKHLU�PDQXDO�WUDQVFULSWLRQV� 7KLV�DJUHHPHQWEHWZHHQ�/,80 DQG�LWV�LQGXVWULDO�SDUWQHU� ZKLFK�LV�QRW�D�(8066, SDUWQHU� KDV�EHHQ�VLJQHG�IRUUHVHDUFK�SXUSRVH�RQO\� OLPLWHG�WR�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW�

7KH�IDVW�/,80 $65 V\VWHP�GHVFULEHG�DERYH�ZDV�DGDSWHG�WR�*HUPDQ�ODQJXDJH� D�YRFDEX�ODU\�RI����. ZRUGV��WZLFH�PRUH�WKDQ�WKH�(QJOLVK�RQH�WR�GHDO�ZLWK�*HUPDQ�FRPSRXQG�ZRUGV�KDV�EHHQ�EXLOW� ZLWK�ODQJXDJH�PRGHOV�WUDLQHG�RQ�WKH�GDWD�GHVFULEHG�LQ�7DEOH �� $ GDWD�VHOHF�WLRQ�ZDV�PDGH�E\�XVLQJ�DQ�LQWHUQDO�WRRO >��@� ;HQ& �GLVWULEXWHG�XQGHU�RSHQ�VRXUFH�OLFHQVH��EDVHG�RQ�FURVV�HQWURS\ >��@� 7KH�GDWD�VHOHFWLRQ�SHUPLWWHG�XV�WR�IRFXV�RQ�WKH�,:6/7 ����WRSLFV��7(' FRQIHUHQFH�WDONV�� $FRXVWLF�PRGHOV�ZHUH�WUDLQHG�RQ�WKH�����KRXUV�SURYLGHG�E\WKH�/,80V�LQGXVWULDO�SDUWQHU�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

��� ��

Page 12: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

6KRZ *OREDO�:(5 &RPPHQWV'UDJRQV�'HQ ����� 5HDOLW\�WHOHYLVLRQ�IHDWXULQJ�HQWUHSUHQHXUV

SLWFKLQJ�WKHLU�EXVLQHVV�LGHDV'DLO\�3ROLWLFV ����� &XUUHQW�DIIDLUV�DQG�SROLWLFV�

LQWHUYLHZV�ZLWK�OHDGLQJ�SROLWLFLDQVDQG�SROLWLFDO�FRPPHQWDWRUV

0DJQHWLF�1RUWK ����� 'RFXPHQWDU\$WKOHWLFV�/RQGRQ �����

(JJKHDGV ����� 4XL]�VKRZ3RLQW�RI�9LHZ �����6\G�%DUUHWW �����7RS�*HDU ����� 0RWRULQJ�(QWHUWDLQPHQW%OXH�3HWHU �����

/HJHQG�RI�WKH�'UDJRQ �����7KH�1RUWK�:HVW���� �����

+ROE\�&LW\ �����7KH�:DOO �����

2QH�/LIH�6SHFLDO�0XP �����*RRGQHVV�*UDFLRXV�0H ����� 6NHWFK�FRPHG\

2OLYHU�7ZLVW ����� 0LQLVHULHV� %ULWLVK�WHOHYLVLRQ�DGDSWDWLRQRI�&KDUOHV�'LFNHQV�QRYHO�2OLYHU�7ZLVW

7DEOH �� 'HWDLOHG�RIILFLDO�UHVXOWV�RI�WKH�IDVW�/,80 $65 V\VWHP�RQ�WKH�WHVW�VHW�RI�WKH�0*%&KDOOHQJH�

)ROORZLQJ�WKH�VDPH�VWUDWHJ\�DV�RXU�MRLQW�SDUWLFLSDWLRQ�WR�WKH�0*% &KDOOHQJH�RQ�(QJOLVKZLWK�WKH�&5,0��ZH�KDYH�DOVR�GHYHORSHG�DQ�VHFRQG�$65 V\VWHP� EDVHG�RQ�WKH�SUHYLRXV�/,80$65 V\VWHP� GHYHORSHG�LQ������ DQG�GHVFULEHG�LQ�WKH�UHSRUW�'���� LQ�RUGHU�WR�REWDLQ�EHWWHUSHUIRUPDQFHV�LQ�WHUPV�RI�DFFXUDF\� 7DEOH � SUHVHQWV�WKH�UHVXOWV�REWDLQHG�E\�WKH�WZR�VLQJOH$65 V\VWHPV�DQG�WKHLU�FRPELQDWLRQ�RQ�WKH�,:6/7 �����GHYHORSPHQW�FRUSXV� 7KH�JDS�EH�WZHHQ�WKH�UHVXOWV�RI�WKH������DQG������V\VWHPV�PXVW�QRW�EH�LQWHUSUHWHG�DV�DQ�LPSURYHPHQWRI� WKH�DFFXUDF\�EHWZHHQ�WKH������DQG������V\VWHPV� DFWXDOO\� ZH� LQWURGXFHG�DOVR�VRPHFKDQJHV��/0 ZHLJKWV� '11 WUDLQLQJ� KHXULVWLFV� ���� LQ�WKH������EDVHG�V\VWHP�ZKLFK�FRXOGGHJUDGH�LWV�UHVXOWV� RXU�JRDO�ZDV�WR�SURGXFH�WZR�VXIILFLHQWO\�GLIIHUHQW�V\VWHPV�LQ�RUGHU�WRJHW�VRPH�FRPSOHPHQWDULWLHV�XVHIXO�WR�WKH�V\VWHP�FRPELQDWLRQ� &RPELQLQJ�WKH������IDVW�$65V\VWHP�DQG�WKH������EDVHG�RQH�DOORZHG�XV�WR�LPSURYH�WKH�SHUIRUPDQFHV�RI�WKH������$65V\VWHP�

2IILFLDO�SHUIRUPDQFHV�RI�WKH�$65 V\VWHP�ZHUH�HYDOXDWHG�E\�WKH�RUJDQL]HUV�RI�WKH�,:6/7�����FDPSDLJQ� 2IILFLDO�UHVXOWV�DUH�SUHVHQWHG�LQ�7DEOH ��

7KLV�VKRZV�WKDW�D�:(5 RI�������ZDV�UHDFKHG�E\�WKH�FRPELQHG�/,80 $65 V\VWHP�RQWKH�HYDOXDWLRQ�FRUSXV� :LWK� WKLV�SHUIRUPDQFH� /,80 ZRQ�WKLV�FRPSHWLWLRQ� 7KH�VHF�RQG�SRVLWLRQ�ZDV�UHDFKHG�E\�WKH�*HUPDQ�.DUOVUXKH�,QVWLWXWH�RI�7HFKQRORJ\� ZKLFK�ZRQ�WKHFRPSHWLWLRQ�LQ������

$V�D�VXFFHVV�LQGLFDWRU RI�WKH�(8066, GRFXPHQW�RI�ZRUN� WKH�:3��KDG�WR�SURGXFH

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 13: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

&RUSXV2ULJLQDO�� 6HOHFWHG�� ��RIRI�ZRUGV RI�ZRUGV 2ULJ�

PDQXDO�WUDQVFULSWLRQV�RI�VSHHFK ����0 ����0 ������&RPPRQ�&UDZO �����0 ����0 ����(XURSDUO �����0 ����0 ����1HZV�&UDZO ���* ������0 ����1HZV�&RPP� ����0 ����0 �����7RWDO��Z�R�,:6/7��� ���* ������0 ����

7DEOH �� &KDUDFWHULVWLFV�RI�WKH�WH[W�GDWD�XVHG�WR�WUDLQ�WKH�ODQJXDJH�PRGHOV�IRU�WKH�*HUPDQ$65 V\VWHPV�

6\VWHP :(5�����)DVW�$65 V\VWHP ����������EDVHG�$65 V\VWHP �����

$65 FRPELQDWLRQ �����

7DEOH �� :RUG�HUURU�UDWH�RI�WKH�/,80 $65 V\VWHPV�RQ�WKH�,:6/7 �����GHYHORSPHQW�FRUSXVRQ�*HUPDQ�ODQJXDJH�

DQ�$65 V\VWHP�DEOH�WR�UHDFK�WKH�:RUG�(UURU�5DWH�REWDLQHG�E\�WKH�EHVW�$65 V\VWHPGXULQJ�,:6/7 �����LQ�*HUPDQ�ODQJXDJH� ,Q������ WKH�EHVW�$65 V\VWHP�UHDFKHG������RI�:(5 RQ�WKH�WVW�����GDWD� 2XU������EDVHG�$65 V\VWHP�UHDFKHV������RI�:(5 RQ�WKH�WVW�����GDWD ZKLFK�LV�QRZ�D�SDUW�RI�WKH�,:6/7 �����GHYHORSPHQW�FRUSXV�,Q�DGGLWLRQ� LW�LV�YHU\�SUREDEOH�WKDW�ZH�FDQ�UHGXFH�PRUH�WKLV�:(5 E\�XVLQJ�RXU������V\VWHP�DQG�PRUH�DJDLQ�E\�FRPELQLQJ�ERWK�WKH������DQG������V\VWHPV� ZH�FDQ�HVWLPDWH�WKDW�RXUJRDO�LQ�WHUPV�RI�DFFXUDF\�LQ�*HUPDQ�KDV�EHHQ�PRUH�WKDQ�UHDFKHG�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 14: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

6\VWHP :(5/,80 �����

.,7 �.DUOVUXKH�,QVWLWXWH�RI�7HFKQRORJ\� �����0//3 �����

7DEOH �� 2IILFLDO�UHVXOWV�RI�WKH�$65 *HUPDQ�WUDFN�RI�WKH�,:6/7 �����WHVW�FRUSXV�RQ�*HUPDQODQJXDJH�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 15: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

� $65 (5525 '(7(&7,21

,Q�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW� DXWRPDWLF�WUDQVFULSWV�DUH�H[SORLWHG�WR�UHWULHYH�VSH�FLILF�YLGHR�GRFXPHQWV� DQG�DOVR� WR�JHW�D� IDVW�DFFHVV� WR� LQIRUPDWLRQ�VXSSRUWHG�E\�VSHHFK�:LWK� WKH�VWDWH�RI�WKH�DUW� WHFKQRORJ\� HUURUV�DUH�XQDYRLGDEOH� OLNH� WKLV� FDQ�EH�REVHUYHG� LQWKH�UHVXOWV�SUHVHQWHG�LQ�VHFWLRQ �� ,Q�:3�� ZH�VWDUWHG�RQH�\HDU�DJR�ZRUNLQJ�RQ�$65 HUURUGHWHFWLRQ� LQ�WKH�IUDPHZRUN�RI�WKH�0UV�6DKDU�*KDQQD\V�3K� '�� SDUWLDOO\�IXQGHG�������E\WKH�(8066, SURMHFW� 9HU\�JRRG�SUHOLPLQDU\�UHVXOWV�KDYH�EHHQ�DOUHDG\�REWDLQHG�RQ�)UHQFKODQJXDJH� RXWSHUIRUPLQJ�UHFHQW�VWDWH�RI�WKH�DUW�DSSURDFKHV�EDVHG�RQ�WKH�XVH�RI�&RQGLWLRQDO5DQGRP�)LHOGV��&5)��

,Q�WKLV�ZRUN� ZH�KDYH�LQYHVWLJDWHG�WKH�XVH�RI�ZRUG�HPEHGGLQJV�DV�LQSXW�IHDWXUHV�RI�DQHXUDO�QHWZRUN�EDVHG�HUURU�GHWHFWLRQ�V\VWHP� :H�H[SHULPHQWHG�WKH�XVH�RI�WKUHH�GLIIHUHQWW\SHV�RI�ZRUG�HPEHGGLQJV�DQG�SURSRVH�WR�FRPELQH�WKHP�ZLWK�DQ�DXWR�HQFRGHU�LQ�RUGHU�WRWDNH�DGYDQWDJH�RI�WKHLU�FRPSOHPHQWDU\�

��� 5HODWHG�ZRUN

)RU�WZR�GHFDGHV� PDQ\�VWXGLHV�KDYH�IRFXVHG�RQ�WKH�$65 HUURU�GHWHFWLRQ�WDVN� 5HFHQWO\� WKHEHVW�SURSRVHG�DSSURDFKHV�ZHUH�EDVHG�RQ�WKH�XVH�RI�&5)��,Q�>��@� DXWKRUV�KDYH�IRFXVHG�RQGHWHFWLQJ�HUURU�UHJLRQV�JHQHUDWHG�E\�2XW�2I�9RFDEXODU\��229� ZRUGV� 7KH\�SURSRVHG�DQDSSURDFK�EDVHG�RQ�&RQGLWLRQDO�&5) WDJJHU� ZKLFK�WDNHV�LQWR�DFFRXQW�FRQWH[WXDO�LQIRUPDWLRQIURP�QHLJKERULQJ�UHJLRQV�LQVWHDG�RI�FRQVLGHULQJ�RQO\�WKH�ORFDO�UHJLRQ�RI�229 ZRUGV� $ VLPLODUDSSURDFK�IRU�RWKHU�$65 HUURUV�ZDV�SUHVHQWHG�LQ >�@� ZKLFK�SURSRVHV�DQ�HUURU�GHWHFWLRQ�V\VWHPEDVHG�RQ�&5) WDJJHU�XVLQJ�YDULRXV�$65��OH[LFDO�DQG�V\QWDFWLF�IHDWXUHV�

,Q�WKH�ZRUN�PDGH�LQ�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW� ZH�FRPSDUH�WKH�SHUIRUPDQFHVRI�WKH�VWDWH�RI�WKH�DUW�&5)�EDVHG�$65 HUURU�GHWHFWLRQ�V\VWHP�SURSRVHG�LQ >�@�WR�RXU�SURSRVL�WLRQ�EDVHG�RQ�D�QHXUDO�QHWZRUN�DUFKLWHFWXUHV�DQG�WKH�XVH�RI�DQ�HIIHFWLYH�FRPELQDWLRQ�RI�ZRUGHPEHGGLQJV�EXLOW�RQ�D�KXJH�WH[W�FRUSXV�

��� 6HW�RI�IHDWXUHV$Q�HUURU�GHWHFWLRQ�V\VWHP�KDV�WR�DWWULEXWH�WKH�ODEHOV FRUUHFW �F��RU HUURU �H��WR�HDFK�ZRUG� 7KLVDWWULEXWLRQ�LV�PDGH�E\�DQDO\]LQJ�HDFK�UHFRJQL]HG�ZRUG�ZLWKLQ�LWV�FRQWH[W� $ VHW�RI�UHOHYDQWIHDWXUHV�PXVW�EH�VHOHFWHG�WR�FDSWXUH�WKH�JRRG�LQIRUPDWLRQ�WR�JHW�D�SUHFLVH�FODVVLILFDWLRQ�

����� $65��OH[LFDO�DQG�V\QWDFWLF�IHDWXUHV

,Q�WKLV�ZRUN� ZH�QHDUO\�XVH�WKH�VDPH�IHDWXUHV�DV�WKH�RQH�SUHVHQWHG�LQ >�@� ZKLFK�DUH�GHWDLOHGDV�IROORZV�

� $65 IHDWXUHV� SRVWHULRU�SUREDELOLWLHV�JHQHUDWHG�IURP�WKH�$65 V\VWHP�

� /H[LFDO�IHDWXUHV� OHQJWK�RI�WKH�FXUUHQW�ZRUG�DQG�WKUHH�ELQDU\�IHDWXUHV�LQGLFDWLQJ�LI�WKHWKUHH���JUDPV�FRQWDLQLQJ�WKH�FXUUHQW�ZRUG�KDYH�EHHQ�VHHQ�LQ�WKH�WUDLQLQJ�FRUSXV�RI�WKH$65 ODQJXDJH�PRGHO�

� 6\QWDFWLF� IHDWXUHV� 326 WDJ� GHSHQGHQF\� ODEHOV�DQG�ZRUG�JRYHUQRUV� ZKLFK�DUH�H[�WUDFWHG�IURP�WKH�WUDQVFULSWLRQV�E\�XVLQJ�WKH�0$&$21 1/3 7RRO�FKDLQ ��

�KWWS���PDFDRQ�OLI�XQLY�PUV�IU

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 16: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

� :RUG� RUWKRJUDSKLF�UHSUHVHQWDWLRQ�LQ�&5) DSSURDFKHV� DV�XVHG�LQ >�@� :LWK�RXU�QHX�UDO�DSSURDFK� ZH�ZLOO�XVH�ZRUG�YHFWRUV� ZKLFK�SHUPLW�XV�WR�WDNH�DGYDQWDJH�RI�VRPHJHQHUDOL]DWLRQV�H[WUDFWHG�GXULQJ�WKH�FRQVWUXFWLRQ�RI�WKHVH�ZRUG�HPEHGGLQJV�

����� :RUG�HPEHGGLQJV

:RUG�HPEHGGLQJV�DUH�YHFWRU�UHSUHVHQWDWLRQV�RI�ZRUGV�WKDW�KDYH�EHHQ�VXFFHVVIXOO\�XVHG�LQVHYHUDO�QDWXUDO�ODQJXDJH�SURFHVVLQJ�WDVNV�>�@� 7KLV�UHSUHVHQWDWLRQ�LV�D�YHFWRU�VSDFH�FDQ�EHWUDLQHG�WKURXJK�GLIIHUHQW�PHWKRGV�DQG�LV�FRPSXWHG�IURP�D�WH[WXDO�FRUSXV�

,Q�RXU�VWXG\� ZH�KDYH�WHVWHG�GLIIHUHQW�NLQGV�RI�ZRUG�HPEHGGLQJV�FRPLQJ�IURP�GLIIHUHQWDYDLODEOH� LPSOHPHQWDWLRQV� &ROOREHUW�DQG�:HWVRQ�ZRUG�HPEHGGLQJV� UHYLVLWHG�E\�7XULDQ� LQ>��@� FRQWLQXRXV�EDJ�RI�ZRUGV��&%2:� DQG�VNLS�QJUDPV�SURSRVHG�E\�0LNRORY�LQ >��@� JOREDOYHFWRUV��*OR9H��LQWURGXFHG�LQ >��@� DQG�ZRUG�HPEHGGLQJ�H[WUDFWHG�IURP�D�QHXUDO�QHWZRUNODQJXDJH�PRGHO� VLPLODU� WR� WKH�RQH�XVHG� LQ�RXU�$65 V\VWHP >��@� 2XU�JRDO�ZDV� WR�EXLOGFRPSOHPHQWDU\�ZRUG�HPEHGGLQJV� IRU� WKH�$65 WDVN�GHWHFWLRQ� )RU� WKLV� WDVN� ZH�QHHG� WRFDSWXUH�V\QWDFWLF�LQIRUPDWLRQ�LQ�RUGHU�WR�XVH�WKHP�WR�DQDO\]H�VHTXHQFHV�RI�UHFRJQL]HG�ZRUGV�EXW�ZH�DOVR�QHHG�WR�FDSWXUH�VHPDQWLF�LQIRUPDWLRQ�WR�PHDVXUH�WKH�UHOHYDQFH�RI�FR�RFFXUUHQFHVRI�ZRUG�LQ�WKH�VDPH�$65 K\SRWKHVLV�

����GLPHQVLRQDO�ZRUG�HPEHGGLQJV�ZHUH�FRPSXWHG�IURP�D�ODUJH�WH[WXDO�FRUSXV� FRPSRVHGRI�DERXW���ELOOLRQV�RI�ZRUGV� 7KLV�FRUSXV�ZDV�EXLOW�IURP�DUWLFOHV�RI�WKH�)UHQFK�QHZVSDSHU��/H0RQGH�� IURP�WKH�)UHQFK�*LJDZRUG�FRUSXV� IURP�DUWLFOHV�SURYLGHG�E\�*RRJOH�1HZV� DQG�IURPPDQXDO�WUDQVFULSWLRQV�RI�DERXW�����KRXUV�RI�)UHQFK�EURDGFDVW�QHZV�

,Q�RUGHU� WR� WDNH�DGYDQWDJH�RI� WKHLU�FRPSOHPHQWDU\� ZH�SURSRVH�WR�FRPELQH�WKH�ZRUGHPEHGGLQJV�E\�LQYHVWLJDWLQJ�WKH�XVH�RI�DQ�DXWR�HQFRGHU >��@�RU�WKH�XVH�RI�D�FODVVLFDO�3ULQFLSDO&RPSRQHQW�$QDO\VLV��3&$���7KH�DXWR�HQFRGHU�LV�FRPSRVHG�RI�RQH�KLGGHQ�OD\HU�ZLWK�����RU����KLGGHQ�XQLWV� ,W�WDNHV�DV�LQSXW�D�FRQFDWHQDWLRQ�RI�WKH�GLIIHUHQW�HPEHGGLQJ�YHFWRUV�DQGRXWSXWV�D�YHFWRU�ZLWK�WKH�VDPH�VL]H�DV�WKH�LQSXW�YHFWRU� 7KH�DXWR�HQFRGHU�LV�WUDLQHG�LQ�RUGHUWR�JHW� LQ�RXWSXW� WKH� VDPH�YHFWRUV�DV� WKH�RQHV�SUHVHQWHG�DV� LQSXWV� )RU�HDFK�ZRUG� WKHYHFWRU�RI�QXPHULFDO�YDOXHV�SURGXFHG�E\�WKH�KLGGHQ�OD\HU�ZLOO�EH�XVHG�DV�WKH�FRPELQHG�ZRUGHPEHGGLQJ�

��� 1HXUDO�QHWZRUN�DUFKLWHFWXUH

1HXUDO�QHWZRUNV�DFFHSWLQJ�RQO\�GLJLWDO�GDWD�YHFWRUV� IHDWXUHV�PXVW�EH�UHSUHVHQWHG�DV�QX�PHULFDO�YDOXHV� :H�LGHQWLI\�VRPH�QRQ�QXPHULF�IHDWXUHV��326 WDJV� GHSHQGHQF\�ODEHOV�DQGZRUG�JRYHUQRUV�� ZH�QHHG�WR�FRQYHUW�WKHP�WR�D�GLJLWDO�UHSUHVHQWDWLRQ� :H�SURSRVH�WR�XVHD�RQH�KRW�UHSUHVHQWDWLRQ�WR�UHSODFH�WKH�326 WDJV�DQG�WKH�GHSHQGHQF\�ODEHOV� )RU�LQVWDQFH�DV�ZH�XVH����326 WDJV� ZH�UHSUHVHQW�WKH ith 326 WDJ�E\�D 25�GLPHQVLRQDO�YHFWRU� ZLWK�DOO�LWVHOHPHQWV�HTXDO�WR��� H[FHSW�IRU�WKH ith RQH� ZKLFK�LV�HTXDO�WR���

7KH�ZRUG�JRYHUQRUV�DQG�WKH�FXUUHQW�ZRUGV�DUH�UHSUHVHQWHG�E\�WKHLU�ZRUG�HPEHGGLQJV�)LJXUH � SUHVHQWV�DQ�H[DPSOH�RI�D 252�GLPHQVLRQDO�IHDWXUH�YHFWRU�IRU�RQH�ZRUG� $Q�LQSXW�LVWKH�FRQFDWHQDWLRQ�RI���ZRUG�IHDWXUH�YHFWRUV�

:H�SURSRVH�WR�H[WHQG�FODVVLFDO�PXOWLOD\HU�SHUFHSWURQ�FODVVLILHU��0/3� E\�XVLQJ�WKH�PXOWLVWUHDP�VWUDWHJ\� IRU� WUDLQLQJ� WKH�QHWZRUN� $Q�0/3 PXOWL� VWUHDP� �0/3�06� DUFKLWHFWXUH� LVXVHG�LQ�RUGHU�WR�EHWWHU�LQWHJUDWH�WKH�FRQWH[WXDO�LQIRUPDWLRQ�IURP�QHLJKERULQJ�ZRUGV� 7KLV

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 17: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

current wordEmbed vec100 dim

3 3-grams features

word length PAP

Pos tagvec 25 dim

dependency labels

vec 22 dim

word governorEmbed vec100 dim

)LJXUH �� 1HXUDO�QHWZRUN�LQSXW�IHDWXUHV�YHFWRU�IRUPDW�

output

H2

H1-left H1-current H1-right

Wi-2 Wi-1 Wi Wi+1 Wi+2

Correct/Error

)LJXUH �� 0/3�06 DUFKLWHFWXUH�IRU�$65 HUURU�GHWHFWLRQ�WDVN�

DUFKLWHFWXUH�LV�LQVSLUHG�E\ >�@�ZKHUH�WKH\�LQWHJUDWH�ZRUG�DQG�VHPDQWLF�IHDWXUHV�IRU�WKHPHLGHQWLILFDWLRQ�LQ�WHOHSKRQH�FRQYHUVDWLRQV� 7KH�WUDLQLQJ�RI�WKH�0/3�06 LV�EDVHG�RQ�SUH�WUDLQLQJWKH�KLGGHQ�OD\HUV�VHSDUDWHO\�DQG�WKHQ�ILQH�WXQLQJ�WKH�ZKROH�QHWZRUN� 7KH�SURSRVHG�DUFKL�WHFWXUH� GHSLFWHG�LQ�)LJXUH �� LV�GHWDLOHG�DV�IROORZV� WKUHH�IHDWXUH�YHFWRUV�DUH�XVHG�DV�LQSXWWR�WKH�QHWZRUN� 7KHVH�YHFWRUV�DUH�UHVSHFWLYHO\�WKH�IHDWXUH�YHFWRU�UHSUHVHQWLQJ�WKH�WZR�OHIWZRUGV��/���D�IHDWXUH�YHFWRU�UHSUHVHQWLQJ�WKH�FXUUHQW�ZRUG��:� DQG�D�IHDWXUH�YHFWRU�IRU�WKHWZR�ULJKW�ZRUGV��5���(DFK�IHDWXUH�YHFWRU�LV�XVHG�VHSDUDWHO\�LQ�RUGHU�WR�WUDLQ�D�PXOWLOD\HU�SHU�FHSWURQ��0/3� ZLWK�D�VLQJOH�KLGGHQ�OD\HU� 7KH�RXWSXW�OD\HU�KDV�WZR�QRGHV�FRUUHVSRQGLQJ�WRWKH�ODEHOV F DQG H� 'HWDLOV�DUH�GHVFULEHG�LQ >�� �@�

��� ([SHULPHQWV

7KH�SHUIRUPDQFH�RI�WKH�&5) DQG�QHXUDO�QHWZRUN�DSSURDFKHV�LV�HYDOXDWHG�DQG�FRPSDUHG�E\XVLQJ�I�PHDVXUH��EDVHG�RQ�UHFDOO�DQG�SUHFLVLRQ�PHDVXUHV��IRU�WKH�HUURQHRXV�ZRUG�SUHGLFWLRQ�DQG�E\�XVLQJ�JOREDO�&ODVVLILFDWLRQ�(UURU�5DWH��&(5� GHILQHG�DV� WKH�UDWLR�RI� WKH�QXPEHU�RIPLVFODVVLILFDWLRQV�RYHU�WKH�QXPEHU�RI�UHFRJQL]HG�ZRUGV�

����� ([SHULPHQWDO�GDWD

([SHULPHQWDO�GDWD�DUH�EDVHG�RQ� WKH�HQWLUH�RIILFLDO�(7$3( FRUSXV >�@� FRPSRVHG�E\�DXGLRUHFRUGLQJV�RI�)UHQFK�%URDGFDVW�1HZV�VKRZV�ZLWK�PDQXDO�WUDQVFULSWLRQV�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 18: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

1DPH �ZRUGV :(5 HUU�)HUWL�7UDLQ ���. ���� ����'HY ��. ���� ����7HVW ��. ���� ����

7DEOH �� 'HVFULSWLRQ�RI�WKH�H[SHULPHQWDO�FRUSXV�

&RUSXV $SSURDFK I�PHDVXUH &(5'HY &5) ��VWDWH�RI�WKH�DUW�EDVHOLQH ����� �����

0/3 EHVW�VLQJOH�ZRUG�HPEHGGLQJ ����� �����0/3 ZRUG�HPEHGGLQJ�FRPELQDWLRQ ����� ����0/3 ZRUG�HPEHGGLQJ�FRPELQDWLRQ���SURVRG\ ����� ����

7HVW &5) ��VWDWH�RI�WKH�DUW�EDVHOLQH ����� ����0/3 ZRUG�HPEHGGLQJ�FRPELQDWLRQ ����� ����0/3 ZRUG�HPEHGGLQJ�FRPELQDWLRQ���SURVRG\ ����� ����

7DEOH �� (UURU�GHWHFWLRQ�UHVXOWV�RQ�$65 WUDQVFULSWLRQV�

7KLV�FRUSXV�ZDV�HQULFKHG�E\�DXWRPDWLF�WUDQVFULSWLRQV�JHQHUDWHG�E\�DQ�$65 V\VWHP� ZKLFKLV� WKH�PXOWL�SDVV�/,80 $65 V\VWHP�H[LVWLQJ�EHIRUH�(8066,��7KLV� V\VWHP� LV�EDVHG�RQ� WKH&08 6SKLQ[�GHFRGHU� XVLQJ�*00�+00 DFRXVWLF�PRGHOV� 7KLV�$65 V\VWHP�ZRQ�WKH�(7$3(HYDOXDWLRQ�FDPSDLJQ�LQ������ $ GHWDLOHG�GHVFULSWLRQ�LV�SUHVHQWHG�LQ >�@�

7KH�DXWRPDWLF�WUDQVFULSWLRQV�KDYH�EHHQ�DOLJQHG�ZLWK�UHIHUHQFH�WUDQVFULSWLRQV�XVLQJ�WKHVFOLWH � WRRO� )URP�WKLV�DOLJQPHQW� HDFK�ZRUG�LQ�WKH�FRUSRUD�KDV�EHHQ�ODEHOHG�DV�FRUUHFW�RULQFRUUHFW��HUURU�� 6L]H� :(5 DQG�$65 HUURU�IHUWLOLW\�RI�WKH�FRUSRUD�DUH�GHVFULEHG�LQ�7DEOH ��7KH�IHUWLOLW\�RI�DQ�$65 HUURU�VSHFLILHV�WKH�QXPEHU�RI�FRQWLJXRXV�HUURUV� LQFOXGLQJ�LQVHUWLRQVDQG�VXEVWLWXWLRQV� REVHUYHG�LQ�WKH�DXWRPDWLF�WUDQVFULSWLRQV�IRU�D�PLVUHFRJQL]HG�ZRUG�LQ�WKHUHIHUHQFH�WUDQVFULSWLRQV�

��� 5HVXOWV

2XU�H[SHULPHQWV� VKRZHG� WKDW�XVLQJ�DXWR�HQFRGHU� WR�FRPELQH�GLIIHUHQW�ZRUG�HPEHGGLQJVSURYLGHV�VRPH�VLJQLILFDQW�LPSURYHPHQWV�LQ�WHUPV�RI�$65 HUURU�GHWHFWLRQ� 'HWDLOV�DUH�SURYLGHGLQ >�� �@�

7DEOH � SUHVHQWV�ILQDO�UHVXOWV�LQ�WHUPV�RI�&(5 DQG�I�PHDVXUH�RI�&5) DQG�RXU�QHXUDO�QHWZRUNDSSURDFK�

7KHVH�H[SHULPHQWDO�UHVXOWV�VKRZV�WKDW�RXU�QHXUDO�QHWZRUN�DSSURDFK� XVLQJ�D�FRPELQDWLRQRI�ZRUG�HPEHGGLQJV��WKH�EHVW�VLQJOH�RQHV� VNLS�QJUDP� *OR9H� DQG�&%2:���RXWSHUIRUPV�WKHVWDWH�RI�WKH�DUW�&5) DSSURDFK�IRU�$65 HUURU�GHWHFWLRQ�

,Q >�@� ZH�KDYH�DOVR�SURSRVHG�WR�DGG�VRPH�SURVRGLF�IHDWXUHV�LQ�DGGLWLRQ�WR�WKH�RQHV�XVHGLQ�WKH�LQSXW�YHFWRU�LOOXVWUDWHG�LQ�)LJXUH �� 7KLV�UHVXOWV�LQ�D�VOLJKW�UHGXFWLRQ�RI�WKH�FODVVLILFDWLRQHUURU�UDWH�

)LQDOO\� RQ�WKH�WHVW�FRUSXV� RXU�DSSURDFK�EDVHG�RQ�QHXUDO�QHWZRUNV�DQG�ZRUG�HPEHG�GLQJV�FRPELQDWLRQ� UHGXFHV� WKH�&(5 RI���� LQ�FRPSDULVRQ� WR� WKH�SUHYLRXV�VWDWH�RI�WKH�DUW

�KWWS���ZZZ�LFVL�EHUNHOH\�HGX�6SHHFK�GRFV�VFWN�����VFOLWH�KWP

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 19: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

&5) DSSURDFK�SUHVHQWHG�LQ >�@�� ZKLFK�KDV�EHHQ�FRPSXWHG�DV�EHLQJ�D�VWDWLFDOO\�VLJQLILFDQWUHGXFWLRQ�

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 20: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

� &21&/86,21 $1' 3(563(&7,9(6

7KLV�\HDU� HIIRUW�ZDV�SURGXFHG�LQ�RUGHU�WR�JHW�WKH�IDVWHVW�SRVVLEOH�$65 V\VWHPV� ZLWK�JRRGDFFXUDF\� IRU� ERWK� (QJOLVK� DQG�*HUPDQ� ODQJXDJHV� :H� KDYH� DFFHOHUDWHG� D� ORW� RXU� ����(8066, $65 V\VWHP� LQ�FRPSDULVRQ� WR� WKH������(8066, $65 V\VWHP� ZH�GLYLGHG�E\� WHQWKH�FRPSXWDWLRQ�WLPH�QHHGHG�WR�SURFHVV�VSHHFK� 7KLV�LPSURYHPHQW�LV�PRUH�LPSRUWDQW�LI�ZHFRPSDUH�WKH������V\VWHP�WR�WKH������/,80 $65 V\VWHP� EXLOW�EHIRUH�WKH�EHJLQQLQJ�RI�WKH(8066, SURMHFW� LQ�WZR�\HDUV� ZH�GLYLGHG�WKH�FRPSXWDWLRQ�WLPH�E\����

,Q�WKH�VDPH�WLPH� ZH�KDYH�LPSURYHG�WKH�DFFXUDF\�RI�RXU�$65 V\VWHP� RXU�YHU\�JRRGUHVXOWV�LQ�ERWK�LQWHUQDWLRQDO�HYDOXDWLRQ�FDPSDLJQV�IRU�(QJOLVK�DQG�*HUPDQ�ODQJXDJHV�LOOXVWUDWHWKLV�

0RUHRYHU� ZH�DUH�H[SORULQJ�QHZ�QHXUDO�DSSURDFKHV�LQ�RUGHU�WR�GHWHFW�$65 HUURUV� 7KLVWDVN�FDQ�EH�YHU\�XVHIXO�LQ�WKH�IUDPHZRUN�RI�WKH�(8066, SURMHFW�IRU�VHYHUDO�UHDVRQV� �L��WR�ILOWHUPLVUHFRJQL]HG�ZRUGV�WR�UHGXFH�IDOVH�DODUPV�ZKHQ�ORRNLQJ�IRU�DXWRPDWLF�WUDQVFULSWLRQV�FRQ�WDLQLQJ�VRPH�UHTXHVWHG�ZRUGV� �LL��WR�KHOS�QDWXUDO�ODQJXDJH�SURFHVVLQJ�DSSOLHG�RQ�DXWRPDWLFWUDQVFULSWLRQV��OLNH�QDPH�HQWLW\�UHFRJQLWLRQ�� DQG��LLL��WR�LPSURYH�WKH�$65 SHUIRUPDQFHV�E\LQMHFWLQJ�FRQILGHQW�DXWRPDWLF�WUDQVFULSWLRQV�LQWR�WKH�WUDLQLQJ�FRUSXV�RI�DFRXVWLF�PRGHO� ODUJHUDPRXQW�RI�WUDLQLQJ�GDWD�LPSURYHV�WKH�TXDOLW\�RI�DFRXVWLF�PRGHOV� 3UHOLPLQDU\�UHVXOWV�RXWSHU�IRUPV�WKH�VWDWH�RI�WKH�DUW�EDVHG�RQ�&5) DSSURDFKHV� 1H[W�\HDU� ZH�ZLOO�FRQWLQXH�WKLV�VWXG\DQG�ZH�ZLOO� LQWHJUDWH�WKLV� LQIRUPDWLRQ� LQ�WKH�GDWD�SURYLGHG� LQ�WKH�(8066, GHPRQVWUDWRUV�/DVW� ZH�KDYH�QRZ�WR�LQWHJUDWH�WKH�WHFKQRORJ\�ZH�GHYHORSHG�GXULQJ�WKH�WZR�ILUVW�\HDUV�LQWRWKH�(8066, ZRUNIORZ� DQG�WR�SURFHVV�DOO�WKH�YLGHRV�SURYLGHG�E\�RXU�SDUWQHUV� DQG�HVSHFLDOO\'HXWVFKH�:HOOH�

5HIHUHQFHV

>�@ )� %pFKHW�DQG�%� )DYUH� $65 HUURU�VHJPHQW�ORFDOLVDWLRQ�IRU�VSRNHQ�UHFRYHU\�VWUDWHJ\�,Q�$FRXVWLFV� 6SHHFK�DQG�6LJQDO�3URFHVVLQJ��,&$663���,((( ,QWHUQDWLRQDO�&RQIHUHQFH������

>�@ 5� &ROOREHUW� -� :HVWRQ� /� %RWWRX� 0� .DUOHQ� .� .DYXNFXRJOX� DQG�3� .XNVD� 1DWXUDO/DQJXDJH�3URFHVVLQJ��$OPRVW��IURP�6FUDWFK� -��0DFK��/HDUQ��5HV�� �������������� �����

>�@ 3� 'HOpJOLVH� <� (VWqYH� 6� 0HLJQLHU� DQG�7� 0HUOLQ� ,PSURYHPHQWV�WR�WKH�/,80 )UHQFK$65 V\VWHP�EDVHG�RQ�&08 6SKLQ[� ZKDW�KHOSV�WR�VLJQLILFDQWO\�UHGXFH�WKH�ZRUG�HUURUUDWH" ,Q�,QWHUVSHHFK� %ULJKWRQ� 5R\DXPH�8QL� �����

>�@ <� (VWqYH� 0� %RXDOOHJXH� &� /DLOOHU� 0� 0RUFKLG� 5� 'XIRXU� *� /LQDUqV� '� 0DWURXI� DQG5� '��0RUL� ,QWHJUDWLRQ�RI�ZRUG�DQG�VHPDQWLF�IHDWXUHV�IRU�WKHPH�LGHQWLILFDWLRQ�LQ�WHOH�SKRQH�FRQYHUVDWLRQV� ,Q��WK�,QWHUQDWLRQDO�:RUNVKRS�RQ�6SRNHQ�'LDORJ�6\VWHPV��,:6'6������ �����

>�@ 6� *KDQQD\� <� (VWHYH� DQG�1� &DPHOLQ� :RUG�HPEHGGLQJV�FRPELQDWLRQ�DQG�QHXUDO�QHW�ZRUNV�IRU�UREXVWQHVV�LQ�DVU�HUURU�GHWHFWLRQ� ,Q�(XURSHDQ�6LJQDO�3URFHVVLQJ�&RQIHUHQFH�(86,3&2 ������ 1LFH� )UDQFH� YROXPH ��� �����

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 21: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

>�@ 6� *KDQQD\� <� (VWqYH� 1� &DPHOLQ� &� 'XWUH\� )� 6DQWLDJR� DQG�0� $GGD�'HFNHU� &RP�ELQLQJ�FRQWLQXRXV�ZRUG�UHSUHVHQWDWLRQ�DQG�SURVRGLF�IHDWXUHV�IRU�DVU�HUURU�SUHGLFWLRQ� ,Q6WDWLVWLFDO�/DQJXDJH�DQG�6SHHFK�3URFHVVLQJ� SDJHV���������6SULQJHU� �����

>�@ 6� *KDQQD\� <� (VWqYH� 1� &DPHOLQ� &� 'XWUH\� )� 6DQWLDJR� DQG�0� $GGD�'HFNHU� &RP�ELQLQJ�&RQWLQXRXV�:RUG�5HSUHVHQWDWLRQ�DQG�3URVRGLF�)HDWXUHV�IRU�$65 (UURU�3UHGLFWLRQ�,Q��UG�,QWHUQDWLRQDO�&RQIHUHQFH�RQ�6WDWLVWLFDO�/DQJXDJH�DQG�6SHHFK�SURFHVVLQJ��6/63������ �����

>�@ *� *UDYLHU� *� $GGD� 1� 3DXOVVRQ� 0� &DUUp� $� *LUDXGHO� DQG�2� *DOLEHUW� 7KH�(7$3( FRU�SXV�IRU�WKH�HYDOXDWLRQ�RI�VSHHFK�EDVHG�79 FRQWHQW�SURFHVVLQJ�LQ�WKH�)UHQFK�ODQJXDJH� ,Q(LJKWK�,QWHUQDWLRQDO�&RQIHUHQFH�RQ�/DQJXDJH�5HVRXUFHV�DQG�(YDOXDWLRQ��/5(&���SDJHV��������� ,VWDQEXO� 7XUNH\� �����

>�@ /� 0DQJX� (� %ULOO� DQG�$� 6WROFNH� )LQGLQJ� FRQVHQVXV� LQ� VSHHFK� UHFRJQLWLRQ� ZRUGHUURU�PLQLPL]DWLRQ�DQG�RWKHU�DSSOLFDWLRQV�RI�FRQIXVLRQ�QHWZRUNV� &RPSXWHU�6SHHFK�/DQJXDJH� ��������������� �����

>��@ 6� 0HLJQLHU�DQG�7� 0HUOLQ� /,80 6SN'LDUL]DWLRQ� DQ�RSHQ�VRXUFH�WRRONLW�IRU�GLDUL]DWLRQ�,Q�&08 638' :RUNVKRS� 'DOODV� 7H[DV� 86$�������

>��@ 7� 0LNRORY� .� &KHQ� *� &RUUDGR� DQG�-� 'HDQ� (IILFLHQW�HVWLPDWLRQ�RI�ZRUG�UHSUHVHQWDWLRQVLQ�YHFWRU�VSDFH� ,Q�3URFHHGLQJV�RI�:RUNVKRS�DW�,&/5�������

>��@ 5� &��0RRUH�DQG�:� /HZLV� ,QWHOOLJHQW�VHOHFWLRQ�RI�ODQJXDJH�PRGHO�WUDLQLQJ�GDWD� ,Q3URFHHGLQJV�RI�WKH�$&/ &RQIHUHQFH�6KRUW�3DSHUV� SDJHV���������� -XLOOHW������

>��@ &� 3DUDGD� 0� 'UHG]H� '� )LOLPRQRY� DQG�)� -HOLQHN� &RQWH[WXDO�LQIRUPDWLRQ�LPSURYHV�RRYGHWHFWLRQ�LQ�VSHHFK� LQ�1RUWK�$PHULFDQ�FKDSWHU�RI�WKHV�$VVRFLDWLRQ�IRU�&RPSXWDWLRQDO/LQJXLVWLFV��1$$&/��������

>��@ -� 3HQQLQJWRQ� 5� 6RFKHU� DQG�&� '��0DQQLQJ� *ORYH� *OREDO�YHFWRUV�IRU�ZRUG�UHSUH�VHQWDWLRQ� ,Q�3URFHHGLQJV�RI�WKH�(PSLULFLDO�0HWKRGV� LQ�1DWXUDO�/DQJXDJH�3URFHVVLQJ�(01/3 ������ YROXPH ��� �����

>��@ '� 3RYH\� $� *KRVKDO� *� %RXOLDQQH� /� %XUJH� 2� *OHPEHN� 1� *RHO� 0� +DQQHPDQQ�3� 0RWOLFHN� <� 4LDQ� 3� 6FKZDU]� -� 6LORYVN\� *� 6WHPPHU� DQG�.� 9HVHO\� 7KH�.DOGLVSHHFK�UHFRJQLWLRQ�WRRONLW� ,Q�,((( �����:RUNVKRS�RQ�$XWRPDWLF�6SHHFK�5HFRJQLWLRQDQG�8QGHUVWDQGLQJ��,((( 6LJQDO�3URFHVVLQJ�6RFLHW\� GHFHPEHU������

>��@ $� 5RXVVHDX� ;HQ&��$Q�RSHQ�VRXUFH�WRRO�IRU�GDWD�VHOHFWLRQ�LQ�QDWXUDO�ODQJXDJH�SURFHVV�LQJ� 7KH�3UDJXH�%XOOHWLQ�RI�0DWKHPDWLFDO�/LQJXLVWLFV� ����������� �����

>��@ +� 6FKZHQN� &6/0 �� D�PRGXODU� RSHQ�VRXUFH� FRQWLQXRXV� VSDFH� ODQJXDJH�PRGHOLQJWRRONLW� ,Q�,QWHUVSHHFK� SDJHV������������ DXJXVW������

>��@ -� 7XULDQ� /� 5DWLQRY� DQG�<� %HQJLR� :RUG�UHSUHVHQWDWLRQV� $ VLPSOH�DQG�JHQHUDO�PHWKRGIRU�VHPLVXSHUYLVHG�OHDUQLQJ� ,Q�$&/��SDJHV���������� �����

>��@ .� 9HVHOè� $� *KRVKDO� /� %XUJHW� DQG�'� 3RYH\� 6HTXHQFH�GLVFULPLQDWLYH�WUDLQLQJ�RI�GHHSQHXUDO�QHWZRUNV� ,Q�,QWHUVSHHFK������ /\RQ� )UDQFH� �����

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��

Page 22: D3.3 PROGRESS REPORT ON RICH AUDIO TRANSCRIPTION€¦ · Progress report on rich audio transcription Type R – Report Status Submitted Version number 1 Number of pages 20 WP /Task

>��@ 3� 9LQFHQW� +� /DURFKHOOH� <� %HQJLR� DQG�3� 0DQ]DJRO� ([WUDFWLQJ�DQG�FRPSRVLQJ�UR�EXVW�IHDWXUHV�ZLWK�GHQRLVLQJ�DXWRHQFRGHUV� ,Q�3URFHHGLQJV�RI�WKH���WK� LQWHUQDWLRQDOFRQIHUHQFH�RQ�0DFKLQH�OHDUQLQJ� �����

(8066, '����3URJUHVV�UHSRUW�RQ�ULFK�DXGLR�WUDQVFULSWLRQ�

���� ��