![Page 1: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/1.jpg)
The IWSLT 2015 Evaluation Campaign
Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany
Sebastian Stüker, KIT, Germany Luisa Bentivogli, FBK, Italy Roldano Cattoni, FBK, Italy
Marcello Federico, FBK-irst, Italy
IWSLT, Da Nang, 3-4 December 2015
1
![Page 2: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/2.jpg)
Ø IWSLT review Ø TED Talks Ø Tracks Ø Automatic evaluation Ø Human evaluation Ø Future plans
Outline
![Page 3: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/3.jpg)
IWSLT Evaluation: record of participants
13#
18#
15#
23#
18#17#
19#
12#
15#
18#
21#
16#
2004# 2005# 2006# 2007# 2008# 2009# 2010# 2011# 2012# 2013# 2014# 2015#
par$cipants*
![Page 4: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/4.jpg)
IWSLT Evaluation: record of participants
12
10 10
7 6 6
4 3
2 1 1 1 1 1 1 1
Total participations of 2015 participants
Almost 70 distinct participants in 12 years
![Page 5: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/5.jpg)
TED Talks
● TED LLC is non-profit ● Two annual events
● Short talks
● Variety of topics ● Website with:
● Videos
● Transcripts ● Translations
● CC License
![Page 6: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/6.jpg)
TED Talks Translations
Nov ‘10 Nov ‘11 Nov ‘12 Nov ‘13 Nov ‘14 Nov ‘15 Talks (EN) 800 1,080 1,395 ~1,650 1,875 2,095
Languages 80 83 93 103 105 109
Translators 4,000 6,823 8,382 11,010 18,699 15,487
Translations 12,500
24,287 +94%
32,707 +34%
49,607 +52%
65,290 +32%
83,265 +28%
![Page 7: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/7.jpg)
7
0 500 1000 1500 2000 2500
Arabic
Chinese(tradi4onal)
Dutch
English
Farsi/Persian
French(France)
German
Hebrew
Italian
Polish
Portuguese(Brazilian)
Romanian
Russian
Slovenian
Spanish
Turkish
TalksavailableatTEDsite(Nov2015
![Page 8: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/8.jpg)
Human task: subtitling and translating
ü segment audio ü transcribe and annotate
ü split into captions
ü translate captions
![Page 9: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/9.jpg)
Ø Language modelling Ø Limited in-domain training data Ø Variability of topics and styles
Ø Acoustic modelling Ø Speaker: accent, fluency, speaking rate, style, , ... Ø Noise: mumble, applauses, laughs, music, ...
Ø Translation modelling Ø Distant and under-resourced languages Ø Morphologically rich languages
Ø Speech Translation Ø From spontaneous speech to polished text Ø Detection and removal of non-speech events Ø Subtitling and translating in real-time
Challenges in TED Task
![Page 10: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/10.jpg)
Ø Language modelling Ø Limited in-domain training data Ø Variability of topics and styles
Ø Acoustic modelling Ø Speaker: accent, fluency, speaking rate, style, , ... Ø Noise: mumble, applauses, laughs, music, ...
Ø Translation modelling Ø Distant and under-resourced languages Ø Morphologically rich languages
Ø Speech Translation Ø From spontaneous speech to polished text Ø Detection and removal of non-speech events Ø Subtitling and translating a data stream in real-time
Challenges for 2011
![Page 11: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/11.jpg)
Ø Language modelling Ø Limited in-domain training data Ø Variability of topics and styles
Ø Acoustic modelling Ø Speaker: accent, fluency, speaking rate, style, , ... Ø Noise: mumble, applauses, laughs, music, ...
Ø Translation modelling Ø Distant and under-resourced languages Ø Morphologically rich languages
Ø Speech Translation Ø From spontaneous speech to polished text Ø Detection and removal of non-speech events Ø Subtitling and translating a data stream in real-time
Challenges for 2012
![Page 12: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/12.jpg)
Ø Language modelling Ø Limited in-domain training data Ø Variability of topics and styles
Ø Acoustic modelling Ø Speaker: accent, fluency, speaking rate, style, , ... Ø Noise: mumble, applauses, laughs, music, ...
Ø Translation modelling Ø Distant and under-resourced languages Ø Morphologically rich languages
Ø Speech Translation Ø From spontaneous speech to polished text Ø Detection and removal of non-speech events Ø Subtitling and translating a data stream in real-time
Challenges for 2013-2014
![Page 13: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/13.jpg)
Ø Language modelling Ø Limited in-domain training data Ø Variability of topics and styles
Ø Acoustic modelling Ø Speaker: accent, fluency, speaking rate, style, , ... Ø Noise: mumble, applauses, laughs, music, ...
Ø Translation modelling Ø Distant and under-resourced languages Ø Morphologically rich languages
Ø Speech Translation Ø From spontaneous speech to polished text Ø Detection and removal of non-speech events Ø Subtitling and translating a data stream in real-time
Challenges for 2014-2015
![Page 14: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/14.jpg)
Ø Automatic Speech Recognition (ASR) Ø Transcription of talks from audio to text Ø English (TED), German (TEDx)
Ø Spoken Language Translation (SLT) Ø Translation of talks from audio (or ASR output) to text Ø German English (TEDx) Ø English Chinese, Czech, French, German, Thai, Vietnamese (TED)
Ø Machine Translation (MT) Ø Translation of talks from text to text Ø German English (TEDx) Ø English Chinese, Czech, French, German, Thai, Vietnamese (TED)
2015 Tracks
![Page 15: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/15.jpg)
Specifications
Conditions ASR SLT MT
Input: Pre-segmented no no yes
Input: Cased & Punctuated no yes
Output: Cased & Punctuated no yes yes
Automatic evaluation yes yes yes(1)
Human eval (En-Fr/De) yes
Metrics ASR SLT MT
WER ✔ ✔ ✔
BLEU ✔ ✔
TER ✔ ✔
(1) Non trivial reference baselines prepared for all directions.
NEW
![Page 16: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/16.jpg)
Participants
![Page 17: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/17.jpg)
Results: ASR English (WER%)
IWSLT15 IWSLT14 IWSLT13 tst2015 tst2014 tst2014 tst2013 tst2013
MITLL-AFR 6.6 7.1 9.9 13.7 15.9 HLT-I2R 7.7 8.9 - - - KIT 9.2 9.7 11.4 14.2 14.4 NAIST 12.0 10.4 - - - MLLP 13.3 19.5 - - - IOIT 13.8 13.9 19.7 24.0 27.2
![Page 18: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/18.jpg)
Progress in ASR En (best systems WER%)
0"
2"
4"
6"
8"
10"
12"
14"
16"
2011" 2012" 2013" 2014" 2015"
tst2011"
tst2012"
tst2013"
tst2014"
![Page 19: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/19.jpg)
Results: ASR German
TEDx
![Page 20: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/20.jpg)
Results: SLT
TEDx
![Page 21: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/21.jpg)
Results: SLT
![Page 22: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/22.jpg)
Results: MT
![Page 23: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/23.jpg)
Results: MT
![Page 24: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/24.jpg)
Results: MT
![Page 25: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/25.jpg)
Results: MT
![Page 26: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/26.jpg)
Results: MT
![Page 27: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/27.jpg)
Progress in MT (best systems BLEU%)
10
15
20
25
30
35
40
45
2011 2012 2013 2014 2015
English-French
English-German
German-English
Chinese-English
![Page 28: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/28.jpg)
Ø Following IWSLT 2013/14: Post-Editing + HTER Ø TED task as an interesting application scenario to test the utility of MT systems in a real subtitling task
Ø Additional reference translations
Ø Edits point to specific translation errors
Ø HTER correlates well with human judgments
Ø Evaluation of MT-EnDe and MT-ViEn tasks
Ø Performed on 2015 test set (tst2015)
Human Evaluation
![Page 29: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/29.jpg)
Human Evaluation (HE) Set:
Ø a subset of tst2015
Ø ~10,000 words
Ø ~ first half of the 12 TED talks composing tst2015
Ø EnDe: 600 segments
Ø ViEn: 500 segments
Evaluation Dataset
![Page 30: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/30.jpg)
Lesson learned from IWSLT 2013/2014:
Ø most informative and reliable HTER:
Ø not by using the targeted reference only
Ø but by exploiting all post-edits
Evaluation Setup
![Page 31: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/31.jpg)
Lesson learned from IWSLT 2013/2014:
Ø most informative and reliable HTER:
Ø not by using the targeted reference only
Ø but by exploiting all post-edits
Evaluation Setup
SRC: Tôi lớn lên trong điều kiện nuôi dạy bình thường.
Targeted Reference Only
REF: I had a normal kind of upbringing . HYP: I grew up in [normal] the conditions raised normal .
TER: 87.50
All Post-Edited References
REF: I grew up in normal raising conditions . HYP: I grew up in [normal] the conditions raised normal .
TER: 38.46
![Page 32: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/32.jpg)
Lesson learned from IWSLT 2013/2014:
Ø most informative and reliable HTER:
Ø not by using the targeted reference only
Ø but by exploiting all post-edits
IWSLT 2015 official evaluation:
Ø HTER calculated on multiple references (post-edits)
Ø EnDe: 5 participants => 5 post-edits
Ø ViEn: 5 participants => 5 post-edits
Evaluation Setup
![Page 33: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/33.jpg)
Ø Bilingual Post-Editing Ø professional translators were required to post-edit the MT output directly according to the source sentence
Data Collection
![Page 34: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/34.jpg)
Ø Bilingual Post-Editing Ø professional translators were required to post-edit the MT output directly according to the source sentence
Ø Data preparation:
Ø 5 systems post-edited by 5 professional translators
Ø each translator must p-edit all the HE set sentences
Ø each translator must p-edit each sentence only once
Ø each MT system must be equally p-edited by all translators
Ø MT outputs dispatched to translators both randomly and satisfying the uniform assignment constraints
Data Collection
![Page 35: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/35.jpg)
Ø Bilingual Post-Editing Ø professional translators were required to post-edit the MT output directly according to the source sentence
Ø Data preparation:
Ø 5 systems post-edited by 5 professional translators
Ø each translator must p-edit all the HE set sentences
Ø each translator must p-edit each sentence only once
Ø each MT system must be equally p-edited by all translators
Ø MT outputs dispatched to translators both randomly and satisfying the uniform assignment constraints
Ø MateCat post-editing interface
Data Collection
![Page 36: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/36.jpg)
Ø Collected Post-edits
Ø 5 new references for each sentence in the HE set
Collected Data
![Page 37: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/37.jpg)
Ø Collected Post-edits
Ø 5 new references for each sentence in the HE set
Ø Post-editors characteristics:
Collected Data
PE 1 PE 2 PE 3 PE 4 PE 5
En-De PE Effort st-dv Sys TER st-dv
22.49 16.44 56.43 20.77 42.68 26.51 55.59 20.82 29.21 22.18 56.00 20.49 27.66 15.50 55.77 21.17 22.19 17.62 56.38 20.85
Vi-En PE Effort st-dv Sys TER st-dv
37.14 21.25 61.38 20.96 40.38 20.46 60.34 20.94 44.76 23.57 61.66 21.74 46.39 25.71 61.69 21.59 38.57 26.64 60.14 20.43
![Page 38: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/38.jpg)
Ø Collected Post-edits
Ø 5 new references for each sentence in the HE set
Ø Post-editors characteristics:
Collected Data
Ø PE effort (HTER): highly variable among post-editors
PE 1 PE 2 PE 3 PE 4 PE 5
En-De PE Effort st-dv Sys TER st-dv
22.49 16.44 56.43 20.77 42.68 26.51 55.59 20.82 29.21 22.18 56.00 20.49 27.66 15.50 55.77 21.17 22.19 17.62 56.38 20.85
Vi-En PE Effort st-dv Sys TER st-dv
37.14 21.25 61.38 20.96 40.38 20.46 60.34 20.94 44.76 23.57 61.66 21.74 46.39 25.71 61.69 21.59 38.57 26.64 60.14 20.43
![Page 39: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/39.jpg)
Ø Collected Post-edits
Ø 5 new references for each sentence in the HE set
Ø Post-editors characteristics:
Collected Data
Ø PE effort (HTER): highly variable among post-editors
Ø MT outputs assigned to translators (Sys TER): very homogeneous
PE 1 PE 2 PE 3 PE 4 PE 5
En-De PE Effort st-dv Sys TER st-dv
22.49 16.44 56.43 20.77 42.68 26.51 55.59 20.82 29.21 22.18 56.00 20.49 27.66 15.50 55.77 21.17 22.19 17.62 56.38 20.85
Vi-En PE Effort st-dv Sys TER st-dv
37.14 21.25 61.38 20.96 40.38 20.46 60.34 20.94 44.76 23.57 61.66 21.74 46.39 25.71 61.69 21.59 38.57 26.64 60.14 20.43
![Page 40: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/40.jpg)
Evaluation Results - EnDe
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03
Rank corr. 1.00 0.90 0.90
![Page 41: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/41.jpg)
Evaluation Results - EnDe
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03
Rank corr. 1.00 0.90 0.90
Statistical Significance at p < 0.01 (Approximate Randomization)
![Page 42: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/42.jpg)
Evaluation Results - EnDe
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03
Rank corr. 1.00 0.90 0.90
TER/HTER reduction
![Page 43: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/43.jpg)
Evaluation Results - EnDe
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref SU 16.16 21.09 51.15 51.13 UEDIN 21.84 27.99 56.39 56.05 KIT 22.67 28.98 55.82 55.52 HDU 23.42 29.93 57.32 56.94 PJAIT 28.18 35.68 59.51 59.03
Rank corr. 1.00 0.90 0.90
Spearman’s Rank Coefficient
![Page 44: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/44.jpg)
Evaluation Results - ViEn
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33
Rank corr. 1.00 0.70 0.70
![Page 45: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/45.jpg)
Evaluation Results - ViEn
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27* 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33
Rank corr. 1.00 0.70 0.70
Statistical Significance at p < 0.01 (* = p < 0.05) (Approximate Randomization)
![Page 46: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/46.jpg)
Evaluation Results - ViEn
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33
Rank corr. 1.00 0.70 0.70
TER/HTER reduction
![Page 47: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/47.jpg)
Evaluation Results - ViEn
System Ranking
HTER HE Set
All PErefs
HTER HE Set
tgt PEref
TER HE Set
ref
TER Test Set
ref JAIST 32.24 37.25 60.10 62.35 UMD 32.71 37.99 58.92 59.19 PJAIT 34.27 40.50 59.48 62.20 TUT 38.50 43.42 62.49 62.69 UNETI 41.42 47.97 64.21 66.33
Rank corr. 1.00 0.70 0.70
Spearman’s Rank Coefficient
![Page 48: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/48.jpg)
Future
Ø TED task by now very seasoned
Ø Extend to more realistic lectures
Ø Work on more challenging tasks: conversations
Ø Include more under-resourced languages on the input side
Ø Discussion on co-location with another MT/NLP conference
Ø Continue with HE based on post-editing
Ø Funding by H2020 CSA Cracker
Detailed discussion with proposals for new tasks tomorrow
![Page 49: The IWSLT 2015 Evaluation Campaignworkshop2015.iwslt.org/downloads/IWSLT_Overview15.pdf · The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany](https://reader034.vdocument.in/reader034/viewer/2022050419/5f8ed907c4059928720f3c39/html5/thumbnails/49.jpg)
Ø Language resources Ø TED LLC, USA (Talk data) Ø Workshop Machine Translation (Giga and news data) Ø DFKI, Germany (United Nations data) Ø PJAIT (Wikipedia parallel corpus) Ø Cantab Reserarch (LM and text corpus for TED) Ø Many other external data providers
Ø Funding Ø H2020 CSA CRACKER Ø Internal funds of eval organizers Ø …
Credits