welocalize throughputs and post-editing productivity webinar laura casanellas
TRANSCRIPT
What circumstances or variables most
reliably facilitate good-quality, highly
productive post-editing?
Do conditions and parameters outside the
post-editor’s control facilitate or hamper his
or her success?
Welocalize Language Tools Team
Implementation and management of
Machine Translation programs
Analysis and research
The Database Data gathered from 2013 to date
Objective:
Establish correlations between 3 evaluation approaches to:
- draw conclusions on predicting productivity gains in advance
- see how & when to use the different metrics best
Contents:
- Content Type
- Language Pair (English into XX)
- MT engine provider & owner (i.e. who owns training & maintenance)
- Metrics (BLEU & PE Distance, Adequacy & Fluency, Productivity deltas)
- MT error analysis
- Final QA scores
- Level of experience of resource doing productivity test
Throughputs and productivity study is carried out as part of a wider study that aims to gain understanding and insight in Machine Translation data with the goal of making educated business decisions for the future.
37 locales in total, with
varying amounts of
available data
11 different MT systems (SMT / Hybrid)
Marketing
Patents
Support
Tech. Doc.
UA
other
UI
The Database Data used
Throughputs The setup
The throughput data used in this presentation is a by-product
of Welocalize’s productivity tests
Throughputs per hour
Translation from scratch: No translation memory was leveraged for the translation part of the test
185 samples
13 different accounts
6 generic categories 11 different machine translation engines (statistical and hybrid)
All of the engines have been customized
Linguists: At least three years of experience on the specific content type + previous exposure to post-editing
Translation versus Post-editing The data
Note: All resources that have taken part in productivity tests are represented
in these two graphics.
These graphs include all languages, content types and MT engines used
during the tests.
Translation versus Post-editing
When we join the data from the previous graphics together we
note that not all the resources improve equally (or at all) when
changing activities from translation to post-editing.
Comparison between Translation and Post-editing Throughputs
The difference
Productivity Tests The iOmegaT environment
Post-Editing versus Human Translation
Tests performed to validate predictive findings
Tool: iOmegaT, instrumented version of open source CAT tool OmegaT, developed in
collaboration with John Moran (CNGL)
iOmegaT tracks time spent editing segments, editing behaviour & activity
Closely mimics translators’ usual work environment: integrated glossary, concordance,
etc. and compatible with 3rd party tools for language quality checks.
Translators can visit a segment several times, if they change their mind later during
translation, or need to implement global changes, etc.
Test sets consist of a mix of MTed segments to post-edit and no matches that need to be
translated from scratch
Usual scope is 8h of translation / post-editing
Provides productivity delta between post-edited and translated words
Note: high throughputs need to be interpreted within the context of this test environment
Evaluation Data A sample
Productivity Results Human Evaluation LQA Automatic Scores
MT Engine
Locale Productivity Delta (%)
Adequacy Score
Fluency Score
LQA BLEU NIST TER Meteor Precision Recall GTM PE Distance
MS Hub pt-BR 73.8% 3.65 3.42 99.04% 65.74 9.30 21.14 73.95 81.04 80.19 69.07 26.00%
MS Hub de-DE 22.9% 3.88 3.48 99.75% 40.76 6.69 46.30 55.45 70.03 68.13 48.96 34.23%
Data from a sample evaluation – example of evaluation criteria
The productivity delta represents the percentage increase from the
average HT throughput when post-editing
Good correlation between productivity results and automatic scores
In spite of the 20 point BLEU/METEOR/GTM difference in the engines,
there are productivity gains in both
The results reflect the differences between language groups well
Throughputs The trend
Trend1: higher translation throughputs generally correlate with lower productivity delta, as corresponding post-editing throughputs might not be significantly higher
Previous post-editing studies have also highlighted this phenomenon (Gerberof,
Plitt & Masselot)
Average productivity delta 23.14%
Who benefits from Post-editing? Analysis by Language and Content type
Languages selected for this analysis:
Content Types: Marketing, Patents, Support, Technical Documentation, UI
Brazilian Portuguese
French
German
Italian
Japanese
Latin-American Spanish
Polish
Russian
Simplified Chinese
Spanish
Who benefits from Post-editing? Romance Languages
ES_LAIT
ESFR
PT_BR
38%
32%
29%
26%
23%
Romance
languages are the
group that usually
renders highest
productivity gains.
Within Romance
languages, Latin
American Spanish
and Brazilian
Portuguese are often
the ones with the
highest productivity
gains from the point
of view of PE.
Who benefits from Post-editing? German and Slavic Languages
German and Slavic
are considered
medium complexity
languages
Availability of
training resources
and post-editor’s
make these
languages a good fit
for MT PE
14%
15%
16%
17%
RU
PL
17%
15%
Who benefits from Post-editing? Asian Languages
Asian languages are
considered complex
from the point of
view of MT.
Productivity gains
depend on
translator’s method
of working and their
expertise in PE.
Simplified Chinese
can render high
productivity gains, as
shown in the graph.
0%
5%
10%
15%
14%
JP
Average Productivity delta - ZH CN 6%
Content types Marketing
Average Productivity delta - ZH CN 6%
Marketing remains a challenging content type for post-editing due to
high quality expectations and free style. However, productivity gains can
still be realised with well-trained MT systems and content that is not
transcreation.
Content types Technical Documentation
Technical Documentation is a good content type for MT PE.
Characteristics: constrained, often structured language; human-quality
translation expectations but without added style and voice requirements.
Content types Support
Support: Knowledge-base content, technical blogs, procedural articles,
Q&A, etc.
More relaxed quality expectations make this type of content very
suitable for Machine Translation.
In some instances this content is suitable for raw MT publishing when a
customized engine is used.
Content types Other content types
14%
14%
15%
15%
16%
16%
17%
17%
18%
18%
Patents UI
15%
18%
User Generated Content
• Highly productive due to low number of touch points during post-editing
• Examples: travel and consumer reviews, blogs
• Quality expectations are very relaxed
• Only accuracy with original meaning is requested
• No terminology checks or cosmetic changes are necessary
• Very high expected throughputs: from 500 to 1,000 per hour
• Also suitable for raw MT publishing when a customized engine is used
Quality Misconceptions
The idea that high throughputs affect MT quality is inaccurate.
Sometimes linguistic issues appear more frequently in translated segments
and in fuzzy-matches than in post-edited segments.
Examples of good
quality and high
throughputs
Language MT
(words/hr)
LQA
Percentage
ja_JP 441 99.89%
es_LA 492 99.60%
pl_PL 644 99.91%
sk_SK 769 99.50%
hu_HU 847 99.73%
Post-editing Other factors
Years experience In a recent survey… Most respondents have more experience with translation than with
post-editing
The overall correlation between translation experience and post-
editing experience is “strong”
However, looking at correlations by locale
German: very strong
French: weak
Japanese: weak
PTBR: strong
Hungarian: weak
This suggests that for German and Brazilian Portuguese only, the
overall experience as professional translator (whether junior or
senior) gives us insights into how much post-editing experience to
expect. For the other 3 locales, profiles are more varied
Post-editing Other factors
- Experience working on certain content type: most linguists used
for productivity tests are very experienced translating / post-
editing the tested content type - No clear trend with regard to background, assuming translation
background like freelance/staff translator, content type
experience, etc.
- No clear trend in relation to working environment (office / at
home, etc.)
Text input methods:
French and German translators seem to make more use of CAT tool shortcuts
Japanese requires the use of Input Method Editors and less use
of shortcuts
Final conclusions
• Based on our findings, Romance languages are the best performers
on MT PE
• All content types are suitable for MT PE, with the exception of
Transcreation; Technical Documentation and Technical Support are
two of the most suitable (apart from UGC).
• Not all translators improve at the same pace when moving to post-
editing
• Productivity increases most in individuals with average translation
throughputs
• Knowledge of the subject matter helps achieving high throughputs
• It is more difficult to foresee post-editing effort than to asses the
quality of raw MT. The human effort is still the most variable aspect.
• There is no quality degradation in MT PE
Questions and answers
Any questions?
Laura Casanellas, WL Language Tools