welocalize throughputs and post-editing productivity webinar laura casanellas

January 15th, 2015

Throughputs: What is

Behind Productive

Post-Editing?

What circumstances or variables most

reliably facilitate good-quality, highly

productive post-editing?

Do conditions and parameters outside the

post-editor’s control facilitate or hamper his

or her success?

Welocalize Language Tools Team

Implementation and management of

Machine Translation programs

Analysis and research

The Database Data gathered from 2013 to date

Objective:

Establish correlations between 3 evaluation approaches to:

- draw conclusions on predicting productivity gains in advance

- see how & when to use the different metrics best

Contents:

- Content Type

- Language Pair (English into XX)

- MT engine provider & owner (i.e. who owns training & maintenance)

- Metrics (BLEU & PE Distance, Adequacy & Fluency, Productivity deltas)

- MT error analysis

- Final QA scores

- Level of experience of resource doing productivity test

Throughputs and productivity study is carried out as part of a wider study that aims to gain understanding and insight in Machine Translation data with the goal of making educated business decisions for the future.

37 locales in total, with

varying amounts of

available data

11 different MT systems (SMT / Hybrid)

Marketing

Patents

Support

Tech. Doc.

UA

other

UI

The Database Data used

Throughputs The setup

The throughput data used in this presentation is a by-product

of Welocalize’s productivity tests

Throughputs per hour

Translation from scratch: No translation memory was leveraged for the translation part of the test

185 samples

13 different accounts

6 generic categories 11 different machine translation engines (statistical and hybrid)

All of the engines have been customized

Linguists: At least three years of experience on the specific content type + previous exposure to post-editing

Translation versus Post-editing The data

Note: All resources that have taken part in productivity tests are represented

in these two graphics.

These graphs include all languages, content types and MT engines used

during the tests.

Translation versus Post-editing

When we join the data from the previous graphics together we

note that not all the resources improve equally (or at all) when

changing activities from translation to post-editing.

Comparison between Translation and Post-editing Throughputs

The difference

Productivity Tests The iOmegaT environment

Post-Editing versus Human Translation

Tests performed to validate predictive findings

Tool: iOmegaT, instrumented version of open source CAT tool OmegaT, developed in

collaboration with John Moran (CNGL)

iOmegaT tracks time spent editing segments, editing behaviour & activity

Closely mimics translators’ usual work environment: integrated glossary, concordance,

etc. and compatible with 3rd party tools for language quality checks.

Translators can visit a segment several times, if they change their mind later during

translation, or need to implement global changes, etc.

Test sets consist of a mix of MTed segments to post-edit and no matches that need to be

translated from scratch

Usual scope is 8h of translation / post-editing

Provides productivity delta between post-edited and translated words

Note: high throughputs need to be interpreted within the context of this test environment

Evaluation Data A sample

Productivity Results Human Evaluation LQA Automatic Scores

MT Engine

Locale Productivity Delta (%)

Adequacy Score

Fluency Score

LQA BLEU NIST TER Meteor Precision Recall GTM PE Distance

MS Hub pt-BR 73.8% 3.65 3.42 99.04% 65.74 9.30 21.14 73.95 81.04 80.19 69.07 26.00%

MS Hub de-DE 22.9% 3.88 3.48 99.75% 40.76 6.69 46.30 55.45 70.03 68.13 48.96 34.23%

Data from a sample evaluation – example of evaluation criteria

The productivity delta represents the percentage increase from the

average HT throughput when post-editing

Good correlation between productivity results and automatic scores

In spite of the 20 point BLEU/METEOR/GTM difference in the engines,

there are productivity gains in both

The results reflect the differences between language groups well

Throughputs The trend

Trend1: higher translation throughputs generally correlate with lower productivity delta, as corresponding post-editing throughputs might not be significantly higher

Previous post-editing studies have also highlighted this phenomenon (Gerberof,

Plitt & Masselot)

Average productivity delta 23.14%

Who benefits from Post-editing? Analysis by Language and Content type

Languages selected for this analysis:

Content Types: Marketing, Patents, Support, Technical Documentation, UI

Brazilian Portuguese

French

German

Italian

Japanese

Latin-American Spanish

Polish

Russian

Simplified Chinese

Spanish

Language complexity grouping

for MT PE

MT PE Reference

table

Who benefits from Post-editing? Romance Languages

ES_LAIT

ESFR

PT_BR

38%

32%

29%

26%

23%

Romance

languages are the

group that usually

renders highest

productivity gains.

Within Romance

languages, Latin

American Spanish

and Brazilian

Portuguese are often

the ones with the

highest productivity

gains from the point

of view of PE.

Who benefits from Post-editing? German and Slavic Languages

German and Slavic

are considered

medium complexity

languages

Availability of

training resources

and post-editor’s

make these

languages a good fit

for MT PE

14%

15%

16%

17%

RU

PL

17%

15%

Who benefits from Post-editing? Asian Languages

Asian languages are

considered complex

from the point of

view of MT.

Productivity gains

depend on

translator’s method

of working and their

expertise in PE.

Simplified Chinese

can render high

productivity gains, as

shown in the graph.

0%

5%

10%

15%

14%

JP

Average Productivity delta - ZH CN 6%

Content types Marketing

Average Productivity delta - ZH CN 6%

Marketing remains a challenging content type for post-editing due to

high quality expectations and free style. However, productivity gains can

still be realised with well-trained MT systems and content that is not

transcreation.

Content types Technical Documentation

Technical Documentation is a good content type for MT PE.

Characteristics: constrained, often structured language; human-quality

translation expectations but without added style and voice requirements.

Content types Support

Support: Knowledge-base content, technical blogs, procedural articles,

Q&A, etc.

More relaxed quality expectations make this type of content very

suitable for Machine Translation.

In some instances this content is suitable for raw MT publishing when a

customized engine is used.

Content types Other content types

14%

14%

15%

15%

16%

16%

17%

17%

18%

18%

Patents UI

15%

18%

User Generated Content

• Highly productive due to low number of touch points during post-editing

• Examples: travel and consumer reviews, blogs

• Quality expectations are very relaxed

• Only accuracy with original meaning is requested

• No terminology checks or cosmetic changes are necessary

• Very high expected throughputs: from 500 to 1,000 per hour

• Also suitable for raw MT publishing when a customized engine is used

Quality Misconceptions

The idea that high throughputs affect MT quality is inaccurate.

Sometimes linguistic issues appear more frequently in translated segments

and in fuzzy-matches than in post-edited segments.

Examples of good

quality and high

throughputs

Language MT

(words/hr)

LQA

Percentage

ja_JP 441 99.89%

es_LA 492 99.60%

pl_PL 644 99.91%

sk_SK 769 99.50%

hu_HU 847 99.73%

Post-editing Other factors

Years experience In a recent survey… Most respondents have more experience with translation than with

post-editing

The overall correlation between translation experience and post-

editing experience is “strong”

However, looking at correlations by locale

German: very strong

French: weak

Japanese: weak

PTBR: strong

Hungarian: weak

This suggests that for German and Brazilian Portuguese only, the

overall experience as professional translator (whether junior or

senior) gives us insights into how much post-editing experience to

expect. For the other 3 locales, profiles are more varied

Post-editing Other factors

- Experience working on certain content type: most linguists used

for productivity tests are very experienced translating / post-

editing the tested content type - No clear trend with regard to background, assuming translation

background like freelance/staff translator, content type

experience, etc.

- No clear trend in relation to working environment (office / at

home, etc.)

Text input methods:

French and German translators seem to make more use of CAT tool shortcuts

Japanese requires the use of Input Method Editors and less use

of shortcuts

Final conclusions

• Based on our findings, Romance languages are the best performers

on MT PE

• All content types are suitable for MT PE, with the exception of

Transcreation; Technical Documentation and Technical Support are

two of the most suitable (apart from UGC).

• Not all translators improve at the same pace when moving to post-

editing

• Productivity increases most in individuals with average translation

throughputs

• Knowledge of the subject matter helps achieving high throughputs

• It is more difficult to foresee post-editing effort than to asses the

quality of raw MT. The human effort is still the most variable aspect.

• There is no quality degradation in MT PE

Questions and answers

Any questions?

Laura Casanellas, WL Language Tools

[email protected]

mailto:[email protected]

mailto:[email protected]

welocalize throughputs and post-editing productivity webinar laura casanellas

Technology

postediting translation

postediting throughputs

h of translation postediting

productive postediting

machine translation

hour translation

translation memory

human translation tests