tech capabilities with_sa

MT @ Welocalize

Agenda: Section I – Approach

Section II - Analytics

Section III – Supply Chain

Section IV – MT and LQA

Section V – GlobalSight and Teaminology

Section VI – Style and Voice

Section I: Approach

1. Team

2. Welocalize Approach

3. Engagement Scenarios

4. Content objectives and levels of PE

Dedicated Team of Experts

The Welocalize Language Tools team is formed by

o Engineers and Computational Linguists – find, test and

develop NLP, MT and global content solutions to apply to our

different programs

o Solutions architects– discovery, MT program design

o Training managers – education of the language force on CAT

tools, MT and PE practices; evaluation of MT output

o Program Management – working in close collaboration with

production and sales on content evaluation and program implementation

Welocalize Approach - MT

Staying ahead of the game

We use a range of partner and proprietary engines

We have expertise with rule-based MT, statistical MT and hybrid MT engines (e.g.

ProMT, Systran, MSHub, AOL, Safaba, Moses)

We assess MT quality through automatic scoring, human evaluations and

productivity tests and forecast MT program performance

We offer an integrated solution with other language tools (familiar working

environment for translators) and GlobalSight

We offer support & advice to clients on MT engine management and

customization

We support post-editors with trainings, documentation and ongoing guidance

We design feedback looks that ensure translator loyalty and engagement and

engine improvement

Suitable content

depends on:

Target audience/user requirements

New or existing clients

Marketing audience

End users

Text Function and Purpose

Quality Expectations Perishability

and Visibility

How long will the text be

visible

Where will the text be posted

Volume

MTProgramOverview2013Client Engagement Scenario

Req. gathering

Solution Architecture

Engine Training

Feedback Loop(s)

PE Metrics “Go Live”

Client

MT

LSP

Client

MT

LSP

MT MT

LSPLSP

ClientClient

MT

LSP

1. Client formulates the program requirements

2. WL and client define the solution architecture

3. WL trains the engine

4. Several feedback loops with automated scores, human PE measurement, human quality assessment

5. WL calculates PE metrics through productivity tests

6. MT-PE projects go “live”

7. WL monitors the engine performance, calculates the correlation between automated

metrics and human feedback and post-editing speed and forecasts program

performance trends

weImpactContent Driving PE Quality

Decisions

Typical MT Integration

Source Content

TMS

TM

Leverage

Fuzzies

New Words MT

System

MT Populated

Segments

Pre-populated

files ready fuzzy

repair + post-

editing

Usual Quality

Checks

Localized

Content

(Communication

between WS and

MT system via

Connector)

Communication

between TMS and MT system via Connector

Workflow – HT vs MT PEConventional Translation

Human Translation

Review Stage

LSO / LQA/ CRI /…

Machine Translation + Post-Editing

Engine Training

MachineTranslation

Human Post-

Editing

Review Stage

LSO / LQA / CRI / …

*Post-editing is proven to be faster than Human Translation.

**The detailed review process depends on final quality requirements.

Section II: Analytics

1. Automatic scores

2. Human Evaluation

3. Productivity test

4. Language Tools Analytics Database

autoscoresProvided by the MT system (typically BLEU)

Provided by our weScore scoring tool (BLEU, Meteor, Recall, PE Distance and more)

BLEU:

• One of the first algorithms developed to evaluate MT quality

• Evaluation is against a human reference translation

• Good for high-level generic idea of quality; not to be used at segment-level

• High is good, 30 is the lower threshold

F-Measure:

• combined measure of both precision and recall,

• A measure of…

How much of the human reference (human translation or human post-edit output) is captured in the MT

translation – precision

With what accuracy - recall

TER (Translation Error Rate):

• An error metric for MT that measures the number of edits required to change a system output into one of the

human references

• Low is good

METEOR:

• Designed to improve the BLEU metric

• Takes into account precision, recall and others (stemming and synonymy)

• Good for segment level (as well as corpus level)

• High is good; 50 is the lower threshold

GTM(General Text Matcher):

• Segment level metric to measure the similarity between texts (in this instance MT output and the associated

human reference)

humanevaluationThe objective of the Human Evaluation is to gain insights into the quality of a given MT engine on a given content type for a specific project and language

pair. It also helps to identify issues, in order to improve the MT output in future

engine trainings.

Adequacy 1-5

Fluency 1-5

This is a sample of the form we use for human evaluations

humanevaluation

Interpreting results

Productivity tests are performed on demand to validate predictive

findings

Tool: iOmegaT, instrumented version of open source CAT tool OmegaT

iOmegaT tracks time spent in segment + keystrokes

Testers can go back on already post-edited / translated segments

Closely mimics translators’ usual work environment: integrated glossary,

compatible with 3rd party tools for quality checks

Test sets consist of a mix of MTed segments to post-edit and no matches

that need to be translated from scratch

Usual scope is 8h of translation / post-editing

Provides productivity delta between post-edited and translated words

productivitytestsPost-Editing versus Human Translation in iOmegaT

Engine Evaluation Summary Raw data from a sample evaluation – all engine evaluation

paramenters are considered in designing the post-editing program

and forecasting performance

Productivity Results Human Evaluation LQA Automatic Scores

MT Engine

Locale Productivity Delta (%)

Adequacy Score

Fluency Score

LQA BLEU NIST TER Meteor Precision Recall GTM PE Distance

MS Hub pt-BR 73.8% 3.65 3.42 99.04% 65.74 9.30 21.14 73.95 81.04 80.19 69.07 26.00%

MS Hub de-DE 22.9% 3.88 3.48 99.75% 40.76 6.69 46.30 55.45 70.03 68.13 48.96 34.23%

Objective:

establish correlations between our 3 evaluation approaches

draw conclusions on predicting productivity gains

identify shortcomings in evaluation approaches

Contents:

automatic scores (BLEU and PE Distance), Human Evaluation Averages,

Productivity deltas

data from various locales, MT systems, content types

Method:

Calculate correlations using Pearson Product-Moment Correlation

Coefficient (Pearson’s r) between the different evaluation methods

Visualization through scatterplots

Reference new content against trends/benchmarks from our

evaluation database

languagetools analytics database

data> Statistics from internal

database

Error Typology

correlationresultsAdequacy & Fluency versus Productivity Delta

Productivity and Adequacy across all locales with a cumulative Pearson’s r of 0.71, a very strong correlation

Productivity and Fluency across all locales with a cumulative Pearson’s r of 0.77, a very strong correlation

According to our data, Human Evaluations are stronger predictors of

post-editing productivity gains than Automatic metrics including PE

distance

correlationresults IITrends in BLEU scores and productivity gains by language groups

Section III: Supply Chain

1. Who are our Post-editors

2. Training and readiness

3. Post-editing to the required

quality levels

1. PE guidelines

2. PE productivity

Supply Chain Readiness Thorough Training: All language providers that collaborate regularly with

Welocalize have received our proprietary Machine Translation and Post-editing

practices foundation course.

This course is imparted by WL Language Tools Training Managers.

The Training Managers are directly responsible for the education and

ongoing support of WL‘s Language Teams.

Customized post-editing instructions: These are created by the Training Managers, based on the specific account characteristics (quality expectations,

type of content, workflow, etc).

Shared with the Language Teams

Targeted calls with individual translation teams to discuss post-editing approaches observed in tests and to address questions, a few weeks into

production.

Done by the Training Managers

Feedback Loop: Between the Language Teams and the MT provider. Goal: ongoing improvement of the MT system.

Establish best LQA process and frequency for transition To ensure client‘s quality expectations are met.

Who are our Post-editors

Same Talent: Our Post-editors are our regular Language providers.

Account knowledge preservation: Like this we keep our know-how and ourexperience, so the account can benefit from using the same resources

MT often coexists with conventional translation: While some accounts are MT exclusively, often MT and conventional translation coexist for different contents

in the same account, so resource consistency is maintained regardless of the

production methodology

Post-editing is just another type of service: We treat PE as another service in thelocalisation industry and we do not engage any resources in our Supply Chain

who would not be able to deliver the service.

Linguistic ability is the key: Postediting is just another way to reach the same result, the main skill is linguistic, PE is just an alternative technique starting from a

different input.

Post-Editing for Different Quality

LevelsIf the client requests full post-editing, this means publishable quality.

The post-editor is responsible for ensuring the client requirements with regard

to final quality expectations are met.

► Client Glossary, TM, Style Guide etc. apply

Light Post-Editing/ “understandable quality”

Requests for MT + light(er) post-editing are on the rise for specific types of

content.

Fast turnaround and affordable pricing are key to cope with the volumes and

use scenarios. The final quality can be lower to accomodate this.

Full and Light PE Guidelines

Full Post-Editing Light Post-Editing

Grammar and spell-checking are correct Minor issues in grammar (and spelling) are acceptable

Terminology is accurate & consistent Key client terminology is accurate & consistent

Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable

Style is consistent (headers, list items,…) Style variations are acceptable

Punctuation is correct Variations/errors in punctuation areacceptable

Style & tone are appropriate for content Style & tone are not offensive

Specific requirements: 33 cm (13‘‘); changeEN quotation marks to FR/DE/….

Follow MT output, e.g. 13‘‘ (33cm); ENquotation marks

… …

Quality Levels SamplesSample Domain - Legal

WeImpact Low

Source Raw MT English, Post-Edited NECESSARY CHANGES

La Fiscalía General de Costa Rica ha acusado de supuesto

delito de peculado al expresidente costarricense Miguel

Ángel Rodríguez Echeverría ante un juzgado penal y ha

solicitado abrir juicio en su contra.

The Attorney General of Costa Rica has been accused of

alleged embezzlement Costa Rican president Miguel Angel

Rodriguez Echeverria before a criminal court and asked to

pass judgment against him.

The Attorney General of Costa Rica has accused of

alleged embezzlement Costa Rican president Miguel

Angel Rodriguez Echeverria before a criminal court and

asked to pass judgment against him.

accuracy

WeImpact Medium


Für die vorliegende Vereinbarung und das zwischen uns

bestehende Rechtsverhältnis gilt das Recht von England und

Wales. Im Falle von Beschwerden, die nicht anderweitig

beigelegt werden können, haben englische Gerichte eine

nicht-ausschließliche Zuständigkeit. Das bedeutet, Sie können

in England klagen, können aber auch einen anderen

Gerichtsstand wählen. Ihre deutschen

Verbraucherschutzrechte sowie Ihr Recht, gerichtliche

Verfahren vor Luxemburger Gerichten einzuleiten, bleiben

von dieser Regelung unberührt.

The laws of England and Wales applies to this agreement and

the legal relationship between us. In the case of complaints

which cannot be resolved otherwise, English courts shall have

non-exclusive jurisdiction. This means that you can charge in

England, can choose but also a different jurisdiction. Its

German consumer protection law, as well as your right to

initiate judicial proceedings before the Luxembourg courts,

remain unaffected by this regulation.

The laws of England and Wales apply to this

agreement and the legal relationship between us. In

the case of complaints which cannot be resolved

otherwise, English courts shall have non-exclusive

jurisdiction. This means that you can make a complaint

in England, but you can also choose a different

jurisdiction. Your German consumer protection rights,

as well as your right to initiate judicial proceedings

before the Luxembourg courts, remain unaffected by

this regulation.

grammar/fluency, domain terminology and

style

WeImpact High


ECIJA cuenta con un equipo especializado y amplia

trayectoria en prestar asesoramiento jurídico y fiscal, en

todos los aspectos relacionados con el retail, distribución

comercial y franquicias. Nuestro equipo de profesionales

evalúa y redacta los acuerdos de distribución y de

franquicias, y asesora en materia de cumplimiento normativo

en estos ámbitos.

ÉCIJA has a specialized team with extensive experience in

providing legal and tax advice in all aspects related to the

retail, commercial distribution and franchising. Our team of

professionals evaluates and drafting the agreements of

distribution and franchising, and consultant in the field of

compliance in these areas.

ECIJA relies on a specialized team with extensive

experience in providing legal and tax advice in all areas

related to retail, franchising and commercial

distribution. Our team of professionals assesses and

drafts franchising and distribution agreements, and

consults in all apsects of compliance related to these

areas.

accuracy, grammar/fluency, terminology,

style & voice

Post-editing quality levels are agreed on at the program launch time and outlined in the SLA

Typical MT PE IssuesKnowing the patterns of MT output is the key to

suffessful post-editing program

Even ”good” MT output is not expected to be perfect. Depending on the underlying MT logic and the language pair, there tend to be typical issues to fix, e.g.:

– issues around capitalization

– punctuation (source punctuation is copied)

– spacing

– omissions/additions of text (usually different in nature to those in fuzzy matches)

– unknown/new words may be translated literally or be left in English

– word order: can be mirroring the source

– compound formation

– word form agreement

→ being aware of typical issues helps good post-editing

General PE Guidelines

Make changes where necessary, using as much of the MT output as possible(based on language and client requirements)

Read the MT output & the source > decide quickly what can be used

Use as many “bits/sections“ of the MT output as possible:

move them around, correct word forms, change the part of speech, use them as

inspiration

Look up key terms in your reference material as usual, but also learn to trust the

output

Automate with customized QA checks (maybe even upfront?)

Adjust your expectations. Rethink your approach. Report recurring errors.

PE => MT Feedback Loop MT output is expected to contain errors.

These errors vary by language combination and content.

MT output is not “fixed”, but can be improved.

A “Live” feedback loop helps us increase translators’ loyalty and

feeling of “owning” the process

As the post-editor is exposed to the output directly and is familiar with the

correct translation, it is important to provide an MT feedback loop to the

clients’ MT team

Allows post-editors to report frequent issues in the MT output

Structured process for constructive feedback

Recurring issues

Post-editors will learn which issues can be fixed

Factors Determining PE

Productivity Just as with human translation, throughput can vary and depends on:

– language pair

– content type & complexity

– experience

– domain knowledge

– quality requirements

– use of automatic QA tools

– quality of TM and reference material

With MT, additional factors are:

– quality of the MT

– experience with post-editing

Compared to average daily throughputs for human translation, average

daily throughputs for full post-editing can be up to 3 x higher.

Section IV: MT & LQA

1. LQA Process

2. LQA for Different Quality Levels

3. Evaluation Models

4. MT PE and LQA Results

QAsupply chain

Selection

•Profiling

•Sourcing

•Screening

•Testing

Certification

•PE training

•Account on boarding

•Knowledge base

Retention

•Team audits

•Attrition management

For a Machine Translation Post-editing program:

• Resources with post-editing experience

• Customized training based on the characteristics of the program

weImpactQAlevels of post-editing, examples

The following examples illustrate approaches to quality evaluation

QAevaluation models

TAUS

Proprietary

Simple

Different weightings / content type

QTLaunchPad

Public

Complex

Scalable

New flexible Quality evaluation models tailored to new processes

that include Machine Translation are adopted by Welocalize

New Approaches - TAUSDQFCategory Subcategory

Terminology Non compliance with company terminologyNon compliance with 3rd party terminologyInconsistency

Accuracy MistranslationOmission/AdditionUntranslated text

Style Noncompliance with company style guidesLiteral translationUnidiomatic use of target languageToneAmbiguous translation

Language Grammar/syntaxPunctuationSpelling: errors, accents, capital letters

Fluency Evaluating the target

Adequacy Evaluating source and target

Proprietary framework

Dynamic Quality Framework

Provides a commonly agreed

approach to select the most

appropriate translation

quality evaluation model(s)

and metrics depending on

specific quality requirements.

Emphasizes Machine

Translation

Qualitymtpe results

No Fails

Quality Results related to one of Welocalize’s largest MT programs

(weekly checks performed by a third party supplier) report consistent

quality increases over Human Translation and editing Fuzzy Matches

Section V: GlobalSight and

Teaminology

1. GlobalSight Capabilities

2. Teaminology Community Terminology

Management Platform

3. Sentiment Analysis

GlobalSight

open-source

community

support

free to download,

install + try

industry-driven

standards-driven

Integrated with

OmegaT

advantages

GlobalSight Workflow

Teaminology

Your Community

•Define the community that will add the greatest value

• internal employees

•user groups

•crowd

•consumers

•vendors

•suppliers

Vote

•Community votes:

•on the proposed translation of a term

•Proposes a new translation for the term

Community Action

•Manager makes decision to use certain translations based on how the crowd has voted

•Removes subjectivity from the process

•country users provide early feedback.

Tracking Use

•Tracks the activity of the community

•Uses meritocracy to highlight the most active users, most accurate users and more.

Reporting

•Provides detailed reporting on the trends of the crowd in terms of how and when they vote

Community management platform. teaminology

allows a terminology and translation manager to

load a list of terms into the system + send them to

a community.

Teaminology –Dashboard

Section VI: Tone of Voice

1. Source Content Profiler

2. StyleScorer

3. Sentiment Analysis

SourceContentProfiler

Source Content Profiler helps flag issues in the source content that can be potentially problematic for translation

StyleScorer

TEST CATEGORYTRAINING

CATEGORYSCORE

SUPPORT TECH DOC 3.16

TECH DOC TECH DOC 2.94

TECH DOC LEGAL ,02

• Identifies stylistic similarity of source document to other documents for the subject matter

• Identifies similarity of target document to other documents for the subject matter

• Example: Is this really a training document? To what degree is it similar to other training documents, or is it closer to support?

• Helps with choosing the best data for MT engines or as a part of the LQA effort

SentimentAnalysis – from Big

Data to Targeted Sentiment

Unstructured information leads to inefficiency, overlooking data, fatigue

Semantic technologies help to interpret or target data for strategic business information and decisions

Capture the Opinion of Your Global Audience and Translate It into Marketing Metrics

SentimentAnalysis – Analytics Frequent phrases and correspond

ing sentimentProducts and Places

Products and Places

Frequent Concepts Report by Geo

THANK YOU

Q&A

Case Study: Dell

27 MT Engines in Nine Months

Welocalize launched global machine translation (MT)

program for Dell overall as part of the localization

strategy for Dell.com.

Client Challenge:– 75% of buyers prefer to buy in their local language

– Dell.com serves over 170 countries

– Evolve the localization strategy and introduce machine translation (MT)

– Reduce translation costs, maintain, quality and increase velocity

Welocalize MT Solution:– Overall translation process differentiates content types and streams and

provides variable quality levels

– After a successful rollout using Safaba Enterprise Machine Translation (EMT) Engine, migrated to Microsoft Translator Hub

– MT approach focused on enterprise optimization aligned to Dell’s exact needs

– Welocalize introduced a solid supply chain to enable all of Dell’s EMT output for post-editing

Results:– Dell’s EMT output is faster

– Speed of translation increased without sacrificing quality

– MT reduced translation costs

“Our website serves over 170 countries so

the multilingual element of Dell.com is key. We wanted to

evolve our localization strategy

and introduce machine translation. My objective being

to reduce our translation costs

while maintaining quality and

increasing velocity.”

Wayne Bourland, Director

Case Study: NetApp

2 million to 40 million words in 5 years

Welocalize has closely supported NetApp in their quest

for a proper globalization model. Core to this strategy is

ensuring you have the right expertise, partnership, trust

and agreements in place to deliver an outstanding

program.

Client Challenge:– Meet growing demand for translation content

– Create value in globalization strategy

– Maximize value of centralized vendor management

– Stay innovative to maintain growth

Welocalize MT Solution:– Partner as a primary provider in services that go beyond words

– Support specialized workflows to meet various content requirements

– Advise with best practices to move beyond time, cost and quality

– Source the best talent to meet the exact needs of the client

Results:– Accelerated growth in scale and volume over 5 year period

– Sustainable foundation to manage today and the future

– Innovation and interoperability investments to support the GPSO

– Platform model to achieve business goals

“In 2009, I envisioned a Virtual Center with a ‘follow

the sun model’ to support any content type, tool, code

or system to be globalized for our customers around the globe. We called this Center

the GPSO, it allows NetApp

to penetrate international markets at faster speeds.

Our first vendor partner was Welocalize. Welocalize has

been there along the journey to scale and speed

up our processes as a trusted advisor.”

Anna SchlegelDirector of NetApp GPSO.

Case Study: TripAdvisor

Operational Excellence in Localization

TripAdvisor branded sites make up the largest travel

community in the world with more than 260 million unique

monthly visitors and over 100 million reviews and opinions.

49% of TripAdvisor revenue is from international points-of-

sale.

Client Challenge:– It is crucial for all TripAdvisor travel sites to be available, real-time, 24 hours a day

– New reviews are posted all the time and read by people all over the world

– TripAdvisor needed an innovative, localization strategy to streamline the translation workflow

Welocalize Solution:– Welocalize developed a localization solution for TripAdvisor, based on operational

excellence and the Localization Maturity Model (LMM)

– Solution removed waste and unnecessary workflows

– Introduced sophisticated levels of process, organization and translation automation

Results:– Welocalize streamlined the translation workflow from 23 to 5 steps

– 70% time savings for program management, 1,300 engineering hours saved per year

– Translators admin time reduced by 50%

– 21 new markets and 15 new languages within 3 years

– Increase in productivity and speed of translation

– 423% increase in words translation for 2011-2012

“We’ve made incredible progress at implementing a

solid localization strategy. By using

the CSA’s maturity model and

Welocalize’s approach of Operational

Excellence, we’re meeting and

exceeding TripAdvisor’s international objectives.”

Lorna Whelan,Senior Localization

Manager at TripAdvisor

Case Study: Intuit

weMT + Post-Editing = Success

Intuit views globalization as a primary business driver to

service their global ecosystem of employees, trade

partners, small businesses, customers and accountants.

Client Challenge:– Getting the essence of the source content

– Translating (liberating) content that would not be translated by humans due to high cost

– Increase efficiency while reducing costs for content requiring human post-editing

– Addressing “urgent” + “on-demand” translation requirements

– Implement a solution ASAP

Welocalize MT Solution:– Proposed approach for Intuit was to roll the MT solution out in tiers

– Analysis shows the best use cases based on engine maturity + cost benefit

– Stage top priority to the lower priority languages for maximum ROI

Results:– MT implemented in only three months

– Efficiency ranged from 5 to 100% increase in productivity

– Average savings = 30% in translation costs

– No compromise on quality for UI project (online software)

– Save d $263,000 on 500,000 words

Savings on 500,000 Words

$29,250 Danish$28,000 Norwegian$28,000 Swedish$24,250 Dutch

$24,250 French (Canada)

$21,750 Finnish$18,000 Japanese$16,750 French (France)

$15,500 German$13,000 Spanish

$11,750 Portuguese$9,250 Polish$9,250 Portuguese (Brazil)

$8,000 Turkish$6,000 Russian

tech capabilities with_sa

Internet