tech capabilities with_sa
TRANSCRIPT
MT @ Welocalize
Agenda: Section I – Approach
Section II - Analytics
Section III – Supply Chain
Section IV – MT and LQA
Section V – GlobalSight and Teaminology
Section VI – Style and Voice
Section I: Approach
1. Team
2. Welocalize Approach
3. Engagement Scenarios
4. Content objectives and levels of PE
Dedicated Team of Experts
The Welocalize Language Tools team is formed by
o Engineers and Computational Linguists – find, test and
develop NLP, MT and global content solutions to apply to our
different programs
o Solutions architects– discovery, MT program design
o Training managers – education of the language force on CAT
tools, MT and PE practices; evaluation of MT output
o Program Management – working in close collaboration with
production and sales on content evaluation and program implementation
Welocalize Approach - MT
Staying ahead of the game
We use a range of partner and proprietary engines
We have expertise with rule-based MT, statistical MT and hybrid MT engines (e.g.
ProMT, Systran, MSHub, AOL, Safaba, Moses)
We assess MT quality through automatic scoring, human evaluations and
productivity tests and forecast MT program performance
We offer an integrated solution with other language tools (familiar working
environment for translators) and GlobalSight
We offer support & advice to clients on MT engine management and
customization
We support post-editors with trainings, documentation and ongoing guidance
We design feedback looks that ensure translator loyalty and engagement and
engine improvement
Suitable content
depends on:
Target audience/user requirements
New or existing clients
Marketing audience
End users
Text Function and Purpose
Quality Expectations Perishability
and Visibility
How long will the text be
visible
Where will the text be posted
Volume
MTProgramOverview2013Client Engagement Scenario
Req. gathering
Solution Architecture
Engine Training
Feedback Loop(s)
PE Metrics “Go Live”
Client
MT
LSP
Client
MT
LSP
MT MT
LSPLSP
ClientClient
MT
LSP
1. Client formulates the program requirements
2. WL and client define the solution architecture
3. WL trains the engine
4. Several feedback loops with automated scores, human PE measurement, human quality assessment
5. WL calculates PE metrics through productivity tests
6. MT-PE projects go “live”
7. WL monitors the engine performance, calculates the correlation between automated
metrics and human feedback and post-editing speed and forecasts program
performance trends
weImpactContent Driving PE Quality
Decisions
Typical MT Integration
Source Content
TMS
TM
Leverage
Fuzzies
New Words MT
System
MT Populated
Segments
Pre-populated
files ready fuzzy
repair + post-
editing
Usual Quality
Checks
Localized
Content
(Communication
between WS and
MT system via
Connector)
Communication
between TMS and MT system via Connector
Workflow – HT vs MT PEConventional Translation
Human Translation
Review Stage
LSO / LQA/ CRI /…
Machine Translation + Post-Editing
Engine Training
MachineTranslation
Human Post-
Editing
Review Stage
LSO / LQA / CRI / …
*Post-editing is proven to be faster than Human Translation.
**The detailed review process depends on final quality requirements.
Section II: Analytics
1. Automatic scores
2. Human Evaluation
3. Productivity test
4. Language Tools Analytics Database
autoscoresProvided by the MT system (typically BLEU)
Provided by our weScore scoring tool (BLEU, Meteor, Recall, PE Distance and more)
BLEU:
• One of the first algorithms developed to evaluate MT quality
• Evaluation is against a human reference translation
• Good for high-level generic idea of quality; not to be used at segment-level
• High is good, 30 is the lower threshold
F-Measure:
• combined measure of both precision and recall,
• A measure of…
How much of the human reference (human translation or human post-edit output) is captured in the MT
translation – precision
With what accuracy - recall
TER (Translation Error Rate):
• An error metric for MT that measures the number of edits required to change a system output into one of the
human references
• Low is good
METEOR:
• Designed to improve the BLEU metric
• Takes into account precision, recall and others (stemming and synonymy)
• Good for segment level (as well as corpus level)
• High is good; 50 is the lower threshold
GTM(General Text Matcher):
• Segment level metric to measure the similarity between texts (in this instance MT output and the associated
human reference)
humanevaluationThe objective of the Human Evaluation is to gain insights into the quality of a given MT engine on a given content type for a specific project and language
pair. It also helps to identify issues, in order to improve the MT output in future
engine trainings.
Adequacy 1-5
Fluency 1-5
This is a sample of the form we use for human evaluations
humanevaluation
Interpreting results
Productivity tests are performed on demand to validate predictive
findings
Tool: iOmegaT, instrumented version of open source CAT tool OmegaT
iOmegaT tracks time spent in segment + keystrokes
Testers can go back on already post-edited / translated segments
Closely mimics translators’ usual work environment: integrated glossary,
compatible with 3rd party tools for quality checks
Test sets consist of a mix of MTed segments to post-edit and no matches
that need to be translated from scratch
Usual scope is 8h of translation / post-editing
Provides productivity delta between post-edited and translated words
productivitytestsPost-Editing versus Human Translation in iOmegaT
Engine Evaluation Summary Raw data from a sample evaluation – all engine evaluation
paramenters are considered in designing the post-editing program
and forecasting performance
Productivity Results Human Evaluation LQA Automatic Scores
MT Engine
Locale Productivity Delta (%)
Adequacy Score
Fluency Score
LQA BLEU NIST TER Meteor Precision Recall GTM PE Distance
MS Hub pt-BR 73.8% 3.65 3.42 99.04% 65.74 9.30 21.14 73.95 81.04 80.19 69.07 26.00%
MS Hub de-DE 22.9% 3.88 3.48 99.75% 40.76 6.69 46.30 55.45 70.03 68.13 48.96 34.23%
Objective:
establish correlations between our 3 evaluation approaches
draw conclusions on predicting productivity gains
identify shortcomings in evaluation approaches
Contents:
automatic scores (BLEU and PE Distance), Human Evaluation Averages,
Productivity deltas
data from various locales, MT systems, content types
Method:
Calculate correlations using Pearson Product-Moment Correlation
Coefficient (Pearson’s r) between the different evaluation methods
Visualization through scatterplots
Reference new content against trends/benchmarks from our
evaluation database
languagetools analytics database
data> Statistics from internal
database
Error Typology
correlationresultsAdequacy & Fluency versus Productivity Delta
Productivity and Adequacy across all locales with a cumulative Pearson’s r of 0.71, a very strong correlation
Productivity and Fluency across all locales with a cumulative Pearson’s r of 0.77, a very strong correlation
According to our data, Human Evaluations are stronger predictors of
post-editing productivity gains than Automatic metrics including PE
distance
correlationresults IITrends in BLEU scores and productivity gains by language groups
Section III: Supply Chain
1. Who are our Post-editors
2. Training and readiness
3. Post-editing to the required
quality levels
1. PE guidelines
2. PE productivity
Supply Chain Readiness Thorough Training: All language providers that collaborate regularly with
Welocalize have received our proprietary Machine Translation and Post-editing
practices foundation course.
This course is imparted by WL Language Tools Training Managers.
The Training Managers are directly responsible for the education and
ongoing support of WL‘s Language Teams.
Customized post-editing instructions: These are created by the Training Managers, based on the specific account characteristics (quality expectations,
type of content, workflow, etc).
Shared with the Language Teams
Targeted calls with individual translation teams to discuss post-editing approaches observed in tests and to address questions, a few weeks into
production.
Done by the Training Managers
Feedback Loop: Between the Language Teams and the MT provider. Goal: ongoing improvement of the MT system.
Establish best LQA process and frequency for transition To ensure client‘s quality expectations are met.
Who are our Post-editors
Same Talent: Our Post-editors are our regular Language providers.
Account knowledge preservation: Like this we keep our know-how and ourexperience, so the account can benefit from using the same resources
MT often coexists with conventional translation: While some accounts are MT exclusively, often MT and conventional translation coexist for different contents
in the same account, so resource consistency is maintained regardless of the
production methodology
Post-editing is just another type of service: We treat PE as another service in thelocalisation industry and we do not engage any resources in our Supply Chain
who would not be able to deliver the service.
Linguistic ability is the key: Postediting is just another way to reach the same result, the main skill is linguistic, PE is just an alternative technique starting from a
different input.
Post-Editing for Different Quality
LevelsIf the client requests full post-editing, this means publishable quality.
The post-editor is responsible for ensuring the client requirements with regard
to final quality expectations are met.
► Client Glossary, TM, Style Guide etc. apply
Light Post-Editing/ “understandable quality”
Requests for MT + light(er) post-editing are on the rise for specific types of
content.
Fast turnaround and affordable pricing are key to cope with the volumes and
use scenarios. The final quality can be lower to accomodate this.
Full and Light PE Guidelines
Full Post-Editing Light Post-Editing
Grammar and spell-checking are correct Minor issues in grammar (and spelling) are acceptable
Terminology is accurate & consistent Key client terminology is accurate & consistent
Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable
Style is consistent (headers, list items,…) Style variations are acceptable
Punctuation is correct Variations/errors in punctuation areacceptable
Style & tone are appropriate for content Style & tone are not offensive
Specific requirements: 33 cm (13‘‘); changeEN quotation marks to FR/DE/….
Follow MT output, e.g. 13‘‘ (33cm); ENquotation marks
… …
Quality Levels SamplesSample Domain - Legal
WeImpact Low
Source Raw MT English, Post-Edited NECESSARY CHANGES
La Fiscalía General de Costa Rica ha acusado de supuesto
delito de peculado al expresidente costarricense Miguel
Ángel Rodríguez Echeverría ante un juzgado penal y ha
solicitado abrir juicio en su contra.
The Attorney General of Costa Rica has been accused of
alleged embezzlement Costa Rican president Miguel Angel
Rodriguez Echeverria before a criminal court and asked to
pass judgment against him.
The Attorney General of Costa Rica has accused of
alleged embezzlement Costa Rican president Miguel
Angel Rodriguez Echeverria before a criminal court and
asked to pass judgment against him.
accuracy
WeImpact Medium
Source Raw MT English, Post-Edited NECESSARY CHANGES
Für die vorliegende Vereinbarung und das zwischen uns
bestehende Rechtsverhältnis gilt das Recht von England und
Wales. Im Falle von Beschwerden, die nicht anderweitig
beigelegt werden können, haben englische Gerichte eine
nicht-ausschließliche Zuständigkeit. Das bedeutet, Sie können
in England klagen, können aber auch einen anderen
Gerichtsstand wählen. Ihre deutschen
Verbraucherschutzrechte sowie Ihr Recht, gerichtliche
Verfahren vor Luxemburger Gerichten einzuleiten, bleiben
von dieser Regelung unberührt.
The laws of England and Wales applies to this agreement and
the legal relationship between us. In the case of complaints
which cannot be resolved otherwise, English courts shall have
non-exclusive jurisdiction. This means that you can charge in
England, can choose but also a different jurisdiction. Its
German consumer protection law, as well as your right to
initiate judicial proceedings before the Luxembourg courts,
remain unaffected by this regulation.
The laws of England and Wales apply to this
agreement and the legal relationship between us. In
the case of complaints which cannot be resolved
otherwise, English courts shall have non-exclusive
jurisdiction. This means that you can make a complaint
in England, but you can also choose a different
jurisdiction. Your German consumer protection rights,
as well as your right to initiate judicial proceedings
before the Luxembourg courts, remain unaffected by
this regulation.
grammar/fluency, domain terminology and
style
WeImpact High
Source Raw MT English, Post-Edited NECESSARY CHANGES
ECIJA cuenta con un equipo especializado y amplia
trayectoria en prestar asesoramiento jurídico y fiscal, en
todos los aspectos relacionados con el retail, distribución
comercial y franquicias. Nuestro equipo de profesionales
evalúa y redacta los acuerdos de distribución y de
franquicias, y asesora en materia de cumplimiento normativo
en estos ámbitos.
ÉCIJA has a specialized team with extensive experience in
providing legal and tax advice in all aspects related to the
retail, commercial distribution and franchising. Our team of
professionals evaluates and drafting the agreements of
distribution and franchising, and consultant in the field of
compliance in these areas.
ECIJA relies on a specialized team with extensive
experience in providing legal and tax advice in all areas
related to retail, franchising and commercial
distribution. Our team of professionals assesses and
drafts franchising and distribution agreements, and
consults in all apsects of compliance related to these
areas.
accuracy, grammar/fluency, terminology,
style & voice
Post-editing quality levels are agreed on at the program launch time and outlined in the SLA
Typical MT PE IssuesKnowing the patterns of MT output is the key to
suffessful post-editing program
Even ”good” MT output is not expected to be perfect. Depending on the underlying MT logic and the language pair, there tend to be typical issues to fix, e.g.:
– issues around capitalization
– punctuation (source punctuation is copied)
– spacing
– omissions/additions of text (usually different in nature to those in fuzzy matches)
– unknown/new words may be translated literally or be left in English
– word order: can be mirroring the source
– compound formation
– word form agreement
→ being aware of typical issues helps good post-editing
General PE Guidelines
Make changes where necessary, using as much of the MT output as possible(based on language and client requirements)
Read the MT output & the source > decide quickly what can be used
Use as many “bits/sections“ of the MT output as possible:
move them around, correct word forms, change the part of speech, use them as
inspiration
Look up key terms in your reference material as usual, but also learn to trust the
output
Automate with customized QA checks (maybe even upfront?)
Adjust your expectations. Rethink your approach. Report recurring errors.
PE => MT Feedback Loop MT output is expected to contain errors.
These errors vary by language combination and content.
MT output is not “fixed”, but can be improved.
A “Live” feedback loop helps us increase translators’ loyalty and
feeling of “owning” the process
As the post-editor is exposed to the output directly and is familiar with the
correct translation, it is important to provide an MT feedback loop to the
clients’ MT team
Allows post-editors to report frequent issues in the MT output
Structured process for constructive feedback
Recurring issues
Post-editors will learn which issues can be fixed
Factors Determining PE
Productivity Just as with human translation, throughput can vary and depends on:
– language pair
– content type & complexity
– experience
– domain knowledge
– quality requirements
– use of automatic QA tools
– quality of TM and reference material
With MT, additional factors are:
– quality of the MT
– experience with post-editing
Compared to average daily throughputs for human translation, average
daily throughputs for full post-editing can be up to 3 x higher.
Section IV: MT & LQA
1. LQA Process
2. LQA for Different Quality Levels
3. Evaluation Models
4. MT PE and LQA Results
QAsupply chain
Selection
•Profiling
•Sourcing
•Screening
•Testing
Certification
•PE training
•Account on boarding
•Knowledge base
Retention
•Team audits
•Attrition management
For a Machine Translation Post-editing program:
• Resources with post-editing experience
• Customized training based on the characteristics of the program
weImpactQAlevels of post-editing, examples
The following examples illustrate approaches to quality evaluation
QAevaluation models
TAUS
Proprietary
Simple
Different weightings / content type
QTLaunchPad
Public
Complex
Scalable
New flexible Quality evaluation models tailored to new processes
that include Machine Translation are adopted by Welocalize
New Approaches - TAUSDQFCategory Subcategory
Terminology Non compliance with company terminologyNon compliance with 3rd party terminologyInconsistency
Accuracy MistranslationOmission/AdditionUntranslated text
Style Noncompliance with company style guidesLiteral translationUnidiomatic use of target languageToneAmbiguous translation
Language Grammar/syntaxPunctuationSpelling: errors, accents, capital letters
Fluency Evaluating the target
Adequacy Evaluating source and target
Proprietary framework
Dynamic Quality Framework
Provides a commonly agreed
approach to select the most
appropriate translation
quality evaluation model(s)
and metrics depending on
specific quality requirements.
Emphasizes Machine
Translation
Qualitymtpe results
No Fails
Quality Results related to one of Welocalize’s largest MT programs
(weekly checks performed by a third party supplier) report consistent
quality increases over Human Translation and editing Fuzzy Matches
Section V: GlobalSight and
Teaminology
1. GlobalSight Capabilities
2. Teaminology Community Terminology
Management Platform
3. Sentiment Analysis
GlobalSight
open-source
community
support
free to download,
install + try
industry-driven
standards-driven
Integrated with
OmegaT
advantages
GlobalSight Workflow
Teaminology
Your Community
•Define the community that will add the greatest value
• internal employees
•user groups
•crowd
•consumers
•vendors
•suppliers
Vote
•Community votes:
•on the proposed translation of a term
•Proposes a new translation for the term
Community Action
•Manager makes decision to use certain translations based on how the crowd has voted
•Removes subjectivity from the process
•country users provide early feedback.
Tracking Use
•Tracks the activity of the community
•Uses meritocracy to highlight the most active users, most accurate users and more.
Reporting
•Provides detailed reporting on the trends of the crowd in terms of how and when they vote
Community management platform. teaminology
allows a terminology and translation manager to
load a list of terms into the system + send them to
a community.
Teaminology –Dashboard
Section VI: Tone of Voice
1. Source Content Profiler
2. StyleScorer
3. Sentiment Analysis
SourceContentProfiler
Source Content Profiler helps flag issues in the source content that can be potentially problematic for translation
StyleScorer
TEST CATEGORYTRAINING
CATEGORYSCORE
SUPPORT TECH DOC 3.16
TECH DOC TECH DOC 2.94
TECH DOC LEGAL ,02
• Identifies stylistic similarity of source document to other documents for the subject matter
• Identifies similarity of target document to other documents for the subject matter
• Example: Is this really a training document? To what degree is it similar to other training documents, or is it closer to support?
• Helps with choosing the best data for MT engines or as a part of the LQA effort
SentimentAnalysis – from Big
Data to Targeted Sentiment
Unstructured information leads to inefficiency, overlooking data, fatigue
Semantic technologies help to interpret or target data for strategic business information and decisions
Capture the Opinion of Your Global Audience and Translate It into Marketing Metrics
SentimentAnalysis – Analytics Frequent phrases and correspond
ing sentimentProducts and Places
Products and Places
Frequent Concepts Report by Geo
THANK YOU
Q&A
Case Study: Dell
27 MT Engines in Nine Months
Welocalize launched global machine translation (MT)
program for Dell overall as part of the localization
strategy for Dell.com.
Client Challenge:– 75% of buyers prefer to buy in their local language
– Dell.com serves over 170 countries
– Evolve the localization strategy and introduce machine translation (MT)
– Reduce translation costs, maintain, quality and increase velocity
Welocalize MT Solution:– Overall translation process differentiates content types and streams and
provides variable quality levels
– After a successful rollout using Safaba Enterprise Machine Translation (EMT) Engine, migrated to Microsoft Translator Hub
– MT approach focused on enterprise optimization aligned to Dell’s exact needs
– Welocalize introduced a solid supply chain to enable all of Dell’s EMT output for post-editing
Results:– Dell’s EMT output is faster
– Speed of translation increased without sacrificing quality
– MT reduced translation costs
“Our website serves over 170 countries so
the multilingual element of Dell.com is key. We wanted to
evolve our localization strategy
and introduce machine translation. My objective being
to reduce our translation costs
while maintaining quality and
increasing velocity.”
Wayne Bourland, Director
Case Study: NetApp
2 million to 40 million words in 5 years
Welocalize has closely supported NetApp in their quest
for a proper globalization model. Core to this strategy is
ensuring you have the right expertise, partnership, trust
and agreements in place to deliver an outstanding
program.
Client Challenge:– Meet growing demand for translation content
– Create value in globalization strategy
– Maximize value of centralized vendor management
– Stay innovative to maintain growth
Welocalize MT Solution:– Partner as a primary provider in services that go beyond words
– Support specialized workflows to meet various content requirements
– Advise with best practices to move beyond time, cost and quality
– Source the best talent to meet the exact needs of the client
Results:– Accelerated growth in scale and volume over 5 year period
– Sustainable foundation to manage today and the future
– Innovation and interoperability investments to support the GPSO
– Platform model to achieve business goals
“In 2009, I envisioned a Virtual Center with a ‘follow
the sun model’ to support any content type, tool, code
or system to be globalized for our customers around the globe. We called this Center
the GPSO, it allows NetApp
to penetrate international markets at faster speeds.
Our first vendor partner was Welocalize. Welocalize has
been there along the journey to scale and speed
up our processes as a trusted advisor.”
Anna SchlegelDirector of NetApp GPSO.
Case Study: TripAdvisor
Operational Excellence in Localization
TripAdvisor branded sites make up the largest travel
community in the world with more than 260 million unique
monthly visitors and over 100 million reviews and opinions.
49% of TripAdvisor revenue is from international points-of-
sale.
Client Challenge:– It is crucial for all TripAdvisor travel sites to be available, real-time, 24 hours a day
– New reviews are posted all the time and read by people all over the world
– TripAdvisor needed an innovative, localization strategy to streamline the translation workflow
Welocalize Solution:– Welocalize developed a localization solution for TripAdvisor, based on operational
excellence and the Localization Maturity Model (LMM)
– Solution removed waste and unnecessary workflows
– Introduced sophisticated levels of process, organization and translation automation
Results:– Welocalize streamlined the translation workflow from 23 to 5 steps
– 70% time savings for program management, 1,300 engineering hours saved per year
– Translators admin time reduced by 50%
– 21 new markets and 15 new languages within 3 years
– Increase in productivity and speed of translation
– 423% increase in words translation for 2011-2012
“We’ve made incredible progress at implementing a
solid localization strategy. By using
the CSA’s maturity model and
Welocalize’s approach of Operational
Excellence, we’re meeting and
exceeding TripAdvisor’s international objectives.”
Lorna Whelan,Senior Localization
Manager at TripAdvisor
Case Study: Intuit
weMT + Post-Editing = Success
Intuit views globalization as a primary business driver to
service their global ecosystem of employees, trade
partners, small businesses, customers and accountants.
Client Challenge:– Getting the essence of the source content
– Translating (liberating) content that would not be translated by humans due to high cost
– Increase efficiency while reducing costs for content requiring human post-editing
– Addressing “urgent” + “on-demand” translation requirements
– Implement a solution ASAP
Welocalize MT Solution:– Proposed approach for Intuit was to roll the MT solution out in tiers
– Analysis shows the best use cases based on engine maturity + cost benefit
– Stage top priority to the lower priority languages for maximum ROI
Results:– MT implemented in only three months
– Efficiency ranged from 5 to 100% increase in productivity
– Average savings = 30% in translation costs
– No compromise on quality for UI project (online software)
– Save d $263,000 on 500,000 words
Savings on 500,000 Words
$29,250 Danish$28,000 Norwegian$28,000 Swedish$24,250 Dutch
$24,250 French (Canada)
$21,750 Finnish$18,000 Japanese$16,750 French (France)
$15,500 German$13,000 Spanish
$11,750 Portuguese$9,250 Polish$9,250 Portuguese (Brazil)
$8,000 Turkish$6,000 Russian