lionbridge - taus tokyo forum 2015
TRANSCRIPT
1
www.lionbridge.com
http://blog.lionbridge.com
http://twitter.com/Lionbridge
http://www.facebook.com/L10nbridge
Paula Shannon
CSO & Senior Vice President
Email: [email protected]
Mobile: +1 781 530 6730
Twitter: @PaulaBShannon
Copyright 2015. Confidential – Distribution prohibited without permission
TAUS JapanApril 9, 2015
2Copyright 2015. Confidential – Distribution prohibited without permission
Machine Translation is Our Industry’s Single Biggest Innovation But, most discussions focus on the tactics of machine translation not the strategy
• Debate on the best MT engine
• Discourse on RBMT vs. SMT
• Disagreement on quality frameworks
• Evaluation against a human standard
• Post-editing output
• Cost and price per word
3Copyright 2015 Confidential – Distribution prohibited without permission
• Sustaining Innovation focuses on incremental improvements to existing processes or products for existing customers. It eventually creates offerings that are too complex and too costly to compete
• A disruptive innovation helps create a new market and value network, displacing an earlier technology. Initially disruptive innovation often begins with lower quality and seems to “get it all wrong”
The Innovator’s Dilemma
The Two Types of Innovation: Sustaining or Disruptive
“Can we use MT+Workflow+PostEdit to produce FAQ and Support Content for 45% less cost?”
“Can we embed Automatic Translation in Skype to
connect people around the world ?”
4Copyright 2015. Confidential – Distribution prohibited without permission
Lionbridge’s Approach to Machine Translation
We Focus on MT solutions and services. We are MT engine ‘agnostic’ (independent)
We use whatever engine best suits the need:o Microsoft Translator Hub (w/ and w/o Geofluent)
o Systran Enterprise engine (Hybrid engine)
o MSR-MT (SMT),
o Barcelona (RBMT, own in-house developed RBMT engine)
o Moses (SMT)
o Apertium (RBMT)
After analyzing each customer’s needs, we will propose (internally or externally) the right MT solution (e.g.: Microsoft, Renault, eBay, Amazon, Alibaba, Becton Dickinson)
Leading both Sustaining and Disruptive Innovation
Copyright 2015. Confidential – Distribution prohibited without permission
Lionbridge Machine Translation Data
More than 1Billion Words pushed through MTM (Machine
Translation + Translation Memory)
+100 Million Words in 2014 Alone*
-
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
80,000,000
90,000,000
2009 2010 2011 2012 2013
Words per year 18,854,754 16,445,110 65,426,924 77,847,141 83,224,523
Words Post-Edited Per YearMT research and development started in 1998
(NetX), RBMT deployment in 2002, Statistical in 2003, and Hybrid MT in 2009
30+ language pairs in our MT solution used daily. New record last week, 45 languages for a program
+1500 post-editors registered in our vendor database
In-house MT customizers for 10+ languages
* 3 Million words cited by other leading MLV
Copyright 2015. Confidential – Distribution prohibited without permission
MT is Used in Most of Top 50 Accounts• 60+ Million Words Post-edited
• 101+ Million Words Machine Translated
• 76% of Projects Use Connected, Automated Workflow (TMS)
• 93% of Projects Rely on Automated Production Workflows (Internal)
29% of Volume is “Premium” output which means professional post-editors perform a full review and edit of the machine translated segments to deliver top human translation quality.
31% of Volume is “Basic” output which includes a light revision of the MT output following agreed guidelines. focus on delivering output that should be well understood in target language.
42% of Volume is “Raw” output which means that we have worked to customize the engine and processes but have not post-edited the output
2014 MT Volume by Output
Copyright 2015. Confidential – Distribution prohibited without permission
Growing Breadth of Languages
From 5 Language Pairs in 2002 to More than 58 Today
• 30 to 45 Language Pairs on average per week
• 44 Customizations Done (27
Language Pairs Produced
2009 - 2014
English ‘X’ – Language Family
English Spanish – Romance
English French – Romance
English Portuguese – Romance
English Italian – Romance
English Swedish – Scandinavian
English Norwegian – Scandinavian
English Danish – Scandinavian
English Dutch – West Germanic
English German – West Germanic
English Czech – Slavic
English Polish – Slavic
English Russian – Slavic
English Chinese – Asian
English Japanese – Asian
English Korean – Asian
‘X’ English
- Normally better quality than in the opposite
- Very good when ‘X’ is Romance or Scandinavian
Better
Weaker
8Copyright 2015. Confidential – Distribution prohibited without permission
How are Engines Customized?RBMT and Statistical MT Share many best practice steps
Customizing Statistical and Hybrid Machine Translation Engines
Linguistic CustomizationTechnical Analysis and Setup Training Publishing ProductionPreparation
Input Analysis
Setup
Feed MT Engine
Training Process
Production Servers
Source Files Samples
TMsand glossaries
Clean-up unwanted data (noise)
Identify and extract entities, tags and other
important source elements
Create filters for training and translation
Classify the elements based on their function in
the translation
Upload filters to servers and test them
Extract Terminology
Create Customized Dictionaries
Create Customized Rules
Training Corpus
Dictionaries
Rules
Create Baseline
Create Custom Profile
Run Training
Quality isOK?
NO
PublishTranslation
Model
YES
Upload Translation Model
Upload Profiles
Upload Custom Filters
Legend
Common Tasks
Hybrid-specific Tasks
Source FilesSource Files
Access to MT EngineAccess to MT Engine
Translated FilesTranslated Files
9Copyright 2015. Confidential – Distribution prohibited without permission
Background TM (M-translated TM)
Source Files
Handoff
>75% Match Leverage
from Previous TM
<75% Match
Enhancement
steps
Foreground TM (project TM)
Entity
Dictionary
Project
Dictionary
Long/Freq.
Short/Infreq.
EntityExtractor
Terminology Extractor
SegmentAnalyzer
MachineTranslation
TM AnalyzerUnknown
Segments
Translated
Segments
-15%
penalty
QUICK
Term &
Punctuation
Lionbridge MTM process workflowOne Approach to deploying MT as part of the regular translation process
10Copyright 2015. Confidential – Distribution prohibited without permission
Edit Distance: A Way to Assess Post Edit Effort Level Needed
Perfect translations
No changes are required to obtain a "human quality" translation for these
segments.
Good-quality sentencesFew changes are required to achieve "human quality". The effort
necessary to post-edit the sentence is small.
Compensating sentences
Approximately half of the sentence needs to be modified to achieve
"human quality".
Mistranslated sentences
Most of the sentence has been wrongly translated. In many cases is faster
to translate from scratch.
The Edit Distance Ratio shows the percentage of changes
(insertions, deletions and substitutions of words) needed to achieve
the full human quality standard, as represented by the existing
translations in the reference TM.
ED is an easy to read metric: ED = 0, zero changes; ED = 1, one
word changed.
The goal of an ED analysis is to measure the MT quality
improvement and try to measure Post Editor's effort.
It is very applicable to the
Language Services Provider
who must estimate level of effort,
resourcing, and cost accurately.
11
3
2
1
Copyright 2015. Confidential – Distribution prohibited without permission
eBay European ecommerce Program - Unlocking Global ListingsMoses, and Systran – high degree of customization on entity mining, terminology. Unique challenges as product listing and titles are non grammatical strings
Microsoft Visual Studio 2005/2008/2010/201215 Million words per languageHighly technical content, Complex format and taggingUA and UI translated simultaneously. Tight schedule (throughput required: 2 million words per month)
Becton Dickinson internal ERP deploymentNo legacy Material and Poor source qualityUnmarked, referenced UI strings. Translation had to preserve English Solved by developing pattern-based rules to detect probable UI strings based on surrounding words
Large Scale Challenges, Complex Processes, Transparent Solutions
Case Studies of MT as Sustaining Innovation
12Copyright 2015 Confidential – Distribution prohibited without permission
• Sustaining Innovation is typically driven by the need to provide existing customers with incremental improvements and efficiencies over time
• Disruptive innovation is driven by new market entrants who introduce products to the under-served portion of a market, often with lower quality and seem to “get it all wrong”
The Innovator’s Dilemma
The Two Types of Innovation: Sustaining or Disruptive
“Can we use MT+Workflow+PostEdit to produce FAQ and Support Content for 45% less cost?”
“Can we embed Automatic Translation in Skype to
connect people around the world ?”
13Copyright 2014. Confidential – Distribution prohibited without permission
Machine Translation Unlocking Social Media MonitoringA process to listen, classify, report, and deliver for action
Crawl the web in multiple
languages/countries using localized keyword
to identify ‘conversations’ related to customer products
or interests in different social media
Get results from the crawler, filter and clean
them, and, when necessary, fine-tune the crawling rules to
obtain more relevant and clean user
comments and results
Using Sentiment Analysis tool, text
analytics experts and the crowd perform
sentiment classification
Results are classified, as positive, negative or neutral, and quantified,
taking into account product categories and
features
Machine Translation, with special
customization, of Sentiments to have all
the comment in English
Final Human and machine accumulative
analysis of all the feedback collected and classified from all the
languages
Amplify the customer’s global presence and responsiveness
14
Solving Social Media Challenges with Machine TranslationCombination of Large Scale Translation and Business Process Crowdsourcing Technology
Automated
Entity Identification
Multilingual Crowd
Validation & Extraction
Machine Translation Multilingual Crowd
Post-Edit & Audit
Lionbridge Smart
Crowd Post-Edit
Lionbridge Hybrid
Machine Translation
Lionbridge Smart
Crowd Data Extraction
Lionbridge
Linguistic Toolbox
15Copyright 2013. Confidential – Distribution prohibited without permission
Listening on the Topic of…Machine TranslationAnalysis, Sentiment, Classification, Reporting
The spike of posts around MT was because of CNN (generic) program on MT
Traffic SentimentTrending
16Copyright 2015. Confidential – Distribution prohibited without permission
67% prefer
online answers
(45% will abandon
purchase if hard)*
Chat
Email Guided Self-help
Online Communities
IM
Social Networks
Knowledge Bases
*Forrester Research, Inc., Navigate the Future of Customer Service
Video
Communicating with global customers who expect answers in real time
The Global Customer Support Challenge
• Language exacerbates the problem
*Source: Common Sense Advisory Report “Automated Translation Technology”
• Business on the Internet
Pervasive real-time connectivity
Smartphone Internet traffic exceeding desktop
Real-time expectations
Instant gratification syndrome
• Customers want convenience
Social networks & search engines are primary gateways
Preference for online answers
17
But Raw Online MT is not Appropriate for Business?Real customer scenario in Home Improvement Retail Store support forum
• Regional or
industry-
specific
vernacular
• Proper names
• Slang
• Branded terms
• Typo /
misspellings
/contractions
GeoFluent OutputGreetings greetings my handyman people,
have heard the new regional director of Lowes,
Generoso Caminante, (he who calls the shots!)
is considering the consolidation of your DIY line
within the framework of “You and Lowes” as
brand and website. This means that the
products most common DIY, from carpet,
Tapcon screws to drywall Sheetrock will be
available in a common portal. I’m really excited
because it means that I will be able to select
and purchase my materials in one place. Now
I’ll have more time to do my chores, LOL! That
is, it is phenomenal! What is your opinion of
these events?
Generic Online MTSalu Salu my little hand people, I’ve heard that
the new regional manager of Lowes,
generous Walker, (the K short COD!) This is
considering the consolidation of our DIY line
within the framework of “Tu and Lowes” such
as brand and web site. This means that the
products most common DIY, from folder, to
handsome screws, to the plasterboard cheet
rock will be available in a common portal. ‘toy
really excited because signifika that I’ll be able
to select and purchase my materials in one
place. Now I’ll have more time to do my
chapusas, JA! Or, is the pump! Q say about
these events?
18Copyright 2015. Confidential – Distribution prohibited without permission
Language
Processing
Engine
Microsoft®
Translator
Real-time Automated TranslationGeoFluent
• Fix slang , shortcuts, misspellings, etc.
• Identify branding and terminology
• Sequester sensitive data
• Output Correction
• Preserve branding and terminology
• Restore sensitive data
19Copyright 2015. Confidential – Distribution prohibited without permission
Real Time Multilingual Chat
20Copyright 2015. Confidential – Distribution prohibited without permission
Not about translation quality, it’s about call deflection, support costs, and customer satisfaction
Disrupting Customer Support
For Pre-Sales Assistance:
• 11% increase in online conversions*
• 16% productivity increase for call center agents
* Where multilingual chat was previously unavailable
Blended support cost $150
Cost of a self-servedtranslated page view
$0.15
Deflection rate 0.5%
Number of translated page views needed for one deflection
200
Total cost to get one deflection (200 x $.15)
$30.00
Savings per deflection($150-$30 )
$120.00
Net value per translatedpage view ($120/200)
$0.60
Breakeven for medium customer (translated page views)
33,000
Breakeven 1-2 months
For Customer Support:
• 15% increase in call deflection
• 21% increase in CSAT among non-English speakers
21Copyright 2015. Confidential – Distribution prohibited without permission
Machine Translation as TRUE Disruptive InnovationMicrosoft Machine Translation Knowledge + Skype
It’s not about how
close it is to human
quality - it’s about
the quality of the
humans being close
22
www.lionbridge.com
http://blog.lionbridge.com
http://twitter.com/Lionbridge
http://www.facebook.com/L10nbridge
Paula Shannon
CSO & Senior Vice President
Email: [email protected]
Mobile: +1 781 530 6730
Twitter: @PaulaBShannon
Copyright 2015. Confidential – Distribution prohibited without permission
TAUS JapanApril 9, 2015