lionbridge - taus tokyo forum 2015

22
1 www.lionbridge.com http://blog.lionbridge.com http://twitter.com/Lionbridge http://www.facebook.com/L10nbridge Paula Shannon CSO & Senior Vice President Email: [email protected] Mobile: +1 781 530 6730 Twitter: @PaulaBShannon Copyright 2015. Confidential – Distribution prohibited without permission TAUS Japan April 9, 2015

Upload: taus-enabling-better-translation

Post on 15-Jul-2015

258 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

1

www.lionbridge.com

http://blog.lionbridge.com

http://twitter.com/Lionbridge

http://www.facebook.com/L10nbridge

Paula Shannon

CSO & Senior Vice President

Email: [email protected]

Mobile: +1 781 530 6730

Twitter: @PaulaBShannon

Copyright 2015. Confidential – Distribution prohibited without permission

TAUS JapanApril 9, 2015

2Copyright 2015. Confidential – Distribution prohibited without permission

Machine Translation is Our Industry’s Single Biggest Innovation But, most discussions focus on the tactics of machine translation not the strategy

• Debate on the best MT engine

• Discourse on RBMT vs. SMT

• Disagreement on quality frameworks

• Evaluation against a human standard

• Post-editing output

• Cost and price per word

3Copyright 2015 Confidential – Distribution prohibited without permission

• Sustaining Innovation focuses on incremental improvements to existing processes or products for existing customers. It eventually creates offerings that are too complex and too costly to compete

• A disruptive innovation helps create a new market and value network, displacing an earlier technology. Initially disruptive innovation often begins with lower quality and seems to “get it all wrong”

The Innovator’s Dilemma

The Two Types of Innovation: Sustaining or Disruptive

“Can we use MT+Workflow+PostEdit to produce FAQ and Support Content for 45% less cost?”

“Can we embed Automatic Translation in Skype to

connect people around the world ?”

4Copyright 2015. Confidential – Distribution prohibited without permission

Lionbridge’s Approach to Machine Translation

We Focus on MT solutions and services. We are MT engine ‘agnostic’ (independent)

We use whatever engine best suits the need:o Microsoft Translator Hub (w/ and w/o Geofluent)

o Systran Enterprise engine (Hybrid engine)

o MSR-MT (SMT),

o Barcelona (RBMT, own in-house developed RBMT engine)

o Moses (SMT)

o Apertium (RBMT)

After analyzing each customer’s needs, we will propose (internally or externally) the right MT solution (e.g.: Microsoft, Renault, eBay, Amazon, Alibaba, Becton Dickinson)

Leading both Sustaining and Disruptive Innovation

Copyright 2015. Confidential – Distribution prohibited without permission

Lionbridge Machine Translation Data

More than 1Billion Words pushed through MTM (Machine

Translation + Translation Memory)

+100 Million Words in 2014 Alone*

-

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

70,000,000

80,000,000

90,000,000

2009 2010 2011 2012 2013

Words per year 18,854,754 16,445,110 65,426,924 77,847,141 83,224,523

Words Post-Edited Per YearMT research and development started in 1998

(NetX), RBMT deployment in 2002, Statistical in 2003, and Hybrid MT in 2009

30+ language pairs in our MT solution used daily. New record last week, 45 languages for a program

+1500 post-editors registered in our vendor database

In-house MT customizers for 10+ languages

* 3 Million words cited by other leading MLV

Copyright 2015. Confidential – Distribution prohibited without permission

MT is Used in Most of Top 50 Accounts• 60+ Million Words Post-edited

• 101+ Million Words Machine Translated

• 76% of Projects Use Connected, Automated Workflow (TMS)

• 93% of Projects Rely on Automated Production Workflows (Internal)

29% of Volume is “Premium” output which means professional post-editors perform a full review and edit of the machine translated segments to deliver top human translation quality.

31% of Volume is “Basic” output which includes a light revision of the MT output following agreed guidelines. focus on delivering output that should be well understood in target language.

42% of Volume is “Raw” output which means that we have worked to customize the engine and processes but have not post-edited the output

2014 MT Volume by Output

Copyright 2015. Confidential – Distribution prohibited without permission

Growing Breadth of Languages

From 5 Language Pairs in 2002 to More than 58 Today

• 30 to 45 Language Pairs on average per week

• 44 Customizations Done (27

Language Pairs Produced

2009 - 2014

English ‘X’ – Language Family

English Spanish – Romance

English French – Romance

English Portuguese – Romance

English Italian – Romance

English Swedish – Scandinavian

English Norwegian – Scandinavian

English Danish – Scandinavian

English Dutch – West Germanic

English German – West Germanic

English Czech – Slavic

English Polish – Slavic

English Russian – Slavic

English Chinese – Asian

English Japanese – Asian

English Korean – Asian

‘X’ English

- Normally better quality than in the opposite

- Very good when ‘X’ is Romance or Scandinavian

Better

Weaker

8Copyright 2015. Confidential – Distribution prohibited without permission

How are Engines Customized?RBMT and Statistical MT Share many best practice steps

Customizing Statistical and Hybrid Machine Translation Engines

Linguistic CustomizationTechnical Analysis and Setup Training Publishing ProductionPreparation

Input Analysis

Setup

Feed MT Engine

Training Process

Production Servers

Source Files Samples

TMsand glossaries

Clean-up unwanted data (noise)

Identify and extract entities, tags and other

important source elements

Create filters for training and translation

Classify the elements based on their function in

the translation

Upload filters to servers and test them

Extract Terminology

Create Customized Dictionaries

Create Customized Rules

Training Corpus

Dictionaries

Rules

Create Baseline

Create Custom Profile

Run Training

Quality isOK?

NO

PublishTranslation

Model

YES

Upload Translation Model

Upload Profiles

Upload Custom Filters

Legend

Common Tasks

Hybrid-specific Tasks

Source FilesSource Files

Access to MT EngineAccess to MT Engine

Translated FilesTranslated Files

9Copyright 2015. Confidential – Distribution prohibited without permission

Background TM (M-translated TM)

Source Files

Handoff

>75% Match Leverage

from Previous TM

<75% Match

Enhancement

steps

Foreground TM (project TM)

Entity

Dictionary

Project

Dictionary

Long/Freq.

Short/Infreq.

EntityExtractor

Terminology Extractor

SegmentAnalyzer

MachineTranslation

TM AnalyzerUnknown

Segments

Translated

Segments

-15%

penalty

QUICK

Term &

Punctuation

Lionbridge MTM process workflowOne Approach to deploying MT as part of the regular translation process

10Copyright 2015. Confidential – Distribution prohibited without permission

Edit Distance: A Way to Assess Post Edit Effort Level Needed

Perfect translations

No changes are required to obtain a "human quality" translation for these

segments.

Good-quality sentencesFew changes are required to achieve "human quality". The effort

necessary to post-edit the sentence is small.

Compensating sentences

Approximately half of the sentence needs to be modified to achieve

"human quality".

Mistranslated sentences

Most of the sentence has been wrongly translated. In many cases is faster

to translate from scratch.

The Edit Distance Ratio shows the percentage of changes

(insertions, deletions and substitutions of words) needed to achieve

the full human quality standard, as represented by the existing

translations in the reference TM.

ED is an easy to read metric: ED = 0, zero changes; ED = 1, one

word changed.

The goal of an ED analysis is to measure the MT quality

improvement and try to measure Post Editor's effort.

It is very applicable to the

Language Services Provider

who must estimate level of effort,

resourcing, and cost accurately.

11

3

2

1

Copyright 2015. Confidential – Distribution prohibited without permission

eBay European ecommerce Program - Unlocking Global ListingsMoses, and Systran – high degree of customization on entity mining, terminology. Unique challenges as product listing and titles are non grammatical strings

Microsoft Visual Studio 2005/2008/2010/201215 Million words per languageHighly technical content, Complex format and taggingUA and UI translated simultaneously. Tight schedule (throughput required: 2 million words per month)

Becton Dickinson internal ERP deploymentNo legacy Material and Poor source qualityUnmarked, referenced UI strings. Translation had to preserve English Solved by developing pattern-based rules to detect probable UI strings based on surrounding words

Large Scale Challenges, Complex Processes, Transparent Solutions

Case Studies of MT as Sustaining Innovation

12Copyright 2015 Confidential – Distribution prohibited without permission

• Sustaining Innovation is typically driven by the need to provide existing customers with incremental improvements and efficiencies over time

• Disruptive innovation is driven by new market entrants who introduce products to the under-served portion of a market, often with lower quality and seem to “get it all wrong”

The Innovator’s Dilemma

The Two Types of Innovation: Sustaining or Disruptive

“Can we use MT+Workflow+PostEdit to produce FAQ and Support Content for 45% less cost?”

“Can we embed Automatic Translation in Skype to

connect people around the world ?”

13Copyright 2014. Confidential – Distribution prohibited without permission

Machine Translation Unlocking Social Media MonitoringA process to listen, classify, report, and deliver for action

Crawl the web in multiple

languages/countries using localized keyword

to identify ‘conversations’ related to customer products

or interests in different social media

Get results from the crawler, filter and clean

them, and, when necessary, fine-tune the crawling rules to

obtain more relevant and clean user

comments and results

Using Sentiment Analysis tool, text

analytics experts and the crowd perform

sentiment classification

Results are classified, as positive, negative or neutral, and quantified,

taking into account product categories and

features

Machine Translation, with special

customization, of Sentiments to have all

the comment in English

Final Human and machine accumulative

analysis of all the feedback collected and classified from all the

languages

Amplify the customer’s global presence and responsiveness

14

Solving Social Media Challenges with Machine TranslationCombination of Large Scale Translation and Business Process Crowdsourcing Technology

Automated

Entity Identification

Multilingual Crowd

Validation & Extraction

Machine Translation Multilingual Crowd

Post-Edit & Audit

Lionbridge Smart

Crowd Post-Edit

Lionbridge Hybrid

Machine Translation

Lionbridge Smart

Crowd Data Extraction

Lionbridge

Linguistic Toolbox

15Copyright 2013. Confidential – Distribution prohibited without permission

Listening on the Topic of…Machine TranslationAnalysis, Sentiment, Classification, Reporting

The spike of posts around MT was because of CNN (generic) program on MT

Traffic SentimentTrending

16Copyright 2015. Confidential – Distribution prohibited without permission

67% prefer

online answers

(45% will abandon

purchase if hard)*

Chat

Email Guided Self-help

Online Communities

IM

Social Networks

Knowledge Bases

*Forrester Research, Inc., Navigate the Future of Customer Service

Video

Communicating with global customers who expect answers in real time

The Global Customer Support Challenge

• Language exacerbates the problem

*Source: Common Sense Advisory Report “Automated Translation Technology”

• Business on the Internet

Pervasive real-time connectivity

Smartphone Internet traffic exceeding desktop

Real-time expectations

Instant gratification syndrome

• Customers want convenience

Social networks & search engines are primary gateways

Preference for online answers

17

But Raw Online MT is not Appropriate for Business?Real customer scenario in Home Improvement Retail Store support forum

• Regional or

industry-

specific

vernacular

• Proper names

• Slang

• Branded terms

• Typo /

misspellings

/contractions

GeoFluent OutputGreetings greetings my handyman people,

have heard the new regional director of Lowes,

Generoso Caminante, (he who calls the shots!)

is considering the consolidation of your DIY line

within the framework of “You and Lowes” as

brand and website. This means that the

products most common DIY, from carpet,

Tapcon screws to drywall Sheetrock will be

available in a common portal. I’m really excited

because it means that I will be able to select

and purchase my materials in one place. Now

I’ll have more time to do my chores, LOL! That

is, it is phenomenal! What is your opinion of

these events?

Generic Online MTSalu Salu my little hand people, I’ve heard that

the new regional manager of Lowes,

generous Walker, (the K short COD!) This is

considering the consolidation of our DIY line

within the framework of “Tu and Lowes” such

as brand and web site. This means that the

products most common DIY, from folder, to

handsome screws, to the plasterboard cheet

rock will be available in a common portal. ‘toy

really excited because signifika that I’ll be able

to select and purchase my materials in one

place. Now I’ll have more time to do my

chapusas, JA! Or, is the pump! Q say about

these events?

18Copyright 2015. Confidential – Distribution prohibited without permission

Language

Processing

Engine

Microsoft®

Translator

Real-time Automated TranslationGeoFluent

• Fix slang , shortcuts, misspellings, etc.

• Identify branding and terminology

• Sequester sensitive data

• Output Correction

• Preserve branding and terminology

• Restore sensitive data

19Copyright 2015. Confidential – Distribution prohibited without permission

Real Time Multilingual Chat

20Copyright 2015. Confidential – Distribution prohibited without permission

Not about translation quality, it’s about call deflection, support costs, and customer satisfaction

Disrupting Customer Support

For Pre-Sales Assistance:

• 11% increase in online conversions*

• 16% productivity increase for call center agents

* Where multilingual chat was previously unavailable

Blended support cost $150

Cost of a self-servedtranslated page view

$0.15

Deflection rate 0.5%

Number of translated page views needed for one deflection

200

Total cost to get one deflection (200 x $.15)

$30.00

Savings per deflection($150-$30 )

$120.00

Net value per translatedpage view ($120/200)

$0.60

Breakeven for medium customer (translated page views)

33,000

Breakeven 1-2 months

For Customer Support:

• 15% increase in call deflection

• 21% increase in CSAT among non-English speakers

21Copyright 2015. Confidential – Distribution prohibited without permission

Machine Translation as TRUE Disruptive InnovationMicrosoft Machine Translation Knowledge + Skype

It’s not about how

close it is to human

quality - it’s about

the quality of the

humans being close

22

www.lionbridge.com

http://blog.lionbridge.com

http://twitter.com/Lionbridge

http://www.facebook.com/L10nbridge

Paula Shannon

CSO & Senior Vice President

Email: [email protected]

Mobile: +1 781 530 6730

Twitter: @PaulaBShannon

Copyright 2015. Confidential – Distribution prohibited without permission

TAUS JapanApril 9, 2015