imt language solutions

33
SDL Proprietary and Confidential Machine Translation: Latest Innovations and their Impact on Commercial Translation Claudiu Stiube, MT Customer Solutions Manager SDL Language Customer Success Summit 2015

Upload: sdl

Post on 22-Jan-2018

1.318 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: iMT Language Solutions

SDL Proprietary and Confidential

Machine Translation: Latest Innovations and their Impact on Commercial Translation

Claudiu Stiube, MT Customer Solutions Manager

SDL Language Customer Success Summit 2015

Page 2: iMT Language Solutions

Evolution of MT

Page 3: iMT Language Solutions

3

1950s

2002

2010

2011

2015

SDL acquires RBMT

engine…establishes

MT group dedicated to

improving quality for

enterprise applications

First SDL Post-

Editing projects

using SMT go into

production

Post-Editing

booms: 4-fold

increase

SDL launches

PE Certification

Program

War-time

cryptography

requirements,

with subsequent

experiments &

investment in

automated

translation

SDL launches

XMT next-

generation MT

platform

2014

Brief history of Machine TranslationSDL acquires

Language Weaver / BeGlobal Statistical

Machine Translation (SMT)

Page 4: iMT Language Solutions

4

Overview: The SDL MT Team

Who we areFirst to commercialize Statistical

Machine Translation

o 50+ Professionals

o Over 10 Nationalities

o Across 5 Time Zones

o 8 Locations

o Ex-translators

o Computational

Linguists

o Project

Managers

Widespread team of language lovers:

o Data

Specialists

o Post-

Editors

o Architects

…all gathered from the

four corners of SDL!

What we doDrive MT Adoption:

Educate, promote and support MT

usage in existing SDL accounts

& new opportunities

o Design

o Create

o Test

o Implement

o Monitor

Custom Engine Builds:

…custom

Statistical Machine

Translation

engines

Linguistic Projects:

Semantic annotation projects

for US Government bodies

& academic institutes

How we do it

o Los Angeles, CA

o Cambridge, UK

Two Research Labs:

o 100s of Scientific

Publications

o Over 50 Patents

Approved or Filed

We’re Evangelists…about

Machine Translation, using

automation to accelerate

productivity

Page 5: iMT Language Solutions

Common MT Use-Cases

Page 6: iMT Language Solutions

6

Communication

Channels

Consumer PreferencesIncreased Global

Competition

Export Market Growth

Page 7: iMT Language Solutions

7

Right translation method, right price, right timeQ

uali

ty

Volume

Human Translation Machine Translation

Blogs

User Forums

Reviews

Chat

Email

Support

FAQ

Websites

Wikis

Knowledge

Base

Alerts/

Notifications

Help

User

Guides

Documentation

Post-Edit

Newsletters

Advertising Content

Legal

Page 8: iMT Language Solutions

8

Description:

○ Direct access to machine translation

from SDL Trados Studio

Benefits:

○ Improve the efficiency of translators by

providing results of machine translation

to them for segments that do not match

entries in translation memory

Translator productivity

Page 9: iMT Language Solutions

9

Description:

○ Real-time translation of web-based

chat conversations

Benefits:

o Reduces cost of staffing the

support/sales operations as they

do not need multi-lingual agents

o Customer acquisition rates and

satisfaction are much higher if you

engage the customer in chat.

Live chat translation

Page 10: iMT Language Solutions

10

Description:

○ Translation of user-generated content

in web-based community forums

Benefits:

o Enable interactions between

customers who speak different

languages

o Leverage community expertise

across languages instead of only

within the language of community

experts

Community forum translation

Page 11: iMT Language Solutions

11

Description:

○ Translation of knowledge base content

for local language customers of technical

solutions

Benefits:

o Reduces customer support costs

and activity level by allowing remote

language customers to directly

access solutions

o Increases customer satisfaction by

providing solutions in their native

language

Knowledgebase content translation

Page 12: iMT Language Solutions

12

Description:

○ Integrate with web content management

system to translate web site

○ Embedding MT into the web site to

support translation “on demand”

Benefits:

○ Ability to translate large volumes of web

content that would not otherwise be

translated because of cost

○ Real-time translation can facilitate

support for multi-lingual content with

minimal changes to the development

and storage of the source content

Web content translation

Page 13: iMT Language Solutions

13

Case study: MT for online customer reviews

Requirements:

o Share customer reviews with

international audiences

o Automate the translation of customer

reviews into 13 languages

Results:

o Reduced bounce rate from 70% to 25%

o Increased user dwell times and page views

o Economically translate 1 billion words/month

Page 14: iMT Language Solutions

14

Case study: MT for instant MS Office translation

[a large global

retail client]

Requirements:

o Improve communication among

geographically scattered company

employees

o Fast, low-cost translation of MS Outlook

emails & MS Office business documents

Results:

o BeGlobal Machine Translation integrated

via API with MS Office apps

o Any employee can instantly translate emails

or attachments with a simple double-click

Page 15: iMT Language Solutions

15

Case study: MT for speedier translation, reduced cost

Requirements:

o Economically and quickly

translate content for 4,000

hotels, 4 million words per

language

Results:

o Trained MT engine integrated with CMS, Web

CMS, Translation Memory, Terminology

Management

o Human post-edit review

Page 16: iMT Language Solutions

16

Engine training: Making MT smarter

Customized engines

Domain verticals

Baselines

Page 17: iMT Language Solutions

17

Baselines

Baselines

Data mined

from reliable

sources

available in the

public domain,

covering various

subjects

Core generic MT

engines for each

language pair

Work well for

general & varied

content

Can be used

as backup for

verticals &

customized

engines

Contain

hundreds of

millions of words

of bilingual data

100Ms+

Page 18: iMT Language Solutions

18

Domain verticals

Domain verticals

Trained statistical engines exclusive

for a domain

Data selected from sources within a

domain or industry

MT output more likely to follow

technical terminology

Solution used when client-specific data is not available or not enough for a

customization

Page 19: iMT Language Solutions

19

Customized engines

Customized engines

Optimize the MT

output for

specific client

projects

Training based

on client-

specific

bilingual data

More data

usually has a

positive effect

on the MT

output

Quality &

consistency

of data is as

important as

quantity

Adherence to client-specific terminology

& style

Page 20: iMT Language Solutions

20

How SDL trains an MT engine

Training Data Prep &

Engine Customization

Prep of Testing

Material

Evaluate MT Output

Machine

Translation

Post-Edit

Quality

Assessment

& Translation

Delivery

Update

Translation

Memory

Source

Content

Apply

Translation

Memory

Content Evaluation MT Customization Production QA

Refine Training or Deploy

for Production

Integrate MT on

Translation Process

SDL MT

Server

Translation

Memory

Page 21: iMT Language Solutions

21

SDL MT Group developers are constantly

researching ways to improve Generic,

Vertical, and Customized MT Engines

SDL Research Scientists are continuously

improving the Statistical Machine Translation

algorithms (e.g. Language Models, Translation

Models, Reordering Models, Syntax,

Transliteration, Rule-Based Components, etc…)

SDL Data Engineers are

continuously mining large

amounts of good data used

by the statistical algorithms

Continuous improvement

Page 22: iMT Language Solutions

22

Introducing SDL XMT…

A NEW, modular & flexible

technology that will power the

“next generation” of SDL MT

Syntax-based Machine

TranslationPhrase-based

Machine Translation

Word-basedMachine

Translation

2002

2003

2008

2015

XMTXMT

Page 23: iMT Language Solutions

23

Legacy MT

Legacy MT

(MonolithicPhrase-based)

Foreign

Language

Your

Language

Page 24: iMT Language Solutions

24

……

Neural

Networks

Compound

Splitting

Phrase-

Based

Finite

State

Automata

String

to Tree

Rule-

Based

Tree to

String

Pre-

Ordering

Trans-

literation

Hidden

Markov

Model

Hyper

Graphs

Modular &

Flexible“State-of-the-Art”

Machine Learning

Better Translation

Quality

Rapid Research

Transition

SDL XMT: Next generation technology, higher quality

XMT

Foreign

Language

Your

Language

M O D U L A R C O M P O N E N T S

Page 25: iMT Language Solutions

25

Language Learning in XMT

Continuous

improvement by

learning from

Post-Editing.

○ The machine learns how

to translate from source to

target during the training

process

○ The machine does

not learn during the

translation process

Machine TranslationMachine Translation

+ Language Learning

○ The machine learns how

to translate from source

to target during the

training process

○ The machine learns &

improves seamlessly,

continuously, and in

real-time from user

feedback during the

translation process

○ See it in action: SDL XMT

XMT

Page 26: iMT Language Solutions

How to Deploy

MT Post-Edit

Page 27: iMT Language Solutions

27

SDL iMT: Key steps in the process

○ Evaluate content and translation assets

○ Train MT engines for your content or use existing solution

○ Configure the trained MT engines with SDL’s translation environment

(TMS, WS, Studio)

○ Post-edit the MT output to full publishable quality

○ SDL infrastructure to support these steps

Evaluate Train MT Configure Post-Edit

SDL Infrastructure

Page 28: iMT Language Solutions

28

Quality in MTBuilding blocks are there as a lot of content is pulled from the engines

Allows the linguist to focus on refining the output

Custom engines pull in client terminology & style

Fewer resources equals greater consistency

Trained linguists well-versed in handling MT output & certified

Page 29: iMT Language Solutions

29

Post-Editing quality requirements

When post-editing to publishable quality,

the following basic principles still apply:

o The same

references must

be used for as

for conventional

translation (project-

specific guidelines,

TMs, glossaries,

termbases, etc.)

o Grammar,

spelling and

punctuation

must be correct

o Appropriate

style & correct

terminology

must be used

consistently

o The translation

must read well

and be suitable

for its intended

purpose

Customer

User Guide

Page 30: iMT Language Solutions

30

Features to watch out for in MT output…

Incorrect Formatting

Additional or Missing words

Words Not Localized or

Wrong Flavor

Gender, Number, Agreement or Verb Inflection

Issues

Articles & Prepositions

Syntax & Word Order Issues

Wrong Punctuation

Inconsistent or Non-compliant Terminology

Mistranslations

!

Page 31: iMT Language Solutions

31

Post-Editing Machine Translation certification

○ The demand for MT solutions

is growing quickly & Post-

Editing is becoming a

mainstream skill for translators

○ In response, SDL have

created Post-Editing

Certification – released

in June 2014

○ 85% of in-house

staff completed the

Certification in 2014

○ 2,500+ freelancers

signed up for the course

○ The Certification covers the

theory behind Machine

Translation as well as practical

approaches to Post-Editing

○ Our Certification is for anyone

impacted by Post-Editing –

certified translators can offer

an extended skill set

JUNE 2014

85%

2,500+

Page 32: iMT Language Solutions
Page 33: iMT Language Solutions

Copyright © 2008-2015 SDL plc. All rights reserved. All company names, brand names, trademarks,

service marks, images and logos are the property of their respective owners.

This presentation and its content are SDL confidential unless otherwise specified, and may not be

copied, used or distributed except as authorised by SDL.

Global Customer Experience Management