welocalize machine translation post editing basics course i

22
Foundations Machine Translation Post-Editing Copyright: Welocalize, Inc. 2014. All Rights Reserved

Upload: welocalize

Post on 13-Jan-2015

1.313 views

Category:

Business


3 download

DESCRIPTION

A quick overview of machine translation (MT) and post editing (PE) for localization service buyers and translators. Welocalize Language Tools team presents an overview of Concepts, Why MT?, Examples of Machine Translation and Post-Editing. Discussion of Post Editing and Light Post Editing. Additional topics include benefits of MT, MT patterns, output, raw MT. Automation, Language Services Provider. Contact Welocalize Language Tools for additional information. www.welocalize.com

TRANSCRIPT

Page 1: Welocalize Machine Translation Post Editing Basics Course I

FoundationsMachine TranslationPost-Editing

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 2: Welocalize Machine Translation Post Editing Basics Course I

machine.translation

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 3: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

machine.translation• Contracts• Patents• Annual Reports• Light Marketing• Software Documentation• Software User Interface• SEO (Search Engine Optimization)• e-Learning Content • User Guides• Internal Corporate Communications• Wikis• Knowledge Bases• Proposals / Draft Applications• User Generated Content

Different use cases for MT

(audience? perishability?

visibility?)

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 4: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

why.mtFor clients– Increase throughputs and consistency– Reduce cost of translation– Content explosion due to Internet– Most internet content is in English (user community is global)– Desire to translate also “lower quality” content, such as User Generated Content (UGC) at a profitable price– Quality of MT has improved (new technologies, lots of research)

For the translator– Increase throughputs and consistency– MT is likely to become commonplace, like TMs before– More & more clients and LSPs use MT– Be an early-adopter– MT and new forms of post-editing requirements are fast evolving

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 5: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

basic.conceptsMT in a nutshell

[…] Machine Translation provides a set of tools by which digital text is automatically translated from one language (e.g. English) into another language (e.g. Spanish).

Source: Systran user guide

There are 3 main types of MT systems with different underlying logics:

Rules-based (RBMT) Statistical (SMT) Hybrid (SMT + RBMT)

Most systems used today are either statistical or hybrid. All system types can be customized for specific clients, incorporating client Translation Memories, basic preferences and/or terminology lists.

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 6: Welocalize Machine Translation Post Editing Basics Course I

basic.concepts

Client-specific dataTMs, glossaries

Domain-specific datachemistry or mechanical

or IT or…

General language dataanything to“teach the system the

basics on the language pair“, so all of: tourism, IT, automotive, literature,…

e.g. Google Translate and Bing would be Baseline

only

Customizable MT

systems(licensed or

open source)

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 7: Welocalize Machine Translation Post Editing Basics Course I

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

basic.conceptsUnderstanding statistical MT

For the translator, it is important to understand that SMT systems are based on algorithms calculating probabilities within a given set of data (bilingual and monolingual).

In other words, the system learns from legacy human translations (Translation Memories in our case) and calculates probabilities of most likely translations from these, without applying linguistic rules as such.

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 8: Welocalize Machine Translation Post Editing Basics Course I

basic.concepts

The logic behindstatistical

machine translation(SMT)

Imagine the TM(s) as aligned data corpus – example

ExampleTerminology

The term click appears > 16 000 times in TM A

In 90% of cases it is translated with fare clicin 10% as: selezionare, scegliere, …

The probability is high, that the machine translation will be fare clic

…BUT, maybe…The string click OK appears 500 times in TM A

In 50% of cases it is translated with fare clic su OKin 50% as: selezionare OK

The probability is 50%, that the machine translation will be selezionare OK

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 9: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

typical.examples

good > perfect to overall understandable and fairly fluent

medium > contains useful chunks, terms and occasionally perfect output; more or less understandable, little fluency

poor > poor with regard to understandability and fluency

We carry out content evaluations to prevent content with overall poor MT output from going into production

Medium is the broadest category and can still lead to productivity gains when used as a basis for post-editing

The quality of raw MT output can vary. A distinction is typically made as follows:

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 10: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

typical.examples

The quality of raw MT output can vary. Example:

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 11: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

typical.examplesKnow the patterns of MT output

Even ”good” MT output is not expected to be perfect. Depending on the underlying MT logic and the language pair, there tend to be typical issues to fix, e.g.:

– issues around capitalization

– punctuation (source punctuation is copied)

– spacing

– omissions/additions of text (usually different in nature to those in fuzzy matches)

– unknown/new words may be translated literally or be left in English

– word order: can be mirroring the source

– compound formation

– word form agreement

→ being aware of typical issues helps good post-editing

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 12: Welocalize Machine Translation Post Editing Basics Course I

typical.examples

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 13: Welocalize Machine Translation Post Editing Basics Course I

typical.examples

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 14: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

post.editing

What is Post-Editing?

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 15: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

post.editingIn other words…

Make changes where necessary, using as much of the MT output as possible

(based on language and client requirements)

Read the MT output & the source > decide quickly what can be used

Use as many “bits/sections“ of the MT output as possible: move them around, correct word forms, change the part of speech, use them as inspiration

Look up key terms in your reference material as usual, but also learn to trust the customized output

Automate with customized QA checks

Adjust your expectations. Rethink your approach. Report recurring errors.

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 16: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

full.post.editingfull post-editing: “publishable quality”

► Client Glossary, TM, Style Guide and others applyExamples:

infinitive / imperative preferences? passive / impassive preferences? formal / informal preferences? different styles for headers, lists, tables? special formatting of UI options? (bilingual, English) are measurements to be converted? Terminology

If the client requests “full post-editing”, this means publishable quality.

The post-editor is responsible for ensuring the client requirements with regardto final quality expectations are met.

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 17: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

light.post.editinglight post-editing / “understandable quality”

Full Post-Editing Light Post-Editing

Grammar and spell-checking are correct Minor issues in grammar (and spelling) are acceptable

Terminology is accurate & consistent Terminology is understandable and actionable

Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable

Style is consistent (headers, list items,…) Style variations are acceptable

Punctuation is correct Variations/errors in punctuation are acceptable

Style & tone are appropriate for content Style & tone are not offensive

Specific requirements: 33 cm (13‘‘); change EN quotation marks to FR/DE/….

Follow MT output, e.g. keep proposed number format 13‘‘ (33cm), English quotation marks,...

… …

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 18: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

post.editinglight post-editing versus full post-editing

*Copyright CSA

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Image © Common Sense Advisory, “Post-Edited machine translation defined”, April 30, 2013

Page 19: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

post.editingNotes on productivity

Just as with human translation, throughput can vary and depends on:

– language pair– content type & complexity– experience– domain knowledge– quality requirements– use of automatic QA tools– quality of TM and reference material

With MT, additional factors are:– quality of the MT– experience with post-editing

Compared to average daily throughputs for human translation, average daily throughputs for full post-editing can be up to 3 x higher.

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 20: Welocalize Machine Translation Post Editing Basics Course I

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

take.aways

There are different use-cases of MT associated with different levels of final (post-edited) quality

When full PE is requested, this means publishable quality There are different MT systems, Welocalize works with a range of

them MT output varies in quality, we evaluate it with our translation

partners to ensure the necessary quality for post-editing is met MT is not expected to be perfect, that‘s why we need post-editors! Post-editing replaces the translation stage in the workflow, but it is a

different task, cognitively MT systems can improve through adding more data & through

constructive feedback from post-editors

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 21: Welocalize Machine Translation Post Editing Basics Course I

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

trademark.disclaimer:Product names, logos, brands and other trademarks referenced within this presentation are the property of their respective trademark holders. These trademark holders are not owned or affiliated to Welocalize, Inc., our products, or our website. They do not sponsor or endorse our materials. Reference is for education purposes only.

Copyright: Welocalize, Inc. 2014. All Rights Reserved

Page 22: Welocalize Machine Translation Post Editing Basics Course I

Questions?Contact the Welocalize Language Tools [email protected], [email protected]

WelocalizeFrederick, Maryland - Headquarters

241 East 4th St. Suite 207Frederick, Maryland 21701 USA

[t] +1.301.668.0330[t] +1.800.370.9515 Toll Free

www.welocalize.com

Copyright: Welocalize, Inc. 2014. All Rights Reserved