understand localization standards and use them effectively · understand localization standards and...

Post on 16-May-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Understand Localization Standards and Use Them Effectively

John Watkins, President, ENLASO jwatkins@enlaso.com

Agenda

• The Standards Universe • Core Standards • Using Standards

The Universe: Standards Evolution

Standards Definitions

• De facto Standards – influence through prevalence. – Standards may evolve from de facto standards through the cooperation of the

industry and a relevant standards body. • Standards

– Remove barriers for the purpose of performing functions that are within an industry

– Are approved and maintained by neutral third parties – Have input from industry to avoid being locked into a proprietary solution

• Open Standards – Do the above, but are publicly available

(with open access rights) – Natural coordination with open source software – Luckily the core localization standards are

Open Standards

Standards Benefits

• Standards help tools to work together – Eases exchange of information among tools – Freedom to work with a wide variety of tools – Processes are developed independent of the tools

• Customer files • The right linguists

• Consequently – Tools are not constrained – Workflow is easier – Projects can be faster, better, and cheaper

Standards Information

• Standards management span various organizations – OSCAR/LISA -> Disbanded

• Standards developed by OSCAR under LISA now under the Creative Commons Attribution license – See GALA below

• European Telecommunications Standards Institute (ETSI) Localization Industry Standards (LIS) Industry Specification Group as the successor for the LISA/OSCAR portfolio (TMX, TBX, SRX…): http://goo.gl/y4JgF

– GALA Open Standards Initiative • OSCAR standards: http://www.gala-global.org/standards/ • Coordination: ETSI , POASIS, Unicode Consortium, ISO TC 37 • Linport project (open format translation packages): http://www.linport.org/ • QT Launchpad – flexible quality metrics for human and machine translation • Tools Corner

– OASIS (XLIFF, DITA…): http://www.oasis-open.org/standards – W3C (ITS, MultilingualWeb-LT – ITS 2.0): http://www.w3.org/

Core Standards

• Four Standards (three areas) that are open, stable, and work well: – Translation memories

• TMX: Translation Memory eXchange1 Easily exchange of translation memory among tools

– Segmentation • SRX: Segmentation Rules eXchange1

Provide a standard method to describe segmentation rules for TMs that are being exchanged among tools

– Extracted data • ITS: Internationalization Tag Set2

Used for XML to support the internationalization and localization of XML schemas and documents (XML, HTML5)

• XLIFF: XML Localisation Interchange File Format3 To store localizable content and carry it from one step of the localization process to the other, while allowing interoperability among tools

1 See GALA Open Standards: http://www.gala-global.org/lisa-oscar-standards 2 See W3C: http://www.w3.org/TR/its/ 3 See OASIS: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff

Using Standards

• Look at an example project • See the standards involved • Use standards to provide localized files

Using Standards – Open Source

• Open Standards fit with Open Source • We work with the Okapi Framework Project1 • You can use the Okapi Framework to:

1. Manipulate and combine translation memories 2. Extract text with appropriate filters 2. Edit segmentation rules and apply them to content 2. Leverage from TM 2. Machine translate unmatched text 2. Create the translation package for the linguists 3. Rebuild translated files

1 See Okapi Framework project site at: http://code.google.com/p/okapi

WordfastTM

TMX 1

TMX 2

Trados TM

SRX Rules

HTML File

MIF File

Translation Memory from Trados

Translation Memory from Wordfast

Segmentation rules for the TMs

New version of the documents to translate (from HTML5 and FrameMaker applications)

Example Project

FrameMaker MIF File

HTML5 File

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

1) Translation Memories – TMX

• TMX (Translation Memory eXchange) is the standard way to store source text segments and their corresponding translations

• Supported by most CAT tools • Customer provided two TMs:

– Trados – Wordfast

WordfastTM

TMX 1

TMX 2

Pensieve TM

Rainbow Toolbox

Trados TM

Four different tools sharing data through TMX

1) Combine TMs

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

2) HTML5 Extraction – ITS

• For XML and HTML5 documents, ITS (Internationalization Tag Set) describes what needs to be extracted and how to extract it

• W3C MultilingualWeb-LT WG is finishing the work on ITS 2.0

• Lets use ITS rules to identify localizable text in the HTML5 document

2) ITS Rules

HTML File

MIF File

ITS Rules

MIF Filter

HTM

L5 Filter

Content Extraction

ITS rules specify what needs to be translated

2) HTML5 Extraction – ITS

Pipeline Driven by Rainbow

2) Segmentation – SRX

• Translation is done at the segment level – SRX (Segmentation Rules eXchange) describes

where to break or not break the content into segments

– Having the rules for source segments allows better re-usability of existing TM, increasing exact matches

– Maintain SRX rules with an SRX Editor

2) Segmentation – SRX

Don’t break segment after VS. V.S. vs. or v.s.

HTML File

MIF File

ITS Rules

Segmentation

MIF Filter

HTM

L5 Filter

SRX Rules Extraction

2) Segmentation – SRX

Pipeline Driven by Rainbow

SRX Rules are key to sharing TMs

2) Translation Kit – XLIFF, TMX

• To flow through the translation process, the extracted content needs to be stored in a common format many tools understand – XLIFF (XML Localisation Interchange File

Format) is a standard way to represent extracted content

– TMX files with all the translation candidates found in the TM or from MT

2) Translation Kit – XLIFF, TMX

Open Source OmegaT TM workbench

Pipeline (Driven by Rainbow)

Translation Kit

Pensieve TM

HTML File

MIF File

ITS Rules

Segmentation

MIF Filter

Pre-translate unmatched

from MT

Pre-translate from TM

Translation Kit Creation

HTM

L5 Filter

SRX Rules

Microsoft MT

HTMLXLIFF

MIF XLIFF TMX Etc.

Extraction Pensieve TM Connector

Microsoft MT

Connector

2) Translation Kit – XLIFF, TMX Tool independent kit

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

Pipeline (Driven by Rainbow)

Translation Kit

MIF File

HTMLFile

Translator Kit

Filter

Translation Kit Post-

Processing

MIF Filter

HTML XLIFF

MIF XLIFF TMX Etc.

Extraction

HTM

L5 Filter

3) Post-Processing

3) Translated FrameMaker MIF

3) Translated HTML

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

Summary

• We Know – More about our standards – We can (and do) use them today

• Next Steps – Consider requiring Open Standards compliance with

the tools you use to ensure portability • Get Involved in the Standards Community

– GALA Standards Initiative – GALA Connect Groups

References

• GALA Standards Initiative http://www.gala-global.org/gala-standards-initiative

• TMX 1.4b – Translation Memory eXchange http://www.gala-global.org/oscarStandards/tmx/

• ITS 1.0 – Internationalization Tag Set http://www.w3.org/TR/its/

• SRX 2.0 – Segmentation Rules eXchange http://www.gala-global.org/oscarStandards/srx/

• XLIFF 1.2 – XML Localisation Interchange File Format http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html

• Okapi Framework (Open Source & cross-platform) http://code.google.com/p/okapi/

Questions?

John Watkins, President, ENLASO jwatkins@enlaso.com

top related