dos and donts

42
How to Localize XML Documents: A Workshop on the Do's and Don'ts of XML Localization November 2014 http:///

Upload: andrzej-zydron-mbcs

Post on 15-Apr-2017

266 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Dos and donts

How to Localize XML Documents: A Workshop on the Do's and Don'ts of XML Localization

November 2014

http:///

Page 2: Dos and donts

XML dominant!

• HTML/XHTML• Web Services• Adobe FrameMaker• Microsoft Office• Open Office• ASP• XAML• Java Properties• DITA• Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm• OAXAL Open Architecture for XML Authoring and Localization

Page 3: Dos and donts

Benefits of XML for L10N

• Separation of form and content• Should make documents easier to translate• There are some critical design decisions• Mistakes can hinder translatability• XML can bootstrap its own localization

Page 4: Dos and donts

The real significance of XML

• XML is not just another electronic format• XML is an eXtensible syntax• XML is a formal IT grammar• XML is programmable• XML is can bootrstrap its own localization

Page 5: Dos and donts

Benefits of XML for L10N

Why use XML for Localization• One input format• Elegant• Uses the latest IT technology• Separation of source and content• One single data bus• Open Standards based• You can use XML assist its own localization• One extraction + TM + SMT engine

Page 6: Dos and donts

Benefits of XML

Any electronic format not in XML can be converted to XML• Frame Maker• RTF• Microsoft Office pre 2007• Quark Express• Windows resource files• Java resources• PO/POT• YAML• Etc.

And then back into the original format

Page 7: Dos and donts

Do: Use Standard XML Libraries

DO!• Xerces/Xalan• MSXML

DO NOT!• Write your own XML parser• Write you own serializer

Page 8: Dos and donts

Do: Use XLIFF

XML Localization Interchange File FormatOASIS Standard:

https://www.oasis-open.org/committees/xliff1. Extract text for translation2. Use Standard library parser!!!!!

• Java/C++ - Apache Xerces• .Net – MSXML

3. XLIFF:doc – simplified XLIFF• XLIFF 1.2 subset

http://code.google.com/p/interoperability-now/4. Use skeleton file

• Use markers for inline elements• Use only <g> and <x> inline elements• Use <mrk> for terminology

Page 9: Dos and donts

XML Document Design Issues

Important points to take into account

Page 10: Dos and donts

Word/Phrase substitution IssuesWhat works in English/Chinese will not work in most other languages:

English/Chinese languages have an impoverished morphology:

• Please undo the bolt using a spanner.

• Proszę odkręcić śrubę kluczem.

Nominative form: klucz

The real killers: gender and case

• New Ford Model: New Fiesta, New Mondeo, New Focus• Nowa Ford Fiesta• Nowe Ford Mondeo• Nowy Ford Focus

Page 11: Dos and donts

Avoid translatable Entity References

<para>Use a &tool; to release the catch.</para>

Problems:• Grammatical difficulties• Parsing difficulties• Translation memory problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU retention catch. </para>

Page 12: Dos and donts

Avoid Word Substitution Mechanisms<p>Using a<keyword conref=”tools.dita#tools/ClawHammer”/>,remove the CPU from its mount.</p>

Problems:• Grammatical difficulties• Parsing difficulties• Translation memory problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU from its mount. </para>

Page 13: Dos and donts

Incorrect use of Translatable Attributes<para>   Use a <tool id="a1098" name="claw hammer">   to release the CPU retention catch. </para>

Problems:• Grammatical difficulties• Text flow problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU retention catch. </para>

Page 14: Dos and donts

CDATA Sections<TEMPLATE><![CDATA[<p>Please refer to the   <em>index page   </em> page for further information</p> ]]></TEMPLATE>

Problems:• Grammatical difficulties• Text flow problems

Solution:

<TEMPLATE>   <dx:p>Please refer to the <dx:em>index page   </dx:em> page for further information</dx:p> </TEMPLATE>

Page 15: Dos and donts

Infinite Naming Schemes<?xml version="1.0" ?> <resources xml:lang="en">   <err001>Cannot open file $1.</err001>   <hint001>Hint: does file $1 exist.</hint001>   <err002>Incorrect value.</err002>   <hint002>Hint: value must be between $1 and $2.</hint002>   <err003>Connection timeout.</err999>   .   . </resources>

Problems:• Poor XML practice• Problems for extraction programs

Solution:

Page 16: Dos and donts

Infinite Naming Schemes contd.

<?xml version="1.0" ?><resources xml:lang="en"> <error id="001"> <caption>Cannot open file $1.</caption> <hint>Does file $1 exist.</hint> </error> <error id="002"> <caption>Incorrect value.</caption> <hint>Value must be between $1 and $2.</hint> </error> . .</resources>

Solution:

Page 17: Dos and donts

Avoid Processing Instructions<para>  Use a <?tool name="claw hammer"?> to release   the CPU retention catch.</para>

Problems:• Grammatical difficulties• Pis not guaranteed to survive transformations• Text flow problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU retention catch. </para>

Page 18: Dos and donts

Avoid text in bitmap graphics

Page 19: Dos and donts

Text expansionNever make assumptions about text length in design

Page 20: Dos and donts

Do: Always use UTF-8 or UTF-16 Encoding

• Avoid code conversion issues• Always use Unicode encoding• Not just CJK issues• Also wingbats and special characters such

as:• M-dash• Ndash• Non break spaces • Etc.

Page 21: Dos and donts

Do not break text over non-inline elements

<para>  <line>This text should not be</line>  <line>broken this way – the translated   text may well be in a different order.</line></para>

Problems:• Grammatical difficulties• Against the principles of XML• Text flow problems

Page 22: Dos and donts

Avoid the use of typographical elements<para><b>Do not use</b> <br/>’br’ type elements.</para>

Problems:• Grammatical difficulties• Against the principles of XML• Text flow problems

Solution:

<para>  <emph>Do not use</emph> 'br' type elements.</para>

Page 23: Dos and donts

Do not mix xlatable and non-xlatable<data-items>  <data id="class">  com.xmlintl.data.dataDefDefinition  </data>  <data id="text">Replace generic datadefinitions with specific instances.  </data></data-items>

Problems:• Poor XML practice• Problems for extraction programs

Solution:

Page 24: Dos and donts

Do not mix xlatable and non-xlatable contd.

<data-items>  <class id="com.xmlintl.data.dataDefinition"><text>Replace generic datadefinitions with specific instances.</text>  </class></data-items>

Page 25: Dos and donts

Avoid mixed language documents<para> <text xml:lang="en"> My hovercraft is full of eels. </text> <text xml:lang="fr"> Mon aéroglisseur est plein d'anguilles. </text> <text xml:lang="hu"> Légpárnás hajóm tele van angolnákkal. </text> <text xml:lang="ja"> 私のホバークラフトは鰻で一杯です。 </text> <text xml:lang="pl"> Mój poduszkowiec jest pełen węgorzy. </text> <text xml:lang="es"> Mi aerodeslizador está lleno de anguilas. </text> <text xml:lang="zh-CH"> 我隻氣墊船裝滿晒鱔. </text> <text xml:lang="zh-TW"> 我的氣墊船充滿了鱔魚 [我的气垫船充满了鳝鱼 ] </text></para>

Page 26: Dos and donts

Do: clearly mark non-translatable text

<para> The following part of this sentence should <its:its translate=‘no’>not be translated</its:its> at all.</para>

Page 27: Dos and donts

Core L10 Interoperability Standards

• W3C ITS Document Rules

• ETSI LIS SRX

• ETSI LIS xml:tm

• ETSI LIS TMX

• ETSI LIS TBX

• ETSI LIS GMX

• OASIS XLIFF

• W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary)

• Unicode TR29

Page 28: Dos and donts

Putting It All Together

Page 29: Dos and donts

• Open Architecture for XML Authoring and Localization (OAXAL)

– http://wiki.oasis-open.org/oaxal/FrontPage

Page 30: Dos and donts

OAXAL

Page 31: Dos and donts

OAXAL

Page 32: Dos and donts

Localization without Standards

Customer

source text source text extract extracted

text tm process

prepared text

translatetranslated text

target texttarget text

merge target text

QA

Page 33: Dos and donts

True Cost of Translation

Page 34: Dos and donts

OAXAL in Action

Page 35: Dos and donts

Translating English Soccer Articles into

Arabic 24x7

Page 36: Dos and donts

Translating English Soccer Articles into

Arabic 24x7

Page 37: Dos and donts

Flagship website

Page 38: Dos and donts

Flagship website

Page 39: Dos and donts

Browser-Based Workbench

Page 40: Dos and donts

OAXAL In Action

Page 41: Dos and donts

Your opinion is important to us! Please tell us what you thought of the lecture. We look forward to your feedback via smartphone or tablet under

http://LOC23.honestly.deor scan the QR code

The feedback tool will be available even after the conference!

Page 42: Dos and donts

• Contact details:• Andrzej Zydroń• [email protected]• http://www.xtm-intl.com