dos and donts

Post on 15-Apr-2017

266 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to Localize XML Documents: A Workshop on the Do's and Don'ts of XML Localization

November 2014

http:///

XML dominant!

• HTML/XHTML• Web Services• Adobe FrameMaker• Microsoft Office• Open Office• ASP• XAML• Java Properties• DITA• Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm• OAXAL Open Architecture for XML Authoring and Localization

Benefits of XML for L10N

• Separation of form and content• Should make documents easier to translate• There are some critical design decisions• Mistakes can hinder translatability• XML can bootstrap its own localization

The real significance of XML

• XML is not just another electronic format• XML is an eXtensible syntax• XML is a formal IT grammar• XML is programmable• XML is can bootrstrap its own localization

Benefits of XML for L10N

Why use XML for Localization• One input format• Elegant• Uses the latest IT technology• Separation of source and content• One single data bus• Open Standards based• You can use XML assist its own localization• One extraction + TM + SMT engine

Benefits of XML

Any electronic format not in XML can be converted to XML• Frame Maker• RTF• Microsoft Office pre 2007• Quark Express• Windows resource files• Java resources• PO/POT• YAML• Etc.

And then back into the original format

Do: Use Standard XML Libraries

DO!• Xerces/Xalan• MSXML

DO NOT!• Write your own XML parser• Write you own serializer

Do: Use XLIFF

XML Localization Interchange File FormatOASIS Standard:

https://www.oasis-open.org/committees/xliff1. Extract text for translation2. Use Standard library parser!!!!!

• Java/C++ - Apache Xerces• .Net – MSXML

3. XLIFF:doc – simplified XLIFF• XLIFF 1.2 subset

http://code.google.com/p/interoperability-now/4. Use skeleton file

• Use markers for inline elements• Use only <g> and <x> inline elements• Use <mrk> for terminology

XML Document Design Issues

Important points to take into account

Word/Phrase substitution IssuesWhat works in English/Chinese will not work in most other languages:

English/Chinese languages have an impoverished morphology:

• Please undo the bolt using a spanner.

• Proszę odkręcić śrubę kluczem.

Nominative form: klucz

The real killers: gender and case

• New Ford Model: New Fiesta, New Mondeo, New Focus• Nowa Ford Fiesta• Nowe Ford Mondeo• Nowy Ford Focus

Avoid translatable Entity References

<para>Use a &tool; to release the catch.</para>

Problems:• Grammatical difficulties• Parsing difficulties• Translation memory problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU retention catch. </para>

Avoid Word Substitution Mechanisms<p>Using a<keyword conref=”tools.dita#tools/ClawHammer”/>,remove the CPU from its mount.</p>

Problems:• Grammatical difficulties• Parsing difficulties• Translation memory problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU from its mount. </para>

Incorrect use of Translatable Attributes<para>   Use a <tool id="a1098" name="claw hammer">   to release the CPU retention catch. </para>

Problems:• Grammatical difficulties• Text flow problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU retention catch. </para>

CDATA Sections<TEMPLATE><![CDATA[<p>Please refer to the   <em>index page   </em> page for further information</p> ]]></TEMPLATE>

Problems:• Grammatical difficulties• Text flow problems

Solution:

<TEMPLATE>   <dx:p>Please refer to the <dx:em>index page   </dx:em> page for further information</dx:p> </TEMPLATE>

Infinite Naming Schemes<?xml version="1.0" ?> <resources xml:lang="en">   <err001>Cannot open file $1.</err001>   <hint001>Hint: does file $1 exist.</hint001>   <err002>Incorrect value.</err002>   <hint002>Hint: value must be between $1 and $2.</hint002>   <err003>Connection timeout.</err999>   .   . </resources>

Problems:• Poor XML practice• Problems for extraction programs

Solution:

Infinite Naming Schemes contd.

<?xml version="1.0" ?><resources xml:lang="en"> <error id="001"> <caption>Cannot open file $1.</caption> <hint>Does file $1 exist.</hint> </error> <error id="002"> <caption>Incorrect value.</caption> <hint>Value must be between $1 and $2.</hint> </error> . .</resources>

Solution:

Avoid Processing Instructions<para>  Use a <?tool name="claw hammer"?> to release   the CPU retention catch.</para>

Problems:• Grammatical difficulties• Pis not guaranteed to survive transformations• Text flow problems

Solution:

<para>   Use a <tool id="a1098">claw hammer</tool>   to release the CPU retention catch. </para>

Avoid text in bitmap graphics

Text expansionNever make assumptions about text length in design

Do: Always use UTF-8 or UTF-16 Encoding

• Avoid code conversion issues• Always use Unicode encoding• Not just CJK issues• Also wingbats and special characters such

as:• M-dash• Ndash• Non break spaces • Etc.

Do not break text over non-inline elements

<para>  <line>This text should not be</line>  <line>broken this way – the translated   text may well be in a different order.</line></para>

Problems:• Grammatical difficulties• Against the principles of XML• Text flow problems

Avoid the use of typographical elements<para><b>Do not use</b> <br/>’br’ type elements.</para>

Problems:• Grammatical difficulties• Against the principles of XML• Text flow problems

Solution:

<para>  <emph>Do not use</emph> 'br' type elements.</para>

Do not mix xlatable and non-xlatable<data-items>  <data id="class">  com.xmlintl.data.dataDefDefinition  </data>  <data id="text">Replace generic datadefinitions with specific instances.  </data></data-items>

Problems:• Poor XML practice• Problems for extraction programs

Solution:

Do not mix xlatable and non-xlatable contd.

<data-items>  <class id="com.xmlintl.data.dataDefinition"><text>Replace generic datadefinitions with specific instances.</text>  </class></data-items>

Avoid mixed language documents<para> <text xml:lang="en"> My hovercraft is full of eels. </text> <text xml:lang="fr"> Mon aéroglisseur est plein d'anguilles. </text> <text xml:lang="hu"> Légpárnás hajóm tele van angolnákkal. </text> <text xml:lang="ja"> 私のホバークラフトは鰻で一杯です。 </text> <text xml:lang="pl"> Mój poduszkowiec jest pełen węgorzy. </text> <text xml:lang="es"> Mi aerodeslizador está lleno de anguilas. </text> <text xml:lang="zh-CH"> 我隻氣墊船裝滿晒鱔. </text> <text xml:lang="zh-TW"> 我的氣墊船充滿了鱔魚 [我的气垫船充满了鳝鱼 ] </text></para>

Do: clearly mark non-translatable text

<para> The following part of this sentence should <its:its translate=‘no’>not be translated</its:its> at all.</para>

Core L10 Interoperability Standards

• W3C ITS Document Rules

• ETSI LIS SRX

• ETSI LIS xml:tm

• ETSI LIS TMX

• ETSI LIS TBX

• ETSI LIS GMX

• OASIS XLIFF

• W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary)

• Unicode TR29

Putting It All Together

• Open Architecture for XML Authoring and Localization (OAXAL)

– http://wiki.oasis-open.org/oaxal/FrontPage

OAXAL

OAXAL

Localization without Standards

Customer

source text source text extract extracted

text tm process

prepared text

translatetranslated text

target texttarget text

merge target text

QA

True Cost of Translation

OAXAL in Action

Translating English Soccer Articles into

Arabic 24x7

Translating English Soccer Articles into

Arabic 24x7

Flagship website

Flagship website

Browser-Based Workbench

OAXAL In Action

Your opinion is important to us! Please tell us what you thought of the lecture. We look forward to your feedback via smartphone or tablet under

http://LOC23.honestly.deor scan the QR code

The feedback tool will be available even after the conference!

• Contact details:• Andrzej Zydroń• azydron@xtm-intl.com• http://www.xtm-intl.com

top related