Download - XML and Localization
![Page 1: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/1.jpg)
XML and LOCALIZATION
An overview by @Fantpmas from @YamagataEurope
![Page 2: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/2.jpg)
What is XML? And why do you people love acronyms so much?
![Page 3: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/3.jpg)
XML stands for eXtensible Markup Language
You can write your own language/dialect
A language to store data in a human readable format
![Page 4: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/4.jpg)
XML is designed to carry data not display data like HTML XML doesn't do anything on its own, nada, zilch!
![Page 5: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/5.jpg)
A sample XML document (Don't worry it's all plain text)
The root element
3 child elements
![Page 6: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/6.jpg)
An XML element in detail
Start tag End tag
Attribute
Element content
Attribute value
![Page 7: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/7.jpg)
XML elements can be empty
is the same as
Self-closing element
![Page 8: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/8.jpg)
There are rules to follow When all rules are abided by, the XML is well-formed
![Page 9: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/9.jpg)
XML well-formedness rules (not exhaustive) • There must be a root element • Elements must follow naming rules • All elements must be closed • Element names are case sensitive • Elements must be properly nested • Attributes must be quoted • Attributes can only appear once in same start tag • Some characters cannot be used as such • Entities must be declared
![Page 10: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/10.jpg)
There must be a root element
![Page 11: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/11.jpg)
Elements must follow naming rules
Names can only start with • A letter (in any language, including accented letters) • A colon • An underscore
筆者 筆者
![Page 12: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/12.jpg)
Elements must follow naming rules
Names cannot contain • White spaces • Most punctuation characters except colon, underscore,
hyphen, dot, middle dot • Symbol characters
筆 者 筆 者
![Page 13: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/13.jpg)
All elements must be closed
![Page 14: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/14.jpg)
Element names are case sensitive
![Page 15: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/15.jpg)
Elements must be properly nested
![Page 16: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/16.jpg)
Attribute values must be quoted
Single or double quotes
![Page 17: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/17.jpg)
Attention to those darn quotes
If double quotes are used you cannot use double quotes inside the attribute value . The same applies for single quotes.
![Page 18: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/18.jpg)
Attributes must be unique in tags
![Page 19: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/19.jpg)
Some characters cannot be used
• < and & need to escaped into entities: and • Most control characters
(characters to indicate carriage return, tab or backspace)
![Page 20: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/20.jpg)
A word about entities
Entities are used to represent characters or a sequence of characters that needs to be repeated throughout a document Syntax:
Ampersand Semicolon
![Page 21: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/21.jpg)
Predefined XML entities
5 predefined character entities, only 2 are obligatory
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
![Page 22: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/22.jpg)
Entities must be declared
Except for predefined entities all entities must be declared in the Document Type Definition
Entity
DTD Entity declaration
![Page 23: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/23.jpg)
Other constructs
• XML declaration
• Stylesheet declaration
• Document Type declaration
• Comments
• CDATA
![Page 24: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/24.jpg)
Document Type Definition A DTD defines the structure of an XML document
![Page 25: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/25.jpg)
How to declare DTDs
DTDs can be internal
DTD
![Page 26: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/26.jpg)
How to declare DTDs
DTDs can be external
![Page 27: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/27.jpg)
XML Schema
XML Schema (*.xsd) is an XML based alternative to DTD
![Page 28: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/28.jpg)
DTDs in the localization world
Don't be scared, but XML really is everywhere • TMX • TBX • XLIFF • TTX • SRX • QT Linguist TS • DITA • ...
![Page 29: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/29.jpg)
Encoding
All XML parsers must support at least UTF-8 and UTF-16. Default encoding is UTF-8. Always a good idea to specify the encoding
![Page 30: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/30.jpg)
Byte Order Mark
A character to indicate the byte order of an XML document In UTF-8 it's optional and not even recommended In UTF-16 it's used to indicate endianness: little-endian or big-endian If you see these at the start of a file, something's wrong:
![Page 31: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/31.jpg)
Complimentary technologies What? There's more of this geek stuff!?
![Page 32: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/32.jpg)
Extensible Stylesheet Language Transformation (XSLT)
It's XML to transform another XML document!
![Page 33: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/33.jpg)
XSL Transformations
XML
(X)HTML
XML
TXT
![Page 34: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/34.jpg)
How to apply an XSLT
Declare the stylesheet in the XML file itself
Use an application like XMLSpy or xmlstarlet
![Page 35: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/35.jpg)
XSLT localization examples
• Convert a TTX to a two-column HTML or CSV • Convert a TMX to a TBX • Convert a TMX to a TXT (for spell-check in MS Word) • Convert multilingual XML to TMX/TBX • Generate HTML preview for XML in SDL Trados Studio • Prepare XML files for translation
![Page 36: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/36.jpg)
XPath
It's a query language to select nodes from an XML document It's used in XSLT
Will select all elements that have an attribute called
and whose value is
And also in SDL Trados Studio file types
![Page 37: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/37.jpg)
Is XML good for localization? Yes, but not always
![Page 38: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/38.jpg)
XML is great for localization
• Unicode supported by default
• Metadata gives more information about content
• Separates content from formatting (to some extent)
• Human readable
• Easily transformable using XSLT
• Excellent for single-sourcing
![Page 39: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/39.jpg)
But bad XML is bad
• Translatable content in attributes
• No metadata to distinguish between content e.g. mixed languages, translatable vs not translatable
• CDATA is just plain cheating
• Bad implementations of standards (XLIFF)
![Page 40: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/40.jpg)
And also
• Multilingual XML can be challenging (XSLT can help)
東京
• Big files and one-liners can cause processing problems
(pretty-printing can help)
![Page 41: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/41.jpg)
Tools, tools, tools
• Altova XMLSpy: all-round XML editor
• Altova DiffDog: compare XML files
• xmlstarlet: command line XML toolkit
• EditPad Pro for all encoding/BOM matters
![Page 42: XML and Localization](https://reader033.vdocument.in/reader033/viewer/2022042602/55967d561a28abc7368b471d/html5/thumbnails/42.jpg)
"Specification is only theory. In practice, there is only the parser."
@Tnkrd