24rd internationalization and unicode conference, atlanta, ga usa – sept 2003 common xml locale...
TRANSCRIPT
24rd Internationalization and Unicode Conference, Atlanta, GA USA – Sept 2003
Common XML Locale Repository
Dr. Mark Davis
Steven R. Loomis
IBM San José Globalization Center of CompetencyCopyright © 2003 IBM Corporation
Atlanta, GA USA – Sept 2003Common XML Locale Repository 2
Locale Data Confusion
Variations in localized data can irritate or confuse users…
OS #1: 2003-02-17 (févr. )
OS #2: 03-02-17 (fév)
Atlanta, GA USA – Sept 2003Common XML Locale Repository 3
Locale Data Problems
Mismatched data can be catastrophic…
OS #1: 10 records in {Z..Aa}
OS #2: 0 records in {Z..Aa}
Atlanta, GA USA – Sept 2003Common XML Locale Repository 4
What is Locale Data?
• Locale = identifier string referring to linguistic and cultural preferences
• Typical data– Dates/times– Numbers– Measurement– Currency– Sorting (Collation)– Translated country and language names
Atlanta, GA USA – Sept 2003Common XML Locale Repository 5
Where is locale data found?
• International Components for Unicode (ICU)
• OpenOffice.org
• Operating Systems– Linux, Solaris, AIX, Windows, …
• Java
• Other vendors: PeopleSoft, Oracle,…
Atlanta, GA USA – Sept 2003Common XML Locale Repository 6
Team
• Li18nux is now OpenI18N(part of the Free Standards Group)
– Linux Application Development Environment subgroup
• Common XML Locale Repository project
http://www.openi18n.org/subgroups/lade/locale/
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
Atlanta, GA USA – Sept 2003Common XML Locale Repository 7
Repository Objectives
• Common XML format for locale data
• Collect data from platforms
• Make repository available to the public
• Validate and release corrected data
• Enable W3C Web Services– Exchange and display of data in localized
form
Atlanta, GA USA – Sept 2003Common XML Locale Repository 8
Repository Features
• Version-controlled database
• HTTP based- browsing or custom tools
• Compare data between platforms– (Comparisons available now)
Atlanta, GA USA – Sept 2003Common XML Locale Repository 9
Repository Structure
• Contents– Common– ICU– Macintosh– OpenOffice.org– Windows?– …
• Allows migration to Common over time
Atlanta, GA USA – Sept 2003Common XML Locale Repository 10
Locale Data Markup Language
• XML "vocabulary" for locale data interchange
• Data stored in separate files (fr.xml or cs_CZ.xml)
• Inheritance used: ‘root.xml’ root locale, ‘fr.xml’ for French, ‘fr_CA.xml’ for French, Canada
• ldml-spec.html
Atlanta, GA USA – Sept 2003Common XML Locale Repository 11
Locale Naming
•ISO-639 + ISO-15924 + ISO-3166 +Variant:en — English
fr_BE — French as in Belgium
zh_Hant — Traditional Chinese
sv_FI_AALAND — Swedish as in Finland (Åland)
•or RFC-3066•with Keywords:
de_DE@collation=phonebook — German as in Germany, Phonebook collation.
Atlanta, GA USA – Sept 2003Common XML Locale Repository 12
Locale vs. Language IDs
• In practice, immaterial:– touchstone: what would copy-editors say?
A. "Theatre Center News: The date of the last version of this document was 2003 年 3 月 20 日 . A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt.“
B. "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."
Atlanta, GA USA – Sept 2003Common XML Locale Repository 13
Overall Structure
• <identity> • <localeDisplayNames> • <layout> • <characters> • <delimiters> • <measurement>
• <dates> • <calendars> • <timeZoneNames> • <numbers> • <currencies> • <collations>
Atlanta, GA USA – Sept 2003Common XML Locale Repository 14
NOTE
• timezones are not represented;– names of timezones are.
• currencies are not represented;– names/symbols are.
Atlanta, GA USA – Sept 2003Common XML Locale Repository 15
Non-localized Data
• In supplemental data
• Example: Currency<region iso3166="R">
<currency iso4217="C01" before="1942"/>
<currency iso4217="C02"/>
<currency iso4217="C03" before="1927"/>
<currency iso4217="none" before="1937-02-13"/>
</region>
• Timezones: Olson Data
Atlanta, GA USA – Sept 2003Common XML Locale Repository 16
Inheritance
fr• Janvier, Février…• 1,234.56 ¤ • …
fr_CA• 1 234,57 $ • …
fr_LX• 1.234,57 €• …
Atlanta, GA USA – Sept 2003Common XML Locale Repository 18
<alias> element
<localeData> <identity> <language type=“zh”/> <territory type=“HK”/> </identity> <collations> <alias source=“zh_TW”/> </collation></localeData>
Atlanta, GA USA – Sept 2003Common XML Locale Repository 19
type attribute
<numberFormatStyle type="decimal”>
1 234,57<numberFormatStyle type="percent”>
123%
cs_CZ
Atlanta, GA USA – Sept 2003Common XML Locale Repository 20
type attribute in Locale
<numberFormatStyle type="percent”>
123%
cs_CZ@numberFormatStyle=percent
Atlanta, GA USA – Sept 2003Common XML Locale Repository 21
Standard Keys/Types
• CollationTraditional, Pinyin, Stroke, Direct (Hindi),
posix
• CalendarGregorian, Arabic (Religious and Civil),
Chinese, Hebrew, Japanese, Thai (Buddhist)
Atlanta, GA USA – Sept 2003Common XML Locale Repository 22
draft and standard
• Unverified data may be marked with draft=true<localeData draft="true">
• Standard-conforming data may be marked with standard=…– Name: <collation standard="MSA 200:2002">
– URL: <dateFormatStyle standard="ISO 8601, http://www.iso.ch/iso/…CatalogueDetail?…ICS3=30,DIN 5008">
Atlanta, GA USA – Sept 2003Common XML Locale Repository 23
Data Access
• Normal HTTP request
http://openi18n.org/locale/icu/de_DE.xml?version=2.2¤cy=pre-euro
• Accessible by web browser or programmatically.
Atlanta, GA USA – Sept 2003Common XML Locale Repository 25
Comparison Data
• en_US.html
• de_DE.html
• ar_EG.html
Atlanta, GA USA – Sept 2003Common XML Locale Repository 26
Open Issues
• Vetting process being defined
• Versioning and release of Repository not finalized
Atlanta, GA USA – Sept 2003Common XML Locale Repository 27
Current Status
• LDML 1.0 Specification released, and approved by Openi18n steering committee
• Alpha 1.0 common data released
• Database available for reporting bugs or feature requests