24rd internationalization and unicode conference, atlanta, ga usa – sept 2003 common xml locale...

28
24rd Internationalization and Unicode Conference, Atlanta, GA USA – Sept 2003 Common XML Locale Repository Dr. Mark Davis [email protected] Steven R. Loomis [email protected] IBM San José Globalization Center of Competency Copyright © 2003 IBM Corporation

Upload: rosaline-gibson

Post on 23-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

24rd Internationalization and Unicode Conference, Atlanta, GA USA – Sept 2003

Common XML Locale Repository

Dr. Mark Davis

[email protected]

Steven R. Loomis

[email protected]

IBM San José Globalization Center of CompetencyCopyright © 2003 IBM Corporation

Atlanta, GA USA – Sept 2003Common XML Locale Repository 2

Locale Data Confusion

Variations in localized data can irritate or confuse users…

OS #1: 2003-02-17 (févr. )

OS #2: 03-02-17 (fév)

Atlanta, GA USA – Sept 2003Common XML Locale Repository 3

Locale Data Problems

Mismatched data can be catastrophic…

OS #1: 10 records in {Z..Aa}

OS #2: 0 records in {Z..Aa}

Atlanta, GA USA – Sept 2003Common XML Locale Repository 4

What is Locale Data?

• Locale = identifier string referring to linguistic and cultural preferences

• Typical data– Dates/times– Numbers– Measurement– Currency– Sorting (Collation)– Translated country and language names

Atlanta, GA USA – Sept 2003Common XML Locale Repository 5

Where is locale data found?

• International Components for Unicode (ICU)

• OpenOffice.org

• Operating Systems– Linux, Solaris, AIX, Windows, …

• Java

• Other vendors: PeopleSoft, Oracle,…

Atlanta, GA USA – Sept 2003Common XML Locale Repository 6

Team

• Li18nux is now OpenI18N(part of the Free Standards Group)

– Linux Application Development Environment subgroup

• Common XML Locale Repository project

http://www.openi18n.org/subgroups/lade/locale/

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Atlanta, GA USA – Sept 2003Common XML Locale Repository 7

Repository Objectives

• Common XML format for locale data

• Collect data from platforms

• Make repository available to the public

• Validate and release corrected data

• Enable W3C Web Services– Exchange and display of data in localized

form

Atlanta, GA USA – Sept 2003Common XML Locale Repository 8

Repository Features

• Version-controlled database

• HTTP based- browsing or custom tools

• Compare data between platforms– (Comparisons available now)

Atlanta, GA USA – Sept 2003Common XML Locale Repository 9

Repository Structure

• Contents– Common– ICU– Macintosh– OpenOffice.org– Windows?– …

• Allows migration to Common over time

Atlanta, GA USA – Sept 2003Common XML Locale Repository 10

Locale Data Markup Language

• XML "vocabulary" for locale data interchange

• Data stored in separate files (fr.xml or cs_CZ.xml)

• Inheritance used: ‘root.xml’ root locale, ‘fr.xml’ for French, ‘fr_CA.xml’ for French, Canada

• ldml-spec.html

Atlanta, GA USA – Sept 2003Common XML Locale Repository 11

Locale Naming

•ISO-639 + ISO-15924 + ISO-3166 +Variant:en — English

fr_BE — French as in Belgium

zh_Hant — Traditional Chinese

sv_FI_AALAND — Swedish as in Finland (Åland)

•or RFC-3066•with Keywords:

de_DE@collation=phonebook — German as in Germany, Phonebook collation.

Atlanta, GA USA – Sept 2003Common XML Locale Repository 12

Locale vs. Language IDs

• In practice, immaterial:– touchstone: what would copy-editors say?

A. "Theatre Center News: The date of the last version of this document was 2003 年 3 月 20 日 . A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt.“

B. "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."

Atlanta, GA USA – Sept 2003Common XML Locale Repository 13

Overall Structure

• <identity> • <localeDisplayNames> • <layout> • <characters> • <delimiters> • <measurement>

• <dates> • <calendars> • <timeZoneNames> • <numbers> • <currencies> • <collations>

Atlanta, GA USA – Sept 2003Common XML Locale Repository 14

NOTE

• timezones are not represented;– names of timezones are.

• currencies are not represented;– names/symbols are.

Atlanta, GA USA – Sept 2003Common XML Locale Repository 15

Non-localized Data

• In supplemental data

• Example: Currency<region iso3166="R">

<currency iso4217="C01" before="1942"/>

<currency iso4217="C02"/>

<currency iso4217="C03" before="1927"/>

<currency iso4217="none" before="1937-02-13"/>

</region>

• Timezones: Olson Data

Atlanta, GA USA – Sept 2003Common XML Locale Repository 16

Inheritance

fr• Janvier, Février…• 1,234.56 ¤ • …

fr_CA• 1 234,57 $ • …

fr_LX• 1.234,57 €• …

Atlanta, GA USA – Sept 2003Common XML Locale Repository 17

Aliasing

ru

ru_RU

mkCollation Collation

Atlanta, GA USA – Sept 2003Common XML Locale Repository 18

<alias> element

<localeData> <identity> <language type=“zh”/> <territory type=“HK”/> </identity> <collations> <alias source=“zh_TW”/> </collation></localeData>

Atlanta, GA USA – Sept 2003Common XML Locale Repository 19

type attribute

<numberFormatStyle type="decimal”>

1 234,57<numberFormatStyle type="percent”>

123%

cs_CZ

Atlanta, GA USA – Sept 2003Common XML Locale Repository 20

type attribute in Locale

<numberFormatStyle type="percent”>

123%

cs_CZ@numberFormatStyle=percent

Atlanta, GA USA – Sept 2003Common XML Locale Repository 21

Standard Keys/Types

• CollationTraditional, Pinyin, Stroke, Direct (Hindi),

posix

• CalendarGregorian, Arabic (Religious and Civil),

Chinese, Hebrew, Japanese, Thai (Buddhist)

Atlanta, GA USA – Sept 2003Common XML Locale Repository 22

draft and standard

• Unverified data may be marked with draft=true<localeData draft="true">

• Standard-conforming data may be marked with standard=…– Name: <collation standard="MSA 200:2002">

– URL: <dateFormatStyle standard="ISO 8601, http://www.iso.ch/iso/…CatalogueDetail?…ICS3=30,DIN 5008">

Atlanta, GA USA – Sept 2003Common XML Locale Repository 23

Data Access

• Normal HTTP request

http://openi18n.org/locale/icu/de_DE.xml?version=2.2&currency=pre-euro

• Accessible by web browser or programmatically.

Atlanta, GA USA – Sept 2003Common XML Locale Repository 24

XML Format

• en.xml

• ar.xml

Atlanta, GA USA – Sept 2003Common XML Locale Repository 25

Comparison Data

• en_US.html

• de_DE.html

• ar_EG.html

Atlanta, GA USA – Sept 2003Common XML Locale Repository 26

Open Issues

• Vetting process being defined

• Versioning and release of Repository not finalized

Atlanta, GA USA – Sept 2003Common XML Locale Repository 27

Current Status

• LDML 1.0 Specification released, and approved by Openi18n steering committee

• Alpha 1.0 common data released

• Database available for reporting bugs or feature requests

Atlanta, GA USA – Sept 2003Common XML Locale Repository 28

For More Information

http://www.openi18n.org/subgroups/lade/locale/