adam goucher i18n and l10n

11
www.jonahgroup.com [email protected] (416) 304-0860 I18N & L10N a technical primer Adam Goucher Senior Quality Specialist, Jonah Group http://www.jonahgroup.com http://adam.goucher.ca

Upload: adam-goucher

Post on 16-Jan-2015

2.714 views

Category:

Technology


0 download

DESCRIPTION

Slides from my recent presentation on I18N and L10N at GLSEC 2007

TRANSCRIPT

Page 1: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

I18N & L10Na technical primer

Adam GoucherSenior Quality Specialist, Jonah Group

http://www.jonahgroup.comhttp://adam.goucher.ca

Page 2: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Definitions

Internationalization I + 18 chars + N I18N• Your application can accept, store, manipulate,

retrieve and display text in the user’s native language

Localization L + 10 chars + N L10N• Your application looks as if it was designed for

the locale it is being used in

Page 3: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

The Problem

English is the native language of only ~ 30% of the Internet’s population.

To not alienate the other 70% of your potential customers, you need to worry about I18N and L10N.

Page 4: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Don’t worry

I18N and L10N are technical problems, not linguistic ones.

Programmers and testers know how to solve technical problems.

Translation is the linguistic problem.Translators know how to solve linguistic

problems.

Page 5: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Unicode

Unicode1 provides a unique number:• for every character• no matter what the platform• no matter what the program• no matter what the language

There are a number of ways (called Encodings) to represent a Unicode code point (single character)• UTF-82 is an 8 bit, variable length encoding• UTF-8 is the de facto standard

1 http://www.unicode.org2 http://en.wikipedia.org/wiki/UTF-8

Page 6: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Resource Bundles

One of the more difficult things to get right is all the string data embedded in your source code.

The easiest solution here is to use resource bundles (locale specific collections of string data)

Page 7: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

String Rules

Like most tools, resource bundles can make your life difficult if not done correctly.

• Do not build strings to display by concatenating strings. This increases translation difficulty by removing context

• Include all punctuation in bundle content to avoid correct translation content, but incorrect punctuation

• Include formatting in bundle content

Page 8: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Resource Bundle Tests

• LOUD3 to check for string rules• Resource key not in code• Resource key in code, but not bundle• Key present (or missing) from different

locales

3 http://adam.goucher.ca/?p=28

Page 9: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Other areas

I18N and L10N is a huge topic. Some of what has not been discussed:

• Date / Time• Numbers• Currency• Username / Password conventions• Postal / Zip Codes• Paper size (when printing)

Page 10: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Testing Advice

• Test your application’s I18N and L10N early to avoid having to re-test everything.

• Include as many checks as possible during the build process

• Beta test translations with friendly customers

Page 11: Adam Goucher   I18n And L10n

www.jonahgroup.com [email protected](416) 304-0860

Summary

• This is a technical problem, not a linguistic one

• Use UTF-8 everywhere you can• Use resource bundles instead of putting

literal strings in the code• Learn about the nuances of your target

locales• Test early