internationalisation and globalisation

Post on 15-Nov-2014

9.900 Views

Category:

Economy & Finance

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a very old presentation but if you gloss over the usage of VB6 there is plenty of value. I presented this to the VBUG Annual Conference in 2003.

TRANSCRIPT

Internationalisation and Globalisation

Visual Basic 6

Credit to Kaplan

• “Internationalisation with Visual Basic”Michael S. KaplanISBN 0672319772

Credit to Appleman

• “Visual Basic Programmer’s Guide to the Win32 API”Dan ApplemanISBN 0672315904

Outline

• “In a connected world, it is increasingly important to be able to implement solutions for users across the world. Unfortunately, the ability to do this with VB6 is not well documented, requires a lot of effort to understand and is not available 'out of the box'.”

• http://www.unitoolbox.com

Contents

• The following subjects are covered:CharactersKeyboardsFonts (very briefly…)LanguagesStringsTechniques to code an internationalised

application

Terminology

Terminology – Contents

• Globalisation

• Internationalisation (i18N)

• Multinationalisation (M18N)

• Translation

• Localisation (L10N)

Internationalisation (i18N)

• The process of converting an application to be capable of multinationalisation and localisation

• Culture-specific issues are addressede.g. conventions, preferences, data formatting

• Depends upon default system or user preferences

• Does not require the translation of the text of an application

Globalisation

• The process of designing and developing an application that supports localized user interfaces and regional data for users in multiple cultures

.NET Framework Developers Guide

Multinationalisation (M18N)

• The process of converting an application to support multiple cultures

• A significant enhancement of i18N

• Multiple language availability, including crossing the code page barrierE.g. Office2000 multilanguage packs

(langpacks) and Win2000 multilanguage user interface (MUI)

Translation

• The process of representing the text of an application in another languagee.g. dialogs, menus, alerts, documentation

etc.

• For example, the ‘File|Open’ menu item is translated to ‘Fichier|Ouvrir’ in FrenchMicrosoft International Word List

• Converts the meaning and sense of the text, not just the words

Beware Babelfish!

• “Insert the boot disk into Drive A”Translate from English to German using

Babelfish“Legen Sie die Boot Diskette in Laufwerk A

ein” which means“Insert the charge disk in Propulsion A”

• “Setzen Sie die Aufladung Scheibe in Antrieb A ein” is the correct translation

Localisation (L10N)

• The process of converting an application to adhere to the local culture of a user

Terminology - Summary

• Explained some of the general terms used around internationalisation

• Discussed the scope of the terms used

About Characters

About Character - Contents

• Character Repertoires

• Character Codes & Encoding

• Character SetsASCII, ANSI, DBCS, Unicode

• Windows Character Set Usage

Character (definition)

• character noun…7. letter or symbol: any written or printed letter, number, or other symbol…Source: Encarta World English Dictionary

Character (alternate definition)

• A character is the atomic unit of textual communication

Character Repertoire

• An abstract set of distinct charactersUsually defined by specifying a name and

sample presentation of each characterThe ordering of characters for sorting

purposes is not definedEither:

Fixed (e.g. English), or Open (e.g. Unicode, Chinese)

Character Repertoire (English)

• The character repertoire of English containsAlphabet

Upper case A ‘A’ … Lower case Z ‘z’Punctuation

Period . Ellipses … Comma , Semicolon ; Colon : Question Mark ? Exclamation Point ! Quotation Marks “” Parentheses () Apostrophe ‘ Hyphen -

Character Repertoires

Character Code

• A mapping between an unsigned integer and a charactere.g. 65=‘A’

• The VB Functions Chr$(…) and Asc(…) address this mappinge.g. Chr$(65) returns “A”e.g. Asc(“A”) returns 65

Character Encoding

• The process of collating code points by assigning an unsigned integer to each character in a repertoire

• The output of encoding is a character set

• The values assigned imply ordering of the character set, but the ordering may not be meaningful

Character Set

• An encoded character repertoire

• There are a large number of character sets

• Character sets are not language specifice.g. Latin Alphabet No.1 (ISO 8859-1)

ASCII Character Set

ANSI Character Sets

Double-byte Character Sets (DBCS)

• aka MBCS (Multi-byte character set)Because first 128 characters single-byte

encoded as ANSIAdditional characters double-byte encoded

• Double-byte encodingthe first (or ‘lead’) byte signals that both

itself and the next byte are to be interpreted as a single character

Double-byte character

DBCS Example

Unicode Character Set

• All characters as double-byte encoded(as far as Windows is concerned anyway: UCS-2/UTF-16)

• Although DBCS and Unicode both use double-byte encoding, the mapping differs

• All characters in the Unicode character set are given a unique value

Character Set Comparison

Character Repertoires Revisited

Windows Character Set Usage

• 16-bit Windows use ANSI character setsKnown as Code Pages

• 32-bit Windows use Unicode

Windows Code Page

• A table of 256(+) code points for a languageFirst 128 code points are the same (the

ASCII table of non-printing and English characters)

Next 128(+) are used for non-English characters needed by the language

• Based on ANSI character sets

About Characters - Summary

• Explained how characters are gathered into repertoires, and are then encoded into character sets

• Described the main character sets supported by Windows

About Keyboards

About Keyboards - Contents

• Scan Codes

• Keyboard Layouts

• Virtual Keys

Scan Code

• A hardware-dependent code sent by a keyboard to indicate a keyboard operation

• Scan codes can vary between different keyboards

Keyboard Layout

• A definition of the scan codes supported by a keyboardWin3.x have a system-wide layoutWin9x and WinNT support multiple layouts

on a system-wide and per-thread basis

Virtual Key

• An abstraction of scan codes, so that interpretation of input need not be hardware-specific

• API Constants exist with VK_ prefixe.g. VK_A

From Key to Character

Keyboard limitations

• Keyboards are an effective data entry method for most languages

• However there are no keyboards for character-based languages because there are no keyboards with thousands of keys…i.e. Far East languages (also known as

Chinese/Japanese/Korean, or CJK languages)

Input Method Editor (IME)

• Software to allow the input of CJK charactersA group that approximates a character is

selectedAn actual character can then be selected

from the group

• Run by the Input Method Manager (IMM)

Japanese IME

About Keyboards - Summary

• Explained how keystrokes become characters

• Briefly discussed non-keyboard input

About Fonts

About Fonts - Contents

• Character-based systems

• Graphic-based systems

• Glyphs & Fonts

Character-based Systems

• Such systems display characters only

Graphic-based Systems

• Such systems display glyphs, not characters

Glyph

• A glyph is a graphical representation of a character

Font

• A collection of glyphs

About Fonts - Summary

• Discussed the difference between character-based and graphic-based systems

• Briefly discussed the representation of characters by glyphs and fonts

About Languages

About Languages - Contents

• Languages

• Locales

Language (definition)

• language noun1. speech of group: the speech of a country, region, or group of people, including its diction, syntax, and grammar…Source: Encarta World English Dictionary

Locale

• A specific international market where a target user is working

• Encompasses localisation issues:e.g. conventions, culture, language,

preferencesincluding formatting of numbers,

currencies, etc.phraseology can vary also

Locale Identifier (LCID)

• A 32-bit unsigned integer that identifies the locale for the system or thread

• Commonly pronounced el-sid

LCID Structure

LCID Language

• Language IdentifierA combination of the primary and secondary

language identifiers

• Primary Language IdentifierRepresents the language itself(e.g. ‘English’)

• Secondary Language IdentifierRepresents the country or region where the

language is spoken(e.g. ‘English as spoken in the United Kingdom’)

LCID Sorting

• Sort IdentifierRepresents the order in which characters

are to be sorted (usually the default)

• Sort VersionCurrently unused (it is reserved and must

be set to 0)

Locale Coverage

• Windows does not have locales for all possible language / region combinationsIn fact, almost without exception, a locale

is only supported if there is a country or region that speaks the language

For example there is no locale for Esperanto, Coptic or Latin and certainly not for Klingon!

Locale Usage

• Settings associated with Locales are heavily used by Windows, COM and VBSo, the current Locale fundamentally

affects the processing of information on a system

• Settings are accessed by the Regional Options control panel

About Languages - Summary

• Discussed the relationship between languages and locales

• Explained the structure of the locale identifier

About Strings

About Strings - Contents

• C Strings

• VB Strings

• VB String calls to COM and Win32 API functions

String

• An array of characters

• Not a primitive datatype

• A number of string datatypes existe.g. LPSTR, BSTR, etc.

Pointer to String (LPSTR)

• C datatype

• Null-terminated

• Used extensively throughout the Windows API

Basic String (BSTR)

• COM datatype, used by VB internally

• Unicode pointer to a block of memory prefixed by a length encoding representing the size of the stringA contract for creation (allocation)A contract for destruction (deallocation)An API

VB COM Calls

• Both VB and COM use Unicode, so strings are not transposed into alternate character sets

VB Win32 API Calls

• Character encodingVB and WinNT use Unicode encoding, butWin9x uses ANSI encoding

• Unfortunately VB does not know the encoding expected on the target API callStrings are therefore encoded as ANSIThus the call succeeds both on Win9x and

WinNT, but this wasteful on WinNT…

VB Win9x API Call

VB WinNT API Call

VB WinNT API Call (Unicode)

About Strings - Summary

• Discussed C and VB strings

• Explained how COM and Win32 API string function calls are transacted

An Internationalised App

1.0.1

• ‘Plain vanilla’ VB Standard EXE

2.0.2

• 1st attempt to internationaliseAddition of resource file

2.1.2

• 2nd attempt to internationaliseIsolate persistent strings

2.2.2

• 3rd attempt to internationaliseParameterise resource strings

2.2.3

• 4th attempt to internationaliseLoading with current LCIDBy setting thread locale

3.0.4

• 5th attempt to internationaliseLoading with current LCID (again…)By loading resources directly

3.1.5

• 6th attempt to internationaliseLoading with current LCID (yet again!)By employing satellite resource

3.1.6

• 5th attempt to internationaliseLoading all strings from satellite resources

Conclusion

• Covered Characters, Keyboards, Fonts, and Languages

• Explained Strings and the usage of Strings

• Coded a simple internationalised application

top related