the nls setup application and trabase.sas · the nls setup application and trabase.sas: easy ways...
TRANSCRIPT
The NLS SETUP Application and TRABASE.SAS: Easy ways to customise character conversions using the SAS System
Manfred Kiefer, SAS Institute European Headquarters
Abstract All software used in a multi-dimensional environment must account for the differences in character sets and encoding schemes. It must also accommodate differences in conventional usage between languages, such as the differing usage of upper and lower cases. The SAS System provides several features to ensure that applications can be written to use local conventions and provide national language support (NLS). Areas of concern include the following:
• moving data and applications between hosts
• management of text-strings
• displaying and printing national characters other than the standard upper and lower case A-Z, as they are encoded by various ASCII and EBCDIC formats.
Historically, the SAS System has provided internal translation tables, called TRANTAB entries, that convert one character encoding standard to another. Currently shipping as a sample application with the Orlando Release, NLSsetup fully automates the creation of TRANTABs, key maps, and device maps, and provides an easy point-and-click interface for users to transparently specify language features.
The Problem for SAS System Users As shown in Table 1, each host or platform on which the SAS System runs uses different standards for encoding characters. As a result, you must convert or map characters when you move data across platforms.
Table 1: Operating Systems (Hosts) Grouped by Character-Encoding Standard
EBCDIC hosts:
• CMS
• MVS
• VSE
ASCII-ISO hosts (those that use character set(s) that are defined by the ISO 8859 standard):
• AIX-RS/6000
• Convex
• DG/UX
• HP-UX
• Intel ABI
• MIPS ABI
• OpenVMS-VAX
• OpenVMS-AXP
• Digital UNIX
• Solaris 2
• SunOS 4.1
• ULTRIX
ASCII-ANSI hosts,. which is the MS-Windows ANSI character set. (This is essentially ISO 8859, but it is called ASCII-ANSI because it was originally based on an ANSI draft standard)
• Windows 3.1
• Windows 32s
• Windows NT
• Windows 95
ASCII-MAC hosts (those that use character set(s) that are defined by the vendor-specific Apple Macintosh character set):
• Macintosh System 7.5 for Motorola 68020-, 68030-, and 68040-based systems
• PowerPC-based Macintosh systems
ASCII-OEM host (vendor-specific IBM PC-ASCII character set):
• OS/2
When you transfer data with the "standard" A-Z characters, character conversion from one encoding standard to another is not a problem. You simply rely on the default conversion mechanisms. However, when you have data with national characters such as the æ, ø, and å in Danish; the ä, ö, ß and ö in German; and the accented characters, such as á, é, ú, and ñ in Spanish, different conversion mechanisms are involved, and unique character encoding standards are used on each platform. The NLSsetup application enables you to adapt conversion (translation) tables for each language. Although this is typically a system administrator's task, you should understand the process so that you can add your own customised tables or modify existing ones.
The problem for a SAS system administrator is to enable users to transfer data and applications from one host to another, or transparently access data on one host from another host without concern about character conversion from one coded character set to another.
The SAS System provides a number of ways of transporting data and applications across hosts. However, the processes and trantabs that are involved differ, depending on the mechanisms you use. The REMOTE engine feature of SAS/CONNECT and SAS/SHARE software uses host-to-host trantabs. PROCs UPLOAD, DOWNLOAD, CPORT, and CIMPORT use transport-format trantabs. Each of these mechanisms is explained in the following sections.
Host-to-Host Trantabs: Transporting Data via the REMOTE Engine The REMOTE engine is a feature of SAS/CONNECT and SAS/SHARE software that allows you to access remote data. When you move data across platforms, the REMOTE engine translates character sets directly from the source platform's encoding standard to the target platform's encoding standard, as shown in the following diagram.
source target platform <------> translation <--------> platform (host-to-host trantabs) For example, if you are using the REMOTE engine to access data on an MVS host, which uses EBCDIC encoding, from a PC client, which uses IBM PC-ASCII encoding, characters are translated directly from EBCDIC to IBM PC-ASCII and vice-versa. Table 2 shows the trantabs that the SAS System provides for direct host-to-host character-set translation.
Table 2: SAS Host-to-Host Trantabs
Trantab Name Entry and Function Specific Hosts ------------------------------------------------------------------------------------------- On EBCDIC hosts (IBM mainframes) _0000030 (0) import from ASCII-ISO to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-ISO OpenVMS or UNIX systems _0000060 (0) import from ASCII-ANSI to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-ANSI Windows _00000A0 (0) import from ASCII-OEM to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-OEM OS/2 _0000120 (0) import from ASCII-MAC to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-MAC MAC -------------------------------------------------------------------------------------------
On Windows hosts - Windows in ANSI mode:
_0000050 (0) import from ASCII-ISO to ASCII-ANSI connecting Windows to (1) export from ASCII-ANSI to ASCII-ISO OpenVMS and UNIX systems _0000060 (0) import from EBCDIC to ASCII-ANSI connecting Windows to MVS (1) export from ASCII-ANSI to EBCDIC CMS, or VSE _00000C0 (0) import from ASCII-OEM to ASCII-ANSI connecting Windows to OS/2 (1) export from ASCII-ANSI to ASCII-OEM or to Windows in ASCII-OEM mode _0000140 (0) import from ASCII-MAC to ASCII-ANSI connecting Windows to MAC (1) export from ASCII-ANSI to ASCII-MAC - Windows in OEM mode:
_0000090 (0) import from ASCII-ISO to ASCII-OEM connecting Windows to (1) export from ASCII-OEM to ASCII-ISO OpenVMS or UNIX systems _00000A0 (0) import from EBCDIC to ASCII-OEM connecting Windows to (1) export from ASCII-OEM to EBCDIC MVS, CMS, or VSE _00000C0 (0) import from ASCII-ANSI to ASCII-OEM connecting Windows to (1) export from ASCII-OEM to ASCII-ANSI Windows in ASCII-ANSI mode _0000180 (0) import from ASCII-MAC to ASCII-OEM connecting Windows to MAC (1) export from ASCII-OEM to ASCII-MAC --------------------------------------------------------------------------------------------
On OS/2 _0000090 (0) import from ASCII-ISO to ASCII-OEM connecting OS/2 to OpenVMS (1) export from ASCII-OEM to ASCII-ISO or UNIX systems _00000A0 (0) import from EBCDIC to ASCII-OEM connecting OS/2 to MVS, (1) export from ASCII-OEM to EBCDIC CMS, or VSE _00000C0 (0) import from ASCII-ANSI to ASCII-OEM connecting OS/2 to Windows (1) export from ASCII-OEM to ASCII-ANSI _0000180 (0) import from ASCII-MAC to ASCII-OEM connecting OS/2 to MAC (1) export from ASCII-OEM to ASCII-MAC -------------------------------------------------------------------------------------------- On MAC _0000110 (0) import from ASCII-ISO to ASCII-MAC connecting MAC to OpenVMS (1) export from ASCII-MAC to ASCII-ISO or UNIX systems _0000120 (0) import from EBCDIC to ASCII-MAC connecting MAC to MVS, (1) export from ASCII-MAC to EBCDIC CMS, or VSE _0000140 (0) import from ASCII-ANSI to ASCII-MAC connecting MAC to Windows (1) export from ASCII-MAC to ASCII-ANSI _0000180 (0) import from ASCII-OEM to ASCII-MAC connecting MAC to OS/2 (1) export from ASCII-MAC to ASCII-OEM
-------------------------------------------------------------------------------------------- On OpenVMS and UNIX hosts _0000030 (0) import from EBCDIC to ASCII-ISO connecting OpenVMS or UNIX to (1) export from ASCII-ISO to EBCDIC MVS, CMS, or VSE _0000050 (0) import from ASCII-ANSI to ASCII-ISO connecting OpenVMS or UNIX (1) export from ASCII-ISO to ASCII-ANSI to Windows _0000090 (0) import from ASCII-OEM to ASCII-ISO connecting OpenVMS or UNIX (1) export from ASCII-ISO to ASCII-OEM to OS/2 _0000110 (0) import from ASCII-MAC to ASCII-ISO connecting OpenVMS or UNIX (1) export from ASCII-ISO to ASCII-MAC to MAC --------------------------------------------------------------------------------------------
The same trantabs are used for all connectivity mechanisms, including APPC, TCP/IP, NETBIOS, and DECnet.
As you can see from Tables 1 and 2, the conversion subsystem distinguishes the following character architectures:
• EBCDIC
• ASCII-ISO (ISO 8859)
• ASCII-OEM (which, for our purposes, includes only the IBM PC-ASCII standard)
• Microsoft's ASCII-ANSI
• Apple's ASCII-MAC.
Each host-to-host trantab actually consists of two halves, or "entries":
• ordered entry 0 (for importing)
• entry 1 (for exporting).
For example, on UNIX hosts, the _0000030 (EBCDIC to ASCII-ISO) trantab is shown in Table 3 as it appears when both halves are listed by the TRANTAB procedure.
Table 3: The _0000030 Trantab
Table name is _0000030. 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '000102039C09867F978D8E0B0C0D0E0F'x 10 '101112139D8508871819928F1C1D1E1F'x 20 '80818283840A171B88898A8B8C050607'x 30 '909116939495960498999A9B14159E1A'x 40 '20A0A1A2A3A4A5A6A7A8D52E3C282B7C'x -> 50 '26A9AAABACADAEAFB0B121242A293B5E'x 60 '2D2FB2B3B4B5B6B7B8B9E52C255F3E3F'x 70 'BABBBCBDBEBFC0C1C2603A2340273D22'x 80 'C3616263646566676869C4C5C6C7C8C9'x 90 'CA6A6B6C6D6E6F707172CBCCCDCECFD0'x A0 'D17E737475767778797AD2D3D45BD6D7'x B0 'D8D9DADBDCDDDEDFE0E1E2E3E45DE6E7'x C0 '7B414243444546474849E8E9EAEBECED'x D0 '7D4A4B4C4D4E4F505152EEEFF0F1F2F3'x E0 '5C9F535455565758595AF4F5F6F7F8F9'x F0 '30313233343536373839FAFBFCFDFEFF'x 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '00010203372D2E2F1605250B0C0D0E0F'x 10 '101112133C3D322618193F271C1D1E1F'x -> 20 '405A7F7B5B6C507D4D5D5C4E6B604B61'x 30 'F0F1F2F3F4F5F6F7F8F97A5E4C7E6E6F'x 40 '7CC1C2C3C4C5C6C7C8C9D1D2D3D4D5D6'x 50 'D7D8D9E2E3E4E5E6E7E8E9ADE0BD5F6D'x 60 '79818283848586878889919293949596'x 70 '979899A2A3A4A5A6A7A8A9C04FD0A107'x 80 '202122232415061728292A2B2C090A1B'x 90 '30311A333435360838393A3B04143EE1'x A0 '41424344454647484951525354555657'x B0 '58596263646566676869707172737475'x C0 '767778808A8B8C8D8E8F909A9B9C9D9E'x D0 '9FA0AAABAC4AAEAFB0B1B2B3B4B5B6B7'x E0 'B8B9BABBBC6ABEBFCACBCCCDCECFDADB'x F0 'DCDDDEDFEAEBECEDEEEFFAFBFCFDFEFF'x
Each cell in the trantab "maps" an ASCII code point to a corresponding EBCDIC code point, or vice-versa. For example, the ampersand character (&) is '50'x in EBCDIC and '26'x in ASCII-ISO (as in all ASCII standards). Therefore, in the first half of the trantab in Table 3 (the EBCDIC to ASCII-ISO half), the cell that represents EBCDIC code point 50 contains the value 26. In the second half of the table (the ASCII-ISO to EBCDIC half), cell 26 contains the value 50.
In all cases, the EBCDIC trantabs (_0000030, _0000060, _00000A0, _0000120) are accurate for the U.S. English EBCDIC code page (CECP 037), and for the first 128 ASCII code points. The upper 128 "ASCII" code points, which are used for national characters, vary from one 8-bit ASCII extension to another, or from one code page to another. Therefore, international users who want to preserve their national characters must always customise these trantabs. This is discussed in "Customizing Host-to-Host Trantabs."
Table 4 shows an ASCII trantab, trantab _0000090 (ASCII-OEM to ASCII-ISO).
Table 4: The _0000090 Trantab
Table name is _0000090. 0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '000102030405060708090A0B0C0D0E0F'x 10 '101112131415161718191A1B1C1D1E1F'x 20 '202122232425262728292A2B2C2D2E2F'x 30 '303132333435363738393A3B3C3D3E3F'x 40 '404142434445464748494A4B4C4D4E4F'x 50 '505152535455565758595A5B5C5D5E5F'x 60 '606162636465666768696A6B6C6D6E6F'x 70 '707172737475767778797A7B7C7D7E7F'x 80 '808182838485868788898A8B8C8D8E8F'x 90 '909192939495969798999A9B9C9D9E9F'x A0 'A0A1A2A3A4A5A6A7A8A9AAABACADAEAF'x B0 'B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF'x C0 'C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF'x D0 'D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF'x E0 'E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF'x F0 'F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x
As you can see, this half of the trantab, as well as the other half, which is not shown, is an identity table, in which each code point is mapped to itself. For example, cell 00 contains the value 00, cell 01 contains the value 01, and so on. Again, this is valid for the first 128 code points, which are the same in all ASCII-based standards. International users must customise the upper half of the trantab to reflect the correct mappings for the national characters that they want to preserve.
Customising Host-to-Host Trantabs You can use the TRANTAB procedure to modify a host-to-host trantab. As previously stated, customisation of these trantabs (except _0000050, where Windows ANSI corresponds exactly to ISO 8859) is always necessary for international users who want to preserve their national characters when they use the REMOTE engine to transfer data across hosts.
For example, suppose that from an OpenVMS session you want to modify a SAS data set in CMS that contains German national characters. The national characters that you want to preserve are ä, ö, ü, ß, Ä, Ö, and Ü.
As Table 2 shows, the trantab that handles character conversion between OpenVMS and CMS is _0000030. Therefore, you customise by mapping the code point for each national character from the German EBCDIC code page (CECP 273) to the corresponding code point on the ISO 8859-1 code page. You do this by performing the following steps.
1. From your OpenVMS session, use PROC TRANTAB to modify the appropriate trantab. For data conversion, only the trantabs in the user's SAS session are used. Do not update a trantab in a user session while you are connected to a foreign host that needs to use the trantab.
proc trantab table=_0000030; /* Customize import entry: */ /* convert EBCDIC to ASCII-ISO */ rep 'c0'x 'e4'x; /* a diaeresis (ä) */ rep '6a'x 'f6'x; /* o diaeresis (ö) */ rep 'd0'x 'fc'x; /* u diaeresis (ü) */ rep 'a1'x 'df'x; /* s sharp (ß) */ rep '4a'x 'c4'x; /* A diaeresis (Ä) */ rep 'e0'x 'd6'x; /* O diaeresis (Ö) */ rep '5a'x 'dc'x; /* U diaeresis (Ü) */ swap; /* Customize export entry: */ /* convert ASCII-ISO to EBCDIC */ rep 'e4'x 'c0'x; /* a diaeresis (ä) */ rep 'f6'x '6a'x; /* o diaeresis (ö) */ rep 'fc'x 'd0'x; /* u diaeresis (ü) */ rep 'df'x 'a1'x; /* s sharp (ß) */ rep 'c4'x '4a'x; /* A diaeresis (Ä) */ rep 'd6'x 'e0'x; /* O diaeresis (Ö) */ rep 'dc'x '5a'x; /* U diaeresis (Ü) */ swap; save; quit;
2. The custom table is written to your SASUSER.PROFILE catalog. If it needs to be generally accessible, copy it to the SASHELP.HOST catalog. By default, the SAS System tries first to locate translation tables in SASUSER.PROFILE, and then in SASHELP.HOST.
3. To start using the modified trantab, you must first close and re-start your SAS session on OpenVMS.
4. Now sign on to CMS. For example:
options comamid=tcp remote=your_serverid; filename rlink 'your_communication_script'; signon;
5. Assign a libname:
libname test 'file-type file-mode' server=your_serverid;
6. Now you can use the FSEDIT procedure, for instance, to update the data set on the remote host, and keep national characters correct.
If you had not modified the table, the default character conversion would apply. This means, for instance, that 'A1'x would be translated to '7E'x. Therefore, the word "Straße" would appear as "Stra~e" in your OpenVMS session.
You can de-activate the customised character conversion by renaming the customised trantab, de-assigning the library, and assigning it again.
Transport-Format Trantabs: Transporting Data via PROCs UPLOAD, DOWNLOAD, CPORT, and CIMPORT Both the UPLOAD/DOWNLOAD procedures and the CPORT/CIMPORT procedures use an intermediate transport format when transporting files from one host to another. PROCs UPLOAD and DOWNLOAD are part of SAS/CONNECT software. See SAS/CONNECT Software: Usage and Reference, Version 6, 2nd Edition for details. (Appendix 4 lists the default ASCII/EBCDIC translation tables.)
The process is illustrated in the following diagram:
translation translation (local-to-transport (transport-to-local trantab) trantab) | | | | source V transport V target platform <-------> format <---------> platform When you are converting from a character-encoding standard to transport format or vice-versa, you use the SAS transport-format trantabs shown in Table 5.
Table 5: SAS Transport-format Trantabs
----------------------------------------------------------------------------------------- Trantab name Function SASXPT controls local-to-transport-format translation SASLCL controls transport-to-local-format translation -----------------------------------------------------------------------------------------
For example, if you are transporting SAS data from an MVS host, which uses the EBCDIC standard, to an OS/2 system, which uses IBM PC-ASCII, the SASXPT trantab on MVS is used for the conversion from EBCDIC to transport format, and the SASLCL trantab is used for the conversion from transport format to IBM PC-ASCII.
The character transport format is an extended ASCII representation. You can visualise transport format as an 8-bit code page in which the first 128 code points are the same as they are for all ASCII-based standards, and the upper 128 code points are initially unassigned. The upper 128 code points are simply used for mapping the national characters from EBCDIC or from any 8-bit ASCII encoding standard. On any host that uses an ASCII-based standard, SASXPT and SASLCL are identity tables, similar to many of the host-to-host trantabs. If you want to preserve national characters on hosts that use an 8-bit ASCII standard, then you must modify the default mapping of the upper 128 cells to fit the particular ASCII standard and code page that you are using.
On EBCDIC hosts, SASXPT is the same as the second half of the EBCDIC host-to-host trantabs _0000030, _0000060, or _00000A0. SASLCL is the same as the first half of those trantabs. To preserve national characters, you must customise SASXPT and SASLCL, just as you would customise the two halves of the EBCDIC host-to-host trantabs. See "Customizing Transport-Format Trantabs," for more information.
Note: To transport SAS files from one host to another via tape or shared DASD, you use the CPORT/CIMPORT procedures, just as you would if you were transporting the files via communications software. The process is virtually identical to that described previously. Only the transport medium is different.
Customising Transport-Format Trantabs Transport-format (SASXPT, SASLCL) trantabs often must be customised to accommodate national language character sets other than U.S. English. There are three ways of customising these tables:
• with the NLSSetup Application
• with the TRABASE program
• directly with the TRANTAB procedure.
The TRABASE program actually uses the TRANTAB procedure to create a number of customised trantabs for you. The following sections explain how to use the TRABASE program and how to use PROC TRANTAB separately to create your own customised trantabs.
Building Customized Trantabs with the TRABASE Program
The TRABASE program, which builds transport-format and character-operations trantabs for a number of languages and operating systems, is part of the SAS sample library. It does not create tables for all possible combinations, but it can easily be adapted to specific needs.
When you look at the TRABASE program, you will see that it creates a macro, BTABLE, with the single parameter COUNTRY. When you supply an appropriate country name, BTABLE creates a set of trantabs (corresponding to some or all of the SASXPT, SASLCL, and other default trantabs) to handle the translation of that country's national characters.
The names of the trantabs that are created by TRABASE follow a naming convention. For the local-to-transport-format and the transport-to-local-format tables, SPAETA and SPAATE are typical trantab names, where "SPA" is an abbreviation for Spanish, "ETA" stands for "EBCDIC to ASCII," and "ATE" represents "ASCII to EBCDIC."
Note: In this context, "ASCII" means IBM PC-ASCII.
The following naming convention for the local-to-transport and the transport-to-local entries was used:
• EBCDIC <-> OEM (PC-ASCII): <country>eta, <country>ate
• ISO <-> OEM (PC-ASCII): <country>ita, <country>ati
• EBCDIC <-> ISO : <country>eti, <country>ite
• ISO <-> MAC (Apple) : <country>itm, <country>mti
• ISO <-> ANSI (MS-Windows): <country>itw, <country>wti
• OEM <-> MAC : <country>atm, <country>mta
• EBCDIC <-> MAC : <country>etm, <country>mte Where country is one of the following:
• dan: Denmark/Norway
• fre: France
• ger: Germany
• hun: Hungary
• ita: Italy
• pol: Poland
• spa: Spain
• swe: Sweden/Finland
• swi: Switzerland (German/French)
See the text of the TRABASE program for further information.
Examples: Using Customised Transport-Format Trantabs
Suppose you are using an OS/2 PC. You have data that contain Spanish characters and you want to use PROC DOWNLOAD to download that data from MVS (EBCDIC) to OS/2 (ASCII, or, more specifically, IBM PC-ASCII or what SAS classifies as ASCII-OEM).
1. First, use the TRABASE program to create the customized transport-format trantabs SPAETA (EBCDIC to ASCII) and SPAATE (ASCII to EBCDIC).
2. To specify that SAS should use these trantabs instead of the default, (SASXPT and SASLCL) transport-format trantabs, you specify the following OPTIONS statement on MVS - since PROC DOWNLOAD (as would be PROC UPLOAD) is executed on the remote host (which is the MVS mainframe in this case):
options trantab=(spaeta,spaate);
The SPAETA trantab handles the correct host-to-transport format translation, and SPAATE takes care of the transport-to-host format translation.
As stated earlier, character translation depends on which platforms are involved. If you want to translate characters between the two 8-bit ASCII extensions of OS/2 and UNIX, you need to create a new set of transport-format trantabs. Most UNIX derivatives use the ISO 8859 standard. In accordance with the naming convention used above, you call a Polish SASXPT trantab for OS/2 POLATI (ASCII-to-ISO). A modified SASLCL trantab would be called POLITA (ISO to ASCII).
Note: The TRABASE program that generates the POLATI trantab in addition to numerous other customised trantabs is a recent program modification. If you don't find this trantab in your version of TRABASE, you could use PROC TRANTAB to create the table as follows:
PROC TRANTAB table=SASXPT nls; rep '98'x 'b6'x; /* s acute */ ... ... <more translations> save table=polati; quit;
A new table is written to your SASUSER.PROFILE catalogue. If a table needs to be generally accessible, copy it to the SASHELP.HOST catalogue.
NLS SETUP APPLICATION The NLSsetup application that creates all the necessary trantabs for users (both host-to-host and transport format) is shipped with Release 6.11 in the BASE sample source library. You can also use this to generate devmaps and keymaps just by selecting a country name from a listbox.
To access the NLSsetup application on Windows or OS2:
• Assign a libname libn '!SASROOT\core\sample';
• Issue: af c=libn.nlssetup.nlssetup.frame
To access the NLSsetup application on UNIX:
• Assign a libname libn '!SASROOT/samples/base';
• Issue: af c=libn.nlssetup.nlssetup.frame
The primary window for the NLSsetup application is shown in the following figure.
•
The elements of the figure are described as follows.
• SELECT ONE allows a user to select the country for which the default tables will be generated.
• RESET REMOTE ENGINE TABLES resets the REMOTE engine tables to the default tables initially shipped with the SAS System.
• OK causes the tables to be generated and stored in SASUSER.PROFILE as TRANTAB entries. If you have write access to SASHELP.HOST, it copies the TRANTAB entries from SASUSER.PROFILE to SASHELP.HOST. It also copies the appropriate key map and device map to DEFAULT.KEYMAP and DEFAULT.DEVMAP in GFONT0.FONTS. If you have write access to SASHELP.FONTS, it copies these entries to SASHELP.FONTS.
Further Details
Customised character translation tables are created, which are used for the REMOTE engine as well as some or all of the following TRANTAB entries:
• Local-to-transport format
• Transport-to-local format
• Uppercase-to-lowercase
• Lowercase-to-uppercase
• Character Classification
• Scanner Translation
• Sort Tables
You are free to rename the entries according to your needs. For example, since Danish and Norwegian users make use of the danxxx tables, Norwegian you may wish to rename the tables to norxxx.
NLSsetup creates TRANTAB entries for various configurations. You must select the proper TRANTAB entries to be used in a system OPTIONS= statement. For example, if you frequently upload or download from OS/2 to a mainframe and vice versa, then you need to use the xxETA and xxxATE tables.
In order to use the customised tables properly, you specify the appropriate TRANTAB system option. The easiest way to make the tables available to all SAS users is to add the -TRANTAB option to the CONFIG.SAS file. The arguments to the TRANTAB option are positional and identify the table entry in SASUSER.PROFILE or SASHELP.HOST by name. The example below specifies custom local-to-transport-format and transport-to-local-format translation tables while leaving the other tables unchanged.
-TRANTAB (sweeta,sweate)
The custom host-to-host trantabs are written to SASUSER.PROFILE, and, if you have write access, copied to SASHELP.HOST, which overwrites the default TRANTAB entries there. You can reset them with the RESET REMOTE ENGINE TABLES button.
Conclusion The SAS System helps to compensate for numerous incompatible character encoding standards by providing internal translation tables that convert from one character encoding standard to another. Key maps and device maps compensate for the differences in the character encodings of SAS System graphics, on the one hand, and the character encodings of host systems and output devices on the other. The NLSsetup application provides an easy point-and-click interface that allows you to set up theses features transparently.
REFERENCES
Kiefer, M. and Kohl, J.R. (1995), “SAS System Support for International Character Sets,” Observations: The Technical Journal for SAS Software Users, 4(3), 18-33.
SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Manfred Kiefer
SAS Institute
DtaPaper Title
NLS SETUP and TRABASE
Easy ways to customize character
conversions
S
Overview
Problems for SAS users
Host-to-Host Trantabs: Transporting
via the REMOTE Engine
Transport-Format Trantabs:
Transporting Data via PROCs
UPLOAD/DOWNLOAD,
CPORT/CIMPORT
The NLS SETUP Application
S
SOME Terminology
(coded) character set = encoding:
unambiguous mapping of the items of
a character set (letters, digits, etc.) to
numeric code values
ASCII: American Standard Code for
Information Interchange, a 7-bit code
that includes the (upper- and
lowercase) letters A-Z, digits,
punctuation and control characters
EBCDIC: Extended Binary Coded
Decimal Interchange Code, a family of
8-bit codes
S
SOME Terminology
national character: a character specific
to a particular nation or group of
nations (ä, è, ð, ñ, ø), or any letter other
than upper- and lowercase A-Z
character conversion = mapping:
changing the representation of data by
using one coded character set in place
of another
trantab = translation table: a SAS
catalog entry that translates from one
character set to another
S
Easy character conversion?
S
Character Encoding standards
ANSI X3.4-1977: 7-bit ASCII
ISO 646-1983: 7-bit ASCII (IRV)
ANSI X3.4-1986: 7-bit ASCII
ISO 8859-1:1987: 8-bit IBM EBCDIC CECP 037
IBM CP 437
DEC Multinational Character Set
HP Roman 8
...
“In hindsight, what we have done is
invented a computer
communications Tower of Babel.”
Edwin Hart
S
Easy character conversion?
“The mishmash of character encoding
standards makes it hard for users to
share data and for programmers to
create worldwide software. Trying to
pass data from different encodings
across networks or between operating
systems involves a gantlet of
mappings, conversions, fonts and
general headaches.”
Nadine Kano, Asmus Freytag
S
Problems for SAS Users
character conversion (mapping) when
moving data across platforms
default conversion when transferring
data with “standard” (A-Z) characters
be careful when dealing with national
characters
different conversion mechanisms, and
different character encoding standards
on different platforms.
S
SAS enables users to:
transparently access data on one
host from another host, or
transfer data and applications
from one host to another
without having to worry about
character conversions from one
coded character set to another.
S
Transporting Data via the REMOTE Engine
Direct character translation
the same trantabs are used
regardless of the connectivity
mechanism
source platform > target platform
S
SAS Host-to-Host Trantabs
EBCDIC/ASCII-ISO: _0000030
ASCII-ISO/ASCII-ANSI: _0000050
EBCDIC/ASCII-ANSI: _0000060
ASCII-ISO/ASCII-OEM: _0000090
EBCDIC/ASCII-OEM: _00000A0
ASCII-ANSI/ASCII-OEM: _00000C0
ASCII-MAC/ASCII-ISO: _0000110
ASCII-MAC/EBCDIC: _0000120
ASCII-MAC/ASCII-ANSI: _0000140
ASCII-MAC/ASCII-OEM: _0000180
S
SAS Host-to-Host Trantabs
Each host-to-host trantab
consists of two halves, or
“entries”
ordered
entry 0 (for importing)
entry 1 (for exporting)
S
_0000030 Import Entry: EBCDIC/ASCII
Table name is _0000030.
0 1 2 3 4 5 6 7 8 9 A B C D E F
00 '000102039C09867F978D8E0B0C0D0E0F'x
10 '101112139D8508871819928F1C1D1E1F'x 20 '80818283840A171B88898A8B8C050607'x
30 '909116939495960498999A9B14159E1A'x
40 '20A0A1A2A3A4A5A6A7A8D52E3C282B7C'x
50 '26A9AAABACADAEAFB0B121242A293B5E'x
60 '2D2FB2B3B4B5B6B7B8B9E52C255F3E3F'x 70 'BABBBCBDBEBFC0C1C2603A2340273D22'x
80 'C3616263646566676869C4C5C6C7C8C9'x
90 'CA6A6B6C6D6E6F707172CBCCCDCECFD0'x
A0 'D17E737475767778797AD2D3D45BD6D7'x
B0 'D8D9DADBDCDDDEDFE0E1E2E3E45DE6E7'x
C0 '7B414243444546474849E8E9EAEBECED'x
D0 '7D4A4B4C4D4E4F505152EEEFF0F1F2F3'x
E0 '5C9F535455565758595AF4F5F6F7F8F9'x
F0 '30313233343536373839FAFBFCFDFEFF'x
S
_0000030 Import Entry: EBCDIC/ASCII
S
Which symbol?
S
The answer depends on ...
the encoding.
In any case ...
EBCDIC trantabs are accurate for the
U.S. EBCDIC code page, and for the
first 128 ASCII code positions
the upper 128 code positions vary from
one ASCII extension to another
international users need to customize
these trantabs.
S
Character conversion without customization
German text stored in EBCDIC
encoding (CECP 237):
“Blüht die Rose noch so schön,
läßt sie doch die Dornen sehn.”
Displayed under UNIX (ISO 8859-
1):
“Bl}ht die Rose noch so sch|n, l{~t
sie doch die Dornen sehn.”
S
Customize a Host-to-Host Character Conversion
Customize trantabs on local host, e.g..
proc trantab table = _0000030;
rep ‘C0’x ‘E4’x; /* a diaeresis */
...
swap;
rep ‘E4’x ‘C0’x; /* a diaeresis */
...
swap;
save;
quit;
S
Customize Host-to-Host Character Conversion
The custom table is written to
your SASUSER.PROFILE catalog.
If it needs to generally accessible,
copy it to the SASHELP.HOST
catalog.
By default, the SAS System tries
first to locate translation tables in
SASUSER.PROFILE, and then in
SASHELP.HOST.
S
Customize Host-to-Host Character Conversion
Sign on to the remote host; e.g..
options comamid=tcp
remote=your_serverid;
filename rlink
‘your_communication_script’;
signon;
S
Customize Host-to-Host Character Conversion
Assign a libname, e.g..
libname test ‘myid.nls.data’;
Use PROC FSEDIT to update the
data on the remote host, and keep
national characters correct.
S
Character conversion with customization
German text stored in EBCDIC
encoding (CECP 237):
“Blüht die Rose noch so schön,
läßt sie doch die Dornen sehn.”
Displayed under UNIX (ISO 8859-
1):
“Blüht die Rose noch so schön,
läßt sie doch die Dornen sehn.”
S
Transporting Data via the PROCs UPLOAD/DOWNLOAD, CPORT/CIMPORT
intermediate transport format
default trantabs can be overridden
via the TRANTAB= system option
source > transport format > target
S
SAS Transport-Format Trantabs
SASXPT: controls local-to-
transport-format translation
SASLCL: controls transport-to-
local-format translation
S
Transporting Data via the PROCs UPLOAD/DOWNLOAD, CPORT/CIMPORT
On EBCDIC hosts, SASXPT is the same
as the second half (export entry) of the
host-to-host trantabs
... SASLCL is the same as the first half
(import entry) of the host-to-host
trantabs
On hosts that use an ASCII-based
standard SASXPT and SASLCL are
identity tables
S
SASXPT: EBCDIC to ASCII
0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '000102039C09867F978D8E0B0C0D0E0F'x 10 '101112139D8508871819928F1C1D1E1F'x 20 '80818283840A171B88898A8B8C050607'x 30 '909116939495960498999A9B14159E1A'x
40 '20A0A1A2A3A4A5A6A7A8D52E3C282B7C'x 50 '26A9AAABACADAEAFB0B121242A293B5E'x 60 '2D2FB2B3B4B5B6B7B8B9E52C255F3E3F'x 70 'BABBBCBDBEBFC0C1C2603A2340273D22'x 80 'C3616263646566676869C4C5C6C7C8C9'x 90 'CA6A6B6C6D6E6F707172CBCCCDCECFD0'x A0 'D17E737475767778797AD2D3D45BD6D7'x
B0 'D8D9DADBDCDDDEDFE0E1E2E3E45DE6E7'x C0 '7B414243444546474849E8E9EAEBECED'x D0 '7D4A4B4C4D4E4F505152EEEFF0F1F2F3'x E0 '5C9F535455565758595AF4F5F6F7F8F9'x
F0 '30313233343536373839FAFBFCFDFEFF'x
S
Customize Transport-Format Trantabs
with the TRANTAB procedure
proc
trantab table=SASXPT NLS; rep
‘C0’x ‘E4’x; /* a diaeresis */ ...
save
table = ... ;
with the TRABASE program
with the NLS Setup Application
S
Customized Trantabs with TRABASE
part of the SAS sample library
builds trantabs for a number of
countries and operating systems
can easily be adapted to specific
needs
creates a macro with the single
parameter COUNTRY
names of the trantabs follow a
naming convention
S
TRABASE naming convention
EBCDIC/ASCII-OEM: ETA/ATE
ASCII-ISO/ASCII-OEM: ITA/ATI
EBCDIC/ASCII-ISO: ETI/ITE
ASCII-ISO/ASCII-MAC: ITM/MTI
ASCII-ISO/ASCII-ANSI: ITW/WTI
ASCII-OEM/ASCII-MAC: ATM/MTA
EBCDIC/ASCII-MAC: ETM/MTE
S
TRABASE naming convention
DAN: Denmark/Norway
FRE: France
GER: Germany/Austria
HUN: Hungary
ITA: Italy
POL: Poland
SPA: Spain
SWE: Sweden/Finland
SWI: Switzerland (Belgium)
S
TRABASE naming convention
Trabase copies and modifies default
trantabs
gereta and gerate are typical names
where
ger is an abbreviation for “German”
eta stands for “EBCDIC to ASCII”
ate represents “ASCII to EBCDIC”
you are free to rename the trantabs
but use a “telling name”
S
Using customized Trantabs
use trabase to create customized
transport-format trantabs
these are used instead of the default
(sasxpt, saslcl) via the TRANTAB=
system option, e.g..
options trantab=(gereta,gerate);
gereta handles correct host-to-
transport format translation
gerate takes care of transport-to-host
format translation
S
The NLS SETUP Application
creates all necessary trantabs for
users (both host-to-host and
transport-format)
also generates devmaps and
keymaps
is shipped with Orlando in the
BASE sample source library
is fully customizable
S
The NLS SETUP Application
on Windows or OS/2:
assign a libname
libn ‘!SASROOT\core\sample’;
issue: af
c=libn.nlssetup.nlssetup.frame
on UNIX:
assign a libname
libn ‘!SASROOT\samples\base’;
issue: af
c=libn.nlssetup.nlssetup.frame
S
The NLS SETUP Application
easy to use:
just select a country from the
listbox
provides on line help
will be further enhanced
S
The NLS SETUP Application: Future
6.12: enhanced version with more
countries, revised help
6.14: production version
S
The NLS SETUP Application: Demo
S
NLS SETUP and TRABASE: Questions?
The SAS® System for successful decision making
DtaPaper Title
Thank you for
your attention