siebel crm unicode conversion 2 – the dba perspective brian hitchcock ocp 8, 8i, 9i dba sun...

73
Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems [email protected] [email protected] DCSIT Technical Services DBA Brian Hitchcock November 11, 2004 Page 1 www.brianhitchcoc k.net

Upload: kelly-reed

Post on 13-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

Siebel CRM Unicode Conversion 2 – The DBA

PerspectiveBrian Hitchcock

OCP 8, 8i, 9i DBA

Sun Microsystems

[email protected]

[email protected] Technical Services DBA

Brian Hitchcock November 11, 2004 Page 1

www.brianhitchcock.net

Page 2: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 2

www.brianhitchcock.net

CRM Unicode Conversion

Three separate presentations– 1) The overall conversion process

What we had, what we wanted, how to get there Issues that come up during conversion

– 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem

– 3) The actual conversion What really happened Issues that came up and how they were resolved

Focus on DBA issues, not Siebel application

Page 3: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 3

www.brianhitchcock.net

How Did I Get Involved?

Sleeping in a meeting… Heard someone say

– “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped”

Woke up, said– “I’ve done that before…”– See “Case of the Missing Kanji”

Don’t wake up in meetings…

Page 4: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 4

www.brianhitchcock.net

What’s The Issue?

Existing Siebel CRM system– Oracle 8.1.7.4– Single-byte character set (WE8ISO8859P1)

Interface systems– Multi-byte character set(s) (UTF8)– Handle data between single,multi-byte apps

Want to convert to Unicode– Siebel, database, interfaces all should be UTF8– Eliminate interface systems

Page 5: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 5

www.brianhitchcock.net

What We Had

Siebel CRM

Oracle Db

Custdb Apac

Users

Tcustdb Apac

Custdb Emea

Custdb Amer

Tcustdb Emea

Amer

Emea

Apac UTF8

WE8ISO8859P1

UTF8

UTF8

UTF8

WE8ISO8859P1

8859P1

8859P1

Ordering System

Page 6: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 6

www.brianhitchcock.net

What We Wanted

Siebel CRM

Oracle Db

Custdb Apac

Users

Custdb Emea

Custdb Amer

Amer

Emea

Apac

WE8ISO8859P1

UTF8

UTF8

AL32UTF8

UTF8

UTF8

Ordering System

Page 7: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 7

www.brianhitchcock.net

What We Wanted

All data in one database– All languages– Unicode

Eliminate interface systems– Reduce support costs

Support increased CRM functionality– All data in one place– Supports new business functionality

Page 8: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 8

www.brianhitchcock.net

Multi-byte Data In Source Db?

Source db is WE8ISO8859P1– Single-byte character set– Doesn’t support multi-byte characters

That’s the official story The reality is somewhat different

What, if any multi-byte data is in source db?– How to determine correct character set?– How to find, how to fix?– Japanese, Chinese, others?

Page 9: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 9

www.brianhitchcock.net

But Wait, There’s More…

Not just multi-byte data to look for Non-p1 character data also

– Non multi-byte character data– Could be WE P1 (western European)

German, Italian, French etc.

– Could be WE Pn Polish, Greek, Russian etc.

How to find?

Page 10: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 10

www.brianhitchcock.net

How Polish Was Handled

Use separate app that sends polish (P2) to CRM database

Stored in P1 db Triggers move this polish data to TWCD Triggers in TWCD

– Know that it’s polish (P2)– Convert to UTF8 and send to WCD db

Therefore, multiple languages in Siebel P1 db

Page 11: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 11

www.brianhitchcock.net

What’s the Problem?

Character data from multiple languages– Stored in oracle db– Db configured for P1

P1 supports multiple WE languages Does not support polish, Russian, etc.

Need to find all such character data Non-p1 can be

– Single-byte (polish, Russian, etc.)– Multi-byte (Japanese, Chinese, etc.)

Page 12: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 12

www.brianhitchcock.net

Single-byte Character Sets

All Pn (8859-1, 8859-2, etc.) character sets– Share same range of byte codes, 0 to 255– Above 0xA1 (decimal 161)

Same byte codes represent different characters

Example– WE8ISO8859P1 (8859-1)

Byte code 0xA3 (decimal 163) is character £

– EE8ISO8859P2 (8859-2) Same byte code, 0xA3 is character Ł

Page 13: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 13

www.brianhitchcock.net

Finding Non-p1 Char Data?

Logically– Examine db design, Siebel docs, figure out which

tables designed to store language specific (local language) data

– Some column (country code) in these tables to tell you which country data is from

– Determine correct character set for data from each country

– Convert these tables manually to AL32UTF8 as part of overall Unicode conversion process

Page 14: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 14

www.brianhitchcock.net

Not Good

Want general method– No need to analyze the meaning of existing data– Need automated way to find all non-P1 char data

Can’t do it– No general way to determine if char data is P1 or

P2 or Pn As shown before, byte code 0xa3 (decimal 163)

Character £ in P1

Character Ł in P2

Page 15: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 15

www.brianhitchcock.net

Good

But, can find non-ASCII data in general– And then find multi-byte character data

Use separate approach to find non-P1 Use PL/SQL code

– Examine every table– Examine every column that holds character data– Determine which rows if any are ASCII– Rows that aren’t ASCII are ‘suspect’– Identify tables that have any non-ASCII character data

Page 16: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 16

www.brianhitchcock.net

Why Look For ASCII?

Character data that is ASCII– Only 7 bits used to encode character– 8th bit of every byte is 0– For non-ASCII, 8th byte is set

WE8ISO8859Pn Multi-byte, Japanese, Chinese, etc.

By eliminating all tables that are ASCII– No need to ask are they P1, P2, Pn or multi-byte– Greatly reduces the task

Page 17: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 17

www.brianhitchcock.net

How To Find Non-ASCII?

Use SQL function convert– Convert a given column to ASCII character set– Compare resulting string with original– If original string is all ASCII

Will match converted string

– If not a match Column value is non-ASCII

Could be WE8ISO8859Pn

Could be multi-byte

Page 18: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 18

www.brianhitchcock.net

Example Finding Non-ASCIIin WE8ISO8859P1 database

create table Psycho_Acircle (text VARCHAR2(100));

insert into Psycho_Acircle values (chr(197)||'BCDE');insert into Psycho_Acircle values ('ABCDE');

select * from Psycho_Acircle;

TEXT-----ÅBCDEABCDE

select convert(text,'US7ASCII','WE8ISO8859P1') from Psycho_Acircle;

CONVERT(TEXT,'US7ASCII','WE8ISO8859P1')---------------------------------------?BCDEABCDE

ÅBCDE is not the same as ?BCDE

Page 19: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 19

www.brianhitchcock.net

Not Included

Did not scan– LONG datatype columns– CLOB datatype columns

Didn’t have any in schema

– PL/SQL code in database

Dev team determined this wasn’t needed

Page 20: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 20

www.brianhitchcock.net

Scripts Strategy

Eliminate as much as possible– Identify all ASCII only tables– Left with set of non-ASCII tables

For remaining tables– Find likely Japanese character data– Verify it is Japanese– Copy to separate table– Remove from non-ASCII tables

Repeat for other languages– How to identify byte patterns for each language?

Page 21: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 21

www.brianhitchcock.net

PL/SQL scripts

Scripts used– Scan_Table_1_Gen_Column_Info.sql– Scan_Table_2_Gen_Nonascii_rows_Info.sql– Scan_Table_3_Gen_NonasciiTables_NoLong.sql– Scan_Table_4_Gen_NonasciiTables_NonasciiCols_Only.sql– Scan_Table_5_Gen_NonasciiTables_YesLong.sql– Scan_Table_6_Gen_NA_EUCJP_info_sql_col_info.sql– Scan_Table_7_Gen_NA_EUCJP_Tables.sql– Scan_Table_8_Gen_NA_EUCJP_2_rows_info.sql

Page 22: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 22

www.brianhitchcock.net

Scripts

Each script generates table(s)– Output of each script stored in table(s)

Next script uses tables Lots of intermediate data stored

– Helped develop scripts– Each script simpler– Provided extra output for developers, analysts to

help them verify results Is this data really Japanese?

Page 23: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 23

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_1_Gen_Column_Info.sql– Scans all tables in a schema– Creates two tables

Table_Gen_Info Info on all tables

Table_Column_Info Info on character columns

Which contain any non-ASCII strings Doesn’t include LONG columns

Can’t use SQL functions on LONG datatype

Page 24: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 24

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_2_Gen_Nonascii_rows_Info.sql– Use table Table_Column_Info– Examine tables with non-ASCII character data– Creates two tables

Table_NonAscii_info Number of rows, columns with non-ASCII data

Table_NonAscii_SQL SQL to extract non-ASCII data from each table Useful for developers, analysts to extract data from

other environments

Page 25: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 25

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_3_Gen_NonasciiTables_NoLong.sql

– Use tables table_gen_info, table_nonascii_sql– Create copies of tables that have non-ASCII data– Copies contain only the non-ASCII rows

Have all character columns of original table Helps identify which country data is from

– Creates tables as select * from <tablename> Doesn’t work on tables with LONG column Tables named NONASCII_<tablename>

Page 26: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 26

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_4_Gen_NonasciiTables_NonasciiCols_Only.sql

– Similar to third (previous) script– Table copies only contain columns that have non-

ASCII data– Does handle tables with LONG column– Creates tables of form NA_CO_<tablename>

Set of tables containing all non-ASCII data in the schema

Page 27: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 27

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_5_Gen_NonasciiTables_YesLong.sql

– Creates copies of tables having non-ASCII data– Copy tables have all char columns of base table– Only copies tables that have LONG column– Companion to third script

Deals with tables that have LONG column Tables named NONASCII_<tablename>

– Now have complete set of tables Have all non-ASCII char columns of base tables

Page 28: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 28

www.brianhitchcock.net

Katakana, Hiragana?

How to find Japanese character data?– Look at hex dump of character data and see lots

of ¥_¥ and ¤_¤– The byte code of ¥ is A4, ¤ is A5– Many Japanese transliterated terms (company

names) start with these bytes– Typical of EUCJP character set– Find rows that contain '%¥_¥%' or '%¤_¤%‘– repeated ¥ or ¤ means EUCJP more likely– Verify that these rows are indeed Japanese

Page 29: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 29

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_6_Gen_NA_EUCJP_info_sql_col_info.sql

– For table copies with non-ASCII columns only– Look for specific pattern of '%¥_¥%'– Or '%¤_¤%‘– Creates tables

Table_NA_EUCJP_Info Table_NA_EUCJP_SQL Table_NA_EUCJP_COL_INFO

Page 30: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 30

www.brianhitchcock.net

6th Script

What does each table contain?– Table_NA_EUCJP_Info

Number of EUCJP rows in each non-ASCII table

– Table_NA_EUCJP_SQL SQL to extract EUCJP rows

– Table_NA_EUCJP_COL_INFO Number of EUCJP rows in each column

Page 31: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 31

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_7_Gen_NA_EUCJP_Tables.sql– Create two copies of each table that has EUCJP

Contain rows that have EUCJP First table, all char columns Second, only EUCJP columns

– Tables created have names EUCJP_<tablename> ECUJP_CO_<tablename>

Page 32: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 32

www.brianhitchcock.net

After 7th Script

We have identified EUCJP rows– In non-ASCII tables– Copied these rows to separate tables

Delete these rows from the non-ASCII tables As we identify rows from a specific char set

– Remove them from the non-ASCII tables– Smaller and smaller set of unknown rows

Page 33: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 33

www.brianhitchcock.net

What Does Each Script Do?

Scan_Table_8_Gen_NA_EUCJP_2_rows_info.sql

– Find rows containing ¥ or ¤– Could be Japanese– Could be WE

Page 34: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 34

www.brianhitchcock.net

Results

For each script– Time to run– Output– %of total db that is non-ASCII– Demonstrates power of this approach– No attempt to speed up

Only need to scan once, no need for speed– Copy prod data to separate environment– Run scripts there, develop the SQL to correctly convert the

non-ASCII data as needed Apply to prod as part of Unicode conversion

Page 35: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 35

www.brianhitchcock.net

Results

Scripts run against copy of production db Database

– 25Gb total, but 13Gb free space– 12Gb of actual data to scan– (be skeptical when people tell you they support

multi-terabyte dbs, size of actual data counts)

Scripts create tables in the same schema they run in

Page 36: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 36

www.brianhitchcock.net

Results

Script 1 – 2hours– Scanned 12Gb of data– 2483 tables, 63138 columns– Created two tables

Table_gen_info Table_column_info

Page 37: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 37

www.brianhitchcock.net

1st Script Results

SQL> select * from Table_Gen_Info where rownum <=10;

TABLENAME NUMROWS NUMCOLS NUMCHARCOLS NUMCLOBCOLS NUMLONGCOLS

------------------------------ ---------- ---------- ----------- ----------- -----------

ACCNT_STAT 15775 5 3 0 0

AMER_AR_OWNER 1085497 7 6 0 0

AMER_AR_T 1060 3 2 0 0

APAC_AR_OWNER 2770 6 6 0 0

AR_ADMIN 5578 35 31 0 0

AR_CON 3573 22 17 0 0

AR_STAT 88652 7 5 0 0

AUDIT_TABLE 53301 29 26 0 0

CONT_CREATED 515126 2 2 0 0

CON_CREATED 184744 2 2 0 0

Page 38: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 38

www.brianhitchcock.net

1st Script ResultsSQL> select * from Table_Column_Info where rownum <=20;

TABLENAME NUMROWS NUMCHARCOLS CHARCOLNUM CHARCOLNAME NUMNONASCIIROWS

------------------------------ ---------- ----------- ---------- ------------ ---------------

ACCNT_STAT 15775 3 1 WCD 0

ACCNT_STAT 15775 3 2 STATUS 0

ACCNT_STAT 15775 3 3 R4_STATUS 0

...

...

...

AR_ADMIN 5578 31 1 R4_ID 0

AR_ADMIN 5578 31 2 R4_SR_NUM 0

AR_ADMIN 5578 31 3 X_DESC 72

20 rows selected.

SQL>

Page 39: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 39

www.brianhitchcock.net

2nd Script Results

12 minutes– 68 tables that have non-ASCII char data– 68 SQL statements

Overall– We have 12Gb of data– 68/2483 tables have any non-ASCII char data– Only 3% of the tables

But they’re some of the biggest tables Schema analysis much easier on 68 tables

Page 40: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 40

www.brianhitchcock.net

2nd Script resultsSQL> select * from Table_NonAscii_Info where rownum <= 10;

TABLENAME NUMROWS NUMNONASCIIROWS NUMCOLS NUMNONASCIICOLS

------------------------------ ---------- --------------- ---------- ---------------

AR_ADMIN 5578 692 35 6

AR_CON 3573 107 22 3

AUDIT_TABLE 53301 17 29 1

CX_S_ADDR_ORG_XM 69470 275 19 5

C_ACCOUNT 17897 1114 20 1

C_ACT 6562 933 21 6

C_ADDRESS 25590 5490 28 6

C_AR 88638 3760 26 6

C_CONTACT 52574 10401 20 3

C_OPTY 2139 119 25 4

Page 41: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 41

www.brianhitchcock.net

2nd Script ResultsSQL> select * from Table_NonAscii_SQL where rownum <= 10;

TABLENAME LENGTHNONASCIISQL

------------------------------ -----------------

NONASCIISQL

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

AR_ADMIN 445

select count(*) from AR_ADMIN where 1=0 or X_DESC != CONVERT (X_DESC, 'US7ASCII', 'WE8ISO8859P1') or LAST_NAME != CONVERT (LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') or FST_NAME != CONVERT (FST_NAME, 'US7

ASCII', 'WE8ISO8859P1') or ACCOUNT != CONVERT (ACCOUNT, 'US7ASCII', 'WE8ISO8859P1') or OWNER_LAST_NAME != CONVERT (OWNER_LAST_NAME, 'US7ASCII', 'WE8ISO8859P1') or R3_CREATED_LAST_NAME != CONVERT (R3_C

REATED_LAST_NAME, 'US7ASCII', 'WE8ISO8859P1')

AR_CON 233

select count(*) from AR_CON where 1=0 or OWNER_LAST != CONVERT (OWNER_LAST, 'US7ASCII', 'WE8ISO8859P1') or OWNER_FST != CONVERT (OWNER_FST, 'US7ASCII', 'WE8ISO8859P1') or R3_X_NOTES != CONVERT (R3_X_N

OTES, 'US7ASCII', 'WE8ISO8859P1')

AUDIT_TABLE 100

select count(*) from AUDIT_TABLE where 1=0 or FIELD2 != CONVERT (FIELD2, 'US7ASCII', 'WE8ISO8859P1')

Page 42: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 42

www.brianhitchcock.net

3rd Script Results 10 minutes

– Create copies of non-ASCII tables– Copies contain all character columns

LONG columns not included– Creates 65 tables

SQL> select table_name from user_tables where table_name like 'NONASCII%'

and table_name not like '%_ORIG‘ and rownum <= 5;

TABLE_NAME

------------------------------

NONASCII_AR_ADMIN

NONASCII_AR_CON

NONASCII_AUDIT_TABLE

NONASCII_CX_S_ADDR_ORG_XM

NONASCII_C_ACCOUNT

Page 43: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 43

www.brianhitchcock.net

4th Script Results

7 minutes– Create copies of non-ASCII tables– Copies contain only non-ASCII columns– Creates 68 tables

SQL> select table_name from user_tables where table_name like 'NA_CO_%‘ and rownum <= 5;

TABLE_NAME

------------------------------

NA_CO_AR_ADMIN

NA_CO_AR_CON

NA_CO_AUDIT_TABLE

NA_CO_CX_S_ADDR_ORG_XM

NA_CO_C_ACCOUNT

Page 44: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 44

www.brianhitchcock.net

5th Script Results

1 minute– Create copies of non-ASCII tables– Copies contain all character columns

LONG column included

– Creates 3 tables only 3 non-ASCII tables have LONG column

TABLE_NAME

------------------------------

NONASCII_EIM_ACCNT_DTL

NONASCII_EIM_OPTY_DTL

NONASCII_S_CS_QUEST_LANG

Page 45: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 45

www.brianhitchcock.net

6th Script Results

27 minutes– Scan non-ASCII tables– Find '%¥_¥%' or '%¤_¤%‘– Very likely EUCJP character set– Create three tables

Table_NA_EUCJP_Info (68 tables) Table_NA_EUCJP_SQL (5 tables) TABLE_NA_EUCJP_COL_INFO (213 columns)

– 5 tables have EUCJP character data

Page 46: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 46

www.brianhitchcock.net

6th Script ResultsSQL> select * from Table_NA_EUCJP_Info where rownum <= 10;

TABLENAME NUM_NONASCII_ROWS NUM_NA_EUCJP_ROWS NUM_NONASCII_COLS NUM_NA_EUCJP_COLS

------------------------------ ----------------- ----------------- ----------------- -----------------

NA_CO_AR_ADMIN 5578 9 6 1

NA_CO_AR_CON 3573 4 3 1

NA_CO_AUDIT_TABLE 53301 0 1 0

NA_CO_CX_S_ADDR_ORG_XM 69470 0 5 0

NA_CO_C_ACCOUNT 17897 0 1 0

NA_CO_C_ACT 6562 0 6 0

NA_CO_C_ADDRESS 25590 0 6 0

NA_CO_C_AR 88638 0 6 0

NA_CO_C_CONTACT 52574 0 3 0

NA_CO_C_OPTY 2139 0 4 0

Page 47: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 47

www.brianhitchcock.net

6th Script ResultsSQL> select * from Table_NA_EUCJP_SQL;

TABLENAME LEN_NA_EUCJP_SQL

---------------- ----------------

NA_EUCJP_SQL

--------------------------------------------------------------------------------------------------------

NA_CO_AR_ADMIN 91

select count(*) from NA_CO_AR_ADMIN where 1=0 or X_DESC like '%¥_¥%' or X_DESC like '%¤_¤%'

NA_CO_AR_CON 97

select count(*) from NA_CO_AR_CON where 1=0 or R3_X_NOTES like '%¥_¥%' or R3_X_NOTES like '%¤_¤%'

NA_CO_S_ADDR_ORG 97

select count(*) from NA_CO_S_ADDR_ORG where 1=0 or COMMENTS like '%¥_¥%' or COMMENTS like '%¤_¤%'

NA_CO_S_CONTACT 142

select count(*) from NA_CO_S_CONTACT where 1=0 or COMMENTS like '%¥_¥%' or COMMENTS like '%¤_¤%'

or X_DEPT like '%¥_¥%' or X_DEPT like '%¤_¤%'

NA_CO_S_SRV_REQ 200

select count(*) from NA_CO_S_SRV_REQ where 1=0 or X_NOTES like '%¥_¥%' or X_NOTES like '%¤_¤%'

or X_DESC like '%¥_¥%' or X_DESC like '%¤_¤%' or X_EMAIL_NOTES like '%¥_¥%' or X_EMAIL_NOTES like '%¤_¤%'

Page 48: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 48

www.brianhitchcock.net

6th Script Results

SQL> select * from TABLE_NA_EUCJP_COL_INFO where rownum <=10;

TABLENAME NUMNONASCIIROWS NUMNACOLS NACOLNUM NAEUCJPCOLNAME NUMNAEUCJPROWS

------------------- --------------- ---------- ---------- ------------------------------ --------------

NA_CO_AR_ADMIN 5578 6 1 X_DESC 9

NA_CO_AR_ADMIN 5578 6 2 LAST_NAME 0

NA_CO_AR_ADMIN 5578 6 3 FST_NAME 0

NA_CO_AR_ADMIN 5578 6 4 ACCOUNT 0

NA_CO_AR_ADMIN 5578 6 5 OWNER_LAST_NAME 0

NA_CO_AR_ADMIN 5578 6 6 R3_CREATED_LAST_NAME 0

NA_CO_AR_CON 3573 3 1 OWNER_LAST 0

NA_CO_AR_CON 3573 3 2 OWNER_FST 0

NA_CO_AR_CON 3573 3 3 R3_X_NOTES 4

NA_CO_AUDIT_TABLE 53301 1 1 FIELD2 0

Page 49: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 49

www.brianhitchcock.net

7th Script Results

6 minutes– Create two copies of each EUCJP tables– First copy has all character columns of table– Second copy has only the EUCJP columns– Tables named

EUCJP_<tablename> EUCJP_CO_<tablename>

Page 50: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 50

www.brianhitchcock.net

7th Script ResultsSQL> select table_name from user_tables where table_name like 'EUCJP_%'

minus select 2 table_name from user_tables where table_name like 'EUCJP_CO_%';

TABLE_NAME

------------------------------

EUCJP_AR_ADMIN

EUCJP_AR_CON

EUCJP_S_ADDR_ORG

EUCJP_S_CONTACT

EUCJP_S_SRV_REQ

SQL> select table_name from user_tables where table_name like 'EUCJP_CO_%';

TABLE_NAME

------------------------------

EUCJP_CO_AR_ADMIN

EUCJP_CO_AR_CON

EUCJP_CO_S_ADDR_ORG

EUCJP_CO_S_CONTACT

EUCJP_CO_S_SRV_REQ

Page 51: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 51

www.brianhitchcock.net

7th Script Results

EUCJP rows selected Reviewed by dev team

– EUCJP of all rows verified

Make copies of these tables for reference Delete the EUCJP rows from the non-ASCII

tables Further scanning of the non-ASCII tables

won’t consider the EUCJP rows

Page 52: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 52

www.brianhitchcock.net

8th Script Results

47 minutes– Scan non-ASCII tables (again)– Find '%¥%' or '%¤%‘– Could be EUCJP character set

Could also be WE character data– Create three tables

Table_NA_EUCJP_2_Info Table_NA_EUCJP_2_SQL TABLE_NA_EUCJP_2_COL_INFO

– 3 tables have EUCJP character data

Page 53: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 53

www.brianhitchcock.net

8th Script Results

Possible EUCJP rows selected Reviewed by dev team

– EUCJP of all rows verified

Make copies of these tables for reference Delete these EUCJP rows from the non-ASCII

tables Further scanning of the non-ASCII tables

won’t consider these EUCJP rows

Page 54: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 54

www.brianhitchcock.net

Next Steps

What I had planned With the EUCJP rows verified and removed Scan non-ASCII tables (yet again) Look for 8859Pn character data

– How?– WE languages, single isolated 8-bit byte code

with ASCII (7-bit) byte codes on either side– Example: Bücher

Page 55: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 55

www.brianhitchcock.net

Next Steps

Select likely WE rows from non-ASCII tables– Review with dev team– Determine source country for each row

Schema has ‘country code’ Select each row using character set of country

– Verify rows with fluent speaker for each country– Remove rows from non-ASCII tables as verified

What to do with remaining rows– Not sure…

Page 56: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 56

www.brianhitchcock.net

What Really Happened?

After 8 scripts Dev team was able to

– Identify likely country for each non-ASCII row– I identified likely character set for each country– I selected rows for each country

Using identified character set

– Fluent speaker from each country verified Rows as selected were correct

– Wrote SQL to correctly convert rows to Unicode

Page 57: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 57

www.brianhitchcock.net

Conversion How to convert non-ASCII rows to Unicode?

– New db uses AL32UTF8 character set

With correct character set identified After importing into new 9i database

– Convert back to WE8MSWIN1252– Convert to AL32UTF8– Example:

UPDATE <tablename> SET <column> =

CONVERT (<column>, WE8MSWIN1252, AL32UTF8);

UPDATE <tablename> SET <column> =

CONVERT (<column>, AL32UTF8, <charset>);

Page 58: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 58

www.brianhitchcock.net

Script Summary

8 scripts, scanning 12 Gb of data– Run times

2 hours 12 minutes 10 minutes 7 minutes 1 minute 27 minutes 6 minutes 47 minutes

Total run time – 230 minutes, about 4 hours– Very slow development machine

Page 59: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 59

www.brianhitchcock.net

Conclusions

For character set conversion– From any 8-bit character set (WE8ISO8859Pn)– To Unicode– Accept that some of the existing data may not be in the

database character set– Don’t assume, verify

Use PL/SQL scripts,identify non-ASCII character data Decide how to evaluate the non-ASCII data

Document, test, communicate– Make sure everyone knows how data from each character

set is identified

Page 60: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 60

www.brianhitchcock.net

Books Used

Oracle PL/SQL By Example– Rozenzweig, Silvestrova Prentice Hall 2004– I needed lots of examples

multiple nested cursors

– Needed to get going fast

Got help from experienced PL/SQL developer– Quotes issue– Even they couldn’t explain why the specific

number of quotes works…but it did

Page 61: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 61

www.brianhitchcock.net

CRM Unicode Conversion

Three separate presentations– 1) The overall conversion process

What we had, what we wanted, how to get there Issues that come up during conversion

– 2) Multi-byte data in the existing CRM db What’s the issue, how did it happen A general method to find and fix this problem

– 3) The actual conversion What really happened Issues that came up and how they were resolved

Focus on DBA issues, not Siebel application

Page 62: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 62

www.brianhitchcock.net

PL/SQL Notes

Quotes of quotes– Hard to know how many you need– Experiment– Test

PL/SQL that generates SQL that contains quoted strings

Keep it simple Break up the task into multiple scripts Generate tables of results, next script uses table(s) as

input– Tables provide documentation of intermediate results

Page 63: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 63

www.brianhitchcock.net

PL/SQL Notes

Second script– Looping to build up select SQL– Selects data from all non-ASCII columns

Initial select SQL has to be– NonAsciiSQL_stmt := 'select count(*) from '||

TableName||' where 1=0– Subsequent SQL of form NonAsciiSQL_stmt :=

NonAsciiSQL_stmt||' or '||TableCharColName||– Needed ‘where 1=0 so we could append further

OR clauses

Page 64: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 64

www.brianhitchcock.net

PL/SQL Notes

LONG datatype– Third script created tables as select * from

Can’t do this when table has LONG column

– Fourth script create tables by building up the create table SQL one column at a time Skip the LONG column, if present in base table

Page 65: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 65

www.brianhitchcock.net

PL/SQL Notes

DBMS_OUTPUT limitations– Only works for so long– Has limit of 1M characters

Scripts are not commercial grade– Testing statements are left in

Commented out

– No error trapping– Still development scripts– They work, but they aren’t pretty

Page 66: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 66

www.brianhitchcock.net

PL/SQL Notes

Scripts setup to– Run in SQL*Plus user’s schema– Output tables created in user’s schema

Could easily change scripts– Store output tables in separate schema– Take a schema as input

Scan tables in specified schema

Page 67: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 67

www.brianhitchcock.net

PL/SQL Script Example

Show PL/SQL of first script– Cursors with definitions that depend on loop

variable of outer loop– Quotes and more quotes– Generating insert statements that are inserting

strings of SQL

Page 68: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 68

www.brianhitchcock.net

6th Script Textset serveroutput on size 1000000;

declare

cursor C_EucJpTabNames is

select table_name from user_tables

where table_name like 'NA_CO_%';

cursor C_EucJpTabCols (i_table_name varchar2) is

select column_name from user_tab_columns

where table_name = i_table_name

order by column_id;

TableName VARCHAR2(100);

TableRowCount NUMBER;

ColCount NUMBER;

TableCharColName VARCHAR(100);

NumAsciiPlusNon NUMBER;

TableCharColNum NUMBER;

Num_NA_EUCJP_Rows NUMBER;

TabNum_NA_EUCJP_Rows NUMBER;

Len_NA_EUCJP_SQL_stmt NUMBER;

TabNum_NA_EUCJP_Cols NUMBER;

CurNum_NA_EUCJP_Cols NUMBER;

Sql_stmt VARCHAR2(4000);

Sql_stmt2 VARCHAR2(4000) := 'COMMIT';

NA_EUCJP_SQL_stmt VARCHAR2(4000);

NA_EUCJP_SQL_stmt_insert VARCHAR2(4000);

NAColCount NUMBER;

BEGIN

--dbms_output.disable;

Sql_stmt := 'create table Table_NA_EUCJP_Info

(TableName VARCHAR2(30),

NUM_NONASCII_ROWS NUMBER,

NUM_NA_EUCJP_ROWS NUMBER,

NUM_NONASCII_COLS NUMBER,

NUM_NA_EUCJP_COLS NUMBER)';

execute immediate Sql_stmt;

Sql_stmt := 'create table Table_NA_EUCJP_SQL

(TableName VARCHAR2(30),

Len_NA_EUCJP_SQL NUMBER,

NA_EUCJP_SQL VARCHAR2(4000))';

execute immediate Sql_stmt;

Page 69: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 69

www.brianhitchcock.net

6th Script TextSql_stmt := 'create table Table_NA_EUCJP_Col_Info

(TableName VARCHAR2(30),

NUMNONASCIIROWS NUMBER,

NUMNACOLS NUMBER,

NACOLNUM NUMBER,

NAEUCJPCOLNAME VARCHAR2(30),

NUMNAEUCJPROWS NUMBER)';

execute immediate Sql_stmt;

open C_EucJpTabNames;

LOOP

FETCH C_EucJpTabNames into TableName;

Exit when C_EucJpTabNames%NOTFOUND;

NA_EUCJP_SQL_stmt := 'select count(*) from '||TableName||' where 1=0';

NA_EUCJP_SQL_stmt_insert := '''select count(*) from '||TableName||' where 1=0';

execute immediate 'select count(*) from

user_tab_columns where table_name = ''' || TableName || '''' into NAColCount;

dbms_output.put_line('here is the NA_EUCJP_SQL_stmt_insert ');

dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'',1,255));

dbms_output.put_line('table name is '||TableName);

execute immediate 'select count(*) from '||TableName into TableRowCount;

TableCharColNum := 0;

CurNum_NA_EUCJP_Cols := 0;

open C_EucJpTabCols (TableName);

LOOP

FETCH C_EucJpTabCols into TableCharColName;

Exit when C_EucJpTabCols%NOTFOUND;

dbms_output.put_line('This is column '||TableCharColName);

TableCharColNum := TableCharColNum + 1;

-- compute the number of EUCJP rows for this column...

execute immediate 'select count(*) from '||TableName||

' where '||TableCharColName||' like ''%¥_¥%'' or '

||TableCharColName||' like ''%¤_¤%''' into Num_NA_EUCJP_Rows;

dbms_output.put_line('This column has '||Num_NA_EUCJP_Rows||' NA_EUCJP_ rows');

IF Num_NA_EUCJP_Rows != 0 THEN

NA_EUCJP_SQL_stmt := NA_EUCJP_SQL_stmt||' or '||TableCharColName||

' like ''%¥_¥%'' or '||TableCharColName||' like ''%¤_¤%''';

NA_EUCJP_SQL_stmt_insert := NA_EUCJP_SQL_stmt_insert||' or '||TableCharColName||

' like ''''%¥_¥%'''' or '||TableCharColName||' like ''''%¤_¤%''''';

Page 70: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 70

www.brianhitchcock.net

6th Script TextCurNum_NA_EUCJP_Cols := CurNum_NA_EUCJP_Cols + 1;

dbms_output.put_line('This is NA_EUCJP_Column number '||CurNum_NA_EUCJP_Cols);

dbms_output.put_line('here is CurNum_NA_EUCJP_Cols');

dbms_output.put_line(CurNum_NA_EUCJP_Cols);

dbms_output.put_line('SQL statement appended...');

END IF;

-- insert column info...

--Dummy_col_count := 999;

Sql_stmt := 'insert into Table_NA_EUCJP_Col_Info values ('''||TableName||''', '||TableRowCount||

', '||NAColCount||', '||TableCharColNum||', '''||TableCharColName||''','||Num_NA_EUCJP_Rows||')';

execute immediate Sql_stmt;

dbms_output.put_line('Column info insert completed...');

End Loop;

NA_EUCJP_SQL_stmt_insert := NA_EUCJP_SQL_stmt_insert||'''';

dbms_output.put_line('here is the NA_EUCJP_SQL_stmt_insert ');

dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'',1,255));

Page 71: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 71

www.brianhitchcock.net

6th Script TextTabNum_NA_EUCJP_Cols:= CurNum_NA_EUCJP_Cols;

dbms_output.put_line('here is TabNum_NA_EUCJP_Cols');

dbms_output.put_line(TabNum_NA_EUCJP_Cols);

-- update number of NAEUCJP columns...

--Sql_stmt := 'update Table_NA_EUCJP_Col_Info set NUMNAEUCJPCOLS = TabNum_NA_EUCJP_Cols

--where TableName = '''||TableName||'';

--execute immediate Sql_stmt;

--dbms_output.put_line('Number of NAEUCJP columns updated...');

Close C_EucJpTabCols;

Len_NA_EUCJP_SQL_stmt := LENGTH (NA_EUCJP_SQL_stmt);

dbms_output.put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt);

dbms_output.put_line('here is the NA_EUCJP_SQL_stmt');

dbms_output.put_line(SUBSTR(''||NA_EUCJP_SQL_stmt||'',1,255));

--this has already been done above...

--execute immediate 'select count(*) from '||TableName into TableRowCount;

execute immediate 'select count(*) from

user_tab_columns where table_name = ''' || TableName || '''' into ColCount;

Page 72: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 72

www.brianhitchcock.net

6th Script Text--NA_EUCJP_SQL_stmt := 'testing';

TabNum_NA_EUCJP_Rows := 0;

execute immediate NA_EUCJP_SQL_stmt into TabNum_NA_EUCJP_Rows;

dbms_output.put_line('Number of NA_EUCJP_ rows... '||TabNum_NA_EUCJP_Rows);

--Len_NA_EUCJP_SQL_stmt := 0;

dbms_output.put_line('Num rows in the table '||TableRowCount);

dbms_output.put_line('Num columns in the table '||ColCount);

dbms_output.put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt);

dbms_output.put_line('Num NAEUCJP_ Rows '||TabNum_NA_EUCJP_Rows);

dbms_output.put_line('Num NAEUCJP_ Columns '||TabNum_NA_EUCJP_Cols);

Sql_stmt := 'insert into Table_NA_EUCJP_Info values ('''||TableName||''', '||TableRowCount||

', '||TabNum_NA_EUCJP_Rows||', '||ColCount||', '||TabNum_NA_EUCJP_Cols||')';

execute immediate Sql_stmt;

dbms_output.put_line('First insert completed...');

-- If number of EUCJP rows is non-zero, insert select SQL into SQL table

IF TabNum_NA_EUCJP_Rows != 0 THEN

Sql_stmt := 'insert into Table_NA_EUCJP_SQL values ('''||TableName||''', '||Len_NA_EUCJP_SQL_stmt||

', '||NA_EUCJP_SQL_stmt_insert||')';

execute immediate Sql_stmt;

dbms_output.put_line('Second insert completed...');

Page 73: Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com DCSIT Technical

DCSIT Technical Services DBA

Brian Hitchcock November 11, 2004 Page 73

www.brianhitchcock.net

6th Script TextEnd If;

execute immediate Sql_stmt2;

End Loop;

Close C_EucJpTabNames;

End;

/