august 11, 2003 31 schema flavors: build a warehouse session 37018 oracleworld san francisco 2003...
TRANSCRIPT
August 11, 2003
31 Schema Flavors: Build a Warehouse
Session 37018OracleWorld San Francisco 2003
Robert CorfmanBoeing Commercial Airplanes
Information Systems
Ow2003-37018.ppt9/10/2003
What is this Session?
Background on the Boeing Tool Management Project– COTS Package Enhancements– COTS Limitations
Warehouse Limitations / Overview Design Specific Code Examples / Techniques
– Replication Usage– Detecting Schema Differences– Error Capture, Propagation, Reporting– Performance Issues
Ow2003-37018.ppt9/10/2003
Presentation Perspective
Application Architect– Responsible for technical application decisions– Enterprise project and Puget Sound responsibilities– Programmer
Multiple technical facets– COTS modification specs, delivery, testing, deployment– Radio Frequency Implementation– Data Conversion– Legacy application interfaces– Warehouse
Not a Warehousing Specialist . . .Not a Warehousing Specialist . . .
Ow2003-37018.ppt9/10/2003
31 Schema Flavors
Project Background
And
Warehouse Design Decisions
How did we get where we are?How did we get where we are?
Ow2003-37018.ppt9/10/2003
Boeing Tool Standardization
Perishable / Portable Tools and Shop Supplies– Drill Motors, Safety Glasses, Reamers … “Non-Production”
Retirement of multiple legacy applications– Mergers and acquisitions – Robust home-grown applications
Enterprise Commodity Standardization– Internal customer base already communicating
Enterprise application – ‘Loose federation’ of independent site projects
Ow2003-37018.ppt9/10/2003
Site Distribution
Everett, Renton, Auburn, Seattle
Portland
Salt Lake City
Mesa
Philly
San Antonio
St LouisWichita
Cecil FieldMacon
Oak RidgeTulsa
Corinth
Ow2003-37018.ppt9/10/2003
COTS Package
Site vs. Enterprise – daily use vs. high leverage Picked best identified product for
‘site-level’ usability No Enterprise features in selected product
– Cross-site visibility, item linkage, or transfers
Supplier to build-in needed features– No customizations
Be Nimble, Keep the Product StandardBe Nimble, Keep the Product Standard
Ow2003-37018.ppt9/10/2003
Data Warehouse Design Decisions
Don’t break the site-level features/usability– Main reason for selecting this product…
Integrate warehouse features into the product Allow continued site operations when warehouse is
unavailable Drove decision not to integrate all sites into a
single Corporate-level database– Create separate roll-up database of all participating sites
Site Operations Company AssetSite Operations Company Asset
Ow2003-37018.ppt9/10/2003
What Data Goes in the Warehouse?
Data Subsets? Data Summarizations?– How do we decide what data to store? Summarize?– Volatile decision, different for different Companies
Drove decision to replicate all data into the warehouse and not pre-define summarizations
Application runs on multiple databases– Expected warehouse structure defined by COTS– Decision on exactly how to get data into designed
warehouse left outside of COTS package– Additional tables and summaries could be added,
but without direct integration in COTS package
Be Nimble, Keep the Product StandardBe Nimble, Keep the Product Standard
Ow2003-37018.ppt9/10/2003
Application Look and Feel
Site Data ViewSite Data View
Ow2003-37018.ppt9/10/2003
Application Look and Feel
Warehouse Data ViewWarehouse Data View
Ow2003-37018.ppt9/10/2003
Application Look and Feel
Inventory Reports ViewInventory Reports View
Ow2003-37018.ppt9/10/2003
Warehouse Design Considerations
7 x 24 Operations Allowed to be non-real-time Includes Oracle and SQL Server sites COTS package allows customization of site
– User-defined fields in standard tables– Modification of field lengths from standard
Warehouse needs to handle‘greatest common denominator’
Desire to leverage replication
Ow2003-37018.ppt9/10/2003
Warehouse Design Options
Updateable Snapshots– Warehouse is the Master, each site an updatable snapshot– Doesn’t work if sites have schema variants– Master control of data no longer at the sites
Replication directly into Warehouse– Same schema variant problem
Multiple “replicants”– Custom procedures to consolidate into Master warehouse– Allows Replication with Master control at site– Accommodates variation in site schemas
ETL software now considered a possibility . . .ETL software now considered a possibility . . .
Ow2003-37018.ppt9/10/2003
Proposed Warehouse Design
Site 1 (Unix)
Oracle
Site 2 (NT)
Oracle
Site 3 (NT)
SQL Server
Site N
Site Data
Warehouse Site
CompanyData Warehouse
Site 1Snapshot
Site 2Snapshot
Site 3Snapshot
Site NSnapshot
Ow2003-37018.ppt9/10/2003
Initial Issues with Design
Could perform Roll-up via Views– Create or replace view as– Select 1 siteno,item,desc,null,null from site1.table– Union Select 2,item,desc,cust1,null from site2.table– Union Select 3,item,desc,null,cust2 from site3.table;
Dynamically shows replicated data changes Significant difference in query times
– Oracle 9.2 showed an Order of magnitude difference in query times
– Oracle 8.1.7 was significantly worse– this was the version we were using when the decision was
made not to use views
Ow2003-37018.ppt9/10/2003
Initial Issues with Design
Strong desire to have 7x24 data availability Complexity of rebuilding in place
– Difficult to update existing data . . .
Leverage transparency of synonyms– Maintain two copies of warehouse– Rebuild, then re-direct
If rebuild fails, don’t re-direct– Built-in contingency . . .
Be Prepared for Success . . . (or Failure)Be Prepared for Success . . . (or Failure)
Ow2003-37018.ppt9/10/2003
Final Warehouse Design
Warehouse Site
CompanyData Warehouse
Site 1Snapshot
Site 2Snapshot
Site 3Snapshot
Site NSnapshot
Site 1 (Unix)
Oracle
Site 2 (NT)
Oracle
Site 3 (NT)
SQL Server
Site N
Site Data
WarehouseData 1
WarehouseData 2
Ow2003-37018.ppt9/10/2003
Replication of Oracle Data
Standard Oracle fast refresh functionality– Used to move data to the master warehouse site– Flexible schedule, avoid replication during rollup
Dbms_snapshot.refresh vs. Refresh Groups– Groups provided cross-table read consistency, required single
commit– Perceived difficulties in schedule variations– Difficulty in automating snapshot validity prior to refresh
– Custom column additions/deletions, column size changes
– Snapshot refresh chosen, acceptance of potential for minor data inconsistencies
Managed replication via packages– One package set on snapshot-site, companion package on site– Designed validate and rebuild snapshot logs and snapshots as
required
Ow2003-37018.ppt9/10/2003
Oracle Snapshot Verification
Verify_snapshots_sql :='select distinct upper(site.table_name)' ||' from user_tab_columns snap, ' ||' user_tab_columns@'|| db_link || ' site' ||' where upper(site.table_name) = upper(snap.table_name(+))' ||' and upper(site.column_name) = upper(snap.column_name(+))' ||' and upper(site.table_name) not like ''MLOG$%''' ||' and upper(site.table_name) not like ''RUPD$%''' ||' and upper(site.table_name) not in ' ||' (select upper(tablename) from exclude_whse@' || db_link || ')'
||' and (' ||' snap.column_name is null' ||' or snap.data_type != site.data_type' ||' or nvl(snap.data_precision,38)!=nvl(site.data_precision,38)' || -- default to 38 as replicated number have precision of 38...' or nvl(snap.data_length,-1) != nvl(site.data_length,-1)' ||' or nvl(snap.data_scale,-1) != nvl(site.data_scale,-1)'' )';
Never Apologize . . .Never Apologize . . .
Ow2003-37018.ppt9/10/2003
Replication of SQL Server Data
Expected to have full refresh– Site administrators has strong preference against…
Apparent Options– Oracle Transparent Gateway– Two-part Replication
– Microsoft Incremental replication to central LAN
– Full replication to warehouse
Final Decision…– Site researched push incremental replication from SQL
Server to Oracle … It works.– Site schema changes are manually handled
Ow2003-37018.ppt9/10/2003
Build a Warehouse
More Code
For
Rolling the Schemas Together
How did we get there from here?How did we get there from here?
Ow2003-37018.ppt9/10/2003
Automation with Configuration
Packaged functionality run with Oracle Jobs Table-driven configuration
– Which schemas are included– Snapshot and Warehouse Tables build parameters
– with defaults and individual table overrides
– Primary Keys and Indexes Includes infrastructure features
– Logging– Full error trapping– Continued operation after many failures– Statistical performance information– Positive response email results
Keep the DBAs Happy . . . What goes around . . . Keep the DBAs Happy . . . What goes around . . .
Ow2003-37018.ppt9/10/2003
Warehouse Build Steps
Drop sub-warehouse objects Cycle through snapshots and build / modify
sub-warehouse tables Bulk insert each snapshot table into sub-warehouse Create primary keys and indexes Compute statistics Switch master warehouse synonyms to point at
“new” sub-warehouse (upon successful completion) Email rebuild results
Ow2003-37018.ppt9/10/2003
Drop Sub-Warehouse Objects (Simple)
PROCEDURE DROP_WAREHOUSE_TABLES IS
cursor warehouse_tables is
select table_name from user_tables order by 1;
mysql varchar2(255);
BEGIN
FOR tables_rec IN warehouse_tables LOOP
mysql := 'DROP TABLE '|| tables_rec.table_name;
EXECUTE IMMEDIATE mysql;
END LOOP;
END DROP_WAREHOUSE_TABLES;
Ow2003-37018.ppt9/10/2003
Drop Warehouse Objects-Add Logging
PROCEDURE DROP_WAREHOUSE_TABLES IS cursor warehouse_tables is select table_name from user_tables order by 1; mysql varchar2(255); me ERR_TYPE;BEGIN Me.my_name := 'DROP_WAREHOUSE_TABLES'; log_activity(me,'Drop Warehouse tables started','S'); FOR tables_rec IN warehouse_tables LOOP log_activity(me,'Dropping '|| tables_rec.table_name); mysql := 'DROP TABLE ' || tables_rec.table_name; EXECUTE IMMEDIATE mysql; END LOOP; log_activity(me,'Drop Warehouse tables completed','C');END DROP_WAREHOUSE_TABLES;
Ow2003-37018.ppt9/10/2003
Drop Warehouse Objects-Add Error Traps
PROCEDURE DROP_WAREHOUSE_TABLES IS cursor warehouse_tables is select table_name from user_tables order by 1; mysql varchar2(255); me ERR_TYPE;BEGIN Me.my_name := 'DROP_WAREHOUSE_TABLES'; log_activity(me,'Drop Warehouse tables started','S');
FOR tables_rec IN warehouse_tables LOOP log_activity(me,'Dropping '|| tables_rec.table_name);
BEGIN mysql := 'DROP TABLE ' || tables_rec.table_name; EXECUTE IMMEDIATE mysql; EXCEPTION WHEN OTHERS THEN INTERNAL_PROC_ERROR(me, SQLERRM, SQLCODE, mysql); END; END LOOP; log_activity(me,'Drop Warehouse tables completed','C');
IF me.cnt_errors > 0 THEN raise_application_error(-20000, APP_ERR_MSG(me)); END IF;END DROP_WAREHOUSE_TABLES;
Ow2003-37018.ppt9/10/2003
Warehouse Build Steps
Drop sub-warehouse objects Cycle through snapshots and build / modify
sub-warehouse tables Bulk insert each snapshot table into sub-warehouse Create primary keys and indexes Compute statistics Switch master warehouse synonyms to point at
“new” sub-warehouse (upon successful completion) Email rebuild results
Ow2003-37018.ppt9/10/2003
Table Driven Configuration
Warehouse_Sites SiteNo Schema_Name
Warehouse_Sites SiteNo Schema_Name
. . .6 TULSA7 EVERETT8 RENTON9 AUBURN. . .
. . .6 TULSA7 EVERETT8 RENTON9 AUBURN. . .
PROCEDURE VERIFY_TABLES ISBEGIN Loop Through Each Schema (With Error Trapping and Logging) Verify_All_Schema_Tables(Schema_Name); End Loop;END VERIFY_TABLES;
Allows for Adding and Removing Sites as needed
Ow2003-37018.ppt9/10/2003
Build Warehouse Tables
Verify Schema Tables performs the major warehouse structure build-up work
31 Flavors . . . more similar than different– First Schema builds majority of tables / columns
Follow-on schemas modify structure as needed– Greatest common denominator
Loops through all table columns usingdata dictionary information
Very similar to snapshot verification routines– Column names, data type, length, precision, scale
Ow2003-37018.ppt9/10/2003
More Table Driven Configuration
Table_Sizes Table_Name Table_Size
Table_Sizes Table_Name Table_Size
Storage_Parms Table_Size Table_Storage Index_Storage
Storage_Parms Table_Size Table_Storage Index_Storage
STANDARDTABLESPACE WH128K_DATA STORAGE(INITIAL 128K NEXT 128K) NOLOGGINGTABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT 128K) NOLOGGINGLARGETABLESPACE WH4M_DATA STORAGE(INITIAL 4M NEXT 4M) NOLOGGINGTABLESPACE WH4M_INDX STORAGE(INITIAL 4M NEXT 4M) NOLOGGING
STANDARDTABLESPACE WH128K_DATA STORAGE(INITIAL 128K NEXT 128K) NOLOGGINGTABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT 128K) NOLOGGINGLARGETABLESPACE WH4M_DATA STORAGE(INITIAL 4M NEXT 4M) NOLOGGINGTABLESPACE WH4M_INDX STORAGE(INITIAL 4M NEXT 4M) NOLOGGING
Select ' ' || Table_Storage into BUILD_PARMS from Storage_Parms Where Table_Size = 'LARGE';My_SQL := Create_Table_Statement || BUILD_PARMS;Execute Immediate My_SQL;
Used when Adding a New Table Used when Adding a New Table
Ow2003-37018.ppt9/10/2003
Warehouse Build Steps
Drop sub-warehouse objects Cycle through snapshots and build / modify
sub-warehouse tables Bulk insert each snapshot table into sub-warehouse Create primary keys and indexes Compute statistics Switch master warehouse synonyms to point at
“new” sub-warehouse (upon successful completion) Email rebuild results
Ow2003-37018.ppt9/10/2003
Insert Mechanics
Data Warehouse schema is the sum of every site Similar technique to identifying schema
– Loop through each schema and table
Dynamically build Insert Statement– Based on site columns– Include unique ID for each schema
Same logging techniques built-in– Includes actual SQL statement
Error trapping and continuation built-in
Ow2003-37018.ppt9/10/2003
Minimizing Archive Logs with NoLogging
Archive logging in the production warehouse instance– Some configuration data should be archived– ‘Normal’ Boeing Operation to have Archive Logs in
Production
Warehouse data is recoverable from sites Nologging specified on tablespaces and tables
– Oracle still logs changes and inserts
Daily warehouse rebuild generates Significant Archive Log activity– Big Surprise with initial production use
Ignorance is Bliss ???Ignorance is Bliss ???
Ow2003-37018.ppt9/10/2003
Using Bulk Inserts
Bulk load can be specified on inserts– Direct Load with sql*loader – Append hint with SQL
Only affects tables specified with NoLogging
Insert /*+ APPEND */ into warehouse.inventory (siteno, item, description)Select 1,item,description from site1.inventory;
Invalidates session table use till commitORA-12838:
cannot read/modify an object after modifying it in parallel
Ow2003-37018.ppt9/10/2003
Warehouse Operations Log
Warehouse_Op_Log ID Log_Date Proc Log_Type Log_User Message Error Log_Rows
Warehouse_Op_Log ID Log_Date Proc Log_Type Log_User Message Error Log_Rows
1617893 08/08/03 02:08:02 INSERT_SITE_TABLE_DATA I CMPWHInserting From WICHITA.STATION 0 0
1617894 08/08/03 02:08:03 INSERT_SITE_TABLE_DATA I CMPWHData Inserted from WICHITA.STATION (5031 Rows) 0 5031
1617893 08/08/03 02:08:02 INSERT_SITE_TABLE_DATA I CMPWHInserting From WICHITA.STATION 0 0
1617894 08/08/03 02:08:03 INSERT_SITE_TABLE_DATA I CMPWHData Inserted from WICHITA.STATION (5031 Rows) 0 5031
Ow2003-37018.ppt9/10/2003
Warehouse Build Steps
Drop sub-warehouse objects Cycle through snapshots and build / modify
sub-warehouse tables Bulk insert each snapshot table into sub-warehouse Create primary keys and indexes Compute statistics Switch master warehouse synonyms to point at
“new” sub-warehouse (upon successful completion) Email rebuild results
Ow2003-37018.ppt9/10/2003
Indexing Mechanics
Table-driven primary key and index list Uses table storage size information for tablespace
and other storage clause information Dynamically build Alter table and Create Index
Statements Same logging techniques built-in
– Includes actual SQL statement
Error trapping and continuation built-in
Ow2003-37018.ppt9/10/2003
Error Logging - Continuation
select id, log_date, proc, log_type, message from warehouse_op_log where id between 1576388 and 1576390 order by id;
1576388 08/01/03 02:12:11 CREATE_PK_CONSTRAINTS ICreate Primary Key STATIONHIST.PK_STATIONHIST(CRIBBIN,SITENO)
1576389 08/01/03 02:12:23 CREATE_PK_CONSTRAINTS EORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated (alter table CMPWH1.STATIONHIST ADD CONSTRAINT PK_STATIONHIST PRIMARY KEY (CRIBBIN,SITENO) USING INDEX TABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT 128K) NOLOGGING)
1576390 08/01/03 02:12:23 CREATE_PK_CONSTRAINTS ICreate Primary Key TASK.PK_TASK(TASKNO,SITENO)
Ow2003-37018.ppt9/10/2003
Error Logging - Propagation
1570901 08/01/03 02:00:02 REBUILD_WAREHOUSE SREBUILDING WAREHOUSE - CMPWH1
1576389 08/01/03 02:12:23 CREATE_PK_CONSTRAINTS EORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated (alter table CMPWH1.STATIONHIST ADD CONSTRAINT PK_STATIONHIST PRIMARY KEY (CRIBBIN,SITENO) USING INDEX TABLESPACE WH128K_INDX STORAGE(INITIAL 128K NEXT 128K) NOLOGGING)
1576418 08/01/03 02:12:49 CREATE_INDEXES EORA-20000: 1 Error Occurred in CREATE_PK_CONSTRAINTS (see log for details). Last was: ORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated
1576471 08/01/03 02:20:31 REBUILD_WAREHOUSE EORA-20000: 1 Error Occurred in CREATE_INDEXES (see log for details). Last was: ORA-20000: 1 Error Occurred in CREATE_PK_CONSTRAINTS (see log for details). Last was: ORA-02437: cannot validate (CMPWH1.PK_STATIONHIST) - primary key violated
1576728 08/01/03 02:41:58 REBUILD_WAREHOUSE CREBUILT WAREHOUSE - CMPWH1 (1 error encountered)
Ow2003-37018.ppt9/10/2003
Warehouse Build Steps
Drop sub-warehouse objects Cycle through snapshots and build / modify
sub-warehouse tables Bulk insert each snapshot table into sub-warehouse Create primary keys and indexes Compute statistics Switch master warehouse synonyms to point at
“new” sub-warehouse (upon successful completion) Email rebuild results
Ow2003-37018.ppt9/10/2003
Warehouse Build Steps
Drop sub-warehouse objects Cycle through snapshots and build / modify
sub-warehouse tables Bulk insert each snapshot table into sub-warehouse Create primary keys and indexes Compute statistics Switch master warehouse synonyms to point at
“new” sub-warehouse (upon successful completion) Email rebuild results
Ow2003-37018.ppt9/10/2003
Positive Email Response
Email sent upon success and failure– Catastrophic failure would result in no email
– Subject clearly states warehouse status
– Table-driven recipient list
Uses dbms_smtp package– Build higher-level packaged functionality
– Used for multiple purposes (such as interface failures)
Email includes statistics regarding– Last snapshot refresh times (system tables)
– Roll-up timing statistics (warehouse operation log)
– Warehouse table row counts (warehouse operation log)
Ow2003-37018.ppt9/10/2003
Status Email Subject and Body
WARNING: Snapshots NOT Healthy, but Warehouse Redirected - CMTWH1
SUCCESS: Production Warehouse Redirected - CMPWH2
The CribMaster Oracle Data Warehouse was successfully rebuild and activated
Last Refresh of Participating Sites:MESA 07/31/03 01:48:17KELLY 07/30/03 23:34:45CECIL 07/30/03 23:14:42. . .Attachment shows statistics
Ow2003-37018.ppt9/10/2003
Email Statistics Attachment
Timing statistics for all major warehouse build steps
8/11/2003 2:00:06 REBUILDING WAREHOUSE - CMPWH2
8/11/2003 2:00:16 Dropping all tables in Warehouse started
8/11/2003 2:00:31 Verify Tables for all Schemas started
8/11/2003 2:02:11 Insert Table Data for all Schemas started
8/11/2003 2:08:02 Create all Primary key constraints started
8/11/2003 2:12:35 Create all indexes started
8/11/2003 2:20:12 Counting Rows in CMPWH2 started
8/11/2003 2:20:13 Analyze all tables in Warehouse started
8/11/2003 2:41:52 Starting with REVOKE-CMPWH2
8/11/2003 2:42:01 Completed REVOKE-CMPWH2
Ow2003-37018.ppt9/10/2003
Email Statistics Attachment
Row count statistics for all warehouse tables
TOTAL TABLE ROWS: 15288750
Row Counts for Significant Tables (rows over 100,000)
8/11/2003 SERIALSTATUSHISTORY 7235395
8/11/2003 TRANS 2945618
8/11/2003 STATIONHISTORY 669932
8/11/2003 STATION 658313
8/11/2003 ALTVENDOR 549365
8/11/2003 ITEMSERIAL 546443
8/11/2003 EMPLOYEE 486319
8/11/2003 INVENTRY 359461
. . .
Ow2003-37018.ppt9/10/2003
Failure Email
ERROR: Production Warehouse Rebuild Failed - CMPWH2(All Snapshots ARE Healthy)There was a problem rebuilding the warehouse. Cutover did NOT occur.Please review the warehouse_op_log for problems, correct them, and establish the correct new warehouse.Statement Executed was (BEGIN CMPWH2.WH_UTIL_MASTER.REBUILD_WAREHOUSE('CMPWH2'); END;)Error was (ORA-20000: 2 Error(s) Occurred in REBUILD_WAREHOUSE (see log for details).Last was: ORA-20000: 6 Error(s) Occurred in INSERT_ALL_DATA (see log for details).Last was: ORA-20000: 105 Error(s) Occurred in INSERT_SCHEMA_DATA (see log for details).Last was: ORA-20000: 1 Error(s) Occurred in INSERT_SITE_TABLE_DATA (see log for details).Last was: ORA-00942: table or view does not exist)
Ow2003-37018.ppt9/10/2003
What to Take Away
Work with COTS Supplier, Avoid Customizations Data needs in a warehouse
– Don’t Over-Analyze, disk is cheap (now) Plan in failure contingency in Build / Rebuild process Table-drive the implementation
– Decisions and Structures change with time Provide Visibility of the Activity
– Ongoing logging and positive results notification Bulk Insert Communications key
– DBA, Developers, COTS Supplier, and Users
Ow2003-37018.ppt9/10/2003
Robert CorfmanTSMS System Architect
Please turn in your evaluation form
Forever New Frontiers