intelligence. innovation. integrity oracle data archiving taming the beast dave moore neon...
TRANSCRIPT
INTELLIGENCE. INNOVATION. INTEGRITY
Oracle Data ArchivingTaming the Beast
Dave Moore Neon Enterprise Software
INTELLIGENCE. INNOVATION. INTEGRITY
Archiving Defined
Requirements and Solutions
Oracle Archiving Strategies
Oracle Row Removal Options
Oracle Post Archive Operations
Agenda
INTELLIGENCE. INNOVATION. INTEGRITY
Dave
Oracle ACE
Using Oracle since 1991
Product Author at Neon Enterprise Software
Creator of OracleUtilities.com
Author of “Oracle Utilities” from Rampant Tech Press
Core competencies include performance, utilities and data management
INTELLIGENCE. INNOVATION. INTEGRITY
Database Archiving
Database Archiving:The process of removing selected data records from operational databases that are not expected to be referencedagain and storing them in an archive data store where they can be retrieved if needed.
Purge
INTELLIGENCE. INNOVATION. INTEGRITY
Trends Impacting Archive Needs
Am
ou
nt
of
Data
Time Required
Complia
nce Pr
otect
ion
0 30+ Yrs
Data Retention Issues:
Volume of data
Length of retention requirement
Varied types of data
Security issues
INTELLIGENCE. INNOVATION. INTEGRITY
Archiving All Types of Data
PaperBlueprintsFormsClaims
WordExcelPDFXML
IMSDB2ORACLESYBASESQL ServerIDMS
VSAMProgramsUNIX Files
OutlookLotus NotesAttachments
SoundPicturesVideo
INTELLIGENCE. INNOVATION. INTEGRITY
Mandatory Retention Period
Data Archiving and ILM
Create DiscardOperational Reference Archive
Needed for completingbusiness transactions
Needed forreportingor expectedqueries
Needed forcompliance and business protection
INTELLIGENCE. INNOVATION. INTEGRITY
What Does It All Mean?
Enterprises must recognize that there is a business value in organizing their information and data.
Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit. (Gartner, 20-April-2007, Research Note G00148170: Cost of E-Discovery Threatens to Skew Justice System)
INTELLIGENCE. INNOVATION. INTEGRITY
Operational Efficiency
Database Archiving can be undertaken to improve operational efficiency
Large volumes of data can interfere with production operations
– efficiency of transactions– efficiency of utilities: BACKUP/RESTORE, REORG,
etc.– Storage
» Gartner: databases copied an average of 6 times!
INTELLIGENCE. INNOVATION. INTEGRITY
What Solutions Are Out There?
Keep Data in Operational Database— Problems with authenticity of large amounts
of data over long retention times
Store Data in UNLOAD files (or backups)— Problems with schema change and reading
archived data; using backups poses even more serious problems
Move Data to a Parallel Reference Database
— Combines problems of the previous two
Move Data to a Database Archive
INTELLIGENCE. INNOVATION. INTEGRITY
Components of aDatabase Archiving Solution
Archive Data Store and Retrieve
Archive DataQuery Access
Archive Administration
Archive Store
Data &Metadata
Production Database
MetadataPoliciesHistory
Recall Database
Captured StructureArchive Policies Data Retention
Data Recall
DataExtract
INTELLIGENCE. INNOVATION. INTEGRITY
Archiving Requirements
Policy based archiving: logical selection
Keep data for very long periods of time
Store very large amounts of data in archive
Maintain Archives for ever changing operational systems
Become independent from Applications/DBMS/Systems
Protect authenticity of data
Access data when needed; as needed
Discard data after retention period automatically
INTELLIGENCE. INNOVATION. INTEGRITY
Policy based archiving
Why : — Business objects are archived, not files
— Rules for when something is ready can be complex
— Data ready to be archived is distributed over database
Implications:— User must provide policies for when something
is to be archived
How:— Full metadata description of data
— Flexible specification of policy : “WHERE clause”
INTELLIGENCE. INNOVATION. INTEGRITY
For Example…
Part Number
Type
Description
Unit Type
Cost
Price
Substitute Parts
Parts Master isthe parent table to all other tables
PARTS MASTER
Part Number
PO Number
Vendor ID
Quantity Ordered
Unit Cost
Date Ordered
Date Received
ORDER INFO
Part Number
Dept. ID
CHIT ID
Qty Disbursed
Date Disbursed
DISBURSEMENT
Part Number
Bin Number
Qty on Hand
Qty on Order
Qty Backorder
STORAGE INFO
Part Number
Year
Q1 Disbursed
Q2 Disbursed
Q3 Disbursed
Q4 Disbursed
SUMMARY BY QUARTER
INTELLIGENCE. INNOVATION. INTEGRITY
Keep Data for a Long Time
Why: retention requirements in decades Implications:
— Archive will outlive applications/DBMS/systems that generated them
— Archive will outlive people who designed and managed operational systems
— Archive will outlive media we store it on How:
— Unique data store— Application/DBMS/system independence— Metadata independence— Continuous management of storage— Continuous management of archive content
INTELLIGENCE. INNOVATION. INTEGRITY
Maintain Archive for Changing Operational Systems
Why : — Metadata changes frequently— Applications are re-engineered periodically
– Change DBMS platform– Change System platform– Replace with new application– Consolidate after merger or acquisition
Implications:— Archive must support multiple variations of an
application— Archive must deal with metadata changes
How:— Manage applications as major archive streams
having multiple minor streams with metadata differences
— Achieve independence from operating environment
INTELLIGENCE. INNOVATION. INTEGRITY
Achieve Metadata Independence
Why : — Operational metadata is inadequate— Operational metadata changes— Operational systems keep only the “current”
metadata— Data in archive often does not mirror data in
operational structures
Implications:— Archive must encapsulate metadata— Metadata must be improved
How:— Metadata Capture, Validate, Enhance capabilities— Store structure that encapsulates with data— Keeps multiple versions of metadata
INTELLIGENCE. INNOVATION. INTEGRITY
Protect Authenticity of Data
Why : — Potential use in lawsuits/ investigations— Potential use in business analysis
Implications:— Protect from unwanted changes— Show original input— Cannot be managed in operational environment
How:— SQL Access that does not support I/U/D— Do not modify archive data on metadata changes— Encryption as stored— Checksum for detection of sabotage— Limit access to functions— Audit use of functions— Maintain offsite backup copies for restore if sabotaged
INTELLIGENCE. INNOVATION. INTEGRITY
Access Data Directly From Archive
Why : — Cannot depend on application environment
Implications:— Full access capability within archive system
How:— Industry standard interface (e.g. JDBC)— LOAD format output for
– For load into a database– May be different from source database
— Requires full and accurate metadata— Ability to review metadata— Ability to function across metadata changes
INTELLIGENCE. INNOVATION. INTEGRITY
Discard Function
Why : — Legal exposure for data kept too long
Implications:— Data cannot be kept in archive beyond retention
period— Must be removed with no exposure to forensic
software
How:— Policy based discard— System level function— Tightly controlled and audited— True “zero out” capability— Discard from backups as well
INTELLIGENCE. INNOVATION. INTEGRITY
Database or Archive?
Performance
Space
Compliance
Keep in DB Keep in Archive
INTELLIGENCE. INNOVATION. INTEGRITY
Based on Data Availability
Must be Available to App
Must be Available
Must Be Secure
Keep in DB Keep in Archive
Not Needed
Purge
INTELLIGENCE. INNOVATION. INTEGRITY
Oracle Archiving Strategies
Designed Up Front (Yeah, right)
Determined by Application Owner
Implemented by ____________
Utilize Oracle Features
INTELLIGENCE. INNOVATION. INTEGRITY
Finding Large Tables
DBA_SEGMENTS (bytes)
DBA_TABLES (num_rows)
or based on I/O
INTELLIGENCE. INNOVATION. INTEGRITY
Rolling Windows
Self Managing
Mostly based on DATE
Utilize DBMS Features — Partitioning
— Transportable Tablespaces– Exchange Partition – Set tablespace read only – Expdp – Copy export file and data file
INTELLIGENCE. INNOVATION. INTEGRITY
Rolling Windows via Partitioning
ProbablyNever
Accessed
Rarely Accessed
Heavily Accessed
Cheap as you can get
Not so fastor expensive
Fast, expensive
Data Profile
Storage Profile
Read / WriteRead OnlyRead Only / Compressed
P1 … … P47
INTELLIGENCE. INNOVATION. INTEGRITY
Why not use transportable tablespacesor Oracle exports for data retention?
INTELLIGENCE. INNOVATION. INTEGRITY
The Problem with Oracle Files
Transportable Tablespaces
Exports
Backups
Export Files &Datafiles Version 16Z
Oracle
Year 2030
ImportTrans Tsp
Year 2007
Not a good method for LT Data Retention
INTELLIGENCE. INNOVATION. INTEGRITY
Partitioning (Old ways)
Range Partitioning
Data is distributed based on partition key range of values – usually a date.
Good When: Data is date-based.
INTELLIGENCE. INNOVATION. INTEGRITY
Partitioning (Old Ways)
Hash Partitioning
Uses hash algorithm to create equally sized buckets of data.
Good When: No natural partition key and desire I/O balancing (hot spots).
INTELLIGENCE. INNOVATION. INTEGRITY
Partitioning (Old Ways)
List Partitioning
Data is distributed based on LIST of values in partition key.
Good When: Have short list of values (States, Regions, Account Types)
INTELLIGENCE. INNOVATION. INTEGRITY
Partitioning (New Ways – 11G)
Interval Partitioning
Initial Partition is created manually, the rest are automatically created as new data arrives.
Good When: Need a rolling window!
INTELLIGENCE. INNOVATION. INTEGRITY
Partitioning (New Ways – 11G)
REF partitioning
Related Tables benefit from same partitioning strategy, whether column exists in children or not!
Good When: Desire related data to be partitioned in the same manner.
INTELLIGENCE. INNOVATION. INTEGRITY
Partitioning (New Ways – 11G)
Virtual Column Partitioning
Partition key may be based on virtual column
Good When: Virtual column is required for partition key.
INTELLIGENCE. INNOVATION. INTEGRITY
Row Removal Options
SQL DELETE
CTAS / DROP / RENAME
TRUNCATE
Row Marking
INTELLIGENCE. INNOVATION. INTEGRITY
SQL DELETE
Good for small number of rows
RI handled automatically
Oracle was born to DELETE, better than any PL/SQL that you write.
Issue with Un-indexed Foreign Keys
INTELLIGENCE. INNOVATION. INTEGRITY
DELETE Optimization
Work in batches, committing (only when programmatically DELETING)
Use parallel DML (Partitioned tables only)
Drop Indexes before (if possible)
Index FK columns
INTELLIGENCE. INNOVATION. INTEGRITY
CTAS
Works well for PURGE, not archive Perfect when you want to keep low
percentage of rows in the table Doesn’t handle RI – no DELETE was issued. Process
1. Create table with rows you want to keep
2. Drop old table
3. Rename table
4. Recreate indexes
create table new_table unrecoverable as select * from old_table where ...
INTELLIGENCE. INNOVATION. INTEGRITY
TRUNCATE
Congratulations if your application lends itself to TRUNCATE without losing new data
What about RI?
May truncate or drop individual partitions
INTELLIGENCE. INNOVATION. INTEGRITY
DROP
DROP PARTITION
What would you do before you drop it?
Exchange partition with table
Transportable tablespace.
INTELLIGENCE. INNOVATION. INTEGRITY
• Benchmark the best way for you • Benchmark against real data if possible• Use parallel DML
Things to Remember
INTELLIGENCE. INNOVATION. INTEGRITY
Design Summary
Create an architecture that lends itself to aging, archiving, deleting
This architecture should compensate for business requirements
— For instance, customer orders not accessible after 6 months … or
— top query performance needed for all ‘ACTIVE’ accounts … etc
Implement it – THE EASY PART
INTELLIGENCE. INNOVATION. INTEGRITY
Post Archive Challenges
“I have successfully deleted 10 billion rows from the table. HoooAhhhh! Performance will be great, space will be available, and I will get credit for optimizing our data warehouse application, saving the company billions of dollars”
INTELLIGENCE. INNOVATION. INTEGRITY
Post Archive Challenges
Hmmmmm. It looks like …
- Queries are not any faster . . . - The Select count(*) took the same amount of time . . . - Space was not freed in Oracle (DBA_FREE_SPACE) . . . - Space was not freed in the operating system . . .
WHY NOT ????? Where are the benefits ???
INTELLIGENCE. INNOVATION. INTEGRITY
Post Archive Challenges
Statistics are not fresh
High Water Marks are very high
Space has not been freed within Oracle (if that’s what you want)
Space has not been freed to the OS
INTELLIGENCE. INNOVATION. INTEGRITY
Refresh Statistics
Help the optimizer, easy enough
dbms_stats provides many options
INTELLIGENCE. INNOVATION. INTEGRITY
Automatic Stats
Recommended by Oracle Calls DBMS_STATS_JOB_PROC Enabled via:
Begin
dbms_auto_task_admin.enable(
client_name => ‘auto optimizer stats collection’,
operation => NULL,
window_name => NULL);
END;
/
INTELLIGENCE. INNOVATION. INTEGRITY
When do you go manual ?
High transaction DELETEs or TRUNCATEs
Bulk loads which add more than 10% of table size
So there’s our answer – go manual.
INTELLIGENCE. INNOVATION. INTEGRITY
How do we Gather Them?
NOT the Analyze Command
Instead DBMS_STATS package
exec dbms_stats.gather_table_stats(ownname => 'BDB', tabname => 'MASTER', estimate_percent =>
dbms_stats.auto_sample_size);
INTELLIGENCE. INNOVATION. INTEGRITY
Reset High Water Mark (HWM)
DROP or TRUNCATE
Multiple OTHER ways to do this depending on version
In v9 … alter table move tablespace [tsp name]; — Row movement must be enabled
— Tablespace must be a LMT
— Can move into same tablespace
— Will occupy 2X space temporarily
— Must then rebuild indexes
In v10 … alter table <table_name> shrink space;
INTELLIGENCE. INNOVATION. INTEGRITY
Create table, check space
SQL> create table space_example as select * from dba_source;
Table created.
SQL> select count(*) from space_example;
COUNT(*)---------- 296463
SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE');
Total blocks: 6328Unused blocks: 1Unused bytes: 8192Last Used Block: 55Last Used Block ID: 10377Last Used Ext File ID: 4
INTELLIGENCE. INNOVATION. INTEGRITY
Check datafile space
Size Current Poss.FILE_NAME Poss. Size Savings-------------------------------------------------- -------- -------- --------/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17/export/home/ora102/oradata/ora102/example01.dbf 69 100 31/export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0/export/home/ora102/oradata/ora102/system01.dbf 493 500 7/export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0/export/home/ora102/oradata/ora102/undotbs01.dbf 91 175 84/export/home/ora102/oradata/ora102/users01.dbf 44 83 39/export/home/ora102/oradata/ora102/test.dbf 51 70 19
INTELLIGENCE. INNOVATION. INTEGRITY
Delete rows, check space
SQL> delete from space_example;
296463 rows deleted.
SQL> commit;
SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE');
Total blocks: 6328Unused blocks: 1Unused bytes: 8192Last Used Block: 55Last Used Block ID: 10377Last Used Ext File ID: 4
Nothing Changed !
INTELLIGENCE. INNOVATION. INTEGRITY
Shrink it, check space
SQL> alter table space_example enable row movement;
SQL> alter table space_example shrink space;
SQL> exec dbms_space.unused_space('BDB', 'SPACE_EXAMPLE');
Total blocks: 8Unused blocks: 4Unused bytes: 32768Last Used Block: 4Last Used Block ID: 5129Last Used Ext File ID: 4
Space Freed From Table, but still in Oracle
INTELLIGENCE. INNOVATION. INTEGRITY
Check space again
Size Current Poss.FILE_NAME Poss. Size Savings-------------------------------------------------- -------- -------- --------/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17/export/home/ora102/oradata/ora102/example01.dbf 69 100 31/export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0/export/home/ora102/oradata/ora102/system01.dbf 493 500 7/export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0/export/home/ora102/oradata/ora102/undotbs01.dbf 171 175 4/export/home/ora102/oradata/ora102/users01.dbf 44 83 39/export/home/ora102/oradata/ora102/test.dbf 1 70 69
This datafile should be resized to save 69 MB
SQL> alter database datafile '/export/home/ora102/oradata/ora102/test.dbf' resize 1m;
INTELLIGENCE. INNOVATION. INTEGRITY
Free the Space
Space is still reserved for future inserts and updates, just not freed back to the OS
Space will not be automatically freed – confirm by checking DBA_FREE_SPACE
Ways to set it free— drop
— truncate
— alter table move …
— alter table shrink space …
INTELLIGENCE. INNOVATION. INTEGRITY
Unindexed Foreign Keys Example
COL1
1 Million Rows
PARENT
COL1
COL1_PARENT
CHILD
1 Million Rows
SQL> DELETE FROM PARENT WHERE COL1 < 1000;
ON DELETE CASCADE
Fky.sql
INTELLIGENCE. INNOVATION. INTEGRITY
Before Index
delete from parentwhere col1 < 1000
call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.01 0.08 2 27 0 0Execute 1 0.90 0.80 4 2208799 6062 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 2 0.91 0.88 6 2208826 6062 999
delete from "DAVE"."CHILD"where "COL1_PARENT" = :1
call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 999 285.94 293.11 1543900 2208789 1029 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 1000 285.94 293.11 1543900 2208789 1029 999
INTELLIGENCE. INNOVATION. INTEGRITY
delete from parentwhere col1 < 1000
call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 1 0.53 0.47 7 13 7053 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 2 0.53 0.47 7 13 7053 999
delete from "DAVE"."CHILD"where "COL1_PARENT" = :1
call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 999 0.42 0.46 2 3002 4058 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 1000 0.42 0.46 2 3002 4058 999
SQL> create index prnt_ndx on child(col1_parent);
INTELLIGENCE. INNOVATION. INTEGRITY
Unindexed Foreign Keys
Problem is not limited to DELETE statements
Search database for unindexed FK columns
Script is on asktom — Search for unindex.sql
INTELLIGENCE. INNOVATION. INTEGRITY
Summary Points
Create sound Archiving strategy based on Oracle technical features as well as business and/or legal requirements
Leverage partitioning Move partitions to cheap disk when
appropriate Make partitions ‘read only’ and
compressed Remove data via DROP or TRUNCATE if
possible If SQL DELETE, make sure to perform
maintenance operations Consider 3rd party solutions