intelligence. innovation. integrity oracle data archiving taming the beast dave moore neon...

70
INTELLIGENCE. INNOVATION. INTEGRITY Oracle Data Archiving Taming the Beast Dave Moore Neon Enterprise Software

Upload: hortense-cooper

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

INTELLIGENCE. INNOVATION. INTEGRITY

Oracle Data ArchivingTaming the Beast

Dave Moore Neon Enterprise Software

INTELLIGENCE. INNOVATION. INTEGRITY

Archiving Defined

Requirements and Solutions

Oracle Archiving Strategies

Oracle Row Removal Options

Oracle Post Archive Operations

Agenda

INTELLIGENCE. INNOVATION. INTEGRITY

Dave

Oracle ACE

Using Oracle since 1991

Product Author at Neon Enterprise Software

Creator of OracleUtilities.com

Author of “Oracle Utilities” from Rampant Tech Press

Core competencies include performance, utilities and data management

INTELLIGENCE. INNOVATION. INTEGRITY

Database Archiving

Database Archiving:The process of removing selected data records from operational databases that are not expected to be referencedagain and storing them in an archive data store where they can be retrieved if needed.

Purge

INTELLIGENCE. INNOVATION. INTEGRITY

Trends Impacting Archive Needs

Am

ou

nt

of

Data

Time Required

Complia

nce Pr

otect

ion

0 30+ Yrs

Data Retention Issues:

Volume of data

Length of retention requirement

Varied types of data

Security issues

INTELLIGENCE. INNOVATION. INTEGRITY

Archiving All Types of Data

PaperBlueprintsFormsClaims

WordExcelPDFXML

IMSDB2ORACLESYBASESQL ServerIDMS

VSAMProgramsUNIX Files

OutlookLotus NotesAttachments

SoundPicturesVideo

INTELLIGENCE. INNOVATION. INTEGRITY

Mandatory Retention Period

Data Archiving and ILM

Create DiscardOperational Reference Archive

Needed for completingbusiness transactions

Needed forreportingor expectedqueries

Needed forcompliance and business protection

INTELLIGENCE. INNOVATION. INTEGRITY

Some Sample Regulations Impacting Data Retention

INTELLIGENCE. INNOVATION. INTEGRITY

What Does It All Mean?

Enterprises must recognize that there is a business value in organizing their information and data.

Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit. (Gartner, 20-April-2007, Research Note G00148170: Cost of E-Discovery Threatens to Skew Justice System)

INTELLIGENCE. INNOVATION. INTEGRITY

Operational Efficiency

Database Archiving can be undertaken to improve operational efficiency

Large volumes of data can interfere with production operations

– efficiency of transactions– efficiency of utilities: BACKUP/RESTORE, REORG,

etc.– Storage

» Gartner: databases copied an average of 6 times!

INTELLIGENCE. INNOVATION. INTEGRITY

What Solutions Are Out There?

Keep Data in Operational Database— Problems with authenticity of large amounts

of data over long retention times

Store Data in UNLOAD files (or backups)— Problems with schema change and reading

archived data; using backups poses even more serious problems

Move Data to a Parallel Reference Database

— Combines problems of the previous two

Move Data to a Database Archive

INTELLIGENCE. INNOVATION. INTEGRITY

Components of aDatabase Archiving Solution

Archive Data Store and Retrieve

Archive DataQuery Access

Archive Administration

Archive Store

Data &Metadata

Production Database

MetadataPoliciesHistory

Recall Database

Captured StructureArchive Policies Data Retention

Data Recall

DataExtract

INTELLIGENCE. INNOVATION. INTEGRITY

Archiving Requirements

Policy based archiving: logical selection

Keep data for very long periods of time

Store very large amounts of data in archive

Maintain Archives for ever changing operational systems

Become independent from Applications/DBMS/Systems

Protect authenticity of data

Access data when needed; as needed

Discard data after retention period automatically

INTELLIGENCE. INNOVATION. INTEGRITY

Policy based archiving

Why : — Business objects are archived, not files

— Rules for when something is ready can be complex

— Data ready to be archived is distributed over database

Implications:— User must provide policies for when something

is to be archived

How:— Full metadata description of data

— Flexible specification of policy : “WHERE clause”

INTELLIGENCE. INNOVATION. INTEGRITY

For Example…

Part Number

Type

Description

Unit Type

Cost

Price

Substitute Parts

Parts Master isthe parent table to all other tables

PARTS MASTER

Part Number

PO Number

Vendor ID

Quantity Ordered

Unit Cost

Date Ordered

Date Received

ORDER INFO

Part Number

Dept. ID

CHIT ID

Qty Disbursed

Date Disbursed

DISBURSEMENT

Part Number

Bin Number

Qty on Hand

Qty on Order

Qty Backorder

STORAGE INFO

Part Number

Year

Q1 Disbursed

Q2 Disbursed

Q3 Disbursed

Q4 Disbursed

SUMMARY BY QUARTER

INTELLIGENCE. INNOVATION. INTEGRITY

Keep Data for a Long Time

Why: retention requirements in decades Implications:

— Archive will outlive applications/DBMS/systems that generated them

— Archive will outlive people who designed and managed operational systems

— Archive will outlive media we store it on How:

— Unique data store— Application/DBMS/system independence— Metadata independence— Continuous management of storage— Continuous management of archive content

INTELLIGENCE. INNOVATION. INTEGRITY

Maintain Archive for Changing Operational Systems

Why : — Metadata changes frequently— Applications are re-engineered periodically

– Change DBMS platform– Change System platform– Replace with new application– Consolidate after merger or acquisition

Implications:— Archive must support multiple variations of an

application— Archive must deal with metadata changes

How:— Manage applications as major archive streams

having multiple minor streams with metadata differences

— Achieve independence from operating environment

INTELLIGENCE. INNOVATION. INTEGRITY

Achieve Metadata Independence

Why : — Operational metadata is inadequate— Operational metadata changes— Operational systems keep only the “current”

metadata— Data in archive often does not mirror data in

operational structures

Implications:— Archive must encapsulate metadata— Metadata must be improved

How:— Metadata Capture, Validate, Enhance capabilities— Store structure that encapsulates with data— Keeps multiple versions of metadata

INTELLIGENCE. INNOVATION. INTEGRITY

Protect Authenticity of Data

Why : — Potential use in lawsuits/ investigations— Potential use in business analysis

Implications:— Protect from unwanted changes— Show original input— Cannot be managed in operational environment

How:— SQL Access that does not support I/U/D— Do not modify archive data on metadata changes— Encryption as stored— Checksum for detection of sabotage— Limit access to functions— Audit use of functions— Maintain offsite backup copies for restore if sabotaged

INTELLIGENCE. INNOVATION. INTEGRITY

Access Data Directly From Archive

Why : — Cannot depend on application environment

Implications:— Full access capability within archive system

How:— Industry standard interface (e.g. JDBC)— LOAD format output for

– For load into a database– May be different from source database

— Requires full and accurate metadata— Ability to review metadata— Ability to function across metadata changes

INTELLIGENCE. INNOVATION. INTEGRITY

Discard Function

Why : — Legal exposure for data kept too long

Implications:— Data cannot be kept in archive beyond retention

period— Must be removed with no exposure to forensic

software

How:— Policy based discard— System level function— Tightly controlled and audited— True “zero out” capability— Discard from backups as well

INTELLIGENCE. INNOVATION. INTEGRITY

Database or Archive?

Performance

Space

Compliance

Keep in DB Keep in Archive

INTELLIGENCE. INNOVATION. INTEGRITY

Based on Data Availability

Must be Available to App

Must be Available

Must Be Secure

Keep in DB Keep in Archive

Not Needed

Purge

INTELLIGENCE. INNOVATION. INTEGRITY

Oracle Archiving Strategies

Designed Up Front (Yeah, right)

Determined by Application Owner

Implemented by ____________

Utilize Oracle Features

INTELLIGENCE. INNOVATION. INTEGRITY

Finding Large Tables

DBA_SEGMENTS (bytes)

DBA_TABLES (num_rows)

or based on I/O

INTELLIGENCE. INNOVATION. INTEGRITY

Rolling Windows

Self Managing

Mostly based on DATE

Utilize DBMS Features — Partitioning

— Transportable Tablespaces– Exchange Partition – Set tablespace read only – Expdp – Copy export file and data file

INTELLIGENCE. INNOVATION. INTEGRITY

Rolling Windows via Partitioning

ProbablyNever

Accessed

Rarely Accessed

Heavily Accessed

Cheap as you can get

Not so fastor expensive

Fast, expensive

Data Profile

Storage Profile

Read / WriteRead OnlyRead Only / Compressed

P1 … … P47

INTELLIGENCE. INNOVATION. INTEGRITY

Why not use transportable tablespacesor Oracle exports for data retention?

INTELLIGENCE. INNOVATION. INTEGRITY

The Problem with Oracle Files

Transportable Tablespaces

Exports

Backups

Export Files &Datafiles Version 16Z

Oracle

Year 2030

ImportTrans Tsp

Year 2007

Not a good method for LT Data Retention

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old ways)

Range Partitioning

Data is distributed based on partition key range of values – usually a date.

Good When: Data is date-based.

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old Ways)

Hash Partitioning

Uses hash algorithm to create equally sized buckets of data.

Good When: No natural partition key and desire I/O balancing (hot spots).

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old Ways)

List Partitioning

Data is distributed based on LIST of values in partition key.

Good When: Have short list of values (States, Regions, Account Types)

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways – 11G)

Interval Partitioning

Initial Partition is created manually, the rest are automatically created as new data arrives.

Good When: Need a rolling window!

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways – 11G)

REF partitioning

Related Tables benefit from same partitioning strategy, whether column exists in children or not!

Good When: Desire related data to be partitioned in the same manner.

INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways – 11G)

Virtual Column Partitioning

Partition key may be based on virtual column

Good When: Virtual column is required for partition key.

INTELLIGENCE. INNOVATION. INTEGRITY

Rows Gotta Go

INTELLIGENCE. INNOVATION. INTEGRITY

Row Removal Options

SQL DELETE

CTAS / DROP / RENAME

TRUNCATE

Row Marking

INTELLIGENCE. INNOVATION. INTEGRITY

SQL DELETE

Good for small number of rows

RI handled automatically

Oracle was born to DELETE, better than any PL/SQL that you write.

Issue with Un-indexed Foreign Keys

INTELLIGENCE. INNOVATION. INTEGRITY

DELETE Optimization

Work in batches, committing (only when programmatically DELETING)

Use parallel DML (Partitioned tables only)

Drop Indexes before (if possible)

Index FK columns

INTELLIGENCE. INNOVATION. INTEGRITY

CTAS

Works well for PURGE, not archive Perfect when you want to keep low

percentage of rows in the table Doesn’t handle RI – no DELETE was issued. Process

1. Create table with rows you want to keep

2. Drop old table

3. Rename table

4. Recreate indexes

create table new_table unrecoverable as select * from old_table where ...

INTELLIGENCE. INNOVATION. INTEGRITY

TRUNCATE

Congratulations if your application lends itself to TRUNCATE without losing new data

What about RI?

May truncate or drop individual partitions

INTELLIGENCE. INNOVATION. INTEGRITY

DROP

DROP PARTITION

What would you do before you drop it?

Exchange partition with table

Transportable tablespace.

INTELLIGENCE. INNOVATION. INTEGRITY

• Benchmark the best way for you • Benchmark against real data if possible• Use parallel DML

Things to Remember

INTELLIGENCE. INNOVATION. INTEGRITY

Design Summary

Create an architecture that lends itself to aging, archiving, deleting

This architecture should compensate for business requirements

— For instance, customer orders not accessible after 6 months … or

— top query performance needed for all ‘ACTIVE’ accounts … etc

Implement it – THE EASY PART

INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges

INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges

“I have successfully deleted 10 billion rows from the table. HoooAhhhh! Performance will be great, space will be available, and I will get credit for optimizing our data warehouse application, saving the company billions of dollars”

INTELLIGENCE. INNOVATION. INTEGRITY

… 2 Days Later …

INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges

Hmmmmm. It looks like …

- Queries are not any faster . . . - The Select count(*) took the same amount of time . . . - Space was not freed in Oracle (DBA_FREE_SPACE) . . . - Space was not freed in the operating system . . .

WHY NOT ????? Where are the benefits ???

INTELLIGENCE. INNOVATION. INTEGRITY

From Swiss to Provolone

After DELETE After Maintenance

INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges

Statistics are not fresh

High Water Marks are very high

Space has not been freed within Oracle (if that’s what you want)

Space has not been freed to the OS

INTELLIGENCE. INNOVATION. INTEGRITY

Refresh Statistics

Help the optimizer, easy enough

dbms_stats provides many options

INTELLIGENCE. INNOVATION. INTEGRITY

Automatic Stats

Recommended by Oracle Calls DBMS_STATS_JOB_PROC Enabled via:

Begin

dbms_auto_task_admin.enable(

client_name => ‘auto optimizer stats collection’,

operation => NULL,

window_name => NULL);

END;

/

INTELLIGENCE. INNOVATION. INTEGRITY

When do you go manual ?

High transaction DELETEs or TRUNCATEs

Bulk loads which add more than 10% of table size

So there’s our answer – go manual.

INTELLIGENCE. INNOVATION. INTEGRITY

How do we Gather Them?

NOT the Analyze Command

Instead DBMS_STATS package

exec dbms_stats.gather_table_stats(ownname => 'BDB', tabname => 'MASTER', estimate_percent =>

dbms_stats.auto_sample_size);

INTELLIGENCE. INNOVATION. INTEGRITY

High Water Mark

INTELLIGENCE. INNOVATION. INTEGRITY

High Water Mark

INTELLIGENCE. INNOVATION. INTEGRITY

Reset High Water Mark (HWM)

DROP or TRUNCATE

Multiple OTHER ways to do this depending on version

In v9 … alter table move tablespace [tsp name]; — Row movement must be enabled

— Tablespace must be a LMT

— Can move into same tablespace

— Will occupy 2X space temporarily

— Must then rebuild indexes

In v10 … alter table <table_name> shrink space;

INTELLIGENCE. INNOVATION. INTEGRITY

Freeing Allocated Space

INTELLIGENCE. INNOVATION. INTEGRITY

Create table, check space

SQL> create table space_example as select * from dba_source;

Table created.

SQL> select count(*) from space_example;

COUNT(*)---------- 296463

SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE');

Total blocks: 6328Unused blocks: 1Unused bytes: 8192Last Used Block: 55Last Used Block ID: 10377Last Used Ext File ID: 4

INTELLIGENCE. INNOVATION. INTEGRITY

Check datafile space

Size Current Poss.FILE_NAME Poss. Size Savings-------------------------------------------------- -------- -------- --------/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17/export/home/ora102/oradata/ora102/example01.dbf 69 100 31/export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0/export/home/ora102/oradata/ora102/system01.dbf 493 500 7/export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0/export/home/ora102/oradata/ora102/undotbs01.dbf 91 175 84/export/home/ora102/oradata/ora102/users01.dbf 44 83 39/export/home/ora102/oradata/ora102/test.dbf 51 70 19

INTELLIGENCE. INNOVATION. INTEGRITY

Delete rows, check space

SQL> delete from space_example;

296463 rows deleted.

SQL> commit;

SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE');

Total blocks: 6328Unused blocks: 1Unused bytes: 8192Last Used Block: 55Last Used Block ID: 10377Last Used Ext File ID: 4

Nothing Changed !

INTELLIGENCE. INNOVATION. INTEGRITY

Shrink it, check space

SQL> alter table space_example enable row movement;

SQL> alter table space_example shrink space;

SQL> exec dbms_space.unused_space('BDB', 'SPACE_EXAMPLE');

Total blocks: 8Unused blocks: 4Unused bytes: 32768Last Used Block: 4Last Used Block ID: 5129Last Used Ext File ID: 4

Space Freed From Table, but still in Oracle

INTELLIGENCE. INNOVATION. INTEGRITY

Check space again

Size Current Poss.FILE_NAME Poss. Size Savings-------------------------------------------------- -------- -------- --------/export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17/export/home/ora102/oradata/ora102/example01.dbf 69 100 31/export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0/export/home/ora102/oradata/ora102/system01.dbf 493 500 7/export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0/export/home/ora102/oradata/ora102/undotbs01.dbf 171 175 4/export/home/ora102/oradata/ora102/users01.dbf 44 83 39/export/home/ora102/oradata/ora102/test.dbf 1 70 69

This datafile should be resized to save 69 MB

SQL> alter database datafile '/export/home/ora102/oradata/ora102/test.dbf' resize 1m;

INTELLIGENCE. INNOVATION. INTEGRITY

Free the Space

Space is still reserved for future inserts and updates, just not freed back to the OS

Space will not be automatically freed – confirm by checking DBA_FREE_SPACE

Ways to set it free— drop

— truncate

— alter table move …

— alter table shrink space …

INTELLIGENCE. INNOVATION. INTEGRITY

Unindexed Foreign Keys Example

COL1

1 Million Rows

PARENT

COL1

COL1_PARENT

CHILD

1 Million Rows

SQL> DELETE FROM PARENT WHERE COL1 < 1000;

ON DELETE CASCADE

Fky.sql

INTELLIGENCE. INNOVATION. INTEGRITY

Before Index

delete from parentwhere col1 < 1000

call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.01 0.08 2 27 0 0Execute 1 0.90 0.80 4 2208799 6062 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 2 0.91 0.88 6 2208826 6062 999

delete from "DAVE"."CHILD"where "COL1_PARENT" = :1

call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 999 285.94 293.11 1543900 2208789 1029 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 1000 285.94 293.11 1543900 2208789 1029 999

INTELLIGENCE. INNOVATION. INTEGRITY

delete from parentwhere col1 < 1000

call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 1 0.53 0.47 7 13 7053 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 2 0.53 0.47 7 13 7053 999

delete from "DAVE"."CHILD"where "COL1_PARENT" = :1

call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 999 0.42 0.46 2 3002 4058 999Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 1000 0.42 0.46 2 3002 4058 999

SQL> create index prnt_ndx on child(col1_parent);

INTELLIGENCE. INNOVATION. INTEGRITY

Unindexed Foreign Keys

Problem is not limited to DELETE statements

Search database for unindexed FK columns

Script is on asktom — Search for unindex.sql

INTELLIGENCE. INNOVATION. INTEGRITY

Summary Points

Create sound Archiving strategy based on Oracle technical features as well as business and/or legal requirements

Leverage partitioning Move partitions to cheap disk when

appropriate Make partitions ‘read only’ and

compressed Remove data via DROP or TRUNCATE if

possible If SQL DELETE, make sure to perform

maintenance operations Consider 3rd party solutions

INTELLIGENCE. INNOVATION. INTEGRITY

Questions?

“Well done is better than well said” Ben Franklin