table compression in oracle9 i r2
Post on 07-Jan-2016
24 Views
Preview:
DESCRIPTION
TRANSCRIPT
Table Compression in Oracle9i R2
Plamen Zyumbyulev
INSIDE OUT
,, Let someone k n o w ”
Agenda
Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Table Compression Facts
Table compression is useful
Everyone benefits from space saving
It not only saves space but can increase performance
It can’t be implemented everywhere
Why Table Compression ?
Table Compression increases:– I/O-subsystem capacity– I/O throughput– query scan performance (mainly FTS)– buffer cache capacity
Table Compression:– reduces cost of ownership– is easy to use– requires minimal table definition changes– is transparent to applications
Overview: Table Compression
Compression algorithm is based on removing data redundancy
Tables and Materialized Views can be compressed – Compression can also be specified at the partition level and
tablespace level– Indexes and index-organized tables are not compressed with
this method (there are other methods for index and IOT compression)
Compression is dependent upon the actual data DDL/DML commands are supported on compressed
tables Table columns cannot neither be added nor deleted
from a compressed table.
Which Applications benefit from Table Compression?
Table Compression targets read intensive applications such as Decision Support and OLAP
All schema designs benefit from Compression
Agenda
Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
How does Table Compression work?
Data is compressed by eliminating duplicate values in a database block
First Name Last Name
Scott Smith
Henry Smith
Henry Scott
Henry-Scott McGryen
Dictionary is built per block information to uncompress data is available in each block
If column values from same or different columns have the same values, they share the same symbol table entry. Only entire column values are compressed.
Sequences of columns are compressed as one entity if a sequence of column values occurs multiple times in many rows.
Block Level Compression
1233033 Meyer 11 Homestead Rd 13.99 1212300 Meyer11 Homestead Rd 1.99 1243012 Meyer 11 Homestead Rd 1.99 9923032 McGryen 3 Main Street 1.99 9833023 McGryen 3 Main Street 1.99 2133056 McGryen 3 Main Street 1.99
Non-Compressed Block
Block Header
Free Space
Block Header
Compressed Block
Meyer 11 Homestead Rd 1.99 McGryen 3 Main Street
1233033 13.9912123001243012 992303233023
982133056
Free Space
Symbol Table
Invoice CustName CustAddr Sales_amt1233033 Meyer 11 Homestead Rd 13.991212300 Meyer 11 Homestead Rd 1.991243012 Meyer 11 Homestead Rd 1.999923032 McGryen 3 Main Street 1.999833023 McGryen 3 Main Street 1.992133056 McGryen 3 Main Street 1.99
How Table Compression works
All columns are considered for compression Only worthwhile compression is performed Symbol table is created within each database
block depending on block content– Self tuning symbol table is created automatically
by the system– No explicit declaration of symbol table entries– Compression algorithm automatically adapts to
changes in data distribution
Which data is compressed
Compression occurs only when data is inserted with a bulk (direct-path) insert operation.
– Direct Path SQL*Loader– insert /*+ append */ …– create table … as select …– alter table move …
A table can consist of compressed and uncompressed blocks transparently.
Any DML operation can be applied to a table storing compressed blocks. However, conventional DML operations cause records to be stored uncompressed*.
SQL Commands
For a new table:– Create with compress attribute in table definition
create table … compress
For an existing table:1. Alter table to add compress attribute only new
rows are compressedalter table foo compress;
2. Compress table old and new rows are compressedalter table foo move compress;
Process of Compressing a Block
Deletes, Inserts and Updates
Deletes, Inserts and Updates are possible but can cause fragmentation and waste disk space when modifying compressed data.
Large PCTFREE will lead to low compression ratios. Setting PCTFREE to 0 (default) is recommended for all tables storing compressed data.
Updates
When a column is updated the algorithm checks whether a symbol table entry for the new value exists.
– If it exists, the reference of the updated column is modified to the new symbol table entry and its reference count is increased by one. At the same time the reference count of the old value is decreased by one.
– If no symbol table entry exists for the new column value, that value is inserted non-compressed into the row.
UPDATE TABLE itemSET i_color = ‘green’WHERE i_color =’blue’
If the old column value (‘green’) was also compressed and its reference count after the update operation became zero, the old symbol table entry is replaced with a new symbol table entry without touching all rows of one block.
Some update operations can take advantage of compression
Deletes
During delete operations all references counters of the deleted rows are decreased by one. Once a reference counter becomes zero, the corresponding symbol table entry is purged.
A symbol table is never deleted from a block even if no reference into it exists because the overhead of an empty symbol table is only 4 bytes.
Agenda
Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Test Environment:
One very big table – 2.3 TB
Table is partitioned per day.
One partition is around 3,2 GB
Once the data is loaded and processed it becomes read only.
Most of the table access is – FTS
Agenda
Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Space Savings
Table Compression significantly reduces disk and buffer cache requirements
Compression results mostly depend on data content on block level
Definitions: Compression Factor
Space Savings
Non Compressed Blocks
Compressed BlocksCF=
Non Compressed Blocks – Compressed Blocks
Non-Compressed BlocksSS= x100
What affects Compression?
Column length long short
Number distinct values low high
Block size large small
Sorted data yes no
Column sequence yes no
Modified data yes no
Column length long short
Number distinct values low high
Block size large small
Sorted data yes no
Column sequence yes no
Modified data yes no
Table Characteristic Compression Factor
high low
Estimating CF by using data samples create function compression_ratio (tabname varchar2)return number is-- sample percentagepct number := 0.000099;-- original block count (should be less than 10k)blkcnt number := 0;-- compressed block countblkcntc number;beginexecute immediate ' create table TEMP_UNCOMPRESSED pctfree 0as select * from ' || tabname ||' where rownum < 1';while ((pct < 100) and (blkcnt < 1000)) loopexecute immediate 'truncate table TEMP_UNCOMPRESSED';execute immediate 'insert into TEMP_UNCOMPRESSED select *from ' ||tabname || ' sample block (' || pct ||',10)';execute immediate 'selectcount(distinct(dbms_rowid.rowid_block_number(rowid)))from TEMP_UNCOMPRESSED' into blkcnt;pct := pct * 10;end loop;execute immediate 'create table TEMP_COMPRESSED compress asselect * from TEMP_UNCOMPRESSED';execute immediate 'selectcount(distinct(dbms_rowid.rowid_block_number(rowid)))from TEMP_COMPRESSED' into blkcntc;execute immediate 'drop table TEMP_COMPRESSED';execute immediate 'drop table TEMP_UNCOMPRESSED';return (blkcnt/blkcntc);end;/
Ordered vs. Not ordered
The biggest CF increase comes from ordering the data
0
0.5
1
1.5
2
2.5
3
3.5
CF
not ordered dataoredered data
How Data volume affects CF
2.4
2.6
2.8
3.2
3.4
3.6
Com
pres
sion
Fac
tor
1 3 5
Days in one partition
7 9
3.0
Ordered Data
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5Input Data
1 2 3 4Block1
5Block2
4 rowsper blockcompressed
CF=2.5
Each value is compressed in one blockSymbol table contains 1 2 3 4 5Block contains 20 values
1 2 3 4 55 rowsper blockcompressed
CF=4
Block1
Not all values fit into first blockSymbol tables contains
1 2 3 4Block contains 16 values
Symbol tables contains only 5Block contains 4 values
20 values 1 2 3 4 5Ordered1 column row
Sorting can also improve the clustering factor of your indexes.
Not Ordered Data
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5Input Data
20 values 1 2 3 4 5Not ordered1 column row
1 2 3 4 55 rowsper blockcompressed
CF=4
Block1
CF=1
1 2 3 4Block1
5 1 2 3Block2
4 5 1 2Block3
3 4 5 1Block4
2 3 4 5Block5
4 rowsper blockcompressed
Choosing the columns to order by
Sorting on fields with very low cardinality does not necessarily yield to better compression
The optimal columns to sort on seem to be those that have a table/partition-wide cardinality equal to the number of rows per block
Column correlation should be considered
The process is iterative
Know your data
Without a detailed understanding of the data distribution it is very difficult to predict the most optimal order.
Table/partition statistics are useful– dba_tables– dba_tab_partitons
Looking into a particular data block is very helpful
– substr(rowid, 1, 15)
Improving ordering speed Set SORT_AREA_SIZE for the session as big as
possible. Use dedicated temp tbs with big extent size (multiple of SORT_AREA_SIZE + 1 block)
If the sort needs more space: The data is split into smaller sort runs; each piece is
sorted individually. The server process writes pieces to temporary
segments on disk; these segments hold intermediate sort run data while the server works on another sort run.
The sorted pieces are merged to produce the final result.
If SORT_AREA_SIZE is not large enough to merge all the runs at once, subsets of the runs are merged in a number of merge passes.
Agenda
Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
How CF affects FTS performance
Queries are executed against compressed schema and non-compressed schema
Overall query speedup 65%
050
100150200250300350400450500
Tho
usan
ds r
ows
per
seco
nd
non-compressedcompressed CF=3.34
Query Elapsed Time Speedup
The larger the compression factor the larger the elapsed time speedup
Query speedup results from reduction in I/O-operations required– Speedup depends on the weakness of the I/O-
subsystem– Speedup depends on how sparse the blocks are
that the query accesses
Performance impact on loads and DML
On system with unlimited IO bandwidth, data load may be two times longer (even more if data need to be ordered).
Bulk loads are IO-bound on many systems. Deleting compressed data is 10% faster. Inserting new data is as fast as inserting into non
compressed table. UPDATE operations are 10-20% slower for
compressed tables on average, mainly due to some complex optimizations that have been implemented for uncompressed tables, and not yet implemented for compressed tables.
Other Performance Tests
Parallel load performance (CPU)
Delete operation CPU Utilization Update operation CPU Utilization
Delete/Update Performance
FTS Performance
Parallel Full Table Scan CPU Utilization
Parallel Full Table Scan IO Performance
Table Access by ROWID
Agenda
Overview Table Compression How does it work? Test Environment Space Savings Query Performance Conclusion
Best Practices
Use Compression in read intensive applications
Execute bulk loads (SQLLDR and Parallel Insert) to compress rows
Compress older data in large Data Warehouses– Integrate Table Compression into the ‘rolling
window’ paradigm: Compress all but most recent partition
Compress Materialized views Only compress infrequently updated tables
Data normalization and Table Compression
“Normalize till it hurts, denormalize till it works”
High normalization may result in a high number of table joins (bad performance)
Both data normalization and table compression reduce redundancy
Conclusion
Table Compression:– reduces costs by shrinking the database
footprint on disk
– is transparent to applications
– often improves query performance due to reduced disk I/O
– increases buffer cache efficiency
AQ&Q U E S T I O N SQ U E S T I O N S
A N S W E R SA N S W E R S
zyumbyulev@mobiltel.bg
top related