reorg,dms,hwm,log space consumption
TRANSCRIPT
Bill Minor - IBM Toronto Lab
TLU- 1243A Data Servers - DB2 for Linux, UNIX, Windows
DB2 Real Estate – Buy, Invest, Sell, … Reorg?!
2
Highlights
� The cost of disk storage represents a significant portion of the overall expense associated with large database systems. Once purchased, managing that storage can significantly add to the total cost of ownership. Effective management and utilization of disk space is instrumental in
keeping your database Real Estate costs in check.
� The goals of this presentation are to:
� Provide intimate details into the reorg utility
� Provide an overview of Data Management in DB2
� Highlight customer usage scenarios including best practices, monitoring, tuning, autonomics and troubleshooting
� Illustrate the role of reorganization in new Viper features such as Table Partitioning, Data Row Compression,and Large RIDs
3
Agenda� ‘DB2 Real Estate’
� Overview of Reorganization
� Table Compression
� Page and Extent Size Selection
� DMS Tablespace Architecture
� Registry Variables
� High Water Mark
� Large Record Identifiers (RIDs)
� Log Space Consumption
4
A Confession! �
� I am not a Realtor, Financial Analyst, Investment Advisor, StockTrader, Card Counter, Poker Tour Champ, …
� One does not have to be an expert to realize that investing in real estate is a significant proposition
� By analogy … your DB2 ‘storage’ is a critical and valuable investment.
� Just as there are many facets/intricacies/strategies when dealing with Real Estate, so to with your management of DB2 Storage.
5
The Costs of ‘Information Real Estate’
� Hardware, Software, Licensing, Support costs
� Poorly optimized and utilized database
� Database Administration/Management: People costs
� Infrastructure costs: floor space, power, cooling
� Frustration
When it comes to storage, it is estimated that it costs (TCO) $5 for
every $1 spent on physical storage.
6
DB2 ‘Real Estate’
� What?
� Storage objects, DB2 tables
� Table types:
� ‘regular’
� Multidimensional Clustered (MDC)
� Range Partitioned (RPT)
� Range Clustered Tables (RCT)
� Database Partitioned
� Also relevant – tablespace type and characteristics
� SMS vs DMS
� REGULAR vs. LARGE vs. TEMPORARY
� There are many aspects facets to managing DB2 Real Estate – I am going to focus on storage with an emphasis on table reorg or reorganization
7
DB2 Reorganization
� Many changes to table data (INSERTs/UPDATEs/DELETEs) can affect the physical organization of
table and index data to the point where performance is adversely affected
�Goals of REORG:
�Defragment or compact data onto fewer data pages
�Physically recluster data into the same logical sequence as an index
�Eliminate pointer-overflow records
�DB2 9 - build a (new) compression dictionary and to compress the rows in the table
using the compression dictionary
�Conversion to Large Rids
�Schema changes
� The result:
� Access to a reorganized object can be done with minimal I/O and bufferpool misses as well as
with maximum prefetcher effectiveness i.e. maintain or improve query performance
8
Access Modes of Table REORG
�'Offline' ==> "Classic Reorg" (as pertains to Tables)
�ALLOW READ ACCESS (the default)
�ALLOW NO ACCESS (truly 'offline')
�'Online' ==> "Inplace Reorg" (not to be confused with 'in-tablespace reorg' as pertains to classic table reorg)
�ALLOW WRITE ACCESS (the default)
�ALLOW READ ACCESS
ONLINE: Table ONLINE: Table available for full available for full S/I/U/D access during S/I/U/D access during reorgreorg
OFFLINE: Table OFFLINE: Table
available for read only available for read only
accessaccess during during reorgreorg
up to copy phaseup to copy phase
9
Table Reorganization Command (CLP Syntax)
REORG {TABLE table-name Table-Clause} [On-DbPartitionNum-Clause]
Table-Clause:
[INDEX index-name] [[ALLOW {READ | NO} ACCESS]
[USE tablespace-name] [INDEXSCAN] [LONGLOBDATA [USE long-tablespace-name]]
[KEEPDICTIONARY | RESETDICTIONARY]] |
[INPLACE [ [ALLOW {WRITE | READ} ACCESS] [NOTRUNCATE TABLE]
[START | RESUME] | {STOP | PAUSE} ]]
Examples:
db2 reorg table staff index inx1_staff inplace allow write accessdb2 reorg table emp inplace pause on dbpartitionnum(10 to 100)
db2 reorg table emp_resume longlobdata
db2 reorg table department resetdictionarydb2 reorg table payroll index pr1 use tempspace1
10
An Overview of Classic ('Offline') Table Reorg Processing
��Shadow Shadow copy approach
�Tablespace used to hold shadow copy is governed by user (USE clause)
•For DMS tablespaces, implication to the ‘High Water Mark’ (more to come)
�TEMP tablespace is required and it varies (next slide)
�Phases: Dictionary Build, Sort, Build, Replace(or Copy), Index Rebuild
�Dictionary Build: there is an additional scan of the table data if INDEXSCAN specified
�Index build/rebuild is now parallelized in Viper II (no need to set INTRA_PARALLEL cfg)
�Processing Modes:
��Reclustering via Reclustering via table scan sorttable scan sort (default) or (default) or index scanindex scan (via INDEXSCAN clause)(via INDEXSCAN clause)
��Space reclamation (compaction) via table scanSpace reclamation (compaction) via table scan
�LONG/LOB data is not reorged by default
ƒWhen reorged, XML data is not "reorged", only empty pages are removed
11
‘Offline’ Table REORG - TEMP Space Usage
� Recall the phases of table reorg:
� Dictionary Build, Sort, Build, Replace(or Copy), Index Rebuild
� Three of these phases can consume TEMP tablespace
� Sort: table scan sort (default) processing if sort spills to disk
� Build: if the shadow copy is to be built in a temp (USE clause)
� Index Rebuild: if associating sort processing spills to disk
� If multiple temporary tablepaces exist
� The table reorg ‘USE <tempspace>’ clause only guarantees that the specified tempspace is used for the table shadow copy
• Index recreate and scan sort processing can use another available temp space (the choice is governed internally according to temp usage)
12
�Table is scanned and records are sorted in order to create new reorganized version of the
table rather (reclustering index is not scanned)
�A reorg may be required because clustering index isn't well clustered so a table scan sort
will give better I/O characteristics (may be slower for sparse tables where index itself is
somewhat small)
�Caveats: Table scan sort is disabled 'under-the-covers' if
�LONG/LOB data is to reorganized
�Length of sort record is too large (RID is included in sort record)
�Index recreate optimization:
�If reclustering index is SMS type or unique DMS type, recreation of this index will not require a
sort. Rather this index is rebuilt by simply scanning the newly reorganized data table.
�Any other indexes that require recreation will involve a sort
�If just reclustering index (of the required type) exists no temp space considerations in this case
'Offline' Table Reorg Reclustering:Table Scan Sort (default)
13
'Offline' Table REORG - Scan Sort Temp Storage
USERSPACE1 TEMPSPACE1 USERSPACE1 TEMPSPACE1
db2 reorg table T1 index I1 db2 reorg table T1 index I1 use TEMPSPACE1
T1
SHADOW
TDASPILL
TDASPILL
TDAMERGE
TDAMERGE
T1
SORT
SHADOW
3x TEMP
2x TEMP
14
�Inplace Table Reorganization
�Rows moved within existing table object to re-establish clustering, reclaim free space, and eliminate overflows
�Executes as asynchronous background application (process name - db2reorg)
�Table must be at least 3 pages in size
�Cannot inplace reorg LONG/LOB data (use 'offline' reorg)
�Attributes:
�Minimal extra storage requirement
�Incremental: benefit of effects seen immediately
�No iterative log processing phase
�Table quiesce for object 'switch over' at end can be avoided
�Think of it as a Trickle Reorg
Inplace or ‘Online’ Table Reorganization
15
Online Table Reorganization
VACATE PAGE RANGE: MOVE & CLEAN to make space
FILL PAGE RANGE: MOVE & CLEAN to fill space
free
space
TIME
Reclustering: vs. Space Reclamation:
Move rows from end of table, filling up holes at the start
�Backward scan starts at end, fills holes earlier in table identified by simultaneous forward
scan
�Uses clustering index during FILL phases
VACATE PAGE RANGE: MOVE & CLEAN to make space
db2 reorg table t1 index i1 inplace db2 reorg table t1 inplace
16
REORGCHK - Table Statistics
F1: 100 * OVERFLOW / CARD < 5
�The total number of Overflow records in the table should be less than 5%
F2: 100 * (Effective Space Utilization of Data Pages) > 70
�There should be less than 30% free space in the table
F3: 100 * (Required Pages / Total Pages) > 80
�The number of pages that contains no rows at all should be less than 20% of the total number
of pages in the table
Table statistics:
F1: 100 * OVERFLOW / CARD < 5
F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80
SCHEMA NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG
----------------------------------------------------------------------------------------
Table: BMINOR.STAFF3
6144 0 153 153 - 276480 0 45 100 -*-
----------------------------------------------------------------------------------------
db2 reorgchk on table bminor.staff3
17
Classic vs Inplace Table Reorg
No truncation
Clustering index vs no clustering index
Scan/sort vsindex scan
Key Options
Table reorganized within original table storage
If TEMP tablespacespecified, table rebuilt there, then copied back
Else, table rebuild directly in original tablespace.
Storage
1) Move rows
2) Truncate (opt)
0) Dictionary built (if necessary)
1) Sort
2) Build Table
3) Replace/Copy
4) Index Rebuild
Phases
By default, table is available for R/W access
If/when truncate is done, table is available for read access
Neither indexes nor LOB/LF/XML
reorganized
Trickle row movement technique:
Moves rows within existing table to re-establish clustering and/or pack rows so as to reclaim space.
InplaceReorg
By default, table is available for read until phase 3)
Can select no access
Indexes always reorganized
LOB/LF/XML
optionally reorganized
Shadow copy technique: rebuilds table in different storage; indexes rebuilt and truncated to new size
Classic Reorg
Avail-abilityOther Objects
ApproachReorg Mode
18
Locks Acquired for Table Reorg
-IX Tablespace Lock
- IS Table Lock
- X Alter Table Lock
-S Row Lock on rows moved/cleaned
- Upgrade to S Table Lock
to prepare for Truncation
-Special Z Table Lock for drain/wait on Truncate
- IS Table Lock
- NS Row Lock
Inplace Table Reorg
- IX Tablespace Lock
- U Table Lock
- Upgrade to Z Table Lock for Copy Phase
- IS Table Lock
- NS Row Lock
Classic Table Reorg
Table Being ReorganizedCatalog Locks
(SYSCAT.TABLES)
Table Reorg Mode
19
Table Reorganization Support Matrix
Not supported (not needed for reclustering)
Not supportedFully supported (can be invoked on all or specified DB partitions)
Inplace Reorg
Fully supportedSupported(invoked on all table partitions)
Fully supported (can be invoked on all or specified DB partitions)
Classic Reorg
MDCTable
Partitioning
DPFReorg Mode
20
Monitoring Table REORGs�Table Snapshot
�db2 get snapshot for tables on SAMPLE
�db2pd tool
�db2pd -db SAMPLE -reorgs file=reorg_pd.out
�(db2pd -db SAMPLE -tcbstats)
�(db2pd -db SAMPLE -mempools)
�LIST HISTORY
�db2 list history reorg all for SAMPLE
�View and Table Functions
�db2 select * from sysibmadm.snaptab_reorg
�db2 select * from table(sysproc.snap_get_tab_reorg('SAMPLE', dbpartitionnum)) as tb
�db2 select * from table(sysproc.admin_list_hist( )) as listhistory
�db2 select * from table(sysproc.admin_get_tab_info(‘<schema>’, ‘<tabname>’)) as t
�db2 select * from table(sysproc.admin_get_compress_ tab_info(‘<schema>’, ‘<tabname>’,
‘<exec-mode>’)) as t
�Administrator Notification Log
�$HOME/sqllib/db2dump/<instance_name>.nfy
21
db2 list history reorg all for SAMPLE
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
G T 20070313103600 N S0000022.LOG
----------------------------------------------------------------------------
Table: "BMINOR "."STAFF2"
----------------------------------------------------------------------------
Comment: REORG START
Start Time: 20070313103600
End Time: 20070313103600
----------------------------------------------------------------------------
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
G T 20070313103729 N S0000023.LOG
----------------------------------------------------------------------------
Table: "BMINOR "."STAFF2"
----------------------------------------------------------------------------
Comment: REORG Done
Start Time: 20070313103729
End Time: 20070313103729
----------------------------------------------------------------------------
REORG Table - History File (Example)
Operation=
REORG
'Online'
REORG
Table
REORG
REORG
Status
NOTE: "Comment"
field reports REORG
Status only for
'online' case. For
'offline' it specifies
reclustering index
and temp space ids.
("N"=Online, "F"=Offline)
("T"=Table,"I"=Index)
Log file being
written to when
REORG started
Log file being
written to when
REORG completed
22
REORG Monitoring - Table Snapshot
Table Schema = <140><BMINOR >
Table Name = TEMP (00007,00002)
Table Type = Temporary
Data Object Pages = 820
Rows Read = Not Collected
Rows Written = 72178
Overflows = 0
Page Reorgs = 0
Table Schema = BMINOR
Table Name = STAFF3
Table Type = User
Data Object Pages = 1184
Index Object Pages = 190
Rows Read = Not Collected
Rows Written = 20736
Overflows = 0
Page Reorgs = 0
Table Reorg Information:
Reorg Type =
Reclustering
Table Reorg
Allow Read Access
Recluster Via Table Scan
Reorg Data Only
Reorg Index = 1
Reorg Tablespace = 2
Start Time = 02/26/2007 13:48:48.908388
Reorg Phase = 1 - Sort
Max Phase = 5
Phase Start Time = 02/26/2007 13:48:48.923862
Status = Started
Current Counter = 986
Max Counter = 1183
Completion = 0
End Time =
db2 reorg table staff3 index i3
Reclustering reorg via table scan sort
Sort phase (phases only applicable to offline reorg)
Total number of phases to occur:
Dictionary Build,Sort, Build, Replace,
Index Recreate
Progress indicator - currently 83% complete (986/1183x100)
ID of index being used to recluster by
'Offline' reorg (read access up until Replace phase)
(Temp table for spilling sort)
23
ADMIN_LIST_HIST( ) Table Function - DPF Example
Database Connection Information
Database server = DB2/AIX64 9.1.2
SQL authorization ID = BMINOR
Local database alias = SAMPLE
DBPARTITIONNUM OBJECTTYPE SQLCODE START_TIME
-------------- ---------- ----------- --------------
0 T - 20070303152449
1 record(s) selected.
myhost: db2 connect to sample completed ok
Database Connection Information
Database server = DB2/AIX64 9.1.2
SQL authorization ID = BMINOR
Local database alias = SAMPLE
DBPARTITIONNUM OBJECTTYPE SQLCODE START_TIME
-------------- ---------- ----------- --------------
100 T -964 20070303152118
1 record(s) selected.
myhost: db2 connect to sample completed ok
db2_all "db2 connect to sample; db2 select
dbpartitionnum,objecttype,sqlcode,start_time
from table'(('sysproc.admin_list_hist'())'
as listhistory where operation=\\'G\\'"
OLR encounter sql0964 - 'log full', on
dbpartitionnum 100
24
'Offline' or 'Online' Table REORG?
'Offline' Table REORG:
�PROS:
�Provides the fastest table reorganization especially if LOBs/LONGs are not required to be reorged (if they are
only classic reorg supported for reorging LONG/LOBs)
�Indexes are rebuilt once the table has been reorganized
�Original version of table can be read only up until the last phase of reorg (replace phase)
�The only way to rebuild a new compression dictionary, and/or to compress all rows in table using the existing or
newly created compression dictionary
�CONS:
�Large space requirement: shadow copy approach so need approximately twice as much space as the original
table
�Limited access: read-only until Replace/Copy phase
�All-or-nothing process
�Can only be stopped by the app or user who understands how to stop the process
�Recommendation: Choose this method if you can reorganize tables during a maintenance
window
25
'Offline' or 'Online' Table REORG?
'Online' Table REORG:
�PROS:
�Allows apps to access the table while executing
�Can be paused and resumed
�Runs asynchronously
�Requires less working storage since table is incrementally processed
�CONS:
�Slower than Classic method (~10-20x)
�Only allowed for tables with type-2 indexes
�Cannot reorganize LONG/LOBs
�Indexes are maintained, not rebuilt, so index reorganization may subsequently be required
�Requires more log space
�Recommendation: Choose this method for 24x7 operations with minimal
maintenance windows
26
Additional REORG Notes
�Different tables can be reorged simultaneously as long as no resource constraints or limitations
�Restriction: for offline table reorg, DMS temp spaces cannot be shared by simultaneous reorgs
�If the table contains mixed row format because the table value compression has been activated or
deactivated, an offline table reorganization can convert all the existing rows into the target row format
�If the table is partitioned onto several database partitions, and the table reorganization fails on any of the
affected database partitions, only the failing database partitions will have the table reorganization rolled
back
�The granularity of table reorg is at the Database Partition level not the Table Range Partition level
�Table Ranges are reorg sequentially one after the other and global indexes rebuilt once all ranges have been
reorganized
27
Reducing the Need to Reorganize Tables�ALTER TABLE to add PCTFREE space to each data page
ƒConsidered only by the load and table reorg. Range is from 0 to 99% with default value of 0
�Sort the data
�Load the data
�Creating multi-dimensional clustering (MDC) tables
ƒ (For MDC tables, clustering is maintained on the columns that you specify as arguments to the ORGANIZE BY
DIMENSIONS clause of the CREATE TABLE statement. However,
REORGCHK might recommend reorganization of an MDC table if it considers that there are
too many unused blocks or that blocks should be compacted)
�APPEND mode tables
� If the index key values of these new rows are always new high key values for example then the clustering attribute
of the table will try to place them at the end of the table. Having free space in other pages will do little to preserve
clustering. Hence, placing the table in append mode may be a better choice than a clustering index
�Automatic Dictionary Creation on Table Growth (TLU-1242A)
� Dictionary created as table is populated and reaches a certain threshold in size (Viper II)
28
Online Index Reorg : OverviewREORG {TABLE table-name Table-Clause | INDEXES ALL FOR TABLE table-nameIndex-Clause} [On-DbPartitionNum-Clause]
Index-Clause:[ALLOW {READ | NO | WRITE} ACCESS][{CLEANUP ONLY [ALL | PAGES] | CONVERT}]
Goals: � To improve physical clustering� Remove fragmentationThe table and original index are available for concurrent transactions
There are 4 phases involved in OLIR:� Build Phase:
� All indexes on the table are rebuilt in a new index storage object – a “shadow object” (as opposed to the “ghost index”)
� Log Catch up Phase: � The catch up is done for all indexes on the table
� Object Switch Phase:� Super exclusive table lock acquired� “Shadow object” becomes THE index object
� Cleanup Phase:� Old index object removed
29
Online Index Reorg : Table Partitioning and MDC Notes and Limitations
� MDC
� REORG with ALLOW WRITE not supported
• Note: ALLOW READ is supported
� Table Partitioning
� Supports ability to reorg individual indexes (as opposed to ALL indexes of a table)
• Supported in all availability modes (ALLOW NONE, ALLOW READ, ALLOW WRITE)
• Natural thing to do, since with table partitioning, each index for the table is in it’s own storage object (and OLIR operates on a storage object basis)
� Also supports REORG INDEXES ALL in ALLOW NONE
30
OLIR Hints & Tips
� Enlarge the util_heap_sz if you see ADM9500W in the Administration Notification Log (it will also appear in the db2diag.log)
� Informational log records are buffered in the utility heap
� If the utility heap is exhausted performance will suffer as the catch up phase will involve reading log files, and, possibly, retrieving them from archive
� Ensure the tablespace is large enough for the shadow/ghost object/index
� Remember for Reorg, the shadow object will contain all indexes, so will require (very approximately) the same amount of space as the current index object on the table
� For Create, the ghost index will simply require the space for the newly created index
� Use LARGE tablespaces
� Ensure you commit as soon as possible after index creations
� Minimizes time table S lock held
31
Locking Associated with Online Index Reorg
-“ALLOW NO ACCESS”: Z lock on table
-“ALLOW READ ACCESS”: S lock on table
-“ALLOW WRITE ACCESS”:IN lock on table
-S drain lock for each index (all writers must be aware)
-S lock at end to perform final catch-up
-Quiesce concurrent writers: Z lock to perform index switch
- IS Table Lock
- NS Row Lock
Online Index Reorg
Table for Indexes Being ReorganizedCatalog Locks
(SYSCAT.TABLES)
Reorg Mode
32
Reorgchk Index Statistics
Index statistics:
F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
The clustering ratio of an index should be greater than 80%(Low cluster ratio means index sequence not the same as table sequence)
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
Less than 50% of the space reserved for index entries should be empty
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
Determine if recreating the index would result in a tree having fewer levels
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
The number of pseudo-deleted RIDs on non-pseudo-empty pages should beless than 20 percent
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20
The number of pseudo-empty leaf pages should be less than 20 percent of the total number of leaf pages
33
2007-03-12-22.26.22.319328 Instance:bminor Node:000
PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618
relation data serv sqlrreorg_indexes Probe:15 Database:SAMPLE
ADM9501W BEGIN online index reorganization on table "BMINOR .STAFF" (ID "3")
and table space "USERSPACE1" (ID "2").
^^
2007-03-12-22.26.22.322948 Instance:bminor Node:000
PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618
relation data serv sqlrreorg_indexes Probe:18 Database:SAMPLE
ADM9503W Online index reorganization proceeds on index ID "1" in table "BMINOR
.STAFF" (ID "3") and table space "USERSPACE1" (ID "2").
^^
2007-03-12-22.26.22.345025 Instance:bminor Node:000
PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618
relation data serv sqlrreorg_indexes Probe:18 Database:SAMPLE
ADM9503W Online index reorganization proceeds on index ID "2" in table "BMINOR
.STAFF" (ID "3") and table space "USERSPACE1" (ID "2").
^^
2007-03-12-22.26.22.382075 Instance:bminor Node:000
PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618
relation data serv sqlrreorg_indexes Probe:31 Database:SAMPLE
ADM9502W END online index reorganization on table "BMINOR .STAFF" (ID "3")
and table space "USERSPACE1" (ID "2").
Monitoring Online Index Reorg - Administrator Log
<instanceName>.nfy
NOTIFYLEVEL=3 (default)
34
DB2 9 Deep Compression
Estimate Compression:
INSPECT ROWCOMPESTIMATE
Enable Compression:
CREATE TABLE …. [COMPRESS {NO | YES}]
ALTER TABLE … [COMPRESS {NO | YES}]
Compress a table:
REORG TABLE <tabname> … [KEEPDICTIONARY | RESETDICTIONARY]
(Session TLU-1242: Deep Compression)
35
INSERT
LOAD
EMPTY TABLE Uncompressed Row Data Compressed Row Data
Dictionary
INDEX
COMPRESS YES
Table REORG – Deep Compression
TableREORG
36
Administrative Table Function – ADMIN_GET_TAB_INFO( )
db2 describe "select * from table(sysproc.admin_get_tab_info(‘BILLM','STAFF')) as t"
� TABSCHEMA
� TABNAME
� TABTYPE
� DBPARTITIONNUM
� AVAILABLE
� DATA_OBJECT_P_SIZE
� DATA_OBJECT_L_SIZE
� INDEX_OBJECT_P_SIZE
� INDEX_OBJECT_L_SIZE
� LONG_OBJECT_P_SIZE
� LONG_OBJECT_L_SIZE
� LOB_OBJECT_P_SIZE
� LOB_OBJECT_L_SIZE
� XML_OBJECT_P_SIZE
� XML_OBJECT_L_SIZE
� INDEX_TYPE
� REORG_PENDING
� INPLACE_REORG_STATUS
� LOAD_STATUS
� READ_ACCESS_ONLY
� NO_LOAD_RESTART
� NUM_REORG_REC_ALTERS
� INDEXES_REQUIRE_REBUILD
� LARGE_RIDS
� LARGE_SLOTS
� DICTIONARY_SIZE
37
ADMIN_GET_TAB_COMPRESS_INFO ( )
SMALLINTAVG_COMPRESS_REC_LENGTH
SMALLINTBYTES_SAVED_PERCENT
SMALLINTPAGES_SAVED_PERCENT
INTEGERROWS_SAMPLED
BIGINTEXPAND_DICT_SIZE
BIGINTCOMPRESS_DICT_SIZE
TIMESTAMPDICT_BUILD_TIMESTAMP
VARCHAR(30)DICT_BUILDER
CHAR(1)COMPRESS_ATTR
INTEGERDATA_PARTITION_ID
SMALLINTDBPARTITIONNUM
VARCHAR(128)TABNAME
VARCHAR(128)TABSCHEMA
Data TypeColumn
New to Viper II
38
Page Size Selection
� Default DB page size is 4K, can override on CREATE DATABASE
CREATE DATABASE SAMPLE PAGESIZE 16 k� Must always have a system temporary table space with a page size that matches the catalog table
space (SYSCATSPACE) page size
� All CREATE BUFFERPOOL and CREATE TABLESPACE statements will default to the database page size unless explicitly specified
� Larger page sizes allow� Larger capacity limits for objects (REGULAR or LARGE table spaces)
� Longer rows in tables, larger keys in indexes (25% of page size)
� Fewer logical and/or physical page reads (more things on each page)
� Smaller page sizes allow� Possibly less page contention (fewer rows/keys on each page)
� Possibly better I/O behavior for pure OLTP environments
� page size * extent size == space per block for MDC tables
�Very, very important to prevent sparse blocks/cells
39
Extent Size Selection
� Unit of disk allocation to table storage objects
� Allocate an extent of “page size” pages on init and extend of objects
� Round robin approach across all containers
� SMS allocates page by page until size exceeds 1 extent
� DMS table spaces have an EMP (Extent Map) storage object for each table storage object – 2 extents minimum per object (data, index)
� Larger extents allow for
� Less frequent allocations during growth
� Less frequent EMP mapping during table scan
� Best for large tables
� Smaller extents allow for
� Most optimal storage, less waste due to partial extents being used
� Less storage for empty or very small tables
� extent size == block size for MDC tables
� Very, very important to prevent sparse blocks/cells
40
DMS Tablespace Architecture – ‘The Extent of Extents’
0
1
2
First Extent of SMPs
Tablespace Header
Object Table Extent
xxxx
T1
z
xx
T1 T1 T2
EMP T1
DAT T1
EMP T1
DAT T1
DAT T1
EMP T2
DAT T2
xxxxx
xxxxxxxxx
3
4
5
create tablespace dms1 managed by database using (file ‘c.1’ 50000) extent size 4 create table t1
insert into t1…
-create table t2-load into t1-load into t2
DAT T1
DAT T2
zz zzz
yy
31968
Extent Map for T1
First Extent of Data Pages for T1
5
6
7
8
9
41
DB2_OBJECT_TABLE_ENTRIES Registry Variable
Tablespace Header
Object Table Extent
xxFirst Extent of SMPs
Object Table Extent
Object Table Extent
Tablespace Header
Object Table Extent
xxxx First Extent of SMPs
db2set DB2_OBJECT_TABLE_ENTRIES=nnnnn
Specifies the expected number of objects in a table space. If you know that a large number of objects (for example, 1000 or more) will be created in a DMS table space, you should set this registry variable to the approximate number before creating the table space. This will reserve contiguous storage for object metadata during table space creation. Reserving contiguous storage reduces the chance that an online backup will block operations which update entries in the metadata (for example, CREATE INDEX, IMPORT REPLACE). It will also make resizing the table space easier because the metadata will be stored at the start of the table space.
42
DB2_TRUNCATE_REUSESTORAGE Registry Variable
db2set DB2_TRUNCATE_REUSESTORAGE=IMPORT
� You can use this variable to resolve lock contention between the IMPORT with REPLACE command and the BACKUP ... ONLINE command. In some situations, online backup and truncate operations are unable to execute concurrently. When this occurs, you can set DB2_TRUNCATE_REUSESTORAGE to "IMPORT" or "import", and physical truncation of the object, including data, indexes, long fields, large objects and block maps (for multi-dimensional clustering tables), is skipped and only logical truncation is performed. That is, the IMPORT with REPLACE command empties the table, causing the object's logical size to decrease, but the storage on disk remains allocated.
� This registry variable is dynamic; you can set it or unset it without having to stop and start instance. You can set DB2_TRUNCATE_REUSESTORAGE before an online backup starts and then unset it after online backup completes. For multi-partitioned environments, the registry variable will only be active on the nodes on which the variable is set. DB2_TRUNCATE_REUSESTORAGE is only effective on DMS permanent objects.
� admin_get_tab_info( )� DATA_OBJECT_P_SIZE vs DATA_OBJECT_L_SIZE
43
The "High Water Mark" (HWM)
�It is the page number of the highest allocated page in a DMS tablespace
�HWM is impacted by:
ƒOffline' REORG of a table within the DMS tablespace that the table resides in
ƒIndex REORG with either ALLOW READ ACCESS or ALLOW WRITE ACCESS
�HWM affects:
ƒRedirected Restore - redefinition of containers allowing tablespace to shrink in size; cannot be shrunk lower
than HWM
ƒDropping or reducing the size of container via ALTER TABLESPACE only affects extents above the HWM
DMS PERM
TABLESPACE
T1 T1
T1'
db2 reorg table T1
T1SHADOW
HWM
44
The "High Water Mark" (HWM)
�If no free extents below the HWM then the only way to reduce the HWM is to drop the object holding it up
�db2dart /DHWM
�Displays detailed tablespace information including which extents are free, which are in use and what
object is using them as well as information about the object holding up the HWM
�db2dart /LHWM
�provides guidance as to how the HWM might potentially be lowered
�If DMS table data object holding up HWM then 'offline' REORG of table within the DMS tablespace that the
table resides can be used to lower the HWM if enough free extents exist below the HWM to contain the
shadow copy
�If DMS index object holding up HWM, index reorg may be able to reduce HWM
�db2dart /RHWM
�If empty SMP extent holding up HWM
http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&q1=high+water+mark&uid=swg21234267&loc=en_US&cs=utf-8&lang=en
�Viper II: ALTER TABLESPACE REDUCE and Online Backup will remove these
45
Large RID – the new default in DB2 9
� RID – Row Identifier
� A reference to the location of a row in a table
� Contains the page number and the slot number (location on page)
� Before DB2 9
� RID is 4 bytes, 3 byte page number and 1 byte slot number
� Default table space data type was REGULAR
� Tables (data part) could not be placed in LARGE table spaces
� DB2 9
� New 6 byte RID, 4 byte page number and 2 byte slot number
� Infrastructure - runtime, sections, sort, log records, locks – all large RID
� Default table space data type for DMS table spaces is now LARGE
� Tables can now be placed in LARGE table spaces
� Indexes contain regular or large RIDs only, based on the table space type where the table data is stored; it has nothing to do with the type of table space where the index resides
46
Large RIDs – More pages, More rows, Bigger Tables
16 TB256 GB32 KB
8 TB256 GB16 KB
4 TB128 GB8 KB
2 TB64 GB4 KB
6 Byte RID
(‘Large RIDs’)
4 Byte RIDPage Size
Maximum tablespace size by page size
New to
DB2 9(default)
For tables in all tablespacetypes: regular, temporary, DMS, SMS
For tables in LARGE table spaces (DMS only). Also all SYSTEM and USER temporary table spaces
More Pages:
More Rows per Page:
23351225312732 KB
1165122546216 KB
58012253308 KB
28712251144 KB
LARGE
TBSP
Max
Records
LARGE
TBSP
Min Rec
Length
REG
TBSP
Max
Records
REG
TBSP
Min Rec
Length
Page Size
Maximum number of rows:Large RIDs - 1.1x1012
4 byte RIDs - 4x109
Maximum rows per page by page size
47
Converting Existing Tablespaces To LARGE
� ALTER TABLESPACE <name> CONVERT TO LARGE� New option must be the only option, cannot be combined with other alter capabilities� Fully logged and supports ROLLBACK and RESTORE/ROLLFORWARD
� If table space is defined with AUTORESIZE YES� If MAXSIZE is NONE, then growth of the table space is automatic!� Else MAXSIZE is restricting table space growth and should be increased
� Otherwise, storage has to be increased to benefit from a larger capacity� Enable AUTORESIZE or� Add a new stripe set or� Extend existing containers
� New tables created will fully support large RIDs, both page and slot numbers
� Previously existing tables continue to be restricted to ~255 rows/page and to 3 byte page numbers until a reorganization of the table or indexes occur� SQL1236N Table "<table-name>" cannot allocate a new page because the index with identifier
"<index-id>" does not yet support large RIDs
BEST PRACTICE:
� Perform the ALTER TABLESPACE during upgrade/migration
� Be pro-active in rebuilding indexes on tables (or reorganizing tables) afterwards
48
Large RIDs - What Actions Need To Be Taken?� The table will not support a larger 4 byte page number until all indexes on the table
support large RIDs
� SELECT TABNAME, TABSCHEMA, DBPARTITIONNUM FROM TABLE (ADMIN_GET_TAB_INFO( '', '' )) AS T WHERE LARGE_RIDS = ‘P’
� The table will not support >255 rows (slots) per page until the table itself has been reorganized with the classic/offline REORG TABLE
� SELECT TABNAME, TABSCHEMA, DBPARTITIONNUM FROM TABLE (ADMIN_GET_TAB_INFO( '', '' )) AS T WHERE LARGE_SLOTS = ‘P’
� Can my table benefit from large slots (more rows per page)?
� SELECT TABSCHEMA, TABNAME, AVGROWSIZE FROM SYSCAT.TABLES
If the (average row size - 2) for a table is smaller than the minimum record lengthfor the page size used, then there could be storage benefits when converting thetable space to large and reorganizing the table to enable large slots
49
Log Consumption – INSERT and DELETE
� Row images are logged so that DB2 can redo or undo actions
� Real log space from active log is written and consumed
� Virtual log space from active log is reserved for rollback
� INSERT
� Row image being inserted is logged (required for redo!)
� Reserve log space for “delete” on undo
• Space for row image is not required in reserved space
� DELETE
� Row image being deleted is logged (required for undo!)
� Reserve log space for “insert” on undo
• Space for row image is required in reserved space
� When row compression is active, the row images are compressed, resulting in fewer bytes logged, reserved, and less log files usage
50
Log Consumption - UPDATE
� There are three different types of UPDATE log records written by DB2:
1. Full before and after row image logging.
� The entire before and after image of the row is logged. This is the only type of logging
performed on tables enabled with DATA CAPTURE CHANGES.
2. Full XOR logging.
� The XOR differences between the before and after row images, from the first byte that is
changing until the end of the smaller row, then any residual bytes in the longer row.
3. Partial XOR logging.
� The XOR differences between the before and after row images, from the first byte that is
changing until the last byte that is changing. Byte positions may be first/last bytes of a
column. Row images must be the exact same length.
51
Log Consumption – UPDATE Examples
1. Full before and after row image logging (DATA CAPTURE CHANGES)
2. Full XOR logging (row length changing updated)
3. Partial XOR logging (row length does not change)
24355TXPlano10000500Fred 24355TXPlano10000500John
24355TXPlano10000500John24355TXPlano10000500Fred
24355TXPlano10000500Fred 24355TXPlano10000500Frank
0111011010100101001100100101001010101011010010101010110101010101010
24355TXPlano10000500Fred 24355TXPlano10000500John
110110101001010011
52
Log Consumption – Full XOR Logging Details
� When the total length of the row is changing, which is common when variable length columns are updated and also when row compression is enabled, DB2 will determine
which byte is first to be changing and log a Full XOR log record.
1. Full XOR logging (length change) with changed column at/near beginning of row
2. Full XOR logging (length change) with changed column at/near end of row
24355TXPlano10000500Fred 24355TXPlano10000500Frank
0111011010100101001100100101001010101011010010101010110101010101010
24355 FredTXPlano10000500 24355 FrankTXPlano10000500
01110110101001010011
53
Log Consumption – Partial XOR Logging Details
� When the total length of the row is not changing, even when row compression is enabled, DB2 will compute and write the most optimal Partial XOR log record possible.
1. Partial XOR logging (no length change) with a gap between columns being
changed
2. Partial XOR logging (no length change) with no gap between columns being
changed
24355TXPlano10000500Fred 24355TXPlano12345500John
1101101010010100110101
11011010100000000000000001001001
24355TXPlano12345John50024355TXPlano10000Fred500
54
Log Consumption – Best Practices
� Columns which are updated frequently (changing value) should be:
� grouped together
� defined towards or at the end of the table definition
� These recommendations are independent of
� Row compression
� Row format (default or null/value compression)
� The benefit would be:
� better performance
� less bytes logged
� less log pages written
� smaller active log requirement for transactions performing a large number of updates.
55
Summary
� Real Estate is a BIG investment
� Knowing details about your DB2 ‘Real Estate’ will allow you to better leverage that investment
� With DB2 9 (Viper) and Viper II (DB2 9.5) significant new functionality has been developed to help with the management of storage
� Going forward, one can expect the trend to continue
56
THANK YOU!!!.
Your Feedback is greatly appreciated.
DB2 Real Estate – Buy, Invest, Sell, … Reorg?!
Bill Minor - IBM Toronto Lab
TLU- 1243A Data Servers - DB2 for Linux, UNIX, Windows