db2 war stories and scary tales
DESCRIPTION
TRANSCRIPT
May 9, 2007 3:00 p.m. – 4:00 p.m.
Platform: DB2 for z/OS
DB2 War Stories and Scary Tales (Part 1)
Robert Goodman
Certified DB2 ProfessionalCertified Business Continuity Planner
Session: A11
Page: 2
DB2 War Stories & Scary Tales
Major Areas
Database Foundation Stones
SQL Exposures
Questionable Strategies
DDL Exposures
Operational Exposures
Page: 3
DB2 on z/OS – A World Class Act!
Page: 4
Database Foundation Stones
DataDataIntegrityIntegrity
DataDataSecuritySecurity
DataDataRecoverabilityRecoverability
DataDataConcurrencyConcurrency
PerformancePerformance& Scalability& Scalability
Page: 5
Where Shall We Begin?
Chapters
Fear of Commitment
Trigger Happy
Rotate Roulette
Extents Wide Open
DB2 War Stories
And Scary Tales
By
DATABASE BOB
Think Tank Publications
Page: 6
Fear of Commitment
Chapter 1
Page: 7
Cycling DB2
DB2Stop
StartDB2
Start: IRLM DM DDFRecover: Unresolved URs
Start: IRLM DM DDFRecover: Unresolved URs
AppsUp
When DB2 Does Lots of Recovery
AppsUp
DB2Crash
Page: 8
Once Upon A TimeIn the kingdom of DB2, the time came for a systems downtime. Many tasks had been busy during the week. It came time to cycle DB2. It was cancelled, but was naughty and wouldn’t come down. After two hours, the operator forcefully killed DB2. Maintenance was applied and the system was brought up. DB2 lingered and wouldn’t wake up. After three hours, the operator killed DB2 for a second time. We tried once again to wake up DB2. This time we asked the wizards at IBM how to slay the problem. They declared that we should just let it run. Twelve hours later DB2 came up. Why did it take so long? And can this happen even today?
Page: 9
What Happened and Why?
Sparse Updating Long URs
XX XX XX XX XX XX XX
DA
Y1
DA
Y1
DA
Y2
DA
Y2
DA
Y3
DA
Y3
ForceDB2
Log1 Log2 Log3 Log1 Log2
A1 A2
- DB2 Log Records -- DB2 Log Records -1) Do Records1) Do Records2) Undo Records2) Undo Records
- DB2 Log Records -- DB2 Log Records -1) Do Records1) Do Records2) Undo Records2) Undo Records
DB2 Restart:Recover Incomplete URs
DB2 Restart:Recover Incomplete URs
DB2Up
Page: 10
DB2 Restart
X
Forward RecoveryForward Recovery Backward Recovery
Backward Recovery
AppAppApp
Log
Deferred Restart
A1
App
LogApplicationsAccess DB
(Quicker)
AppAppAppX RECP
Start D
B2
Start D
B2
Restart C
om
plete
Restart C
om
plete
ApplicationsAccess DBS
tart DB
2S
tart DB
2
Restart C
om
plete
Restart C
om
plete
Forward RecoveryForward Recovery Backward RecoveryBackward Recovery
AppAppApp
Log
Without Deferred RestartA1
App
Log
AppAppApp AppAll Recoveries Must Complete
Page: 11
Turning On Deferred Restart
LIMIT BACKOUT Parameter
AUTO – Automatically recovers once DB2 is up
YES – RECOVER POSTPONED command resumes recovery
NO – Process all inflight and inabort URs
BACKOUT DURATION Parameter
log records during backward recovery before deferring
Page: 12
DB2 Restart Questions
• Does deferred restart always work?• In rare cases it fails
• Won’t deferred restart fix all my problems?• Normally• Deferred pagesets still need recovery
• What is status of pagesets after restart?• Most pagesets are available• Deferred pagesets are unavailable
• What is the exposure?• Applications which use deferred
pagesets will fail
• How can we detect long running URs?• DB2 log message - DSNR035I
UNCOMMITTED UR AFTER ### CHECKPOINTS
• Can I automatically cancel long running URs?• Netview can be used to do this
Page: 13
Trigger Happy
Chapter 2
Page: 14
The Trigger Concept
Program1 Program1
SQL1SQL1
MasterSQLSQL SQLSQL
AfterTrigger
AfterTrigger
Database InvokesTrigger
Trigger Facts:• Programs are unaware• Synchronous• Part of UR• Firing cost of a FETCH• Add in trigger SQL• Plus cost of trigger work• Adds to SQL elapsed
Trigger Facts:• Programs are unaware• Synchronous• Part of UR• Firing cost of a FETCH• Add in trigger SQL• Plus cost of trigger work• Adds to SQL elapsed
CreateTrigger
CreateTrigger
SQL1SQL1
Page: 15
Once Upon A TimeThe king declared that dashboards would help rule the kingdom. This was a daunting task for many programs had to be changed. Triggers came to the rescue, they were quick and easy. Since 1 trigger was good, many were even better. They multiplied like rabbits and soon the whole kingdom was full of triggers. The word came down from on high that things were slow. Many programs were dragging but no changes had been made. They noticed that when triggers were added, darkness descended upon the kingdom. What had gone wrong? And how could it be fixed?
Page: 16
Multiple Triggers
Program
Master
SQL Elapsed Time
SQLSQL
Database InvokesTrigger
BeforeTrigger
BeforeTrigger
AfterTrigger
AfterTrigger
AfterTrigger
AfterTrigger
Multiple Triggers• All are synchronous• All in UR• Multiple trigger’s SQL• Fire one after another• Fired in timestamp order• Serially add to elapsed time
Multiple Triggers• All are synchronous• All in UR• Multiple trigger’s SQL• Fire one after another• Fired in timestamp order• Serially add to elapsed time
SQLSQL
Page: 17
Triggers & Stored Procedure
Program
SQLSQL
DB2MSTR
SQL Elapsed Time
SQLSQL
Database InvokesTrigger
AfterTrigger
AfterTrigger
DDF
Inv
ok
eIn
vo
ke
StoredProcedure
StoredProcedure
Triggers & Stored Procedures• Synchronous • SP program load time• SP execution time• Can make calls outside DB2• Greatly extends total times
Triggers & Stored Procedures• Synchronous • SP program load time• SP execution time• Can make calls outside DB2• Greatly extends total times
Page: 18
Program
Stored Procedure w/Transition Tables
SQLSQL
MSTR
SQL Elapsed Time
SQLSQL
Database InvokesTrigger
AfterTrigger
AfterTrigger
DDF
Inv
ok
eIn
vo
ke
StoredProcedure
StoredProcedure
Using Transition Tables• Synchronous• Transition tables
• Create table time• Use table time• Delete time
• SP program load time• SP execution time• Calls outside DB2 time• Adds significantly to times
Using Transition Tables• Synchronous• Transition tables
• Create table time• Use table time• Delete time
• SP program load time• SP execution time• Calls outside DB2 time• Adds significantly to times
DM
Page: 19
How Expensive Are Triggers?
Fire Trigger – Cost of a FETCH
+ Trigger SQL – Cost of SQL
+ WHEN – Invoked every time trigger event happens
+ Transition Variables – Cost of transition table
+ Invoke Stored Procedure – DDF, Start thread
+ Resident Stored Procedure – work in SP
+ Non-Resident Stored Procedure –start SP + work in SP
Statement Triggers – CheapestRow Triggers – CheapSP Triggers – ExpensiveSP Triggers w/Trans Vars - Priceless
Statement Triggers – CheapestRow Triggers – CheapSP Triggers – ExpensiveSP Triggers w/Trans Vars - Priceless
Page: 20
Generally Poor Reasons to Use Triggers
1) Just because they’re quick 2) Lazy man’s solution3) Easier than changing programs4) Temp fixes that become permanent5) For data replication6) To populate summary tables7) To enforce simple value constraints8) To enforce RI constraints
9) To maintain dashboards (oops!)
Page: 21
Modifying Triggers
DROP TRIGGERCREATE TRIGGER
DROP TRIGGERCREATE TRIGGER
Update TRIGGERRefresh TRIGGER
Update TRIGGERRefresh TRIGGER
DROP & CREATE TRIGGERHow Triggers Are Maintained
Page: 22
Trigger – Firing SequenceSYSIBM.SYSTRIGGERSCREATEDTS “It is also used to order the
execution of multiple triggers.”
Trigger 12007-01-01
Trigger 22007-01-15
Trigger 32007-01-30
11
22
33
FiringOrder
DROP &CREATEOn 2007-02-15
DROP &CREATEOn 2007-02-15
Trigger 12007-01-01
Trigger 32007-01-30
Trigger 22007-02-15
11
22
33
FiringOrder
Page: 23
Who Is Aware of Triggers?Source Awareness
YesYes - DB2
Should BeShould Be - DBAs
MaybeMaybe – Programmers
NONO – DB2 Utilities
Triggering Awareness NONO – Applications
NONO – SQL
NONO – DB2 Optimizer
NONO – Explain
NONO – Resource Limit Facility
NONO – Constraints
Source Awareness
YesYes - DB2
Should BeShould Be - DBAs
MaybeMaybe – Programmers
NONO – DB2 Utilities
Triggering Awareness NONO – Applications
NONO – SQL
NONO – DB2 Optimizer
NONO – Explain
NONO – Resource Limit Facility
NONO – Constraints Triggers - A Run-Time Event
Page: 24
Invisible Program Dependencies
Programs A B C D . . .
MSTR
DDF
DBM1
SQLSQL
SQLSQL
TriggerTrigger
TTTT ATAT ATATATAT
Stored ProcedureStored Procedure
ATAT
Ca
lls O
uts
ide
DB
2C
alls
Ou
tsid
e D
B2
Invisible Causes of BreakageInvisible Causes of Breakage TriggerTrigger Trigger SQLTrigger SQL DDFDDF Transition tablesTransition tables Stored procedureStored procedure Application tablesApplication tables Calls outside DB2Calls outside DB2 Unavailable resourcesUnavailable resources RI & check constraintsRI & check constraints UtilitiesUtilities Deadlocks & timeoutsDeadlocks & timeouts
Any Break CausesAny Break CausesAllAll Triggering SQL to Fail! Triggering SQL to Fail!
Invisible Causes of BreakageInvisible Causes of Breakage TriggerTrigger Trigger SQLTrigger SQL DDFDDF Transition tablesTransition tables Stored procedureStored procedure Application tablesApplication tables Calls outside DB2Calls outside DB2 Unavailable resourcesUnavailable resources RI & check constraintsRI & check constraints UtilitiesUtilities Deadlocks & timeoutsDeadlocks & timeouts
Any Break CausesAny Break CausesAllAll Triggering SQL to Fail! Triggering SQL to Fail!
Page: 25
Trigger Traps
The Scenario1) Update trigger on TableA Starts resident stored procedure(SP-X) Inserts before image into log – TableB2) DBA adds column to TableA3) Days Later - SQL updating TableA starts failing
The Scenario1) Update trigger on TableA Starts resident stored procedure(SP-X) Inserts before image into log – TableB2) DBA adds column to TableA3) Days Later - SQL updating TableA starts failing
Corrective Actions1) Drop the trigger (may require down-time)2) Drop the stored procedure3) Add column to parameter lists & SQL4) Recreate the stored procedure 5) Recreate the trigger
Corrective Actions1) Drop the trigger (may require down-time)2) Drop the stored procedure3) Add column to parameter lists & SQL4) Recreate the stored procedure 5) Recreate the trigger
When DDF reloaded, the resident SP-X the transition variables no longer matched the trigger and SP-X. The trigger had to be dropped & recreated to correct this. SP-X had to be changed to include new column in TableA
When DDF reloaded, the resident SP-X the transition variables no longer matched the trigger and SP-X. The trigger had to be dropped & recreated to correct this. SP-X had to be changed to include new column in TableA
TriggersTriggers
AltersAlters
DBADBA
Page: 26
RotateRoulette
Chapter 3
Page: 27
The ROTATE Concept
1Oldest
Limitkey A
2Old
Limitkey B
3New
Limitkey C
4Newer
Limitkey D
Delete Old DataDelete Old Data New Last Part Limitkey
New Last Part Limitkey
ROTATE DDL Command1) Delete Oldest Partition Rows2) Reuse Oldest Partition
ROTATE DDL Command1) Delete Oldest Partition Rows2) Reuse Oldest Partition
1NewestLimitkey EE
Page: 28
Rotating PartitionsRotating Partitions
ALTER TABLE table ROTATE PARTITION FIRST TO LAST ENDING AT (limitkeys) RESET;
Page: 29
ROTATE DDL
ROTATE In Action
catg.DSNDB.db.ts.I0001.A0033catg.DSNDB.db.ts.I0001.A0033
catg.DSNDB.db.ts.I0001.A0022catg.DSNDB.db.ts.I0001.A0022
catg.DSNDB.db.ts.I0001.A0044catg.DSNDB.db.ts.I0001.A0044
catg.DSNDB.db.ts.I0001.A0011catg.DSNDB.db.ts.I0001.A0011
LP 4LP 4
LP 1LP 1
LP 2LP 2
LP 3LP 3
LP 4LP 4
LP 3LP 3
LP 2LP 2
LP 1LP 1
Logical PartitionSYSTABLEPART(V8)
Logical PartitionAfter The ROTATE
Physical PartitionDataset
Page: 30
catg.DSNDB.db.ts.I0001.A0044catg.DSNDB.db.ts.I0001.A0044
catg.DSNDB.db.ts.I0001.A0011catg.DSNDB.db.ts.I0001.A0011
catg.DSNDB.db.ts.I0001.A0033catg.DSNDB.db.ts.I0001.A0033
catg.DSNDB.db.ts.I0001.A0022catg.DSNDB.db.ts.I0001.A0022
A Series of ROTATEs
P1
P2
P3
P4
P1
P2
P3
P4
P3
P4
P1
P2
P4
P1
P2
P3
P1
P2
P3
P4
1st Rotate 2nd Rotate 3rd Rotate 4th RotateALTER TABLE ROTATE ...
ENDING AT (‘E’) R
ESET;
ALTER TABLE ROTATE ...
ENDING AT (‘E’) R
ESET;
Limitkey
‘A’
‘B’
‘C’
‘D’
‘E’
‘F’
‘G’
‘H’
ALTER TABLE ROTATE ...
ENDING AT (‘F’) R
ESET;
ALTER TABLE ROTATE ...
ENDING AT (‘F’) R
ESET;
ALTER TABLE ROTATE ...
ENDING AT (‘G’) R
ESET;
ALTER TABLE ROTATE ...
ENDING AT (‘G’) R
ESET;
ALTER TABLE ROTATE ...
ENDING AT (‘H’) R
ESET;
ALTER TABLE ROTATE ...
ENDING AT (‘H’) R
ESET;
Page: 31
Once Upon A TimeVersion 8 was up and running well. The call came to convert to table based partitioning and reuse the oldest partition. The ROTATE command was chosen to do this non-disruptive deed. Suddenly the phone began to ring and thick darkness covered the database cubicles. A quick check revealed that two parts were mired in REORP status. User processing ground to a halt their workloads were in peril. The database guardians countered with concurrent REORG to fix the problem. This crashed and burned. Share level NONE REORG was called upon. When it finished, the sun came out and life was good again. What happened to disrupt the peace of this database kingdom?
Page: 32
SQLCODE = -327
SQLCODE = -327
SQLCODE = 0
SQLCODE = 0
Why Did ROTATE Set REORP?
Last PartLimitkey
(‘2007’)
Last PartLimitkey
(‘2007’)
INSERT INTO tableX(D_YEAR)VALUES (‘2008’)
INSERT INTO tableX(D_YEAR)VALUES (‘2008’)
Maximum Limitkey in last partition is not enforced by “index based partitioning”
Maximum Limitkey in last partition is not enforced by “index based partitioning”
The 1st time only, ROTATE converts Indexed Based partitioning to Table Based partitioning. Because limitkey is not enforced in Index Based, the 1st and last parts have to be put in REORP status to eliminate this potential issue.
The 1st time only, ROTATE converts Indexed Based partitioning to Table Based partitioning. Because limitkey is not enforced in Index Based, the 1st and last parts have to be put in REORP status to eliminate this potential issue.
Table_IBP
Table_TBP
Table BasedTable BasedPartitioningPartitioning
Index BasedIndex BasedPartitioningPartitioning
Page: 33
Converting to Table-Controlled Partitioning
ALTER INDEX clustering_index NOT CLUSTER;(conversion to table-controlled partitioning)
COMMIT WORK;
ALTER INDEX clustering_index CLUSTER;(clustering index reestablished)
It’s Simple to Do This Before Rotate
Page: 34
ALTER TABLE table ALTER PARTITION # ENDING AT (limitkeys);
(This & next partition put in REORP, data outage!)
REORG TABLESPACE tablespace SCOPE PENDING SHRLEVEL NONE STATISTICS COPYDDN
(Data keys beyond limitkey, discarded during REORG!)
Lowering Limitkeys in Last PartD
ata Ou
tage
Page: 35
Which Partition Number is Used?
Cmd/DDL/Utility Physical Part Logical Part
-DISPLAY DB X Order of Display
ALTER ROTATE X
Other ALTERs X
Image Copy X
Unload X
Load X
Reorg X X - Rebalance
Recover/Rebuild X
DB2 Datasets X
-DIS DB(db) SPACENAM(ts) NAME TYPE PART STATUSSRG9700 TS 0002 RW -THRU 0004SRG9700 TS 0001 RW
-DIS DB(db) SPACENAM(ts) NAME TYPE PART STATUSSRG9700 TS 0002 RW -THRU 0004SRG9700 TS 0001 RW
Page: 36
Data O
utag
e
Ad
visable
UNLOAD TABLESPACE tablespace PART ?PP
LOAD TABLESPACE tablespace PART ?PP REPLACE using dummy SYSREC
-START DB(db) SPACE (ts) PART(?PP) ACCESS FORCE
COPY TABLESPACE tablespace DSNUM ?PP SHRLEVEL CHANGE(rotated partition now recoverable)
Rotating Logical Partitions
ALTER TABLE table ROTATE PARTITION FIRST TO LAST ENDING AT (limitkeys) RESET; (1st LP data deleted, becomes last LP)
Ad
visable
SELECT PARTITION, LOGICAL_PARTFROM SYSIBM.SYSTABLEPART WHERE DBNAME = ‘db' AND TSNAME = ‘ts' AND LOGICAL_PART = 1
SELECT PARTITION, LOGICAL_PARTFROM SYSIBM.SYSTABLEPART WHERE DBNAME = ‘db' AND TSNAME = ‘ts' AND LOGICAL_PART = 1
1st LP Logical Part = ?PP Physical Part1st LP Logical Part = ?PP Physical Part
?PP?PP
11
22
33
44
55
66
Page: 37
Avoiding Data Outages New Tables – Use “table based partitioning” Last Partition – Don’t set max/min limitkey (may cause -327 SQLCODEs)
Converting from “Indexed Based Partitioning” Don’t convert to table based partitioning with ROTATE
Use …ALTER INDEX index NOT CLUSTER Then …ALTER INDEX index CLUSTER
Plan for Outage on 1st ROTATE Query for values beyond limitkey before reorg ALTER ASC/DESC limitkey from max/min value Downtime REORG to remove REORP status
Know Logical Partitions Prior to Rotate Query SYSIBM.SYSTABLEPART
Mitigating Rotate Issues
Ou
tage
Page: 38
Rotate Dangers1) Knowing which physical part is 1st logical part
2) Long running DELETEs to empty 1st logical part (42 secs to ROTATE / delete 1,000,000 row partition)
3) ROTATE can cause an outage (REORP status) (convert to table based partitioning or ALTER limitkeys)
4) Which part # to use for Commands / DDL / Utilities
5) Mistakenly rotating the wrong table (DDL reuse or finger fault)
6) Adding partitions to ROTATEd tables (Confusion factor on first & last parts)
7) With ascending keys, trying to insert null key inserts
8) Recoverability after a ROTATE
9) Attempting to REBALANCE ROTATEd tables
Page: 39
The Rotate Questions
• Which partition will rotate next?• 1st Logical partition - query
SYSIBM.SYSTABLEPART
• Does rotate interrupt availability?• Yes – if indexed based partitioning (convert
to table based, last 2 partitions in REORP)• Yes – If last part limitkey is altered• No – table based partitioning & limitkey
doesn’t need to be altered
• Can rotate be blocked?• Set MAXVALUE in limitkey of last partition
• Show we use rotate to Convert to Table Based Partitioning?• Not advisable (last 2 partitions in REORP)
• Does rotate behave differently with Index Based Partitioning?• Yes - the first time
• Does rotate rename datasets?• No – does SQL deletes instead
• Does rotate delete SYSCOPY entries?• No – puts a rotate row in SYSCOPY
• If accidentally rotate can I recover?• No – rotate row in SYSCOPY
prevents recovery prior to ROTATE
Page: 40
Extents Wide Open
Chapter 4
Page: 41
The Extent ConceptVOL001
VOL002
VOL003
catg.DSNDB.db.ts.I0001.A0011catg.DSNDB.db.ts.I0001.A0011
Request 1 Track
Request 1 Track
Request 1 Track
Extent Consolidation
1st Extent1st Extent
3rd Extent3rd Extent
2nd Extent2nd Extent
DB2 requests space from z/OS which
finds blocks of space and updates the
VTOC and Catalog.
DB2 requests space from z/OS which
finds blocks of space and updates the
VTOC and Catalog.
Page: 42
Once Upon A Time
Life in the database kingdom was good. Autonomic features had eliminated many servant duties. One day a troubled user called. A table was broken and needed to be recovered. Luckily it had only had five million rows. We proclaimed that it would be back in merely moments. Tapes were mounted, disks were spinning and the clock was ticking. Five minutes turned to ten and then to fifteen. After having many discussions with management, the recovery finally finished. Why did a small table take so long to recover?
Page: 43
Extent Evolution
Cylinder
Track
Block
21st
CenturyDASD
20th
CenturyDASD
z/OSRules
DASDReality
z/OS Rules
DASDSubsystem
Page: 44
Extent Limits
• Logical Extent Limits• Max Extents / Dataset
• 255 extents z/OS 1.6
• 7,257 extents z/OS 1.7
• Max Extents / Volume• 123 extents / volume
• Large DASD increase this issue (Mod 27s & 54s)
• Extent Rule• 5 pieces primary extent
• Whatever can get on secondary
Page: 45
Solving Systemic Extent Issues
• IBM Strategies
• Tolerate More Extents• z/OS 1.7 – 7,257 extents
• Make It Harder to Hit Limits• SMS Extent Consolidation
• Automate Extent Management• V8 Sliding Secondaries
Page: 46
How Extents Affect Utilities (V8)
0
50
100
150
200
250
300
350
1 Extent
85 Extents
16 25
63
256
118
276305
115
60 45
Ela
ps
ed
Se
con
ds
Table 1.2 M Rowsand 1 Index
Page: 47
Avoiding Extent IssuesDASD Fragmentation
STOGROUPs by size
Standardized allocations
z/OS Slow AllocationsFewer Extents
Automated MethodUse sliding secondaries
Can cause fragmentation
Managed MethodSTOGROUPs by size
Standardized allocations
From z/OS We Need Faster Allocation Search
Faster Cataloging z/OS
Page: 48
In Closing
• Beware
• Be Knowledgeable
• Be Careful
• Do Excellent Work
DB2 War Stories
And Scary Tales
By
DATABASE BOB
Think Tank Publications