db2 z/os recovery overview

43
02/06/22 DB2 Recovery 101 Overview Bill Arledge DB2 Data Management Strategist Mainframe DB Recovery

Upload: tess98

Post on 27-Jan-2015

160 views

Category:

Documents


6 download

DESCRIPTION

 

TRANSCRIPT

Page 1: DB2 z/OS Recovery Overview

04/10/23

DB2 Recovery 101 Overview

Bill ArledgeDB2 Data Management Strategist Mainframe DB Recovery

Page 2: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software2

Availability is critical …

Lost revenue… Up to $6 million/hour for e-businesses

2 - 3% of annual revenue for every 10 hours of database outage

Lost customers… Nearly 14 times your initial investment

to win customers back

Lost market share…Approximately 1/2 point of market share

for every 8 hours of outage (It will take an estimated 3 years to win customers back)

Page 3: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software3

Recovery Elements

FAILURECREATE

RECOVERYJCL

EXECUTERECOVERYANALYSIS

RECOVERY MANAGEMENT

FAST UTILITIES

APPLICATION OUTAGE

Page 4: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software4

When Availability is Critical, Recovery is Crucial!

Unplanned downtime is an unfortunate fact of life…. Up to 80% of all unplanned downtime is caused by software or human error*Up to 80% of all unplanned downtime is caused by software or human error* Up to 70% of recovery is Up to 70% of recovery is “think time”“think time”!!

*Source: Gartner, “Aftermath: Disaster Recovery”, Vic Wheatman, September 21, 2001*Source: Gartner, “Aftermath: Disaster Recovery”, Vic Wheatman, September 21, 2001

Recover30%

Analyze30%

Investigate20%

Diagnose20%

Page 5: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software5

DB2 Recovery Resource Review

– Large lines = User data flow• Updates to Tablespace are Logged, Copied

– Small lines = DB2 info flow• Copies and Logs are registered• Log range for Tablespace recovery tracked

– MVS ICF Catalog• Watches over all

ICF Catalog

SYSLGRNX TABLESPACE

BSDS

SYSCOPY

ActiveLog

Archive Logs

or

Full Copy

orIncremental

Copies

Page 6: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software6

ICF Catalog

› All DB2 pagesets must be cataloged– Cataloging updates three MVS system files

• VTOC - Volume Table of Contents• VVDS - VSAM Volume Data Sets• ICF - Integrated Catalog Facility

CREATE TABLESPACETS000001 IN DP000001USING VCAT DB2PVCAT ...

ICF Catalog

DB2PVCAT.DSNDBC.DP000001.TS000001.I0001.A001DB2PVCAT.DSNDBD.DP000001.TS000001.I0001.A001

VTOC, VVDS

Page 7: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software7

Boot Strap Data Set (BSDS)

– Boot Strap Data Set (BSDS)• Active and Archive Log Inventories• Active Log is reused, Archive Log is serial up to 10000 (V8 and later)

– Current Active Log is DS01– Next Archive Log will be A004

Active LogsDS01DS02DS03

ACTIVE RBA LOG START END STATUS

DS01 40000 4FFFF Not ReusableDS02 20000 2FFFF Reusable DS03 30000 3FFFF Reusable

ARCHIVE RBA LOG START END VOLSER

A001 10000 1FFFF VOL=T00001A002 20000 2FFFF VOL=T00002A003 30000 3FFFF VOL=T00003

BSDS

Archive LogsA001A002A003

Page 8: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software8

BSDS and Relationship to the LOG

IncompleteUR Summary

Record

Open PagesetSummary

Record

Database and PagesetExceptions

Summary RecordDB2 BSDS

LOG

Checkpoint

Active & Archive Log Dataset Information

DDF Communication Record

Checkpoint Queue

Page 9: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software9

Print Log Map (DSNJU004)

› DSNJU004 (Print Log Map) provides– Active log data set information

– Archive log data set information

– System checkpoints• Driven by

– LOGLOAD ZPARM– Active log switch

Local Time

GMT

Page 10: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software10

DB2 Logs – A Shared Resource

› Primary usage to provide for restart and recovery of DB2 subsystems and objects

– Other uses including audit and data migration› Log records

– Record updates for all spaces with before and/or after images• Data and index pages

– Record DDL operations (updates to catalog pages)– Capture checkpoint and restart information including

• transaction info • Object exception information

– Are identified by a log point• RBAs (Relative Byte Addresses) for non-data sharing• LRSNs (Log Record Sequence Numbers) for data sharing

– Include the page (and row if applicable) being impacted

Page 11: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software11

DB2 Log Data Flow

– Log Buffers • RBA (Relative Byte Address)

– Written to Active Log– Full Active written to Archive

RBA 1000

RBA 2000

DB2 RECOVERY LOG MANAGER

LOG BUFFERS

INSERTONE ROW

ArchiveLogLOGSWITCH

ActiveLog

BEGIN_URRECORD

UNDO / REDODATA

UNDO / REDOINDEX

COMMITRECORDS

END_URRECORD

UNDO / REDOINDEX

OPENPAGESETS

CheckpointRECORD

Page 12: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software12

DB2 Log Data Breakdown

Data20%

Index50%

Checkpoint10%

Commit5%

Other15%

UNDO

REDO

REDO

UNDO

Page 13: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software13

DB2 Log Archival Process

› Active log offload is triggered by several events, including:– active log data set is full – Starting DB2 and an active log data set is full – ARCHIVE LOG command – Two uncommon events also trigger the offload:

• An error occurring while writing to an active log data set • Filling of the last un-archived active log data set

Write to Active Log Triggering Event

Update the BSDS

Write the Archive

Offload Process

ARCHIVING THE ACTIVE LOG

Page 14: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software14

DSN1LOGP - Looking at Log Records

› DSN1LOGP is a standalone utility available with DB2› IBM does not set out to document everything you see in a detail

report› You can still get lots of information but not easily› Most recovery experts would find a more sophisticated log tool

handy to– Re-create SQL from the log– report and filter on transaction and column data more effectively and– handle compression and other issues

Page 15: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software15

DSN1LOGP – Usage Examples

› Who updated that table that's not supposed to be updated?

› Who DROPPED that DATABASE?

› Did BADUSER update anything?

› Are GRANTs and REVOKEs being done outside of our control?

› Who FREEd that PLAN or PACKAGE?

› Are SAVEPOINTs being executed on this subsystem?

› Can I find a common quiet point for two tablespaces for Recovery?

› Sample summary log record for a Unit of Recovery (UOR)DSN1151I DSN1LPRT UR CONNID=TSO CORRID=BADUSER AUTHID=BADUSER PLAN=DSNESPCS START DATE=05.059 TIME=14:29:20 DISP=COMMITTED INFO=COMPLETE STARTRBA=024AC666C4C7 ENDRBA=024AC666C76B STARTLRSN=BCA426C98AED ENDLRSN=BCA426C98C3E NID=* LUWID=USBMCN01.DEBALU.BCA4267B984C.0001 COORDINATOR=* PARTICIPANTS=* DATA MODIFIED:

DATABASE=0600=KMMSEGDB PAGE SET=0002=KMMSEGS

Page 16: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software16

Identifying Relevant Log

› The SYSLGRNX directory table records log ranges containing updates to a space (or partition)

– There are entries for each data sharing member updating and• these entries give the location range on the logs (relative byte address--

RBA) and• the relative time range (log record sequence number--LRSN) to coordinate

with copies and other logs– SYSLGRNGX output provided by REPORT RECOVERY utility

• Identifies assets required for recovery

Page 17: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software17

DB2 Directory Table

– SYSLGRNX • Open update log ranges on the DB2 Log • Provides for faster recovery

SYSLGRNX

COPY Current

LOG

A B C D E FOpen log ranges

QUIESCE

DBID PSID Start Range End Range0105 000F A B0105 000F C D0105 000F E F

Quiet Point

Page 18: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software18

Spaces Absent in SYSLGRNX

› Some catalog and directory spaces don’t have SYSLGRNX entries

– DSNDB01.SYSUTILX – DSNDB01.DBD01 – DSNDB01.SYSLGRNX– DSNDB06.SYSCOPY – DSNDB06.SYSGROUP– DSNDB01.SCT02 – DSNDB01.SPT01

Page 19: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software19

Log Compatibility

› Since log records reference pages and rows in spaces and spaces are identified by internal IDs:

– certain activities make one series of log records incompatible with others

– require a new copy or starting point– LOAD REPLACE completely resets the data and REORG and

REBUILD change row and key entry locations– Certain DROPs can be disastrous

Page 20: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software20

Log Roadblocks

insert row,page 1F2,

row 7

update row,page E2,

row 1

log apply

COPY COPYREORG is executed

Row is now on page E2, row 1

log apply

Page 21: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software21

Problem Categories

› An expert categorizes failures and plans and performs accordingly. There are three common possibilities.

– A media failure destroys data or compromises it (Disk failure or controller or cache failure occurs)

– Data becomes logically compromised by an incorrect job or transaction– The data center is unusable (aka disaster)

Page 22: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software22

Media Failure

› A media failure destroys data or compromises it– Identify volume contents and RECOVER or REBUILD for traditional

DASD– Identify objects affected by storage component and RECOVER or

REBUILD bearing mind that some may not be affected because they weren’t recently updated

Page 23: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software23

Pop Quiz - Index Recovery

Maybe Not

Recovery to current (for a media failure) would not require it . Some objects are being recovered to overcomemedia failure. Related objects should still be consistent if they were unaffected by the media failure.

Page 24: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software24

Logical Data Corruption

› Data has become logically compromised by an incorrect job or transaction

– An expert finds the cause of the problem.– An expert knows the possible tools to use

• whether it is a set of RECOVER and REBUILD statements or• a special program or• a special log tool • or some combination of the above

Page 25: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software25

Finding Corruption Point

Look for a place where everyone agrees data wasn’t corrupted. Get as close to now as possible!

Application Data Is fine

POINT A

Application Data Is Corrupted

POINT B

What happened in between

Page 26: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software26

Consistency Point

If RECOVER must be used, a point of consistency across affected table spaces must be located and any good updates after that point will be lost.

Batch Job

Online Trans Online Trans

Application Data Is fine

Batch Job Rerun

Application Data Now Corrupted

Page 27: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software27

Identifies all QuiescePoints after the

specified ICDATE

Looking for QUIESCE Points

SELECT DBNAME, TSNAME, ICDATE, ICTIME, HEX(START_RBA) FROM SYSIBM.SYSCOPY WHERE DBNAME IN ('LSBX', 'LSBQ') AND ICTYPE = 'Q' AND ICDATE > ’020124' ORDER BY ICDATE, ICTIME;

SELECT HEX(MAX(START_RBA)) FROM SYSIBM.SYSCOPY WHERE DBNAME IN ('LSBX', 'LSBQ') AND ICTYPE = 'Q'; SELECT DBNAME, NAME FROM SYSIBM.SYSTABLESPACE WHERE DBNAME IN ('LSBX', 'LSBQ') AND (DBNAME, NAME) NOT IN ( SELECT DBNAME, TSNAME FROM SYSIBM.SYSCOPY WHERE START_RBA = ( SELECT MAX(START_RBA) FROM SYSIBM.SYSCOPY WHERE DBNAME IN ('LSBX', 'LSBQ') ) );

• Official points are recorded in SYSCOPY generally as a result of execution of the QUIESCE utility

• Useful queries for evaluating available quiesce (quiet) points on the DB2 log

Identifies the latest Quiesce point for a set

Of objects

Identifies related objectsWith no entry at the latest

quiesce point

Page 28: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software28

Recovering from Logical Errors Using SQL Processes

If the result of the batch job was undone with SQL then the online transactions might be preserved and it would not be necessary to find a point of consistency for RECOVER.

SQL INSERT SQL DELETE

Online TransOnline Trans

Online TransOnline Trans

Online TransOnline Trans

Online TransOnline Trans

Online Trans

Batch Job Rerun Batch Job Reversed

Application Data Now Corrupted

Page 29: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software29

Caveats for a SQL approach

› Using SQL to correct logical errors has some possible pitfalls• The transactions being preserved may have depended on the incorrect

data

transaction changes addresses

transaction changes salesmen based on

addresses

Good!BAD!

Page 30: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software30

Caveats for a SQL approach

› Using SQL to correct logical errors has some possible pitfalls• The transactions that ran during or after the corruption may have also

updated the same column in some of the rows corrupted

many employees 401K deductions set

to zero

Employee requests to set new percentages for

401K processed

Good!

BAD!

Page 31: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software31

Caveats for a SQL approach

› Allowing access to application spaces while they are corrupted may cause problems as in these examples

• If a group of customers were accidentally deleted from your data base, then allowing salesmen to continue placing orders might cause them to recreate customer rows because they don’t see the rows. When a insert is attempted for the customer to correct the delete, it will likely receive -803 or cause an improper duplicate customer record

• Customer shipping addresses corrupted might cause a label to be printed (read only) and packages to be misdirected

Page 32: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software32

Image Copy

› An image copy is a sequential dataset– Contains page images from the tablespace or indexspace – Represents at least one data set of a space and at most a complete space

(all data sets or partitions)› Image Copies

– Can be made while changes are taking place (SHRLEVEL CHANGE) or – Can be made allowing only reads so they are consistent (SHRLEVEL

REFERENCE)– Registered in SYSCOPY and accessible via SQL SELECT – REPORT RECOVERY identifies copy required for recovery

• No guarantee that a copy in SYSCOPY is not deleted or not cataloged. – Can be used to UNLOAD data– Deleted by

• DROP DDL against the space• Potentially by the MODIFY utility

Page 33: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software33

Image Copy Types

› Multiple, identical image copies (four) may be made. They are identified as:

• Primary or Backup; and• Local or Recovery site.

› Image copies may be made with only changed pages. These are incremental image copies.

x nK pagesspacecopy

Page 34: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software34

SHRLEVEL CHANGE Copies

copybegins

page 2update

page FFF2 update

pg0

pg1

page 2copied

pg2 ...

pgFFF0

copyends

page FFF2

copied

pgFFF1

pgFFF2

contentsof copy

1 23 4

5

Page 35: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software35

External Copies Unknown to DB2

› Data set dumps made by DFDSS or DSN1COPY or other mechanisms

– Aren’t registered but may be used by• Restoring known copies that are consistent because the space was

stopped or DB2 was cleanly stopped• Restoring a complete set of system data

– ‘flash copied’ or ‘snapped’ between the SET LOG SUSPEND and SET LOG RESUME commands or

– made while DB2 is down after it was taken done cleanly– and then restarting DB2.

Page 36: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software36

SYSCOPY MINING

SYSCOPY contains a wealth of data!

WHO?

GROUP_MEMBER

DBNAME

TSNAME

DSNUM

DSNAME

JOBNAME

AUTHID

WHEN?

TIMESTAMP

START_RBA

ICDATE

ICTIME

WHAT?

ICTYPE

STYPE

SHRLEVEL

ICBACKUP

PIT_RBA

OTYPE

Page 37: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software37

SYSCOPY Example

– DB2 Catalog Table SYSIBM.SYSCOPY• Backup and recovery point information

IC Type Description F Full I Incremental Q Quiesce X REORG LOG(YES)

SHRLEVEL Description R Reference C Change

Page 38: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software38

Spaces not recorded in SYSCOPY

› Three catalog and directory spaces do not have entries in the SYSCOPY table

– DSNDB01.DBD01 – DSNDB01.SYSUTILX– DSNDB06.SYSCOPY

› Information on copies for these spaces resides in the DB2 log

Page 39: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software39

Pop Quiz - Avoid Copies?

PROBOBABLY NOT

Not if REORG or LOAD are used with LOG NO.

Page 40: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software40

DB2 Recovery Processing

LOGAPPLY

RESTORE

MESSAGES

ImageCopy

SYSCOPY

SYSLGRNG

BSDS

ACTIVE LOG

ARCHIVE LOG

TABLE SPACE

Page 41: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software41

RECOVER Flavors

› RECOVER can use all the log records to the end of the subsystem log(s) or

› RECOVER can be instructed to stop at a particular log point

› RECOVER usually starts by restoring image copies except in the rare cases where everything is on the log and

› RECOVER has a LOGONLY feature that assumes the space is restored outside its control

Page 42: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software46

Disaster Recovery

› Options from weekly dumps to offsite logging– Dumps - Simple, cheap, maximum data loss

• Weekly dumps means several days data loss– Offsite Logs - Complex, expensive, no data loss

• Applying log data to shadow increases expense– Compromise - Periodic vaulting of Copies & Logs

• Daily or hourly log shipment will minimize data loss› Good topic for a future presentation

CostComplexity

Data LossOutage Time

Page 43: DB2 z/OS Recovery Overview

04/10/23 ©2007 BMC Software47

Expert Summary

› Know the basics and don’t be caught by the myths› Know the assets you are trying to protect› Know what you have to protect them with› Plan for each type of failure and practice if you can