online reorg db2 1 ghs 10/6/2015 a method for online reorganization of a database gary sockut (joint...
TRANSCRIPT
Online REORG DB2 1GHS 4/19/23
A Method for Online Reorganization of a Database
Gary Sockut
(joint work with Thomas Beavin & Chung-Chia Chang)
Work performed at
IBM Silicon Valley Laboratory, San Jose, CA
Online REORG DB2 2GHS 4/19/23
Introduction: reorganization
Reorganization: Δ way arranged (logical / physical).
Any DBMS & database might need:
• Δ logical database definition:generalize 1-many -► many-many, split column.
• Δ physical database definition:construct index, split partition.
• Restore physical arrangement of instances without Δ definition:* Compact space.* Collect garbage.* Restore clustering (store near each other; criteria). ▼ I/O. Writing degrades; reorganization restores. This method.
Online REORG DB2 3GHS 4/19/23
highly available (24 x 7) (web commerce, armed forces)
very large
Introduction: offline vs. online reorganization
Reorganize online (concurrently withusage or incrementally within users’transactions). Importance ▲.
TIME
USAGE BATCH WINDOW (offline OK) USAGE
REORGANIZATION
Traditional: reorganize offline:
2 categories:unacceptable
Online REORG DB2 4GHS 4/19/23
Introduction: online reorganization
Method for restoring clustering online.IBM’s Database 2 (DB2) relational DBMS.
Rest of presentation:
• Storage structures
• Overview (unload, reload, process log); problem; solution
• Steps of reorganization & more details
• Comparison: previous research
Online REORG DB2 5GHS 4/19/23
ID map
Storage structures
• Not unique key. File pages, indexes, entries in loguse record identifier (RID) to identify records:– Page # & # within page. ID map.– Δ only during reorganization.
• Header of page: position in log (log record sequence number (LRSN)) current when most recently written.
LRSN data data * *
data record data recordfile page:
• Rows of tables in data records in file pages:
Online REORG DB2 6GHS 4/19/23
ID map
ID map
Storage structures
• Variable-length: grow: regular -► pointer & overflow:
LRSN data RID * *
regulardata record
pointerdata recordfile page:
LRSN data data * *
regulardata record
overflowdata recordfile page:
bad!
Online REORG DB2 7GHS 4/19/23
Storage structures: writing
Users write rows -► DBMS writes data records (& log entries).Usually, Insert / Update / Delete row -► 1 regular data record.Exceptions on Update:
• Has regular & new data too large for regular's page -►I overflow & U regular to pointer.
• Has overflow & new data fits on overflow's page -►U overflow to overflow.
• Has overflow & new data too large for overflow's page -►* Now room on pointer's page -► U pointer to regular & D overflow.* Still no room on pointer's page -► I overflow, D overflow, & U pointer to pointer.
Exception on Delete: has overflow -► D pointer & overflow.
Backout (undo if failure) reverses.
Online REORG DB2 8GHS 4/19/23
Storage structures: reorganization
• Database administrator invokes reorganization:* Restores clustering (clustering key: not necessarily unique).* Removes overflow.* Distributes free space. Improves performance.
• Area being reorganized: set of 1 or more tablesor set of 1 or more partitions of 1 table (parameter).
• Offline:
1. Read-only (R/O): unload data, sort.
2. No access: reload.
• Online: Read/write (R/W): most. --►
Online REORG DB2 9GHS 4/19/23
Overview of method for online reorganization
An inspiration:Fuzzy dumping: unload data (backup)while letting users write:A. Dumping: (1) record current LRSN for
log; (2) unload.
B. Recovery: (1) reload; (2) bring up to date:apply log entries (start: recorded LRSN).Ignore log entry whose LRSN ≤LRSN of indicated page.
data-base
back-up
log
data-base
back-up
1
2
-► Fuzzy reorganization.
Online REORG DB2 10GHS 4/19/23
Overview of method for online reorganization
A) Record current LRSN; unload, sort, reload (reorganize). (R/W):
users’data
old copyof area
new copyof area
log
READ
WRITE
UNLOAD,SORT, RELOAD
users’data
old copyof area
new copyof area
log
READ
WRITE
B) Process log (read; apply). (n iterations R/W; 1 R/O):
PROCESS
users’data
new copyof area
log
READ
WRITE
C) Switch users' accessing to new. (brief offline; R/W):
name 0name 1
name 0 name 1
name 0 name 1
Online REORG DB2 11GHS 4/19/23
Problem & solution
Problem: reorganization Δ RIDs:
• Log entries: old RIDs.
• Apply: identify data record in new copy (new RID).
Solution:
• Temporary table maps old & new RIDs, stores LRSNs.
• Translate log entries before applying.
Novelty: interaction: maintain table & process log.
Online REORG DB2 12GHS 4/19/23
Main steps of reorganization (more detail)
1. Record current LRSN for log. 2.
3.
4. 5. 6. 7.
8. 9.
--►Delete old.Start R/W new.
Switch future accesses: exchange names of files(data, indexes). Users: no access.
Quiesce all access.Process log. Append changed pages to backup. R/O.
Quiesce writing (block; wait until finished). Record LRSN.
Process log: iterate:* Read subset of log between recorded LRSNs (users' writing): translate; apply to new. R/W.* Record current LRSN. Iterate or next step (criteria).* Last iteration: append changed pages to backup.
Unload data, sort, reload into new (includes indexes),create backup copy. Users R/W old.Table maps: old & new RIDs. Record current LRSN.
Online REORG DB2 13GHS 4/19/23
oldRID
newRID
Unloading, sorting, & reloading (R/W)
2. Sort by clustering key.
old copyof area
new copyof area
1. UNLOAD
2. SORT
3. RELOAD
1. ADD ENTRY 3. ADD ENTRY
mappingtable
sort file
1. Unload: scan sequentially:* Regular or overflow: unload data, old RID, & LRSN of page.* Pointer: add entry to mapping table.
3. Reload: also add entry to mapping table.
Online REORG DB2 14GHS 4/19/23
Processing of log (n R/W & 1 R/O) 5 phases
5. APPLY (insertion: actual)
pointers. sort: LRSN copy of log (buffer) log1. COPY
2. SORT BY OLD RID (speed access to mapping table)
pointers. sort: old RID copy of log (buffer)
3. TRANSLATE RIDS (insertion: estimate, for sorting)
pointers. sort: old RID copy of log (buffer)mappingtable
4. SORT BY NEW RID (speed access to new copy of area)
pointers. sort: new RID copy of log (buffer)
pointers. sort: new RID copy of log (buffer)
mapping table
new copyof area
3, 5, control of iterations --►
Online REORG DB2 15GHS 4/19/23
Phase 3: translation of RIDs
DBMS's log application (recovery) ignores log entry if LRSN ≤page’s. Translation ignores log entry if LRSN ≤ mapping table’s.Insertion:
• R or O: estimate new RID; store in log entry (buffer);insert entry in mapping table (old, estimated new).
• P: delete log entry; insert entry in mapping table (old, no new).
Update:• R to R or O to O: store new RID in log entry.• P to P: delete log entry.• P to R: ~I: estimate new RID. Log entry: store estimated new RID; U -► I. Mapping table: store estimated new RID.• R to P: ~D: log entry: store new RID; U -► D.
Deletion:• R or O: store new RID in log entry; delete mapping table’s entry.• P: delete log entry; delete mapping table’s entry.
Online REORG DB2 16GHS 4/19/23
Phase 5: application
Scan set of pointers to log entries (sorted by new RID).For each RID value:1. Find all pointers (contiguous).2.
3. Apply sequentially:I:* Insert in new copy; obtain actual new RID.* In mapping table & in log entries for current RID, estimated new RID -► actual.U or D: treat like DBMS's handling of user's.
≥ 1 D log entry -► delete certain log entries:* 1st entry is I -► delete last D & all preceding.* 1st entry is D or U -► keep last D; delete all preceding.Omit log entries for which no entries in mapping table;omit I U U D.
Online REORG DB2 17GHS 4/19/23
HILO
Control of iterations of log processing
Iterate (allowing R/W) until:1. Estimated time for next ≤ parameter for maximum
R/O -► next is last (R/O). Parameter: trade-off. --►
2. Estimated completion for next > deadline(parameter) -► cancel reorganization.
3. Amount of log for next not sufficiently < current(not catching up) -► send message to operator:After delay (parameter), DBMS will continue,quiesce writing, or cancel (parameter).During delay, database administrator canΔ parameters, adjust priorities,quiesce writing, cancel, or let DBMS take action.
Display status.
iteration
time
Online REORG DB2 18GHS 4/19/23
Scheduling of online reorganization
1. High tolerance of delay -► R/O & offline tolerable.2. Low rate of writing -► easy to catch up to log.3. No long-running transactions -► quick quiescing.
Trade-off.
Online REORG DB2 19GHS 4/19/23
Comparison with previous research
Calculation of clustering: not novel;Sort by clustering key (index); assign to pages.
Compare mapping tables, fuzzy reorganization:
1) Mapping tables:• Omiecinski et al.: reorganization in place;
map RIDs to translate entries in leaves of indexes.• Wiener & Naughton: loading data into object database;
map surrogate object identifiers (source file) into object IDs.
2) Several authors mention fuzzy reorganization (no detail).Unique identifier does not Δ. No mapping table.Many customers dislike requirement.Fuzzy reorganization when RIDs Δ --►
Online REORG DB2 20GHS 4/19/23
Comparison with previous research
O'Toole et al.: fuzzy garbage collectionfor persistent data. Forwarding field: In database context (not O'Toole), mapping table’s advantages:1. Reorganization copies, user deletes, DBMS reuses space
(not O'Toole). Mapping table safe; forwarding field gone.2. Mapping table entry < data record. Fewer pages.
Less I/O to read/write forwarding information.3. Less locking for old (unload old, reload/map, process log):
Mapping table: (R, no, no). Forwarding field: (R, W, R/W?).4. Avoid extra space (permanent) for forwarding field
(O'Toole: space already existed).
Forwarding field’s advantages:1. One field (already existed: O'Toole), not two (temporary).2. Seems simpler.
old copy new copy
data RID data
Online REORG DB2 21GHS 4/19/23
Possible (but rejected) alternatives
1. Reorganize only index online:support, but not enough.
2.
3.
record’sposition
record’sposition
Δ▬►user’s
positionin scan
reorganization
Reorganize partition (index) (support):fine-grained; reorganize offline:* Correct other indexes.* Slow routing, increase space for partition descriptors.* Less uniform growth/shrinkage -► increase total free space: • coarse: | high growth shrinkage low growth | • fine: | high growth | shrinkage | low growth |
Reorganize online in place (notby copying): inaccuracy. --►Complexity and/or slowness,or low-concurrency locking.
Online REORG DB2 22GHS 4/19/23
Publications
Web site: Google GARY SOCKUT(http://alum.mit.edu/www/ghs); click “publications”: -► pdf.
1. This work: G. H. Sockut, T. A. Beavin, & C.-C. Chang, “A Method for On-line Reorganization of a Database,” IBM Systems Journal, Vol. 36, No. 3, 1997, pp. 411-436; erratum in Vol. 37, No. 1, 1998, p. 152. Web site: slides.
2. Survey on online reorganization (not just IBM, not just clustering, not just fuzzy): G. H. Sockut & B. R. Iyer, “Online Reorganization of Databases,” Computing Surveys, Vol. 41, No. 3, ACM, Article 14, July 2009, 136 pages. Web site: table of contents.
Online REORG DB2 23GHS 4/19/23
Reorganize online:
• Copy data from old to new in reorganized form; R/W.Table maps between old & new RIDs.
• Apply log to new; map.
• Switch users' accessing to new.
Summary
Online REORG DB2 24GHS 4/19/23
Appendices
Online REORG DB2 25GHS 4/19/23
Appendix: effect of fine-grained partitioning
p
op
ula
tio
n d
ens
ity
(pe
op
le /
sq
ua
re m
ile)
urban area with many high-rise apartment buildings (Manhattan Community District 8: ~ Upper East Side)
uninhabited area
NewJersey
Alaska
coarse (state)fine (square mile)granularity of partitioning
Population densities in the area covered by the 50 states:
finer
0
~1 (census)
~100K(demographia
.com)
~1K (census)