how to health-check your tsm environment - ibm - united … … · · 2009-05-06global technology...
TRANSCRIPT
Global Technology Services
How to health-check your TSM environment
Holger Speh
Consulting IT Specialist
Global Technology Services
Motivation
2
Why is my backup so slow?
Why is my restore so slow?
How are my tape drives utilized?
How much data is
transferred every day?
Where is my bottleneck?
It‘s the network!?
Why do I run out of scratch
tapes all the time?
Healthcheck your environment!Healthcheck your environment!
Global Technology Services
3
What is a TSM Healthcheck?
� A short engagement to evaluate an existing TSM solution.
� Typically focused on optimisation potential
� TSM server health
� Overall performance
� Aligning current configuration with best practices
� Can also be focused on helping with strategic planning for future needs.
� Must be clear and honest, even if it has bad news
3
Global Technology Services
4
What is a TSM Healthcheck not?
� It is not just an “overview dashboard lights” tool.
� It is not deployment.
� It is not an excuse to sell/buy unneeded software.
� It is not a long term engagement.
� It is not about fixing specific current problems.
� It is not an excuse for the customer to find an IBM engineer to be on the hook forever for recommendations resulting from the assessment.
� It does not include Disaster Recovery analysis
4
Global Technology Services
5
Methodology overview
1. Prepare an assessment workshop
2. Perform assessment workshop
3. Perform analysis and generate diagrams and tables
4. Create report (doc/presentation)
5. Present analysis results and recommendations
5
Global Technology Services
66
Prepare yourself to understand the situation
� Understand why this has been started
� Understand the customer‘s expectations
� Understand your own expectations
� Prepare a customer assessment workshop and send customer an assessment worksheet
Global Technology Services
77
Perform a detailed client workshop
� Set customer expectations of the assessment
� Gather general information about processes, organisation, strategy, etc.
� Understand your client‘s requirements!
� Gather as much data as possible about customer‘s environment
� TSM configuration
� Network topology
� SAN topology
� Systems and Software used
� Disk layout
� Tape infrastructure
� Setup data gathering/monitoring mechanisms
� Remember to monitor OS cpu, memory, disk and network
� Monitor the same time period as you will analyse from within TSM
Global Technology Services
88
Know your tools and understand the data
� This TSM Healthcheck is a pretty technicalbased tool!
� It does not replace a general architectural review!
� It does not provide all answers at hand!
� You need to be able to interpret the data!
� Use latest available reference literature!
Global Technology Services
6-May-09
Workflow and Tools used
� TSM Administrative Commandline
� MySQL
� Perl
� Bash Shell Scripts
� Gnuplot
Actlog
…
Archives
Backups
Contents
Spacemgfiles
…
Volumeusage
9
Global Technology Services
1010
TSM performance measurement areas
� Not all layers can be monitored through TSM
� Client performance for LANfree needs special server parameter
� OS monitoring for TSM host needed
Global Technology Services
1111
The situation: Consolidation of very large TSM environment
Customer asked for proposal to simplify and consolidate his TSM environment
Following points had be be respected:
� LAN-free (150-250 Clients), LAN-based (3500-4500 Clients)
� Central Tape Management (eRMM / RMM)
� Dynamic loadbalancing due to heavy data growth
� Simple and leight weighted
� Use optimal tape performance
� Use TSM mirroring
� Use available techniques
� Don‘t consider disaster recovery
� Encryption
� PnP (Plug´n Play of new systems)
� Migrate... finally
Global Technology Services
6-May-09
Quick overview ordered by sections
Server i0 i1 s1 s2 s3 s4 s5 s6 s7 s8 w1 w2 x1 x2 x3 x4 x5 x6 x7 x8 xa xb xc xd
Kategorie Kriterien HP-UX zOS Win AIX
client activity 24h traffic � � � � � � � � � � � � � � � � � � � � �
daytime gaps � � � � � � � � � � � � � � � � � � � � �
backup � � � � � � � � � � � � � � � � � � � � �
archive � � � � � � � � � � � � � � � � � � � � �
restores/retrieves � � � � � � � � � � � � � � � � � � � � �
client volume avg hourly nighttime volume � � � � � � � � � � � � � � � � � � � � �
avg hourly daytime volume � � � � � � � � � � � � � � � � � � � � �
daily file-level � � � � � � � � � � � � � � � � � � � � �
daily tdp � � � � � � � � � � � � � � � � � � � � �
client performance avg mb/sec � � � � � � � � � � � � � � � � � � � � �
max mb/sec � � � � � � � � � � � � � � � � � � � � �
client sessions avg parallel count � � � � � � � � � � � � � � � � � � � � �
max parallel count � � � � � � � � � � � � � � � � � � � � �
server activity 24h traffic � � � � � � � � � � � � � � � � � � � � �
nighttime gaps � � � � � � � � � � � � � � � � � � � � �
migration � � � � � � � � � � � � � � � � � � � � �
reclamation � � � � � � � � � � � � � � � � � � � � �
expiration � � � � � � � � � � � � � � � � � � � � �
server performance db backup � � � � � � � � � � � � � � � � � � � � �
expiration � � � � � � � � � � � � � � � � � � � � �
migration � � � � � � � �
server volume migration � � � � � � � � � � � � � � � � � � � � �
mount wait migration � � � � � � � � � �
12
Global Technology Services
6-May-09
Section: client activity
� zOS and HP-UX environment
� Actvity around the clock
� Nearly no free time windoes for administrative activities during the day
� Moderate to frequent restore activites
� Windows and AIX environment
� Backup operation only during the night
� Adequate free time windows for administrative activities during the day
� No restore activities
tsmgroup platform_name count gb
HPUX HPUX 178 4.47
HPUX TDP Oracle HP 41 3.86
HPUX TDP R3 HP 52 3,659.61
MVS AIX 77 48.92
MVS CE Archive 3,593 41.76
MVS HPUX 3,293 114.53
MVS IRIX 16 3.92
MVS Linux390 3 0.00
MVS Linux86 26 298.07
MVS LinuxIA64 1 0.00
MVS LinuxPPC 6 0.00
MVS SUN SOLARIS 13 0.75
MVS TDP Oracle HP 260 156.01
MVS TDP Oracle
SUN 5 1.58
MVS TDP R3 HP 31 1,260.61
MVS WinNT 81 20.18
Windows WinNT 56 28.93
AIX AIX 4 0.00
AIX TDP MSExchg 25 780.03
AIX WinNT 60 24.28
SUM 7,821 6447.52
13
Global Technology Services
14
Section: client volume
� zOS environment
� Moderate to high data volume per hour during the night
� Low data volume per hour during the day
� File-level as TDP traffic
� HP-UX environment
� Moderate data volume per hour during the night as during the day
� Few file-level Traffic, much TDP
� Windows environment
� Low utilization per hour
� Low data traffic
� AIX environment
� Fresh environment with yet low utilization
� Partition of day and night recognizable
14
Global Technology Services
15
Section: client performance
� All environments
� Low thruput during file-level activities
� Only TDP Nodes show higher thruput
15
Global Technology Services
16
Section: client sessions
� zOS environment
� Moderate to many parallel sessions
� Note: Every active session allocates memory and generates locks in the TSM DB
� HP-UX environment
� Few to moderate parallel sessions
� Windows environment
� Few parallel sessions
� AIX environment
� Very few parallel sessions on freshly setup systems
� Moderate utilization on all other systems
16
Global Technology Services
17
Section: server activity
� zOS environment
� Activity round the clock with no free time windows
� Main activity is migration
� Sometimes very long running expirations
� HP-UX environment
� Different behaviour
� Adsmi0 only shows server activities during the day
� Adsmi1 show activity round the clock with a lot of migration
� Windows environment
� Few nightly activities
� Sometimes long expiration
� AIX environment
� Currently server activity only during the day
� Exception: Expiration, which is scheduled with short intervals
17
Global Technology Services
18
Section: server performance
� zOS environment
� Very good DB performance, exception is s7
� Migration only moderate
� HP-UX environment
� Very good DB performance
� Migration only moderate to bad
� Windows environment
� Mixed behaviour
� Moderate (w1) and very good (w2) performance
� AIX environment
� Mixed behaviour
� Sometimes very good DB Performance (x1-x4)
18
Global Technology Services
6-May-09
Section: server volume
� Only migration shows significant volume
� zOS environment
� Low to very high migration volume
� Frequent disk pool overflows
� HP-UX environment
� Low migration volume
� Frequent disk pool overflows
� Windows environment
� Low migration volume
� AIX environment
� Nearly no migration volume
19
Global Technology Services
20
Section: mount wait
� zOS environment
� Low waiting times
� HP-UX environment
� Low to moderate waiting times
� Windows environment
� Low to moderate waiting times
� High waiting times for client actions
� AIX environment
� Low waiting times
� High waiting times for client actions
20
Global Technology Services
21
Potential Storage Pool Overflows
� Daily migration volume compared to disk pool size
� Assumption: only 1x daily migration
� Potential small disk pool if daily migration volume exceeds storage pool size
stgpool_name overflows stgpool_name overflows
adsmi0 adsms5
BACKUP_DISK_LAN1 5 ARCHIVE_DISK 7
BACKUP_DISK_REDO1 10 BACKUP_DISKJ 16
BACKUP_DISK_REDO2 10 adsms6
adsmi1 ARCHIVE_DISK 14
BACKUP_DISK_REDO1 25 BACKUP_DISK 5
BACKUP_DISK_REDO2 15 BACKUP_DISK2 4
adsms1 BACKUP_DISK3 3
ARCHIVE_DISK 9 BACKUP_DISKJ 1
BACKUP_DISKJ 21 BACKUP_DISKJ_GR 4
adsms2 adsms7
ARCHIVE_DISK 1 ARCHIVE_DISK 31
BACKUP_DISK 12 adsms8
BACKUP_DISKJ 18 ARCHIVE_DISK 1
adsms3 BACKUP_DISKJ 22
ARCHIVE_DISK 2 adsmx7
BACKUP_DISK 17 BACKUP_3592_J1_FS 4
adsms4 adsmx8
BACKUP_DISKJ 30 BACKUP_3592_J1_FS 4
21
Global Technology Services
6-May-09
� 161 LANfree Clients Activities by Volume Volume by date
� 74 LANfree Clients
� Small operations no longer LANfree
� Massive reduction of LANfree tape mounts
LANfree considerations – by volume
platform_name Total
HPUX 55
TDP MSExchg 39
TDP Oracle HP 9
TDP R3 HP 58
Grand Total 161
Plattform GB Count
HPUX <1 820
HPUX 1-10 206
HPUX 10-50 20
HPUX 50-100 1
SUM 1,047
TDP MSExchg <1 4
TDP MSExchg 1-10 9
TDP MSExchg 10-50 11
TDP MSExchg 50-100 50
TDP MSExchg >100 147
SUM 221
TDP Oracle HP <1 9,392
TDP Oracle HP 1-10 1,042
SUM 10,434
TDP R3 HP <1 3,519
TDP R3 HP 1-10 2,240
TDP R3 HP 10-50 376
TDP R3 HP 50-100 259
TDP R3 HP >100 652
SUM 7,046
SUM 17,639
SUM 1,109
Overall SUM 18,748
New Old
date LAN LANFREE LAN LANFREE Total
2007-04-19 24,166.77 8,402.95 23,353.95 9,215.77 32,569.72
2007-04-20 21,327.20 9,066.45 19,845.86 10,547.78 30,393.65
2007-04-21 18,156.31 8,570.01 16,822.02 9,904.30 26,726.32
2007-04-22 5,233.80 1,621.94 4,995.45 1,860.28 6,855.74
2007-04-23 19,138.22 9,542.61 17,910.95 10,769.87 28,680.82
2007-04-24 20,589.58 10,757.53 19,054.26 12,292.85 31,347.11
2007-04-25 18,260.77 11,085.51 16,958.34 12,387.93 29,346.27
2007-04-26 21,739.70 9,790.04 19,787.00 11,742.74 31,529.75
2007-04-27 20,816.35 9,917.61 19,560.24 11,173.71 30,733.95
2007-04-28 20,546.28 8,830.13 19,231.59 10,144.82 29,376.42
2007-04-29 4,866.14 2,083.47 4,654.87 2,294.74 6,949.61
2007-04-30 16,865.96 8,702.27 15,834.17 9,734.06 25,568.23
2007-05-01 15,993.69 10,719.76 14,715.15 11,998.30 26,713.46
2007-05-02 19,126.66 11,644.60 17,789.90 12,981.37 30,771.27
2007-05-03 22,755.03 9,169.84 21,149.79 10,775.08 31,924.87
2007-05-04 19,576.18 9,992.76 18,225.08 11,343.86 29,568.94
2007-05-05 18,911.51 8,449.42 17,552.52 9,808.40 27,360.93
2007-05-06 5,194.30 1,678.79 5,043.75 1,829.34 6,873.09
2007-05-07 19,298.28 9,299.15 18,092.39 10,505.05 28,597.43
2007-05-08 19,789.73 10,301.16 18,551.39 11,539.50 30,090.89
Median 19,218.25 9,234.50 18,001.67 10,658.83 29,361.34
Max 24,166.77 11,644.60 23,353.95 12,981.37 32,569.72
Min 4,866.14 1,621.94 4,654.87 1,829.34 6,855.74
platform_name Total
HPUX 1
TDP MSExchg 37
TDP R3 HP 36
Grand Total 74
22
Global Technology Services
6-May-09
� 161 LANfree Clients Activities by Thruput and GB Volume by Thruput
� 85 LANfree Clients
LANfree considerations – by performance
platform_name Total
HPUX 55
TDP MSExchg 39
TDP Oracle HP 9
TDP R3 HP 58
Grand Total 161
Platform MB/sec Count Avg GB/Op
HPUX 0-1 362 0.12
HPUX 1-5 561 0.95
HPUX 5-10 93 4.04
HPUX 10-20 31 12.38
SUM 1,047
TDP MSExchg 0-1 1 0.00
TDP MSExchg 1-5 2 0.09
TDP MSExchg 5-10 1 0.09
TDP MSExchg 10-20 18 140.59
TDP MSExchg 20-50 157 145.47
TDP MSExchg >50 42 154.43
SUM 221
TDP Oracle HP 0-1 2,865 0.05
TDP Oracle HP 1-5 7,407 0.32
TDP Oracle HP 5-10 36 2.08
TDP Oracle HP 10-20 113 3.89
TDP Oracle HP 20-50 13 0.77
SUM 10,434
TDP R3 HP 0-1 1,383 0.00
TDP R3 HP 1-5 2,063 0.04
TDP R3 HP 5-10 105 2.65
TDP R3 HP 10-20 1,797 17.52
TDP R3 HP 20-50 1,698 75.16
SUM 7,046
New Old
date LAN LANFREE LAN LANFREE Total
2007-04-19 24,489.57 8,080.15 23,353.95 9,215.77 32,569.72
2007-04-20 21,738.70 8,654.95 19,845.86 10,547.78 30,393.65
2007-04-21 19,242.94 7,483.38 16,822.02 9,904.30 26,726.32
2007-04-22 5,196.03 1,659.71 4,995.45 1,860.28 6,855.74
2007-04-23 19,330.74 9,350.08 17,910.95 10,769.87 28,680.82
2007-04-24 21,497.01 9,850.10 19,054.26 12,292.85 31,347.11
2007-04-25 19,350.50 9,995.77 16,958.34 12,387.93 29,346.27
2007-04-26 22,045.69 9,484.05 19,787.00 11,742.74 31,529.75
2007-04-27 22,113.83 8,620.13 19,560.24 11,173.71 30,733.95
2007-04-28 21,695.53 7,680.89 19,231.59 10,144.82 29,376.42
2007-04-29 4,848.17 2,101.44 4,654.87 2,294.74 6,949.61
2007-04-30 17,271.56 8,296.67 15,834.17 9,734.06 25,568.23
2007-05-01 16,121.22 10,592.24 14,715.15 11,998.30 26,713.46
2007-05-02 20,472.16 10,299.11 17,789.90 12,981.37 30,771.27
2007-05-03 23,449.74 8,475.14 21,149.79 10,775.08 31,924.87
2007-05-04 20,517.47 9,051.46 18,225.08 11,343.86 29,568.94
2007-05-05 19,466.40 7,894.52 17,552.52 9,808.40 27,360.93
2007-05-06 5,168.56 1,704.52 5,043.75 1,829.34 6,873.09
2007-05-07 20,737.75 7,859.68 18,092.39 10,505.05 28,597.43
2007-05-08 22,198.89 7,892.01 18,551.39 11,539.50 30,090.89
Median 20,494.82 8,385.91 18,001.67 10,658.83 29,361.34
Max 24,489.57 10,592.24 23,353.95 12,981.37 32,569.72
Min 4,848.17 1,659.71 4,654.87 1,829.34 6,855.74
platform_name Total
TDP MSExchg 39
TDP Oracle HP 1
TDP R3 HP 45
Grand Total 85
23
Global Technology Services
24
� Each layer is setup by following defined standards and will be easily expandable in a horizontal manner by adding additional components.
� Through standardized usage of included components a vertical efficiency can process data by its business value.
� By introducing planning mechanisms this highly scalable environment can be administered very effectively and efficiently.
Solution strategy: Equalize potential bottlenecks by introducing a layered architecture which accounts all included components
24
Global Technology Services
25
Findings require infrastructure alignment to implement solution strategy
� TSM Software
� Consolidate environment by focussing on only two platforms
� Adjust config to Best Practices
� Define standard mechanisms for client backup procedures according to their SLAs
� Establish central configuration manager for TSM objects
� Disk Hardware
� Reduce complexity coming from LANfree backups and force Disk2Disk by enlarging disk pools by 10-15TB
� Establish shared disk environment by establishing GPFS
� Establish standardized Disk environment with DS8300 for Open Systems and Mainframe
� Tape Hardware
� Establish standardized Tape environment with TS1120 for Open Systems and Mainframe, which also supports encryption
� SAN Hardware
� New dual fabric SAN with >200 ports per fabric
25
Global Technology Services
What about TSM 6.1?
� DB/2 included
� DB/2 instance is hidden
� Many tables can be seen but not all can be accessed
� Don‘t do number crunching direct on TSM DB
� Include DB/2 parameters into assessment
� TSM Reporting available
� Offers limited view on available data
� Aggregates data
� Presented method is still valid for TSM 6.1
� Little Adjustment might be needed
26
Global Technology Services
2727
Global Technology Services
29
Global Technology Services
6-May-09
Disclaimer
30
No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.
Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This information could include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or program(s) at any time without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
The performance data contained herein was obtained in a controlled, isolated environment. Actual results that may be obtained in other operating environments may vary significantly. While IBM has reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customer experiences described herein are based upon information and opinions provided by the customer. The same results may not be obtained by every user.
Reference in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe IBM's intellectual property rights, may be used instead. It is the user's responsibility to evaluate and verify the operation on any non-IBM product, program or service.
THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR INFRINGEMENT. IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not responsible for the performance or interoperability of any non-IBM products discussed herein.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.