© Copyright 12/5/2012 BMC Software, Inc 1
Argentina: 0800 444 6440Australia: 1 800 612 415Austria: 0800 295 780Bahamas: 1 800 389 0491Belgium: 0 800 75 636Brazil: 0800 891 0266Bulgaria: 00 800 115 1141Chile: 123 0020 6707China, Northern Region: 10 800 714 1509China, Southern Region: 10 800 140 1376Colombia: 01 800 518 1171Czech Republic: 800 700 715Denmark: 80 883 277Dominican Republic: 1 888 752 0002France: 0 800 914 176Germany: 0 800 183 0299Greece: 00 800 161 2205 6440Hong Kong: 800 968 066
Hungary: 06 800 112 82India: 000 800 1007 613Indonesia: 001 803 017 6440Ireland: 1 800 947 415Israel: 1 80 925 6440Italy: 800 789 377Japan: 00348 0040 1009Latvia: 8000 3523Lithuania: 8 800 3 09 64Luxembourg: 800 2 3214Malaysia: 1 800 814 723Mexico: 001 800 514 6440Monaco: 800 39 593Netherlands: 0 800 022 1465New Zealand: 0 800 451 520Norway: 800 138 41Panama: 00 800 226 6440Peru: 0800 54 129
Philippines: 1 800 111 010 55Poland: 00 800 112 41 42Portugal: 800 827 538Russian Federation: 810 800 2915 1012Singapore: 800 101 2320Slovenia: 0 800 80439South Africa: 0 800 982 304South Korea, Korea, Republic Of:
003 0813 2344Spain: 900 937 665Sweden: 02 079 3266Switzerland: 0 800 894 821Taiwan: 00 801 127 186Thailand: 001 800 156 205 2068Trinidad and Tobago: 1 800 205 6440United Kingdom: 0 808 101 7156Uruguay: 0004 019 0348Venezuela: 0 800 100 8540
INTERNATIONAL TOLL FREE: Participant Code: 703371
BSA Maintenance / DB Cleanup Best Practice 1
© Copyright 12/5/2012 BMC Software, Inc 2
Housekeeping
Please ask questions in the “Q&A” section, not in Chat:- Many “Q&A” questions can be addressed during the session by our experts, while
Chat is not seen by the Presenter until the end of the session
https://communities.bmc.com/communities/docs/DOC-21692
BMC Server Automation (BladeLogic) v8.2
Best PracticesMaintenance & DB Cleanup 1Sean BerryLead, Customer Engineering Operations
© Copyright 12/5/2012 BMC Software, Inc 4
Overview
First Level TrainingBest Practice vs. How ToCovers Most Common Maintenance TasksDoes not address every scenarioAssumes prior knowledge of BSA
components and terms
© Copyright 12/5/2012 BMC Software, Inc 5
Agenda
Activities & ObjectsMaintenance OverviewAssessment (Give it to me straight)Performance Monitoring – BasicCleanup – DatabaseCleanup – Fileserver & AppserverAgent Health & Agent CleanupUpgrades / UpkeepConfiguration GuidanceQuestions & Feedback
© Copyright 12/5/2012 BMC Software, Inc 6
Introduction
Artifacts in the “Best Practices” franchise- BSA Best Practices Webinar Series:
https://communities.bmc.com/communities/docs/DOC-21692
- BSA Best Practices Webinar Episode 1: Deployment Architecture: https://communities.bmc.com/communities/docs/DOC-21693
- BSA 8.2 base documentation: https://docs.bmc.com/docs/display/bsa82/Home
- Deployment Architecture: https://docs.bmc.com/docs/display/bsa82/Deployment+architecture
- Sizing and Scalability: https://docs.bmc.com/docs/display/bsa82/Sizing+and+scalability+factors
- Disaster Recovery and High Availability: https://docs.bmc.com/docs/display/bsa82/High+availability+and+disaster+recovery
- Large Scale Installations: https://docs.bmc.com/docs/display/bsa82/Large-scale+installations
- App Server sizing spreadsheet (internal)- BSA Database Cleanup Best Practice White Paper (internal)
https://docs.bmc.com/docs/display/NP/BSA+Database+Cleanup- Agent Cleanup blcli “Delete cleanup*” spaces
Activities & Objects
© Copyright 12/5/2012 BMC Software, Inc 8
Activities & Objects – Typical Usage
Historical data is the data related to jobs run logs, results and schedules, which are stored in job results, job events, compliance results, audit results, snapshot results and job schedule tables. Why Do Maintenance- Same as any piece of machinery or other system: for smooth running- Conserve DB & FS Capacity- Ensure Performance- Maintain Capacity & Capability- Purge records once no longer needed / desired- Meet data storage compliance requirements (PCI, GLB, SOX)
© Copyright 12/5/2012 BMC Software, Inc 9
Activities & Objects – Typical Usage (cont’d)
• All activities use storage of some sort:• All jobs generate some database records
• Normal Job Logs & Job Data• A large or verbose NSH Script or Deploy Job can easily generate
10,000 rows of Job Run Event Logs• Snapshot Job: 1 run * 1000 targets * 1000 objects = 1 000 000 rows
• File Server• Software package of SQL Server 2008 full install is 2GB, many
agents have footprints >50-100MB• Agent
• Packages staged for deployment• Activity, Agent, other logs
• App Server• Temporary files
© Copyright 12/5/2012 BMC Software, Inc 10
Concepts & Terminology
Soft delete:- Object’s “is_deleted” column set to “1”
Hard delete:- Object (and dependencies) are permanently removed from the database.
Retention Policy- How long to keep a given object before cleaning it up. Some jobs, runs, or objects
aren’t needed for more than a few days, some need to stay around for at least 90 days (Patch Catalog Job runs) depending on use case.
A typical “run” object’s “weight” or “balance”:- If a given object has many rows of data (10,000 or more) associated with it, we say
it’s “heavy”, since it will take a relatively long time to clean up the many rows required to clear that one object.
© Copyright 12/5/2012 BMC Software, Inc 11
Concepts & Terminology (cont’d)
Leaf object- An object that has no dependencies, nothing depends on it (like a log file entry)
Dependent object- A given run of a job is only relevant if the original job object is present, so the run
should be cleaned up when the job itself is cleaned up. Stored Procedure- A particular type of SQL query that is kept within the database, and may perform
relatively quickly.Truncate Cleanup- Quickly removes all objects older than a certain date. Only practical on leaf objects.
UNDO space & REDO logs- Very long running queries can queue up other transactions, put strain on some
critical DB resources
© Copyright 12/5/2012 BMC Software, Inc 12
Concepts & Terminology (cont’d)
Historical Data- Job Run Logs, Job Results (Snapshot, Audit, Compliance), prior Job Schedules and
Object Audit Trail objects. These types of data typically consume the bulk of the space in the database and are responsible for the bulk of the database growth on a daily basis, with the Job Run Event Log data (JobRunEvent) and the Audit Trail data (AuditTrail) typically consuming the most space.
Shared Object Data- This data is typically related to Snapshot and Audit data, and can consume a large
amount of database space, especially if Inventory Snapshots (or any other environment-wide Snapshot Jobs or Audit Jobs) are run on a frequent basis.
Soft deleted Data- Any time an object is deleted in the GUI, or deleted by the retention policy, this data
must eventually be hard-deleted from the database by the cleanupDatabasecommand
File Server Data- After objects are removed from the database, the actual underlying file system
objects need to be removed from the file server. Note that the cleanupDatabase step must complete before the cleanupFileServer is run.
Maintenance Overview
© Copyright 12/5/2012 BMC Software, Inc 14
Kinds of Maintenance
Health / Performance Monitoring (find the problems before they’re out of control)Normal day-to-day deletes & dependency checkingSetting Boundaries (Retention & Job Timeouts)Database cleanup- Applying retention policies- Historical Cleanups- Hard Deletes
Filesystem-based cleanup- Agent- Appserver- Fileserver
Assessment:(How Much Work Do I Have Ahead of Me?)
© Copyright 12/5/2012 BMC Software, Inc 16
Assessment: How much work do we have ahead?
DB Cleanup:- When was Gather_stats last run? Was it the “right” gather_stats process?- Table sizes script: how big are my tables? (in millions of rows)- When’s the last time we ran cleanup? (Last Cleanup Task Ran)- Several scripts available for this task (see DB Cleanup White Paper)
FS Cleanup- How big is my total file server footprint? (du –sk)- How many blpackages do I have? (ls -d blpackages/* | wc -l )- Am I creating blpackages automatically and not cleaning them up?- How much of what I have do I not need anymore?
Appserver Cleanup- How much space is being used on our appservers in the tmp folders? (not
/tmp)- Is space a problem on the appserver? (df -k)
© Copyright 12/5/2012 BMC Software, Inc 17
How much work do we have ahead? (cont’d)
Agent Cleanup- What’s our agent footprint? (du –sk in the agent directory)- Do we have lots of old deploy packages on our agents? (spot check)- Are we tight on space on our agents, or do we see new deployments
coming that will need more or much more space? (big installers)Stability (some things to think about)- How stable is my environment? Do I need to restart appservers more than weekly? Where should I be focusing my energy? Am I adhering to best practices for component placement? Do I have unresolved tickets I could close or respond to?
Performance Monitoring - Basic
© Copyright 12/5/2012 BMC Software, Inc 19
Performance Monitoring – Basic
Availability:- How long “should” a given job type take to run in your environment?- Would you notice if a job or config server instance were down or unresponsive?- Before your customers called you to tell you?
Capacity:- Do you know the rough capacity of your environment? How many jobs can run
across the entire environment? How many total work item thread minutes in an hour?
- How many GUI users, CLI users can be logged into your environment? How many did you build the environment for?
Performance:- Do you notice long running (>6, > 24 hr) jobs? Are they normal and expected?- What’s the typical load average on your app servers? Database? Swap utilization?
How much is too much?
© Copyright 12/5/2012 BMC Software, Inc 20
Performance Monitoring – Basic
How much further can your environment grow before the next app server gets added?- Next X amount of database capacity?- File server capacity?- Reports server capacity?- (Sizing spreadsheet and available performance capacity)
GUI:- Does the GUI have enough memory to run on your user’s workstation without
swapping? how big is the typical footprint in your environment? What views can be closed? Paging may be a factor here
- What’s the typical ping time & bandwidth between the different components of your installation? long ping time to agents & repeaters is fine within reason GUIs and appservers shouldn’t be “far” apart from each other or the DB) Do you measure and record this regularly (monitoring infrastructure)
© Copyright 12/5/2012 BMC Software, Inc 21
Performance Monitoring - Tools
BPPM 9.0 Knowledge ModuleJMXCLI- Very detailed range of configurations
Other Performance / Capacity Management Tools (BCO etc.)Classical tools:- Vmstat/iostat/top/etc. (UNIX)- nstats (NSH)- Perfmon (Windows)
Empirical tools:- GUI performance
Cleanup - Database
© Copyright 12/5/2012 BMC Software, Inc 23
BMC Server Automation (BladeLogic)Depot
CONSOLE
MID
TIER
NODES
© Copyright 12/5/2012 BMC Software, Inc 24
Configuring BSA DB Cleanup
Configuring BSA DB Cleanup- To manage database size, you must proactively setup DB Cleanup
and run its job(s) at regular intervals.- The first setup step is to configure a data retention policy.
Database Retention Policies- Retention policy is the time period for which you need to retain/keep
your data in the BSA database.- If organization policy is to only keep the compliance data of last 90
days, then set the retention policy to 90 (days) so that any data older than 90 days will be deleted by the cleanup operation.
© Copyright 12/5/2012 BMC Software, Inc 25
Configuring BSA DB Cleanup (cont’d)
Setting a DB Retention Policy- Retention Policy setting is done in two places: The default time (in days) is set in the Property Dictionary in the Job
class Policy can be set at individual job level (instance level), or for a
given job type (Property Set Class: PSC)– Known issue – setting it a property set class level does not propagate the
settings to all dependent jobs (e.g. of patching job) For Auto-generated Depot objects
– The retention time for auto-generated depot objects and jobs is defined by the BLASADMIN system property AutoGeneratedRetentionTime. E.g. –
» blasAdmin: set Cleanup AutoGeneratedRetentionTime <number of days>» blasAdmin: set cleanup EnableRetentionPolicy true
© Copyright 12/5/2012 BMC Software, Inc 26
Default Retention Windows in the Property Dictionary
© Copyright 12/5/2012 BMC Software, Inc 27
RESULTS_RETENTION_TIME Property (Default)
© Copyright 12/5/2012 BMC Software, Inc 28
Setting NSH Script Job RESULTS_RETENTION_TIME to 90 days
© Copyright 12/5/2012 BMC Software, Inc 29
Configuring BSA DB Cleanup (cont’d)
Applying a Retention Policy- After define the retention policy, need to apply it to objects, which
marks all the relevant objects older than their retention limit for deletion
- Objects marked for deletion are actually deleted when the DB Cleanup operation in run.
- Applying retention policy is required before you run DB Cleanup.- Retention policy is usually applied by running this BLCLI command: Delete ExecuteRetentionPolicy
© Copyright 12/5/2012 BMC Software, Inc 30
Configuring BSA DB Cleanup (cont’d)
What does the Retention Policy blcli command do?- Soft deletion- regularly run by DB Cleanup which is provided OOTB box in BSA –
BSA Recommended Database Cleanup Job (NSH script job) - Customers usually ETL the BSA DB before they run DB Cleanup to
ensure they have their oldest data in the BDS warehouse DB for reporting.
- Execute ETL then run BSA Recommended Database Cleanup Job
- Documentation of using this OOB BSA job and its parameters is here: Changing Database Cleanup script options and commands
© Copyright 12/5/2012 BMC Software, Inc 31
Out of the Box Database Cleanup Script
© Copyright 12/5/2012 BMC Software, Inc 32
Configuring BSA DB Cleanup (cont’d)
Historical Data- This types of data typically is responsible for the bulk of the database
growth on a daily basis, with the Job Run Event Log data (JobRunEvent) and the Audit Trail data (AuditTrail) usually consuming the most space.
- If you have a very large BSA DB and you have never run DB cleanup before, then we recommend that before you run the OOB NSH job, you should execute the following commands manually in sequence to cleanup your BSA DB in incremental steps – BLCLI Delete cleanupHistoricalData JobRunEvent <retention
time> <duration> BLCLI Delete cleanupHistoricalData AuditTrail BLCLI Delete cleanupHistoricalData JobSchedule
© Copyright 12/5/2012 BMC Software, Inc 33
What if I can’t run a full cleanup? (all at once & common problems)
JRE/Audit Trail are very big- Offline cleanup: relatively fast, but requires BSA to be completely
offline, no running appservers (truncate) Other tables are way too big, or a complete outage isn’t possible- Online “small” cleanup: Size of cleanup tasking depends on how many
“days” of data you want to clean up: start with relatively “few” days of cleanup, nibble away at it depending on your available resources.
- Schedule historical cleanups to run in available windowsCleanups sometimes compete with the BDSSA ETL process- Schedule opposite (or subsequent: what better time to run cleanup
than after a successful ETL?), and use time limits to prevent collisions.- Time limit limitations (checked between SQL queries: configure
blasadmin to hit fewer total objects at a time)Other: open a support ticket if you don’t already have one!
© Copyright 12/5/2012 BMC Software, Inc 34
Recommended Schedule
Based on experience at customer environments we have created some recommendations on how and when to run a database cleanup in a BMC Server Automation environment. General recommendation: - run Historical cleanups on a daily basis (ex:, Mon-Sat) - run the hard delete cleanups (cleanupAllSharedObjects, cleanupDatabase,
cleanupFileServer) on a weekly basis (ex:, every Sunday).
If you are using BMC BladeLogic Decision Support for Server Automation (BDSSA, “reports”) ensure that the ETL does not run during the 'hard delete' cleanup run.
© Copyright 12/5/2012 BMC Software, Inc 35
Recommended Schedules (cont’d)
Typical Jobs and execution schedules:Daily 'Historical' Cleanup - Batch Job "Daily Cleanup" - set to run in series Cleanup Retention
– Batch Job "Cleanup Historical" - set to run in parallel » Cleanup JobRunEvent» Cleanup AuditTrail» Cleanup JobSchedule» Cleanup AuditResult» Cleanup SnapshotResult» Cleanup ComplianceResult
Cleanup AllAppServerCaches
Weekly 'Hard Delete' Cleanup - Batch Job "Weekly Cleanup" - set to run in series Cleanup Retention Cleanup Database Cleanup Shared Objects (8.1 SP4+ only) Cleanup FileServer
© Copyright 12/5/2012 BMC Software, Inc 36
Monitoring DB Cleanup
Use one of the following scripts to query the status of cleanup.For Oracle:?ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY/MM/DD HH24:MI';SELECT task_id, task_name, started_at,cast((cast(updated_at AS timestamp)-cast(started_at as timestamp))AS interval day(0) to second(0)) AS duration,current_action, to_char(deleted_rows,'9G999G999G999') AS deleted_rowsFROM delete_tasksWHERE ended_at IS NULL;Or this script for Oracle or this script for MSSQL.For SQL Server:?SELECT task_id, task_name, started_at, DATEDIFF(mm, started_at,updated_at) AS duration, current_action,deleted_rowsFROM delete_tasksWHERE ended_at IS NULL;If cleanup fails, the first place to look is in the delete_errors table:?Select * from delete_errors;
© Copyright 12/5/2012 BMC Software, Inc 37
Monitoring Growth
Monitoring Oracle tablespace growth (fewer surprises)- Good to monitor the database growth over time with a handy sql script and cronjob. Modify them to match your environment and set them to
run as a sysdba. This will send you an email every day and show you how much tablespace was consumed by the tables compared to the day before. This method can also be useful to spot problematic trends (for example, someone enabled the Audit Trail logging on Server.Read).
Gather statistics (use ours, not just the default)- Oracle provides a default routine to calculate statistics on the tables and the DBA usually has this enabled by default. BMC has worked with
Oracle and has created an improved procedure and this ships in the <version>-external-files.zip for each version and is available from the EPD. The Oracle default should be disabled and the DBA should install and enable these to run on a weekly (at minimum) basis. This will improve both cleanup performance and database/user interface performance in general.
Alternative Gather Stats check (dbdiagnostics)- As an alternative to running the script that was mentioned you could also use the 'dbdiagnostics' command to determine if the stats are up to
date. In the <install_dir>/NSH/br directory there is a 'dbdiagnostics' binary. This can be run by:
- # ./dbdiagnostics runDiag diabId=1000006
- The results can be viewed by:- #./dbdiagnostics getResLastExec diagId=1000006- diagId=1000006- execDiagId=2000040- execStartTime=2012-09-21 10:00:30.0- messageLevel=INFO- message=DBMS_STATS_CHK: DBMS_STATS on the Database ran 221 days ago, which is NOT OK. The Expected running of
DBMS_STATS is once in 15 days. Please run BL_GATHER_SCHEMA_STATS PROC for this schema.- messageTime=2012-09-21 10:00:31.0
© Copyright 12/5/2012 BMC Software, Inc 38
How to run cleanup in large / busy environments
Manage cleanup run time vs. ETL and critical jobsCleanup will consume some amount of performance capacity: plan for itAdjust runtime and # of objects considered to complete within shorter windows:- SQL Delete queries are unforgiving of running out of time and UNDO space- 63 minutes of cleanup in a 60 minute window
Offline cleanups may be required to “catch up”Measure how much data is being generated weekly: treading water vs. forward progressManage logging levels, max # of objects considered in a single query
Cleanup – Fileserver & Appserver
© Copyright 12/5/2012 BMC Software, Inc 40
File Server Cleanup
Will clean up files by ageCaution: if your patch payloads or metadata are mounted under the file server (ala /patch), cleanup will clean them up too!
© Copyright 12/5/2012 BMC Software, Inc 41
App Server Cleanup
Logs / cores- Check the “br” directory for zipped up old log files, core files (can easily be 2GB
each)- Old snapshot files (depending on version)- Extra junk in /tmp on Solaris (impacts swap)- Full C: on Windows- Easy to spot check with “du -sk *”
Agent Health & Agent Cleanup
© Copyright 12/5/2012 BMC Software, Inc 43
BMC Server Automation (BladeLogic)Logical Architecture
CONSOLE
MID
TIER
NODES
© Copyright 12/5/2012 BMC Software, Inc 44
Agent Health
Agent Health is critical to successful job runs because the appserver is generous when trying to talk to a slow remote agent. - JOB_PART_TIMEOUT
Agent Health Survey:- Managed servers go up and down regularly - Run the “Update Server Properties” Job periodically, and before a critical job updates AGENT_STATUS property:
– “Agent is Alive” for hosts that are up, vs. – “Agent is Unavailable” for hosts that are down.
- AGENT_STATUS in Server Smart Groups to include only available hosts in Jobs Can’t deploy to a host that’s not up
Recovery:- Re-run Update Server Properties Job more often against a server group that only
includes “down” servers- Use a Server Smart Group to identify hosts that have been out of contact > 30 days
© Copyright 12/5/2012 BMC Software, Inc 45
Agent Cleanup
The Problem:- Deployment packages (and their rollbacks) accumulate in the Transactions folder
The Solution:- blcli Delete cleanupAgent- Cleans up all objects on the agent older than the specified <retentionTime>
parameter.- Typical time frame of 30-90 days.
Upgrades / Upkeep
© Copyright 12/5/2012 BMC Software, Inc 47
Upgrades / Upkeep
Upgrades are the easiest way to get new, improved (supported) code, with bugfixes.Agent Upgrades:- Agent upgrades are key to retaining functionality & avoiding running into “solved”
bugs or compatibility issues when upgrading- File Deploy Job upgrade method- Unified Agent Installer upgrades coming in a future release- Upgrading agents outside of BSA (MSI installer)
Require Planning (may be significant planning for larger environments!)There may be benefit to cleanup before upgrade or use offline upgrade techniquesMany customers are asking for features that have been delivered in current releasesTEST before PROD!
Configuration Guidance
© Copyright 12/5/2012 BMC Software, Inc 49
Additional Resources & Information
BSA Best Practices Webinar Series: https://communities.bmc.com/communities/docs/DOC-21692
- BSA Best Practices Webinar Episode 1: Deployment Architecture: https://communities.bmc.com/communities/docs/DOC-21693
Online Documentation- BSA Deployment Architecture Best Practices
http://docs.bmc.com/docs/display/public/bsa82/Deployment+architecture- Product Documentation
http://docs.bmc.com/docs/display/public/bsa82/Home
BMC Communities (public forum)- BMC website documents discussions whitepapers additional information
- https://communities.bmc.com/communities/community/bmcdn/bmc_service_automation/server_configuration_automation_bladelogic
What to do when you inherit a BSA installation, including “How to” videos: https://communities.bmc.com/communities/community/bsm_initiatives/optimize_it/blog/2012/06/15/taking-the-reins-server-automation
© Copyright 12/5/2012 BMC Software, Inc 50
Howto Videos
Initial Install – Database Setup: On BMCdocs YouTube at http://www.youtube.com/watch?v=91FEUDVD6sEInitial Install – File Server and App Server Installs: On Communities YouTube at
http://www.youtube.com/watch?v=m7Y3SY23kuQInitial Install – Console GUI and Appserver Config: On Communities YouTube at
http://www.youtube.com/watch?v=uwqlj60Lvo0Compliance Content Install: On BMCdocs YouTube at http://www.youtube.com/watch?v=bXdaogDsCNcCompliance Quick Audit: On BMCdocs YouTube at http://www.youtube.com/watch?v=i8BLi4WAWEYBSA 8.2 Patching - Setting Up a Windows Patch Catalog: On Communities YouTube at
http://www.youtube.com/watch?v=nfpFpOuub9k.Windows Patch Analysis: On Communities YouTube at http://www.youtube.com/watch?v=ODWhC01uEaQ.Patching in Short Maintenance Windows with BMC BladeLogic Server Automation: On Communities YouTube at
http://www.youtube.com/watch?v=o6Lfzbb3JZg.Basic Software Packaging: http://www.youtube.com/watch?feature=player_embedded&v=dtOWTTFqsaYSOCKS Proxies: https://communities.bmc.com/communities/community/bmcdn/bmc_service_automation/server_configuration_automation_bladelogic/blog/2012/11/30/how-to-use-socks-proxies-with-bsa-to-deal-with-firewalls-and-overlapping-ip-ranges
Q&A