scom design

24
No Copyright © 2012 ACME Limited Project Implementation Architecture (PIA) Document SCOM – Design Synopsis: System Centre Operations Manager (SCOM) is an end-to-end service monitoring solution that can monitor clients, events, services, applications and network devices. It presents integration for Microsoft Products Such as Live Communication Server 2005, MS SQL and Sharepoint. Segment: EMEA Authors: Mr X. Contributors: Mr Y. PIA Document Version: 2.3 Document Status: Draft Date: 24/11/2011 Document Status: 1. Definition Phase Draft Current Project Phase: Definition Authorised by:

Upload: kschan

Post on 29-Nov-2014

956 views

Category:

Documents


17 download

TRANSCRIPT

Page 1: SCOM Design

No Copyright © 2012 ACME Limited

Project Implementation Architecture (PIA) Document

SCOM – Design

Synopsis: System Centre Operations Manager (SCOM) is an end-to-end service monitoring

solution that can monitor clients, events, services, applications and network devices. It presents integration for Microsoft Products Such as Live Communication Server 2005, MS SQL and Sharepoint.

Segment: EMEA

Authors: Mr X. Contributors: Mr Y. PIA Document Version: 2.3 Document Status: Draft Date: 24/11/2011 Document Status: 1. Definition Phase Draft Current Project Phase: Definition

Authorised by:

Page 2: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 2 of 24

Contents 1. Project Summary ............................................................................................................................... 3

1.1 References ................................................................................................................................ 3

1.2 Change History .......................................................................................................................... 3

1.3 Glossary .................................................................................................................................... 4

2. Business Context ............................................................................................................................... 5

2.1 SCOM Pilot ................................................................................................................................ 5

2.2 Other Key Benefits ..................................................................................................................... 6

2.3 Scope ........................................................................................................................................ 6

3. SCOM Architectural Design ................................................................................................................ 7

3.1 Original Design in ASMB............................................................................................................ 7

3.2 Proposed Design ....................................................................................................................... 8

3.3 System Centre Operations Manager Data Flow Diagram ........................................................... 9 3.3a Agent to server Communication ............................................................................................... 10

3.4 Previous MOM installation in Interchange ................................................................................ 10

3.5 Microsoft’s Recommendations ................................................................................................. 10

3.6 Deployment Plan ..................................................................................................................... 11

3.7 Testing Schedule ..................................................................................................................... 12

4. Project Requirement Analysis........................................................................................................... 14

4.1 Requirements: ......................................................................................................................... 14

4.2 Admin Requirements ............................................................................................................... 14

4.3 Equipment List ......................................................................................................................... 14

5. Database Sizing ............................................................................................................................... 16

5.1 Database Sizing and Design .................................................................................................... 16

6. Network Recommendations .......................................................................................................... 17

7. Storage Requirements .................................................................................................................. 17

8. Security ........................................................................................................................................ 18

8.1 MacAfee Exclusions for System Centre Operations Manager ................................................... 18

8.2 Operations Manager 2007 (management servers and agents): ................................................ 18

8.3 Exclusion of File Type by Extensions ....................................................................................... 18

8.4 Considerations ......................................................................................................................... 19

8.5 Access Control Model .............................................................................................................. 19

9. Capacity Forecast & Schedule Availability via PAWZ .................................................................... 22

10. Disaster Recovery Backup & Restore ...................................................................................... 22

10.1 Backup SCOM Databases ....................................................................................................... 22

10.2 Backup software Requirements ............................................................................................... 22

10.3 List of servers to be backed up ................................................................................................ 22

10.4 Clustered SQL Server .............................................................................................................. 23

11. Maintenance and Best Practices .............................................................................................. 23

12. Additional Costs ....................................................................................................................... 24

13. Risks & Issues ......................................................................................................................... 24

Page 3: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 3 of 24

1. Project Summary

SCOM is a Microsoft software solution which will provide end-to-end service monitoring of X messaging environments. It presents integration for Microsoft Products Such as Live Communication Server 2005, MS SQL, interchange and SharePoint.

1.1 References

1.2 Change History

Document Date Author

1

2

3

4

5

Ver Date Author Key Changes

0.1

0.2

0.3

0.4

Page 4: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 4 of 24

1.3 Glossary

Term Definition

SCOM System Center Operations Manager

Collab Collaboration

LCS (Microsoft) Live Communication Server

RMS Root Management Server

ASMB Assembly Blue

TDP Tivoli Data Protector

IOPS Input/Output Per Second

ASM Assembly Environment

Prod VM LAN

Production Environment Virtual Machine Local Area Network

Page 5: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 5 of 24

2. Business Context

GMI is the solution which currently provides monitoring of the Messaging environment however; there are some limitations on its monitoring capabilities. GMI cannot report effectively to the granularity required for critical business systems nor can it provide a comprehensive view of the health of the environment. SCOM provides helpful tools to manage the environment in its entirety and has the ability to integrate with other existing tools such as GMI. Key benefits:

• Improve Service Health Improve services health while driving alignment with business SLAs. SCOM provides easy to use reporting and authoring capabilities. There is full visibility of service health but also the ability to monitor in a more proactive manner to circumvent the likeliness of service impacting issues.

• Unify Management of Complete Messaging Environment

Visibility across platforms, applications and components. SCOM provides a single view, including application and infrastructure components across Windows and non-Windows environments.

• Dynamically Respond to Changes

SCOM enables the ability to dynamically respond to changes with automated action to ensure continued service performance and availability. Actions can be taken to remediate a service directly from the console, making it easy to restore it back to full health in an operationally efficient manner.

2.1 SCOM Pilot

An initial SCOM Pilot within the ASMB environment presents the following findings:

• SCOM utilises a hierarchical, overall health philosophy, where alerts are associated with each other to assist in identifying the underlying issue which is currently not offered by GMI.

• SCOM alerts on minor and major issues that have not yet become critical or have not interrupted the normal operation of a configuration item.

• Alerts are visible at the top level of the hierarchy and it is possible to drill down to the offending constituent parts causing the alert. These broken items would pre-empt the failure before it becomes service impacting.

Page 6: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 6 of 24

2.2 Other Key Benefits

• Integration with GMI

Using Microsoft Orchestrator / Opalis, SCOM is able to integrate with GMI Webtop. SCOM is able to take alerts and pass them to GMI, eliminating the need for an additional monitor.

• Automation

SCOM has native automation capabilities, which are further enhanced by the presence of Microsoft Orchestrator / Opalis.

• Economy of implementation

Using existing non cutting edge hardware, through lessons learnt and bugs that have been resolved, the third implementation of the software should be accelerated and mature.

• Diverse monitoring capability

SCOM can make use of third party management packs to monitor many third party products such as, Solaris OS, LINUX, Cisco switches.

• Built in knowledgebase

The built in knowledgebase contains common problems and how to resolve them as well as common fixes. This knowledgebase is also expandable and easily accessible through the SCOM Console interface.

2.3 Scope

In scope:

• SCOM monitoring solution in ASMB

• SCOM monitoring solution in PROD

Out of scope

• SCOM via Opalis to GMI

• Handover to 2nd

Level

• Handover to TPH

• SCOM monitoring solution in ADMB DR

• SCOM monitoring solution in PROD DR

• That which is not in scope

Page 7: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 7 of 24

3. SCOM Architectural Design

3.1 Original Design in ASMB

The above diagram is an illustration of the original design concept (see Figure.1).This configuration uses a single Root Management Server virtual machine and a single physical database server which was chosen for ease of implementation in-conjunction with the Lync project. This design failed to meet requirements required to implement the design into the production environment. Flaws in the design presented:

• Single points of failure The ASMB concept had two single points of failure which were the database and the RMS they both had no inbuilt resiliency e.g. clustering or management servers this has been addressed in the new design.

• Little scope for growth There was very little scope to grow the number of systems monitored as the RMS had poor performance and did not seem able to handle the load efficiently. (Please see section three)

This Pilot will remain in ASMB for further analysis during the design & planning stage of the project. Once this stage is complete, the pilot will be replaced with a proposed design which eliminates the flaws identified in the original design.

Page 8: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 8 of 24

3.2 Proposed Design

The new design illustrated above considers:

• High Availability

The clustered configuration of our new design will provide the extra redundancy in the event of a failure, our present Installation in Assembly blue does not provide this. The single Root Management Server and the single Database and single Opalis installations all provide single points of failure which are unacceptable for a production installation.

• Resilience

The new design also has two extra management servers which provide monitoring redundancy. Active Directory integration makes it possible for agents to automatically failover to another management server in the management group, if their assigned management server fails.

• Use of VMWARE

The Gateway Server is a virtual machine and has the entire resiliency offered by the Vmware Vsphere infrastructure. The Gateway Server will communicate with the management server through certificate based authentication. If the management server that communicates with the gateway server fails, then the gateway server will failover to the other management server which would then take over monitoring systems on the untrusted domain.

Page 9: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 9 of 24

3.3 System Centre Operations Manager Data Flow Diagram

Figure 4: Data Flow Diagram of the SCOM Environment

Page 10: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 10 of 24

3.3a Agent to server Communication

The Data Flow diagram above shows the dataflow which SCOM uses to monitor managed systems and communicate with the operations manager database. The RMS monitors managed systems through the Management Server Action Account and communicates with the database via the SDK Config Account. SCOM utilizes Management Packs for Key Performance Indicators on the various types of software that it monitors, for example, SharePoint or Active Directory, so within the Active Directory Management Pack Microsoft has placed thresholds which it considers are normal and thresholds which are not normal. The Agent communicates with the RMS through a heartbeat which beats three times per minute. As soon as the agent picks up on a system which has fallen outside a normal operating window, it generates an event which is reported back to the management server. The nature of the event could range from software compliance to configuration errors. These events may not stop the system from running but may impact performance later on.

3.4 Previous MOM installation in Interchange

The current MOM configuration is not offering the level of monitoring required to examine the environment and provide a proactive level of support. A number of issues may be going unnoticed and subsequently leading to impact to service. However, following an analysis of the previous MOM installation in Interchange, therefore parts of the old MOM installation can be salvaged to form the new SCOM design.

The MOM management server will no longer be required as Interchange will be monitored via the SCOM Gateway and Root Management Servers. Agents will need to be reinstalled, as there is no upgrade path available from MOM 2005 to SCOM 2007. The previous MOM configuration is then to be decommissioned.

3.5 Microsoft’s Recommendations

SCOM via VM

This is not a recommended solution from Microsoft; the present pilot setup includes one Root Management Server on a VM and one Database Server on a physical machine. Microsoft recommend against using VMs in the case of either server components due to performance reasons.

Sluggish performance on a VM There is an impact on response times with the use of VM’s, despite the fact that the network is more than able to support SCOM and the fact that the database had been tuned to use separate physical disks. The reason for this is due to the fact that the SCOM Root Management Server has a high IOPS threshold, and according to Microsoft, this is not suitable for VM’s.

Fault tolerance

Page 11: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 11 of 24

The recommendation from Microsoft is to add separate Management servers specifically for the function of passing data to the database to ease the load on the Root Management server. We have also opted to Cluster both the SCOM Database and Root Manangement Servers for added fault tolerance. Opalis will be installed on the Root Management Server as per the old design and being on a cluster, Opalis will benefit from the added fault tolerance which is missing from the old model.

Untrusted Domain

There is no Firewall between Interchange and RM, but because Interchange is on another Domain a gateway server will be required. The native SCOM Gateway will provide secure encrypted communication between Messaging and Interchange via mutual authentication Certificates using Port 5723.

3.6 Deployment Plan

STAGE NAME

Days Start Date End Date

Initiation - Objectives - Scope - Stakeholders - PIA Document

5 21/11/2011 25/11/2011

Planning & Design - H/W & S/W Requirements - High Level Design - Capacity - Maintenance - Storage - Database - Network - Security - Access Control Model - Test Plan

25 28/11/2011 04/01/2012

Page 12: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 12 of 24

PHASE 1: Implementation ASMB - Base Build - Installation - Configuration - Testing

PHASE 2: Implementation PROD

- Base Build - Installation - Configuration - Testing

30 03/01/2012

09/01/2012

10/02/2012

24/02/2011

PHASE 3: Scom via OPALIS to GMI Handover to 2

nd Level

Handover to TPH

Out of Scope

TBC TBC

ASMB DR Build PROD DR Build

Out of scope

TBC TBC

Closure - Training - Documentation - DR Documentation/ Update - Project Closure Doc

10 March 2012 April 2012

3.7 Testing Schedule

Test Reason For Test Expected Result Result achieved Success/Failure

Verify RMS and SQL Cluster Failover

To test Cluster failover and check service availability

There should be no interruption to services on the RMS server and the SQL server

Agent failover on monitored systems

Testing to see if monitored systems failover to an alternate management server if the primary management server goes down

Monitored systems should failover to the secondary management server when the first management server is switched off

Cumulative update Process for clustered RMS

Testing cumulative update process for RMS and verify that the RMS is at the correct patch level

Cumulative update 5 should be applied to the RMS cluster

Analyze events from managed

Comparing events generated in scom

There should be more granularity

Page 13: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 13 of 24

systems with events from GMI in a side by side comparison

with the scom events and more proactive monitoring

Page 14: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 14 of 24

4. Project Requirement Analysis

4.1 Requirements:

ID System Requirements

001 The system SHALL provide an end-to-end service monitoring of the messaging environments.

002 The system SHALL integrate with existing monitoring tools.

003 The systems SHALL see a reduction in alerts from Microsoft Products Such as Live Communication Server 2005, MS SQL and SharePoint.

004 The system SHALL see an improvement in the maintenance life cycle of all associated configuration items.

005 The system SHALL replace MOM with SCOM in Interchange ASMB and Production environments.

006 The system SHALL provide better visibility of issues before they become critical

ID Operational Requirements

007 The system SHALL provide a high-availability solution.

008 The system SHALL provide resilience.

009 The system SHALL have external support agreements.

010 The system SHALL have backup and restore processes.

ID Environment Requirements

011 The system SHALL adhere to X hardware and software standards.

012 The system SHALL include a test/ Non production environment.

ID Strategic Requirements

013 The system SHALL be saleable for X growth.

4.2 Admin Requirements

There are no direct administration requirements to the system other than predefined static configuration files that are part of the overall install package. All user details and access should be managed by external capabilities (Collaboration Admin Portal / Siebel / eSpresso / AD).

4.3 Equipment List

LCS Production ASMB

Page 15: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 15 of 24

Root Management Server (Clustered) 2x HP DL360 G5 16 GB RAM Min 60GB native Hard disks SAN Storage 136 GB

Root Management Server (Clustered/Non Clustered) 2/1x HP DL360 G5 16 GB RAM Min. 60GB native Hard disks SAN Storage 136 GB

DB Server (Clustered) 2x HP DL585 G7 32GB RAM 60GB native Hard disks SAN Storage 500 GB (see sizing based on 30 days data and 500 agents)

DB Server (Clustered/Non Clustered) 2/1x HP DL585 G5 32GB RAM 60GB native Hard disks SAN Storage 500 GB (see sizing based on 30 days data and 500 agents)

Management Servers (Non Clustered) 2x HP DL360 G5 or equivalent 60GB native Hard disks 12 GB RAM Min.

Management Servers (Non Clustered) 2x HP DL360 G5 or equivalent 60GB native Hard disks 12 GB RAM Min.

Gateway Server 1x Virtual Machine 8 GB RAM 60 GB HDD Min.

Gateway Server 1x Virtual Machine 8 GB RAM 60 GB HDD Min.

Page 16: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 16 of 24

5. Database Sizing

5.1 Database Sizing and Design

The following table shows the amount of space used by the same SCOM SQL databases in the present proof of concept in ASMB. These figures were used to work out how much space would be required for a SCOM environment with 500 Agents.

Database diskspace estimation.

277 Physical / 125 VMs + room = 26 Interchange & for error / growth we will work with the number of 500 Agents, but the actual number will be nearer 420 Agents to allow for growth

DB Component

Grooming Interval (days)

Present conf. diskspace MB for 40 Agents

Estimated diskspace GB for 500 Agents

(40 Agents x 12.5 = 500)

Database

1) Ops Mgr DW DB 90 26796/16110 MB 201375+80560 (+40%) = 282 GB

2) Ops Mgr DB 7 2800 MB 35000+14000 (+40%) = 49 GB

3) Opalis DB N/A 355 MB 4437.5+1775 (+40%) = 07 GB

4) Reports Server DB 7 8.87 MB 0.10GB

5) Reports Server Temp DB 7 3.5 MB 0.04GB

SAN Mount Point 1 340+68 (+20%) = 408 GB

6) System Master DB N/A 4.0 MB

5GB

7) System Model DB N/A 1.3 MB

8) MSDBData DB N/A 13 MB

9) System Temp DB N/A 235 MB 25GB

SAN Mount Point 2 30+6 (+20%) = 36 GB

Transaction Logs

10) Ops Mgr DW DB Transaction Logs

Autogrow 200MB

130 GB

11) Ops Mgr DB Transaction Logs

Autogrow 200MB 30000 MB

12) Opalis DB Transaction Logs

Autogrow 200MB 435 MB

13) Reports Server DB Transaction Logs

Autogrow 200MB 22 MB

14) Reports Server Temp DB Transaction Logs

Autogrow 200MB 35 MB

15) Master DB Transaction Logs

Autogrow 200MB 1 MB

16) Model DB Transaction Logs

Autogrow 200MB 0.5 MB

17) Temp DB Transaction Logs Autogrow 200MB 77 MB

18) MSDB Data DB Transaction Logs

Autogrow 200MB 7 MB

Page 17: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 17 of 24

SAN Mount Point 3 130+26 (+20%) 156 GB

Quorum Drive

SAN Mount Point 4 0.5GB

Required disk space Total Circa 600GB

http://www.simple-talk.com/sql/database-administration/estimating-disk-space-requirements-for-databases/

6. Network Recommendations Network usage measurements and response times were analysed against a snapshot of data over the period of an hour in the pilot deployment of SCOM in ASMB. It is recommended that two devices share the same access switch so that traffic flows are carried on the switch fabric and not across the LAN via the distribution switch. There is an element of risk involved as it shows to impact performance caused by network latency therefore co-locating would eliminate this risk. For details of findings and network requirements analysis, please refer to the following document.

7. Storage Requirements

We assume the disk subsystems have at least the following capability:

• 125 random I/O operations per second per drive.

The database disks were sized based on the number of SCOM Agents, file sizes, 5 months of data,

projected growth and data retention values in the present Pilot in ASMB.

For example, the Operations Manager Data warehouse file is currently 26796MB, since July we have collected about 150 days (30*5) worth of data and we actually only wish to keep 90 days worth of data.

26796/150 = 179MB (data per day)

90*179 = 16110MB (90 days worth of data) In order to support the number of 500 Agents we need to multiply this number by 12.5 16110*12.5 = 201375MB Microsoft recommend adding 40% to this number for operations such as indexing etc. 201375/100 = 2014 (1%) 2014*40 = 80560 (40% of 201375) 201375 + 80560 = 281935MB This final figure of 282GB is the projected disk space needed for the SCOM Data warehouse

Page 18: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 18 of 24

In Assembly Blue there is less activity than in Production in general, but we have no yard stick to measure this by, so I would like to propose adding a further 20% buffer on to of the final total SAN disk space required (600GB). Our DBA was also consulted regarding the disk space requirements for the database files and transaction logs along with the following websites: http://www.simple-talk.com/sql/database-administration/estimating-disk-space-requirements-for-databases/ http://technet.microsoft.com/en-us/library/bb735402.aspx

8. Security

8.1 MacAfee Exclusions for System Centre Operations Manager

In order for System Centre Operations Manager to run effectively the following exclusions would need to be made on the MacAfee Epolicy orchestrator server this would be for managed systems as well as management servers and the RMS server. The following exclusions need to be applied to the management servers and monitored servers.

8.2 Operations Manager 2007 (management servers and agents):

These include the queue and log files used by Operations Manager.

Both of these need to be excluded: C:\Program Files\System Center Operations Manager 2007 D:\Program Files\System Center Operations Manager 2007\Health Service State\Health Service Store D:\Program Files\System Center Operations Manager 2007

8.3 Exclusion of File Type by Extensions

SQL Database Servers: These include the SQL Server database files used by Operations Manager components as well as system database files for the master database and tempdb. Examples: MDF, LDF Operations Manager 2007 (management servers and agents):

These include the queue and log files used by Operations Manager. Example: EDB, CHK, LOG.

SQL Database Servers:

Page 19: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 19 of 24

These include the SQL Server database files used by Operations Manager components as well as system database files for the master database and tempdb. To exclude these by directory, exclude the directory for the LDF and MDF files: Examples: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data D:\MSSQL\DATA E:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Log

8.4 Considerations

A group policy will be required to place the agent action account into the local administrators group

Testing Guide

• Test and verify cluster failover configuration on both the SCOM database and the RMS servers

• Test agent failover configuration

• Test cumulative update process for clustered root management server (Please refer to patching guide for further details)

• Analyse events from managed systems

8.5 Access Control Model

Required Accounts to be Created including Group Policy and Management Packs

The below table shows the accounts, permissions and groups memberships that are required, including any Group Policy changes for the SCOM deployment. Management Pack account requirements are also included.

SCOM Account Permissions Group Memberships and Extra rights

Use

Management Server Action Account (MSAA)

SCOMMSAA

Local Admin Access, Allow Log On Locally (Globally via Group Policy)

Local Users group, Local Performance Users group.

Password never expire, User cannot change password

Collect information and run tasks on managed systems

System Center Data Access

SCOMDA

Local Admin Access, Allow Log On Locally (Globally via Group Policy)

Local Users group, Local Performance Users group.

Register the SPN with Active Directory – grant service account’s SELF property the right to register and update SDK

Password never expire, User cannot change password

Collect information and run tasks on managed systems

Page 20: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 20 of 24

System Center Management Configuration, (data access and config)

SCOMCDA

Local Admin Access globally (Via Group Policy)

Password never expire, User cannot change password

Runs services and write data to Operational Database

Data Reader

SCOMDR

Local Admin on SCOM DB

Password never expire, User cannot change password

Query Reporting Services database

Data Warehouse Write Action

SCOMDWWA

Local Admin on SCOM DB

Password never expire, User cannot change password

Writes data to the data warehouse databases

MS SQL Server Action Account

SCOMSQL

Local Admin Access globally (Via Group Policy)

Password never expire, User cannot change password

Collect information and run tasks on managed SQL systems

Sharepoint Agent Action Account

SCOMSPAA

Local Admin Access

Password never expire, User cannot change password

Collect information and run tasks on managed Sharepoint systems. To be confirmed

Sharepoint Pool

SCOMSPP

Local Admin Access

Password never expire, User cannot change password

Advanced tasks for Sharepoint. To be confirmed

Sharepoint Config

SCOMSPC

Local Admin Access

Password never expire, User cannot change password

Runs services and write data to Operational Database. To be confirmed

Operations Manager Administrators accounts

SCOMADMIN

On all SCOM severs (Via Group Policy)

Administrators accounts. Specific accounts to be confirmed

The following table shows the groups and members that are required by SCOM, including any Group Policy changes necessary.

SCOM Groups Members Local Admin Access Miscellaneous

Operations Manager Administrators Group

SCOMADMIN, and any individual accounts for COLLAB 2

nd level.

On all SCOM Servers

Page 21: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 21 of 24

The following table shows Active Directory requirements for the SCOM deployment.

Additional Function Notes

OPSMGR Organisational Unit in the Active Directory for general

Required

OPSMGRSQL Organisational Unit in the Active Directory for MS SQL

To be confirmed

OPSMGRSP Organisational Unit in the Active Directory for Sharepoint

To be confirmed

Collaboration Support Team 2

nd Level Access to the SCOM Administrator Console

So far access SCOM for the Collaboration Support Team will be facilitated via RDTABs as with all other monitoring tools. All Collaboration Support Team members will have full SCOM administrator access and all the rights that go with this level of access.

Additional Management Packs to be imported

Default Management Packs covering a range of functions are installed by default, which are out of scope for this document. The following is a list of additional Management Packs to be imported into SCOM. The requirements for these Management Packs, if any, are covered in the previous section “Required Accounts to be Created including Group Policy and Management Packs”:

• Microsoft SharePoint 2010 Products

• Microsoft SharePoint Foundation 2010

• Microsoft.Office.LiveCommunicationsServer.2005

• Microsoft.SQLServer.2008.Monitoring

• SQL Server 2005 (Monitoring)

• Windows Server 2003 Operating System

• Windows Server 2008 Operating System

• Windows Server 2008 Cluster Management

• Windows Server 2003 Cluster Management

• Windows Server Internet Information Services 2003

• Respective Core Files and Libraries which are dependants of all the above

Page 22: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 22 of 24

9. Capacity Forecast & Schedule Availability via PAWZ

Capacity management is implemented through the standard X PAWZ trend analysis tool. Metrics should be reviewed monthly through the standard “under watch” process. This should equate to a monthly report that is issued by the Global capacity management group and subsequent review session to discuss any negative trend and mitigation activities that result. All servers will have at least 5GB free allocated for performance logs. This section will be work in progress, as the old SCOM Pilot in ASMB environment cannot really be used to collect accurate metrics to fill in the Capacity Risk Proforma. Accurate metrics will be available as soon as the SCOM deployment in ASMB is finished.

10. Disaster Recovery Backup & Restore

This section outlines the process required to backup and restore the clustered Root Management and Clustered SQL Database.

Tivoli Storage Manager will be used to backup the SCOM environment. The backup requirements for a clustered SCOM are as follows:

10.1 Backup SCOM Databases

• Operations Manager (Operational Database)

• Operations ManagerDW (Dataware house Database)

• Ops Mgr DB Transaction Logs

• Ops Mgr DW DB Transaction Logs

• Report Server (Reporting Server Database)

• Report Server TempDB (Reporting Server Temporary Database)

• Master (SQL Server Master Database)

• MsDbData (Msdb database) Other components

• Internet Information Services (IIS) 7.0 Metabase

• Internet Information Services (IIS) 7.0 configuration

• Root Management Server Encryption Key

• Create List of All Management Pack Installed (stored on shared network drive)

• Backup of Unsealed (customised) Management packs

• SCOM Registry Keys

10.2 Backup software Requirements

Each server requires the Backup Archive TSM client to be installed the clustered SQL database servers and clustered RMS servers also require the TSM SAN storage client to be installed.

10.3 List of servers to be backed up

• Clustered Root Management Server x2 servers

Page 23: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 23 of 24

• Clustered SQL Server x2

• Backup requirements for the Root management Cluster

The RMS cluster has shared resources on a shared drive; the following files need to be backed up

• Root Management Server Encryption key this key allows the management group to function • All management Packs which are currently installed (Power shell command Get-

ManagementPack | Export-csv c:\ManagementPackList-Nov-2011.csv”) will generate a

comprehensive list of management packs that SCOM is currently using, all unsealed management packs need to be backed up as well these can be exported to the network shared drive within the cluster on a regular basis.

• Gateway server certificate this should be exported to the shared network drive so that it can be backed up.

• Backup the following SCOM registry key Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\

export the registry key to a folder on the shared network drive within the cluster, the registry key can then be backed up by TSM with the rest of the files and restored to a new installation of SCOM if necessary.

10.4 Clustered SQL Server

• All the SCOM databases will need to be backed up by TSM a list above shows all the databases that need to be backed up

Management Servers

The management Servers can be rebuilt fairly quickly so no backups of these servers are required active directory integration will be used to allow agent failover so if a management server goes down the secondary server takes over the monitored systems that the failed server was monitoring.

Gateway Server

The gateway server does not require backing up as it will be a virtual machine and will carry all the inbuilt resiliency of VMware. If however a restoration of the gateway server is required than the certificate stored on the root management server will need to be imported to any new gateway serve

X Backup Strategy for SCOM as Recommended by Database Administrator Weekly Full backup of all SCOM databases Daily Differential of all SCOM databases Hourly Log backup in between database differential backups

Please refer to the xxx for further details on the specific backup process.

11. Maintenance and Best Practices

Systems Centre Operations Manager 2007 will start to accumulate large amounts of data after the system has been deployed into the X Messaging Environment.

Page 24: SCOM Design

PIA - SCOM V2.3 Draft

PIA - SCOM V2.3 Draft Page 24 of 24

To limit service interruption and protect your Operations Manager environment will further develop and implement a comprehensive and effective maintenance plan. This plan is based mainly on best practises from Microsoft, external website sources and our own experience, ensuring effective ongoing maintenance of our Operations Manager environment to improve performance and minimize the chances of failure.

Our maintenance will includes the following:

• Regular monitoring of both software and hardware.

• Frequent backups of databases and other critical data so that it can be later restored in case of

failure.

Please refer to the xxx for further details on the maintenance schedule and best practice recommendations by Microsoft.

12. Additional Costs

Name of item Cost per unit Total Server RAM Upgrades for 8 x DL360

£500 £4000

(HP Part# 397415-B21)

13. Risks & Issues

• One virtual Gateway server will be deployed to take advantage of built in VMware resiliency, specifically dynamic virtual hardware provisioning and the ability to create snapshots. These features will provide the ability to rapidly recover from most failures and is regarded by us as an accepted risk.

• Annual leave of staff due to time of year may delay the delivery of key milestones.

• BAU take priority over project work in the event of a major incident resources may be required to assist in service restoration.