Transcript
Page 1: Is your data center on the verge of a crisis?

© 2014 Uptime Institute

Is your data center on the verge of a crisis?

Julian Kudritzki Chief Operating Officer

Uptime Institute

Page 2: Is your data center on the verge of a crisis?

What Defines a Crisis?

2

Page 3: Is your data center on the verge of a crisis?

Tour of Operational Computer Room

3

Page 4: Is your data center on the verge of a crisis?

Looking for Clues

4

Page 5: Is your data center on the verge of a crisis?

Tour of ‘Live’ Critical Spaces

5

Page 6: Is your data center on the verge of a crisis?

Daily Practices Compromise Uptime, Safety, and Security

6

Page 7: Is your data center on the verge of a crisis?

•  Overtime hours exceeding 10% •  Voice mail boxes full •  Emails not responded to •  Email inbox size limit exceeded •  Meetings missed or routinely cancelled •  No time for training •  Shortage of qualified staff •  Personnel performing work outside their competency •  Everything is an emergency •  Personnel turnover

What Else Is Going On?

7

Page 8: Is your data center on the verge of a crisis?

•  Break fix budget exceeded •  Maintenance budget exceeded •  Energy cost estimate exceeded or unknown •  Last minute deployment requirements •  No organization chart •  No responsibilities matrix •  No records of maintenance activities •  No written policies & procedures •  No preventive maintenance schedule •  Back of the server looks like a spaghetti pot exploded

The Issues Add Up

8

Page 9: Is your data center on the verge of a crisis?

•  Cabling is not labeled or worse incorrectly labeled •  Equipment is not uniquely labeled •  Loads are consistently out of balance •  Capacities are not managed or tracked •  Deferred maintenance exceeds 10% •  Housekeeping: if it looks like a mess, it is a mess Maybe you don’t have a crisis, but how do you know how well your data center operation compares to rest of industry?

The Issues Add Up

9

Page 10: Is your data center on the verge of a crisis?

Are you confident in your Facilities team’s capability to manage a technologically advanced and highly efficient design to your 24 x 7 uptime requirements?

•  Can you easily replace any member of that team? •  Are you protected against poor operations practices

migrating from older sites to higher criticality data centers? •  Do you have sites that operate in isolation, ignoring global

corporate standards? •  Do you even have corporate global standards? •  If you outsource any aspect of your data center operations,

how do you avoid losing responsibility and accountability? •  Do you manage an outsourcing contract. . . . or direct an

expert team?

Ask the Tough Questions

10

Page 11: Is your data center on the verge of a crisis?

•  Initial review •  Gap analysis against industry best practices

§  Staffing and Organization §  Maintenance §  Training §  Planning, Coordination & Management §  Operating Conditions

•  Roadmap to operational excellence •  Plan changes •  Implement changes •  Monitor & refine •  Annual review

Path to Data Center Operations Success

11

Page 12: Is your data center on the verge of a crisis?

Key Elements of Facilities Management Staffing and Organization

•  Staffing •  Qualifications •  Organization

Maintenance •  Preventative Maintenance (PM)

Program •  Housekeeping Policies •  Maintenance Management

System (MMS) •  Vendor Support •  Deferred Maint. Program •  Predictive Maintenance •  Life-Cycle Planning •  Failure Analysis Program

12

Page 13: Is your data center on the verge of a crisis?

Key Elements of Facilities Management Training

•  Data Center Staff •  Vendors

Planning, Coordination, and Management

•  Site Policies •  Financial Management •  Reference Library •  Computer Room Mgmt.

Operating Conditions •  Load Management •  Operating Set Points •  Alternating Use of

Infrastructure Equipment

13

Page 14: Is your data center on the verge of a crisis?

The Uptime Institute over the years has observed management issues posing the largest risk to uptime physical infrastructure

•  Inadequate staffing •  Ineffective or non-existing maintenance and training programs •  Lacking processes and procedures •  Resulting in the majority of outages being caused by

‘human error’ No standard existed to help Owners/Operators determine

•  Common language/vocabulary  of  data  center  operations •  Focus of data center management •  Resource allocation •  Resource requirements

Genesis of Industry Best Practices

14

Page 15: Is your data center on the verge of a crisis?

Data Center Owners / Operators / End Users •  Increased availability and cost savings •  Multi-site consistency •  Benchmark for continuous monitoring and refinement

Colocation / Managed Services Sites

•  All of the above plus… •  Customer assurance of consistency •  Competitive differentiator (attain & retain certification)

Industry Benchmark

•  No need to reply on opinions and anecdotes

Value of Industry Best Practices

15

Page 16: Is your data center on the verge of a crisis?

Uptime Institute has been conducting Operational Sustainability Reviews for approximately 3 years— based upon decades of site operations knowledge and experience:

•  Operational Sustainability Certifications: Tier + Gold, Silver, or Bronze •  Management & Operations (M&O) Stamps of Approval

See http://uptimeinstitute.com/publications for Tier Standard: Operational Sustainability

Best Practices Reviews

16

Page 17: Is your data center on the verge of a crisis?

Staffing •  Inadequate staffing •  Excessive overtime (over 10%) •  No escalation process

Qualification

•  No list of required qualifications •  No experience with data center specific equipment

Organization

•  Roles and Responsibilities not documented •  Data center organization not integrated

Staffing and Organization Significant Findings

17

Page 18: Is your data center on the verge of a crisis?

Preventive Maintenance (PM) •  No list of required PM activities •  PM activities not fully scripted •  No quality control process

Housekeeping

•  Combustibles in the data center •  No documented housekeeping policy

Maintenance Management System (MMS)

•  No list of equipment •  Missing critical data: warranty info, maintenance history, performance

data, etc.

Maintenance Significant Findings

18

Page 19: Is your data center on the verge of a crisis?

Vendor Support •  Contracts missing response times, call-in process, detail SOW, or

technician qualifications Deferred Maintenance

•  Unable to produce Deferred maintenance report from MMS Predictive Maintenance

•  No predictive maintenance program •  Not comparing current results with previous results

Maintenance Significant Findings

19

Page 20: Is your data center on the verge of a crisis?

Life-Cycle Planning •  No life-cycle plan •  Not using MMS data to develop plan

Failure Analysis •  No record of outages or near misses

Maintenance Significant Findings

20

Page 21: Is your data center on the verge of a crisis?

Data Center Staff •  Undocumented On-the-Job (OJT) programs •  No formal qualification program •  No list of training required by position •  No formal training program with lesson plans, etc.

Vendors •  No briefing for escorted vendors

Training Significant Findings

21

Page 22: Is your data center on the verge of a crisis?

Load Management •  Alarm settings not documented •  Alarms not set on PDUs to ensure maximum loads are not exceeded

Operating Set Points •  Cooling set points are not document or part of

Change Management Process •  Changing of set points is not controlled

Operating Conditions Significant Findings

22

Page 23: Is your data center on the verge of a crisis?

Site Policies •  Missing Site Policies •  Especially Site Configuration Policy

Reference Library

•  No process for keeping documents up-to-date

Capacity Management •  No process for forecasting future space, power, and cooling

requirements •  No active tracking of cooling capacity •  Ineffective management of Cold Aisles /Hot Aisles •  Electrical power monitoring (balancing phases)

Planning, Coordination, and Management Significant Findings

23

Page 24: Is your data center on the verge of a crisis?

Facilities •  Operate and maintain the critical facility infrastructure •  Support the installation of IT equipment (space, power, & cooling)

IT Management •  Operate and maintain IT hardware, software, applications, and

network connectivity •  Manage the installation/de-installation of IT equipment

Security •  Access Control •  Physical Security

Typical Data Center Disciplines

24

Page 25: Is your data center on the verge of a crisis?

Functionally Separate Organization •  Corporate Real Estate (Facilities) •  IT •  Security

Communication between organizations was typically poor

•  Data center activities conducted without coordination •  Poor future space, power, and cooling planning

No individual responsible for all aspects of operating a data center

Past Organizational Structures

25

Page 26: Is your data center on the verge of a crisis?

Factors driving changes to organizational structure •  Rapid changes in technology and speed at which capacity must be

brought online •  Increased costs associate with IT and Facilities •  Business objectives of continuous computing availability

Legacy organizations could not accommodate quickly evolving business requirements

•  Slow to respond •  Not integrated

Evolving Organizational Structure

26

Page 27: Is your data center on the verge of a crisis?

The value of industry best practices is in the process of continuous improvement

•  Discovery leads to learning •  Learning leads to change •  Change leads to improvement •  Regular reviews leads to discovery •  Crises can be avoided

Summary

27

Page 28: Is your data center on the verge of a crisis?

For more information contact: Julian Kudritzki

[email protected] 206.706.4143

Questions?

© 2014 Uptime Institute 28


Top Related