data center downtime feb2011

39
The Truth and Consequences of Data Center Downtime © 2011 Emerson Network Power

Upload: emerson-network-power

Post on 23-Mar-2016

223 views

Category:

Documents


0 download

DESCRIPTION

Data Center Downtime Feb2011

TRANSCRIPT

Page 1: Data Center Downtime Feb2011

The Truth and Consequencesof Data Center Downtime

© 2011 Emerson Network Power

Presenter
Presentation Notes
Thom: Hello, everyone and welcome to our Emerson Network Power Business Innovators Series webcast: “The Truth and Consequences of Data center Downtime.” I’m Thom Gall, and I’ll be your host. As you know, our webcast program is certified by the International Association for Continuing Education and Training. So attending a 1-hour Web cast qualifies you to earn one-tenth of a CEU training credit. Just participate in all 60 minutes of our broadcast today, and you’ll receive your certificate via e-mail within 2-3 weeks. Today, we’ll be exploring two very interesting studies on the tangible and intangible costs of downtime in the data center, as well as the proactive measures you can take to prevent some of the most frequent causes of downtime events. This presentation deck will be available to download after the Webcast – or you can click the “download slides” button, on the bottom right-hand side of your console. I’ll give a brief introduction of Emerson Network Power and its Liebert products and services, then we will move on to our presenters.
Page 2: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Emerson Network Power: The global leader in enabling Business-Critical Continuity

Automatic Transfer Switch

ParallelingSwitchgear

Uninterruptible Power Supplies & Batteries

Fire Pump Controller

Surge Protection

Extreme-DensityPrecision Cooling

Perimeter Precision Cooling

Power Distribution UnitsData Center Infrastructure Management

Integrated Racks

Cooling

RackRack Power

Distribution Unit

KVM Switch

UPS

Monitoring

Cold Aisle Containment

Row Based Precision Cooling

© 2011 Emerson Network Power

Presenter
Presentation Notes
Emerson Network Power is an Emerson business and the global leader in enabling Business-Critical Continuity. That means they provide the technology that powers and protects the critical systems business depends on, like servers and communications equipment. Through its Liebert AC power, precision cooling and monitoring products and services, Emerson Network Power delivers *Efficiency Without Compromise* by helping customers to optimize their data center infrastructures so as to reduce costs and deliver high availability.
Page 3: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Emerson Network Power –An organization with established customers

© 2011 Emerson Network Power

Presenter
Presentation Notes
Through their solutions and their service organization, Emerson Network Power helps every company in the Fortune 500 keep its systems running. Here is just a sampling of some of the recognizable organizations that rely on Emerson Network Power to keep their business in business. Emerson Network Power has customers in virtually every industry, including telecommunications, computing, healthcare, transportation, manufacturing, web-hosting, and banking and finance.
Page 4: Data Center Downtime Feb2011

© 2010 Emerson Network Power

• Emerson Network Power overview

• National Survey on Data Center Downtime: Frequency, Duration and Cost, Dr. Larry Ponemon, Founder and President, Ponemon Institute

• Preventing the Most Common Causes of Downtime: Root Cause Analysis, Best Practice Prevention and Technology, Peter Panfil, Vice President and General Manager, Liebert North America AC Power, Emerson Network Power

• Question and Answer session

Presentation topics

© 2011 Emerson Network Power

Presenter
Presentation Notes
Now a little bit about how our Webcast will run. In just a minute, I’ll introduce Dr. Larry Ponemon, Founder and President of the Ponemon Institute, who will discuss the results of the Institute’s recent studies on the causes and costs associated with data center downtime. Then, we’ll turn things over to Peter Panfil, Vice President and General Manager for the Liebert North America AC Power business of Emerson Network Power who will suggest several strategies for preventing the downtime events identified in the studies. You can ask questions through the console throughout the Webcast, and we’ll select several to be answered during our closing Q&A. To ask a question, click the questions tab on the bottom of your screen. Type your question in the box along with your company name and submit it.
Page 5: Data Center Downtime Feb2011

National Survey on Data Center Downtime: Frequency, Duration and Cost

Dr. Larry Ponemon Founder and PresidentPonemon Institute

© 2011 Emerson Network Power

Presenter
Presentation Notes
Thom: And now I’m pleased to introduce Dr. Larry Ponemon, Founder and President of the Ponemon Institute, a research “think tank” dedicated to advancing privacy and data protection practices. Dr. Ponemon consults with leading multinational organizations on global privacy management programs and has extensive knowledge of regulatory frameworks for managing privacy and data security including financial services, health care, pharmaceutical, telecom and Internet. And Larry, I understand that your work in the data center space began with researching the cost of data security breaches, correct? Larry: … … During the Q&A Session.
Page 6: Data Center Downtime Feb2011

© 2010 Emerson Network Power

• The Institute is dedicated to advancing responsible information management practices that positively affect privacy, data protection and information security in business and government

• The Institute conducts independent research, educates leaders from the private and public sectors and verifies the privacy and data protection practices of organizations

• The Institute is a member of the Council of American Survey Research Organizations (CASRO), and Dr. Ponemon serves as CASRO’s chairman of Government and Public Affairs Committee of the Board

• The Institute has assembled more than 50 leading multinational corporations called the RIM Council, which focuses the development and execution of ethical principles for the collection and use of personal data about people and households

About the Ponemon Institute

© 2011 Emerson Network Power

Presenter
Presentation Notes
… basically who we are and what we do.
Page 7: Data Center Downtime Feb2011

© 2010 Emerson Network Power

• Purpose: Determine the frequency and cost of unplanned data center outages

• Study 1: 453 individuals in U.S. organizations who have responsibility for data center operations– Perceptions about data center criticality, availability and outages– Perception differences between executives and associates

• Study 2: Develop an activity-based costing model derived from actual meetings or site visits for 41 data centers that experienced a complete or partial unplanned data center outages to capture both direct and indirect costs related to:– Damage to mission critical data– Impact of downtime on organizational productivity– Damages to equipment and other assets– Cost to detect and remediate systems and core business processes– Legal and regulatory impact, including litigation defense cost– Lost confidence and trust among key stakeholders

About the studies

© 2011 Emerson Network Power

Presenter
Presentation Notes
…conversation over the next few minutes.
Page 8: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Perceptions about data center availability

© 2011 Emerson Network Power

Agree: Combines strongly agree and agree responsesDisagree: Combines strongly disagree, disagree and

unsure responses

Presenter
Presentation Notes
…or prevent a significant data center outage.
Page 9: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Perception differences between senior management and operators

© 2011 Emerson Network Power

Supervisor and belowDirector and above

Presenter
Presentation Notes
…favorable view than rank-and-file employees.
Page 10: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Experience with unplanned data center outages

© 2011 Emerson Network Power

Experienced one or more unplanned outages data center

over the past 24 months

Frequency of unplanned data center outages

over the past 24 months

Total data center outage: Entire facility is downPartial outage: Limited to individual rows and rack

Device-level outage: Individual servers and IT units

Presenter
Presentation Notes
..experience an unplanned data center outage.
Page 11: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Extrapolated duration of data center outages in minutes

© 2011 Emerson Network Power

Total data center outage: Entire facility is downPartial outage: Limited to individual rows and rack

Device-level outage: Individual servers and IT units

Presenter
Presentation Notes
… but they can have devastating consequences to organizations.
Page 12: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Extrapolated frequency of complete data center outages by square footage

© 2011 Emerson Network Power

Freq

uenc

yD

urat

ion

Presenter
Presentation Notes
…still seeing outages that are very significant.
Page 13: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Extrapolated frequency of complete data center outages by industry

© 2011 Emerson Network Power

Extrapolated frequency of unplanned outages over two years

Presenter
Presentation Notes
…highest end there in terms of frequency.
Page 14: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Study 2: Activity-based cost framework for the cost of data center outages

© 2011 Emerson Network Power

Interviewed and audited 41 data center managers who experienced an unplanned outage

Presenter
Presentation Notes
… factor that into the model, moving on to the next slide.
Page 15: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Cost loadings from ABC Framework

Cost activity centers Direct cost

Indirect cost

Opportunity cost Total

Detection 52% 48% 0% 100%Equipment cost 60% 40% 0% 100%IT productivity loss 23% 77% 0% 100%End-user productivity loss 22% 78% 0% 100%Third parties 35% 41% 24% 100%Recovery 22% 78% 0% 100%Ex-post response 53% 47% 0% 100%Lost revenue 33% 26% 41% 100%Business disruption 24% 30% 45% 100%

Average contribution 36% 52% 12%

© 2011 Emerson Network Power

Interviewed and audited 41 data center managers who experienced an unplanned outage

Presenter
Presentation Notes
…cost is that indirect category.
Page 16: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Average cost by category

© 2011 Emerson Network Power

Results shown are derived from the analysis of 41 data centers located in the United States

Presenter
Presentation Notes
… significant part of that cost category.
Page 17: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Total cost by industry sector

© 2011 Emerson Network Power

The average duration of the outage for the 41 data centers was 102 minutes

Presenter
Presentation Notes
…the moral of this story, this picture.
Page 18: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Total cost for partial and total shutdown

© 2011 Emerson Network Power

Results shown are derived from the analysis of 41 data centers located in the United States

Presenter
Presentation Notes
….and back to you Thom.
Page 19: Data Center Downtime Feb2011

Preventing the Most Common Causes of Downtime: Root Cause Analysis, Best Practice Prevention and Technology

Peter Panfil Vice President and General managerLiebert North America AC PowerEmerson Network Power

© 2011 Emerson Network Power

Presenter
Presentation Notes
Thom: Thanks very much Larry for sharing such compelling data on the costs of downtime. And now, to provide some suggestions on just how your business can avoid incurring those costs is Peter Panfil, Vice President and General manager for the Liebert North America AC Power business of Emerson Network Power. With more than 30 years of experience in embedded controls and power, Peter Panfil leads global market and product development for Emerson Network Power’s Liebert AC Power business. He also works to apply the latest power and control technology to industry-proven topologies to provide the highest availability systems for business-critical applications. Peter, welcome back to our webcast program. Peter: Thanks Thom. (ad lib) Thom: Peter, now that we understand just how costly data center downtime can be– what can we do to prevent it? Peter: …they were certainly for me.
Page 20: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Were the unplanned outages during the 24 months preventable?

© 2011 Emerson Network Power

Presenter
Presentation Notes
…80 percent of them were preventable.
Page 21: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Total cost by industry sector

© 2011 Emerson Network Power

Data centers experienced multiple outages duringthe 24 month period surveyed

Presenter
Presentation Notes
….employ to eliminate them.
Page 22: Data Center Downtime Feb2011

© 2010 Emerson Network Power

• 65% of outages caused by battery failure

• Service life of a battery varies, dependant on:– Frequency of usage– Ambient temperatures– Quality of connections and terminals

• The weakest link in critical power

#1: Battery failure

© 2011 Emerson Network Power

How?

A single bad cell among thousands can take down a facility

Batteries have a limited life expectancy

False confidence; no indication of problems until needed

Presenter
Presentation Notes
…avoid those at all costs.
Page 23: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#1: Battery failure

© 2011 Emerson Network Power

Best Practice: Preventive Maintenance• Service contracts for inspections and testing

– Monthly, quarterly and annual actions need to be taken

Presenter
Presentation Notes
… personnel there to back you up.
Page 24: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#1: Battery failure

© 2011 Emerson Network Power

Best Practice: Real-Time Monitoring• Measure the internal DC resistance of all battery cells• Combination of hardware and software

– Alarm management via email and SMS– Measures the reliability of the entire battery

• Strap• Inter tier connections• Plates• Battery connection posts/ terminals

• Proactively indentify and replace bad batteries

White Paper: Implementing Proactive Battery Management Strategies

to Protect Your Critical Power System

Presenter
Presentation Notes
…available for you to take a look at.
Page 25: Data Center Downtime Feb2011

© 2010 Emerson Network Power

IT usage is variable, not static

IT gets added without knowledge of infrastructure impact

Redundant UPS loaded over 50%Should UPS or battery failure occur, the remaining UPS cannot support 101% of the load

• 53% of outages caused by lack of UPS capacity

• IT growth outpaces AC Power infrastructure growth• Disconnect between Facilities and IT

– The owner of the UPS might not be IT

• Battery runtime is also dependant on how much load is being supported

#2: UPS capacity exceeded

© 2011 Emerson Network Power

How?

Presenter
Presentation Notes
…relook at your load-events.
Page 26: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#2: UPS capacity exceeded

© 2011 Emerson Network Power

Best Practice: Additional UPS Cores for capacity and redundancy

• Keep redundant UPS capacities at 30% - 40%– IT load must not exceed the total capacity of a single UPS– Efficiency of the Liebert NXL optimized at partial loads

• Size the new UPS system on best-case growth• Real-time capacity monitoring to manage load balancing• UPS configured in a parallel redundant configuration

Some data centers willing to trade redundancy for capacity – analyze the costs,

risks and benefits

Presenter
Presentation Notes
…and their associated costs are well understood.
Page 27: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#2: UPS capacity exceeded

© 2011 Emerson Network Power

• Options for parallel redundant UPS

White Paper: High-Availability Power Systems, Part II: Redundancy Options

UPSCore STS

UPSCore

SS SS SS

System Control Cabinet Paralleling Cabinet

UPSCore

UPS Core

UPSCore

UPSCore

IT Load IT Load

N+1Centralized static transfer switch

System-level control, fault tolerant Size of STS determines total capacity

1+NDistributed static switches

Individual cores manage load transfers Cannot parallel different sized UPS

Presenter
Presentation Notes
…white paper that can show you those details.
Page 28: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Pushing the EPO thinking it’s a light switch

Improper equipment operation could drop the entire facility

Careless installation of servers damages infrastructure

• 51% of outages caused by user error

• Many people involved in data center operation– Too many cooks…– Alarms and control panels everywhere

• 100% preventable• Most cost-effective root cause to solve

#3: Accidental EPO / Human error

© 2011 Emerson Network Power

How?

Presenter
Presentation Notes
…downtime causes in terms of accidental human error.
Page 29: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#3: Accidental EPO / Human error

© 2011 Emerson Network Power

Best Practice: Documentation, Standard Procedures, Training and Remote Monitoring

Shield EPODocumented Maintenance Procedures

LabelingOne-Lines

Follow Processes; No Short

CutsPersonnel Training

Keep it Clean

No Food or Drink

Escort Visitors

Infrastructure Monitoring

Presenter
Presentation Notes
…trained on using those procedures.
Page 30: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#3: Accidental EPO / Human error

© 2011 Emerson Network Power

• Best practices for EPO– A / B EPO in A / B data centers– Separate EPO from the fire alarm– Remove local EPO from UPS and PDUs– Provide physical protection– Provide maintenance and test features– Document and label– Training

• 2011 code changes– NFPA 70 – 645-10, Disconnecting Means

Presenter
Presentation Notes
…in a way that protects our personnel.
Page 31: Data Center Downtime Feb2011

© 2010 Emerson Network Power

UPS has components with a finite life, some need replaced

UPS repaired with non-OEM parts

Blame the UPS when it’s really the batteries

• 49% of outages caused by UPS failure

• Reliability of a UPS only lasts as long as the shortest component life– Liebert design philosophy addresses this issue by reducing the number

of parts, thus decreasing the chance of a failure

• UPS designed to prevent outages, not cause them

#4: UPS equipment failure

© 2011 Emerson Network Power

How?

Presenter
Presentation Notes
…across all equipment in your data center.
Page 32: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#4: UPS equipment failure

© 2011 Emerson Network Power

Best Practice: Preventive Maintenance by an experienced technician

• At least two PM visits per year• OEM technician using OEM parts and calibration• MTBF for units that received two PM’s is 23 times higher than a

machine with no PM service events per year

White Paper: The Effect of Regular, Skilled Preventive Maintenance on Critical Power System Reliability

Presenter
Presentation Notes
…is available for your review as well.
Page 33: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Cooling leaks and chilled water distributed in-row

Repairs to in-row cooling causes chilled water leaks

Server densities are rising, so is the heat

• 35% of outages caused by water incursion• 33% of outages are heat-related

• As densities increase, cooling is brought closer to the IT load– For some in-row cooling products, water is on top of, next to and below

critical electrical equipment– Solving the heat problem, but causing a water problem

#5: Heat- and water-related

© 2011 Emerson Network Power

How?

Presenter
Presentation Notes
…not there, can’t break.
Page 34: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#5: Heat- and water-related

© 2011 Emerson Network Power

Best Practice: Utilized refrigerants, easier maintenance and leak detection monitoring

• R410A and Glycol for row-based units– Eliminate the need for water in the row

• Monitor for leaks under the floor

• Importance of easy maintenance for row CW units– Do you need to remove the in-row unit for repair?

Refrigerant-based high density cooling Front and rear parts

accessPoint or zone detection

Presenter
Presentation Notes
…attack them very quickly.
Page 35: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#5: Heat- and water-related

© 2011 Emerson Network Power

Best Practice: Optimized airflow• Containment

– Increases cooling capacity and energy efficiency

• Temperature sensors– Supply and return– Rack-level

• Utilize temperature data to control and optimize cooling output– Variable Speed Drives– Digital Scroll Compressors

White Paper: Combining Cold Aisle Containment with Intelligent Control to Optimize Data Center Cooling

Efficiency

Presenter
Presentation Notes
…controls for you to review.
Page 36: Data Center Downtime Feb2011

© 2010 Emerson Network Power

#5: Heat- and water-related

© 2011 Emerson Network Power

• Optimized airflow not only prevents heat-related outages, it improves cooling efficiency

Requires less fan power per kW of coolingLeverages variable fan speed control

Operates with digital scroll technology for variable capacity controlUp to 33% efficiency gain

Digital CompressorVariable Speed Fan

Presenter
Presentation Notes
…conventional approaches.
Page 37: Data Center Downtime Feb2011

© 2010 Emerson Network Power

What could be done to prevent unplanned outages in the future?

© 2011 Emerson Network Power

How to make the case for more resourcesand budget?

What can be done short-term?

Presenter
Presentation Notes
…and enhance monitoring.
Page 38: Data Center Downtime Feb2011

© 2010 Emerson Network Power

1. Educate your senior leaders on frequency and impact of downtime on your business– 56% of senior leaders think downtime doesn’t happen often

2. Utilize Cost of Downtime data to justify infrastructure improvements– Develop a business case or your own ABC model

3. Grab the “low-hanging fruit”– No cost to ensure IT staff doesn’t bring a Big Gulp onto the server floor

4. Conduct assessments and audits– Assess batteries, capacity, airflow– vendors can help

5. Talk to your infrastructure vendors– Service contracts, new technology, more best practices

Next steps

© 2011 Emerson Network Power

Presenter
Presentation Notes
…thanks for your time, I appreciate it.
Page 39: Data Center Downtime Feb2011

© 2010 Emerson Network Power

Dr. Larry Ponemon, Founder and President, Ponemon Institute• National Survey on Data Center

Outages• Coming Soon: Cost of Data

Center Outages

Q & A, further reading

Peter Panfil, Vice President and General Manager, Liebert North America AC Power, Emerson Network Power• Addressing the Leading Root

Causes of Downtime

© 2011 Emerson Network Power

Presenter
Presentation Notes
Thom: Thanks very much Peter. At this time, Larry and Peter will be happy to take your questions. We can still accept your questions, which you can submit by clicking on the questions tab and typing your question in the box. Our first question comes from … BEGIN LIVE Q&A ….We’ll need to wrap up the questions now, but Peter and Dr. Ponemon will be answering the questions we didn’t have time for via e-mail. You can also learn more in the two white papers appearing as pop-ups on your screen now: The National Survey on Data Center Outages by the Ponemon Institute, and “Addressing the Leading Root Causes of Downtime,” an accompanying white paper from Emerson Network Power which offers even more actionable strategies you can implement to address the leading causes of data center downtime. In addition, I encourage you to stay tuned to EmersonNetworkPower.com for the upcoming publication of the second Ponemon Institute research study on the financial cost of data center outages and a second accompanying white paper from Emerson Network Power. On behalf of Emerson Network Power, I’d like to thank you all for your time and attention today during our webcast program. And I encourage you to stay tuned to your e-mail for information on a new webcast series from Emerson Network Power, which will continue to provide valuable information on data center best practices from Liebert Products and Services and other Emerson Network Power brands. Sign Off