heres what goes catastrophically wrong when you dont follow your it process

of 10 /10 Here’s What Goes Catastrophically Wrong When You Don’t Follow Your IT Process On November 18, 2014, Microsoft Azure went dark. Thousands of the cloud computing service’s customers experienced downtime on their sites for over 9 hours. When they flooded Microsoft customer support to ask what the hell had gone wrong, customers learned that it wasn’t some glitch, natural disaster, or devious hacking scheme. It was pure human error .

Author: liz-angelene-verano

Post on 19-Feb-2017




0 download

Embed Size (px)



    Heres What Goes Catastrophically Wrong When You DontFollow Your IT Process

    On November 18, 2014, Microsoft Azure went dark. Thousands of the cloud computing services customersexperienced downtime on their sites for over 9 hours. When they flooded Microsoft customer support to ask what thehell had gone wrong, customers learned that it wasnt some glitch, natural disaster, or devious hacking scheme.

    It was pure human error.


  • Microsoft deployed an update without running through the standard operating guidelines specifically laid out for thisscenario. Instead of rigorously checking that the update was good to go, engineers shipped it on the assumption itwas bug-free. This wasnt just this one-time incidentengineers at Azure regularly violated standard operatingprocedure because 99.9% of the time, it was a total waste of their resources.

    And its not just Azure that does thishabitual violation of process leads to a ton of mistakes in all kinds of ITarenas.

    When I was 16, I landed my first real job which so happened to be in IT. I had just passed my CCNA with the helpof my dad (apparently the youngest person in Australia at the time) and was hungry to get my hands on some realtechnology.

    But once I started, I found there was rigid process everywhere, and configuring a live Windows 2000 server was atotally different experience to tinkering with my fathers machines at home, with serious consequences for notfollowing the process.

    As Microsoft hardware designer Dan Luu writes,

    How many technical postmortems start off with someone skipped some steps because theyreinefficient, e.g., the programmer force pushed a bad config or bad code because they were surenothing could go wrong and skipped staging/testing?

    His answer: a lot. In fact, a lot of major IT mess-upsthe high-profile Sony hacks, cloud computing outages,downtime on major websitesare totally preventable. They happen because IT people dont follow the processesput in place to prevent them from happening. Whats more, they dont even care, until it becomes a catastrophe.

    This behavior is actually a widely-studied sociological phenomenon called the normalization of devianceand ithas huge implications for your computing security.

  • Why Deviance Becomes the Norm

    The term normalization of deviance was coined by Columbia University sociology professor Diane Vaughan. Shedescribes it as: a gradual process that leads to a situation where unacceptable practices or standards becomeacceptable, and flagrant violations of procedure become normaldespite that fact that everyone involved knowsbetter.

    Essentially, its when negligence becomes the norm. In IT scenarios, its often most noticeable when a new personjoins the team. Heres common scenario, in Dan Luus words:

    [New person joins the team, discovers bad IT processes]New person: WTF WTF WTF WTF WTFOld hands: Yeah we know, were concerned about it.New person: WTF WTF wTF wtf wtf w[New person gets used to bad IT processes]

    [New person #2 joins]New person #2: WTF WTF WTF WTFNew person #1: Yeah we know. Were concerned about it.

    The thing thats really insidious here, Dan writes, is that people will really buy into the WTF idea, and they canspread it elsewhere for the duration of their career. Things that should be eradicated are totally normalized, whetherits not wearing gloves in a hospital or not double-checking that you have the right inventory inside PlayStationpackaging.

    As security expert Bruce Schneier argues, normalization of deviance has huge implications in the field of IT,especially because its easy for problems to go unnoticed until they reach a large scale.

    Here are a couple scenarios of what can happen when IT deviates from processes.

    1. Hackers Steal Your Customers Data

    For a companies like the social sharing tool Buffer, trust is a crucial. Customers trust Buffer with their passwordsand login info to essentially post things to Facebook and Twitter for them. And when Buffer was hacked in 2013,customers experienced perhaps the most flagrant violation of that trust: Buffer used their logins to post spam.

  • Buffer was extremely open about the security breach, sending customers regular updates about the appsfunctionality and ultimately, a post-mortem explaining what had happenedincluding a list of security measuresthey put in place after the fact.

    The real problem with the hack was that once hackers got into Buffers system, the data was right there and totallyusable since Buffer hadnt encrypted any of the data.

    This was of the most important measures Buffer implemented after the fact. As they told customers, we have addedencryption of OAuth access tokens and we have changed all API calls to use an added security parameter. Problemwas, it was too late.

    Deviant Behavior: Not Encrypting Data

    Its the classic out of sight, out of mind problem, the same way that you dont need health insurance until yourediagnosed with a rare skin disease. If youre planning on getting hacked, its not worth the effort to encrypt data. Butthen again, who plans on getting hacked, or getting a rare skin disease?

    Its basic common sense that every IT department needs to follow. They really need regularly check up on andenforce these processes.

  • As Auth0 says, normalization of deviance in security is totally normal, especially when companies are focused onother things. Moreover, security gets de-prioritized on your roadmap because its hard to build and isnt immediatelyurgent.

    They table security to tackle more pressing issues. By consistently doing so, they get accustomed to sweeping thisdeviant behavior under the rug to the extent that they dont even consider it deviant anymore. Its totally normalized,even though any outsider would recognize that its bad practice and needs to be fixed immediately.

    What You Can Do

    This de-prioritization is why a lot of companies elect to opt out of important security steps like multi-factorauthentication or encrypting data, which saved the day in the Patreon hack. Because the data was encrypted, noones credit card information was stolen.

    An easy way to circumvent this deviant behavior is to outsource the issue to another provider. Tools like Auth0 takecare of your security needs, providing code for features like multi-factor authentication, so you dont have to worryabout it.

    But even if your company has these in place, its important to regularly run through checklists to make sureeverything is up to date. Try running Process Streets checklist on network security management every coupleweeks, to provide accountability on a company-wide scale.

    2. You Literally Lose Your Data

    In 2010, Zurich Insurance was fined a record 2.28 million pounds for losing personal details on 46,000policyholders, which happened because they lost an unencrypted data tape in a routine transfer.

  • The real problem in this casethe one that caused the Financial Services Authority real concernwas that theywerent adhering to guidelines that are in place. As the Telegraph reported, Zurich Insurances big blunder was that itdidnt have controls in place to prevent the lost data being used for financial crime.

    Its sloppy behavior, but more common than you might think. Gartner estimates 28 percent of corporate data isstored only on endpoint devices. You lose the device, you lose the data.

    Deviant Behavior: Relying Solely on Endpoint Devices

    The company certainly knew they should have had these controls in place, but somewhere along the line they hadgrown accustomed to not having them. The deviant behavioronly relying on endpoint deviceswas normalizedover time.

    Zurich Insurance became so used to doing this the wrong waythe way that a new hire might say WTF WTFWTFbut everyone became accustomed to it.

  • The company didnt do anything about it because it had fallen off their radar. Since the bad behavior wasnormalized, there wasnt a pressing need to change their processes (or so they thought). The bad process onlybecame a problem once it was too late.

    What You Can Do

    Changing deviant behavior is hard, but sometimes its just a matter of constantly reminding teams about values andreinforcing your IT processes. Take a look at Process Streets Client Data Backup Best Practices checklisthavingteams run through this every couple of weeks puts your companys processes back on everyones radar.

    This checklist in particular is a reminder that even the small stuff that you might think is irrelevant, like verifying thatyou have the right kind of equipment, labeling tapes properly, or setting up recurring reminders to automaticallyperform a differential backup, can make a huge difference.

    Deviant behavior will actually feel deviant, which makes teams want to address it.

    3. Your Server Crashes When You Need it Most

    In 2011, Italian designer Margherita Maccapini Missoni, whose clothes often put you back for upwards of $4,000,developed a limited-edition line at Target. The release of the line was highly anticipated, and while it made long linesoutside brick-and-mortar stores, it caused Targets website to come crashing down.

  • Its a little bit embarrassing for one of the nations largest retailers to have a Web site that cant support a rush its not like theyre any strangers to rushes, said Ian Schafer, chief executive of the digital marketing firm DeepFocus.

    They managed to handle a lot of other rushes, but not this one. Why? That testing went out the window. Theyfigured it wouldnt be a big problem, so they neglected to perform the tests they normally would for a big event likeCyber Monday.

    There are a lot of reasons servers crash, but its no excuse for not preparing or preventing the scenario fromhappening. Its a big problem with customer retention. 74% of visitors leave your site after hitting a 404 error pageand theyre much less likely to return in the future.

    Deviant Behavior: Neglecting Server Maintenance

    The problem here is that IT department knew that they should be doing better, but they ignored it. Its reallycommon. As CEO Steve Klein of says, theres a good reason people put off doing servermaintenance, or creating an external status page:

    The idea of creating and maintaining a status page outside of your infrastructure is like a cold thatwont go away. You should probably see the doctor, but keep putting it off until suddenly youre on

  • your way to the ER coughing up a lung. Similarly, a status page is one of those things you shouldcreate before you experience downtime, but end up building at 4am in the morning when your hostingprovider goes down.

    The behavior is deviant, but everyone is so accustomed to it, that they dont take action. You dont care that behavioris deviant until it becomes a problem. And oftentimes, like in the case of the Target crash, they dont prepare forthose emergencies.

    What You Can Do

    IT should regularly run through a checklist to make sure theyre not missing any steps of routine servermaintenance. Check out Process Streets server maintenance checklist for a template you can customize to yourown needs. Setting regular reminders will help your team do the important stuff like setting up an external statuspage so that if your site goes down, your visitors wont encounter that dreaded 404 error.

    Its especially important to run through this checklist if youre anticipating any changes to your server, like a surge intraffic.

    When Paper magazine released nude photos of Kim Kardashian, they knew they ran a risk of their serverspractically melting. Their response? Anything but deviancethey strictly adhered to processes, rigorously checkingthat the site would stay functional. You can test the server load capacity of your store with tools like LoadImpact.comor

    Bottom Line: Battle the Culture of Complacency

    Its really easy to get complacent, especially when were so reliant upon technology to do the work for us. Its easy tolet things slide, but it can have huge repercussions. Theres no excuse for human error, especially when there are somany automation tools at your disposal to prevent them.

    Checklists are a powerful tool to hedge against human error and the normalization of deviance. Theyre a toolsurgeon Atul Gawande uses to combat the very same problem hes witnessed in his operating room. Essentially,

  • deploying checklists is a reminder that our deviant behavior is actually deviant, and not normal. It prevents us fromgetting accustomed to bad behavior.

    This changes the very culture behind adhering to processes. As Atul writes in the Checklist Manifesto, checklistshelp fundamentally change the mindset teams enter when they do their work. Rather than just getting complacent,and normalizing deviant behavior, they are reminded to be vigilant and disciplined.

    Thats because checklists are about more than just adhering to rules. As Atul writes, Just ticking boxes is not theultimate goal here. Embracing a culture of teamwork and discipline is. Checklists restrain our natural instinct tonormalize deviance. In the case of IT, its easy to get complacent, especially when everything appears to be runningjust fine.

    So if you dont want to see your companys name in the headlines of the next high-profile hack, checklists might bethe best tool in your belt.

    Heres What Goes Catastrophically Wrong When You Dont Follow Your IT ProcessWhy Deviance Becomes the Norm1. Hackers Steal Your Customers DataDeviant Behavior: Not Encrypting DataWhat You Can Do

    2. You Literally Lose Your DataDeviant Behavior: Relying Solely on Endpoint DevicesWhat You Can Do

    3. Your Server Crashes When You Need it MostDeviant Behavior: Neglecting Server MaintenanceWhat You Can Do

    Bottom Line: Battle the Culture of Complacency