gail warren director, online services microsoft corporation session code: cos201

43
A Day in the Life: Running Hosted Services in the Cloud Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Upload: gwendolyn-sharp

Post on 02-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

A Day in the Life: Running Hosted Services in the CloudGail WarrenDirector, Online ServicesMicrosoft Corporation

SESSION CODE: COS201

Page 2: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 3: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 4: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Microsoft’s Significant Investment

Carrier-ClassData Centers

Microsoft is making a significant investment in building on-line compute capacity

Microsoft has more than 10 and less than 100 global data centers that range from 1 megawatt to 60+ megawatts of power

Some of the data centers are massive and relatively the size of 9–10 football fields and contain enough wire to wrap around the earth several times

Page 5: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Carrier-Class Data Centers Carrier-ClassData Centers

Features

Dual power feeds

Multiple generators

Battery backup

Dual power to each rack

Computer controlled cooling

1

2

3

4

5

21

3

4

5

Page 6: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 7: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Microsoft Online Thinks About Security from 3 Perspectives:

World-classSecurity

1. Secure from the ground upCarrier-class data centersNine layers of security protecting your dataSecure development life cycle

2. Secure in knowing your data will be there when you need itOperational best practicesComplete n+1 redundancyBest-of-breed hardware

3. Security through peace of mindAudited by third partiesInternal auditsDedicated SOC24x7 support any time you need helpFinancially backed service level agreements (SLAs)

Page 8: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Service Security

It starts with the data centerData Center within a Data Center

Motion sensors

24x7 secured access

Biometric controlled access systems

Video camera surveillance

Security breach alarms

World-classSecurity

Page 9: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Service SecurityThen we add multiple layers of logical security…

Filtering RoutersFirewallsIntrusion DetectionSeparate Data NetworksPenetration testingScanning and monitoring

AVConfiguration/patch

Host Security (hardened operating system)Application-LevelCountermeasuresApplication AuthenticationAuthentication to Data

World-classSecurity

Data

Page 10: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Service Security World-classSecurity

Data Centers are SAS70 and ISO27001 certifiedService is SAS70 certifiedService is ISO27001 certifiedFISMA targeted for 2010Customer’s own their data…our job is to protect it

Security

Risk ManagementPrivacy

Data

Page 11: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Service Security World-classSecurity

Data hygiene supported by multi-layers antivirus and spam filteringHighly secure data accessfor users via HTTPS

Geo-redundant datacenters certified with SAS70 and ISO27001

Page 12: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 13: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

BPO Capacity and Reliability

Capacity Management

Continuous capacity review

Buffer capacity for unexpected load

Capacity modeling implements capacity at least 3 months in advance of forecast

N+1 Redundancy Throughout

Network

Storage

Servers

Result: 99.9%+ reliabilityFinancially backed SLA

World-classArchitecture

Page 14: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 15: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

BPO Logical Architecture Best-of-BreedHardware

Dual power supplies Dual network interfacesFull lights-out management capabilities

RAID 1 + 5Optimized for performance and availabilityDisk to disk to disk backup

Full failover capabilitiesN+1 throughout the network stack

Servers

Storage

Network

Page 16: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 17: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Operational Best Practices OperationalBest Practices

Operations practices based on Information Technology Infrastructure Library (ITIL) /Microsoft® Operations Framework (MOF)

Change managementIncident managementProblem management

Dedicated Service Operations Center (SOC) Focused on BPOExperts in online collaboration services

Dedicated service administration teamISO 27001 certified operational procedures

Page 18: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Monitoring OperationalBest Practices

Significant investment in tools to ensure the service is there 24x7, and if there are problems, we know ASAPComplete monitoring suite

Microsoft® Systems Center Operations ManagerTransaction monitors around the world Holistic network monitoringSecurity monitoring

Custom built tools to provide further insightCustom Microsoft® Operations Manager (MOM) packsSynthetic transactions

Page 19: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Incident Management

Issue discoveryMonitoringSyntxCustomer reported

Operations monitoring infrastructureIssue handling

Issue documentationIssue escalationService restoration

OperationalBest Practices

Page 20: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue Discovery – Monitoring

System Event monitoring with heavy tuning for what goes to the console, using a failure-mode approach

Review how the components could failBuild rules for each failure modeBuild knowledge for each failure mode to drive quicker resolutionsOne can never predict all failure modes, so a closed-loop system is a necessity. If we have an outage without a failure-mode alert, we treat it as a bug and drive it until we have a corresponding rule and TSG (Technical Support Guide) for that specific failure mode in place.

Heavy customizations on top of SCOM platforms. For example:Transactions added to SCOM specific to mailflow and administrative services

Currently ~20K unique rules for the service

OperationalBest Practices

Page 21: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue Discovery – SyntxWhat are the capabilities of the service that end users consume?

E.g. search sharepoint, create a list, post a document, search for a document that was posted yesterday, etc

How do we emulate the consumption of those capabilities?Code that emulation = “synthetics”Run synthetics every X minutesAlert if the capability is not performing within specificationsExpose synthetic success/failure and performance data for trending

Monitor DIPs and VIPs from LANMonitor VIPs from internet

Ideally, two alerts for every issue: Synthetic alert telling us that the capability is impactedFailure mode alert telling us what happened

OperationalBest Practices

Page 22: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue Discovery – Customer

Despite monitoring and syntx, customers do find and report errors to our Support organization

OperationalBest Practices

Page 23: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Continuous Improvement

If a service event is missed by monitoring a bug is opened and tracked for resolution

OperationalBest Practices

Page 24: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue Discovery – Infrastructure

Geo-redundant Tier 1 team and SOC LeadsConsole, email, and phone monitored 24x7x365SOC Leads (Ops Managers) are also 24x7x365

Geo-redundant SCOM infrastructureAlerts to console

Geo-redundant synthetic monitoring infrastructure (separate from SCOM)

Synthetic alerts go to email currently We will integrate the alert stream into the console, but we will always want visibility outside of the console for resiliency

OperationalBest Practices

Page 25: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue DocumentationIssues are logged into a tool called Product Studio (specific database is “Service Delivery Escalation” or SDE)

OperationalBest Practices

Page 26: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue EscalationEmails to critical teams within Microsoft Online Services are automatically triggered for all escalations entered in SDE

OperationalBest Practices

Page 27: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue EscalationFor high-severity issues, pagers are triggered and phone bridges are spun up to work on immediate service restoration

OperationalBest Practices

Page 28: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue Escalation

Emails to critical teams within Microsoft Online Services are sent out every 30 minutes until Service is restoredLinked bugs are opened in SDE for any follow-up work items

OperationalBest Practices

Page 29: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Customer View

Sample RSS feed

OperationalBest Practices

Page 30: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Problem Management Processes

Present Microsoft Online Services Problem Management processes:Issue-to-Problem escalation flowMinimize repeat occurrences (incidents & alerts)Build a better service (continuous improvement)

Present Microsoft Online Services Service Intelligence Processes:What is SI?Sample ReportsHow is the data used to improve service health?

OperationalBest Practices

Page 31: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue-to-Problem Escalation

Issues are logged into a tool called Product Studio

OperationalBest Practices

Page 32: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Issue-to-Problem Escalation Flow

Questions asked of each issue:Are there coding changes required?Are there configuration changes required?Are there infrastructure changes required?Are there operational changes required?Are there short-term preventative measures required while a longer-term solution is put in place?Was the issue caught by monitoring? Was the issue responded to correctly?

OperationalBest Practices

Page 33: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Service Intelligence - Definition

Business Intelligence vs. Service Intelligence

Let customers focus on their business while we focus on our service and resourcesBI pulls data from the SI platform

“Any metric from any datasource”Availability, Incidents, Alerts, TTR, TTE

OperationalBest Practices

Page 34: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Minimize Repeat OccurrencesLook for trendsTarget preventative actions

OperationalBest Practices

Page 35: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Build a Better Service

MOM Alert

Syntx Alert

Customer

Report

Bug in SDE

Operational Process Change

Code Change

Configuration Change

Infrastructure Change

+Bug

+Bug

+Bug

+Bug

Monitor &

Measure

Impact

OperationalBest Practices

Page 36: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 37: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

World-Class Support

Dedicated BPO Support organizationDeep service knowledge

Tightly aligned with operations and development organizationsPromotes faster resolution timesEnsures the voice of the customer is heard

24x7 Phone Support andElectronic SupportSupport requests can be entered directly into the Service PortalContinuously updated Knowledge Base articles

World-classSupport

Page 38: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Track ResourcesRead more about Microsoft Online Services – www.microsoft.com/onlineSign up for a 30-Day Trial of the Business Productivity Online Suite:

https://mocp.microsoftonline.comUse Promo Code TENA2010

Continue the conversationMicrosoft Online Services Team Blog – http://blogs.technet.com/msonlineFacebook Fan Page – http://www.facebook.com/MicrosoftOnlineServices You Tube Channel – http://www.youtube.com/user/msonlineservices Twitter – http://twitter.com/msonline

Page 39: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

Page 40: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Complete an evaluation on CommNet and enter to win!

Page 41: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st

http://northamerica.msteched.com/registration

You can also register at the

North America 2011 kiosk located at registrationJoin us in Atlanta next year

Page 42: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 43: Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201

JUNE 7-10, 2010 | NEW ORLEANS, LA