gail warren director, online operations microsoft its213

47

Upload: sherman-baldwin

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gail Warren Director, Online Operations Microsoft ITS213
Page 2: Gail Warren Director, Online Operations Microsoft ITS213

Critical Infrastructure and Operations for Delivering Secure, Enterprise-Class Software Services

Gail WarrenDirector, Online OperationsMicrosoftITS213

Page 3: Gail Warren Director, Online Operations Microsoft ITS213

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 4: Gail Warren Director, Online Operations Microsoft ITS213

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 5: Gail Warren Director, Online Operations Microsoft ITS213

Microsoft’s Significant Investment

Microsoft is making a huge investment in data center and network capacity

There are currently 13 global data centers that use70 megawatts of power. By the end of 2009, there will be 20 data centers that use 180 megawatts of power.

The data centers are massive in size, relatively the size of 9–10 football fields with significant network capacity (primary facilities all maintain at least OC-192 capacity)

Carrier-ClassData Centers

Page 6: Gail Warren Director, Online Operations Microsoft ITS213

Carrier-Class Data Centers

Features

Multiple Generators

Dualpower feeds

Batterybackup

Dual powerto each rack

Computer controlled cooling

1

1

2

2

3

4

5

3

4

5

Carrier-ClassData Centers

Page 7: Gail Warren Director, Online Operations Microsoft ITS213

Carrier-Class Data Centers Carrier-ClassData Centers

North America

Central andSouth America

Europe Asia

Africa

Australia

Current locations Future location

Page 8: Gail Warren Director, Online Operations Microsoft ITS213

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 9: Gail Warren Director, Online Operations Microsoft ITS213

Microsoft Online Thinks About Security from 3 Perspectives:1. Secure from the ground up

Carrier-class data centersMultiple layers of security protecting your dataSecure development life cycle

2. Secure in knowing your data will be there when you need itOperational best practicesComplete n+1 redundancyBest-of-breed hardware

3. Security through peace of mindAudited by a third partyInternal auditsDedicated service administration resources24x7 support any time you need helpFinancially backed service level agreements (SLAs)

World-classSecurity

Page 10: Gail Warren Director, Online Operations Microsoft ITS213

Service Security

It starts with the data center

World-classSecurity

Data Center within a Data Center

Motion sensors

24x7 secured access

Biometric controlled access systems

Video camera surveillance

Security breach alarms

Page 11: Gail Warren Director, Online Operations Microsoft ITS213

Service SecurityThen we add multiple layers of logical security…

World-classSecurity

Filtering RoutersFirewallsIntrusion DetectionSeparate Data NetworksPenetration testingScanning and monitoring

AVConfiguration/patch

Host Security (hardened operating system)Application-LevelCountermeasuresApplication AuthenticationAuthentication to Data

Data

Page 12: Gail Warren Director, Online Operations Microsoft ITS213

Service Security World-classSecurity

CyberTrust—Leading security certification providerCyberTrust provides both application and physical security validation

4 of the 5 largest banks use CyberTrustCertifies more than 95% of all information security software

What they found“…not discover a single device with any high-severity vulnerabilities … I can comfortably say in three years of conducting internal scans I have never seen an internal scan without any high-severity vulnerabilities” —CyberTrust

Page 13: Gail Warren Director, Online Operations Microsoft ITS213

Service Security World-classSecurity

Data hygiene supported by multi-layers antivirus and spam filteringHighly secure data accessfor users via HTTPS

Geo-redundant datacenters certified with SAS70 and ISO27001

Page 14: Gail Warren Director, Online Operations Microsoft ITS213

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 15: Gail Warren Director, Online Operations Microsoft ITS213

BPO Logical Architecture World-classArchitecture

1 Administration through defined set of tools promotes availability and security

234

Rich set of tools available to IT Profession to promote visibility into the service

Integrated with world-class services such as Live Meeting and Exchange Hosted Services

Significant investment in monitoring and management

HMC MPS/MPF

HMC namespaces

Providers

SharepointOCS

Exchange

Provisioning web ServiceBPO Admin portal

AD

Live Meeting

Syndicated services

BPO specific KB

Alert publishing

Ticket mgmt

Customer centric service health

Service Object model

Sign in service

Customer Premise

IT Generalist End User

Service applet SSOand client config

Service Administration

OLS Interface

Trials VL

Deployment and configuration

Service monitoring

Audit collection

Performance loggingand collection

Capacity mgmt

Patch mgmt

Backups

EHS

1

23

4

Page 16: Gail Warren Director, Online Operations Microsoft ITS213

BPO Physical Overview World-classArchitecture

All Services Protected by Microsoft® Forefront™

LocationService runs in isolation

A data center located within a data center

ServicesAdministration and user portals

E-mail

SharePoint®

Instant messaging (IM)

Web conferencing

AvailabilityEach service runs with complete n+1 redundancy within the data center

Multiple data copies to protect against data loss

Full service geo-replication for disaster recover

Page 17: Gail Warren Director, Online Operations Microsoft ITS213

BPO Capacity and Reliability World-classArchitecture

Capacity Management

Continuous capacity review

Buffer capacity for unexpected load

Capacity modeling implements capacity at least 3 months in advance of forecast

N+1 Redundancy Throughout

Network

Storage

Servers

Result: 99.9%+ reliabilityFinancially backed SLA

Page 18: Gail Warren Director, Online Operations Microsoft ITS213

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 19: Gail Warren Director, Online Operations Microsoft ITS213

BPO Logical Architecture Best-of-BreedHardware

Dual power supplies Dual network interfacesFull lights-out management capabilities

RAID 1 + 5Optimized for performance and availabilityDisk to disk to disk backup

Full failover capabilitiesN+1 throughout the network stack

Servers

Storage

Network

Page 20: Gail Warren Director, Online Operations Microsoft ITS213

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 21: Gail Warren Director, Online Operations Microsoft ITS213

Operational Best PracticesOperations practices based on Information Technology Infrastructure Library (ITIL) /Microsoft® Operations Framework (MOF)

Change managementIncident managementProblem management

Dedicated Service Operations Center (SOC) Focused on BPOExperts in online collaboration services

Dedicated service administration teamISO 27001 aligned operational procedures

OperationalBest Practices

Page 22: Gail Warren Director, Online Operations Microsoft ITS213

Monitoring

Significant investment in tools to ensure the service is there 24x7, and if there are problems, we know ASAPComplete monitoring suite

Microsoft® Systems Center Operations ManagerTransaction monitors around the world Holistic network monitoringSecurity monitoring

Custom built tools to provide further insightCustom Microsoft® Operations Manager (MOM) packsSynthetic transactions

OperationalBest Practices

Page 23: Gail Warren Director, Online Operations Microsoft ITS213

Incident Management

Issue discoveryMonitoringSyntxCustomer reported

Operations monitoring infrastructureIssue handling

Issue documentationIssue escalationService restoration

OperationalBest Practices

Page 24: Gail Warren Director, Online Operations Microsoft ITS213

Issue Discovery – Monitoring System Event monitoring with heavy tuning for what goes to the console, using a failure-mode approach

Review how the components could failBuild rules for each failure modeBuild knowledge for each failure mode to drive quicker resolutionsOne can never predict all failure modes, so a closed-loop system is a necessity. If we have an outage without a failure-mode alert, we treat it as a bug and drive it until we have a corresponding rule and TSG (Technical Support Guide) for that specific failure mode in place.

Heavy customizations on top of SCOM platforms. For example:Transactions added to SCOM specific to mailflow and administrative services

Currently ~20K unique rules for the service

OperationalBest Practices

Page 25: Gail Warren Director, Online Operations Microsoft ITS213

Issue Discovery – SyntxWhat are the capabilities of the service that end users consume?

E.g. search sharepoint, create a list, post a document, search for a document that was posted yesterday, etc

How do we emulate the consumption of those capabilities?Code that emulation = “synthetics”Run synthetics every X minutesAlert if the capability is not performing within specificationsExpose synthetic success/failure and performance data for trending

Monitor DIPs and VIPs from LANMonitor VIPs from internet

Ideally, two alerts for every issue: Synthetic alert telling us that the capability is impactedFailure mode alert telling us what happened

OperationalBest Practices

Page 26: Gail Warren Director, Online Operations Microsoft ITS213

Issue Discovery – CustomerDespite monitoring and syntx, customers do find and report errors to our Support organization

OperationalBest Practices

Page 27: Gail Warren Director, Online Operations Microsoft ITS213

Continuous ImprovementIf a service event is missed by monitoring a bug is opened and tracked for resolution

OperationalBest Practices

Page 28: Gail Warren Director, Online Operations Microsoft ITS213

Issue Discovery – Infrastructure

Geo-redundant Tier 1 team and SOC LeadsConsole, email, and phone monitored 24x7x365SOC Leads (Ops Managers) are also 24x7x365

Geo-redundant SCOM infrastructureAlerts to console

Geo-redundant synthetic monitoring infrastructure (separate from SCOM)

Synthetic alerts go to email currently We will integrate the alert stream into the console, but we will always want visibility outside of the console for resiliency

OperationalBest Practices

Page 29: Gail Warren Director, Online Operations Microsoft ITS213

Issue DocumentationIssues are logged into a tool called Product Studio (specific database is “Service Delivery Escalation” or SDE)

OperationalBest Practices

Page 30: Gail Warren Director, Online Operations Microsoft ITS213

Issue EscalationEmails are automatically triggered for all escalations entered in SDE

OperationalBest Practices

Page 31: Gail Warren Director, Online Operations Microsoft ITS213

Issue EscalationFor high-severity issues, pagers are triggered and phone bridges are spun up to work on immediate service restoration

OperationalBest Practices

Page 32: Gail Warren Director, Online Operations Microsoft ITS213

Issue EscalationEmails sent out every 30 minutes until Service is restoredLinked bugs opened in SDE for any follow-up work items

OperationalBest Practices

Page 33: Gail Warren Director, Online Operations Microsoft ITS213

Customer View

Provide customer with service stateMailSharePoint

Really Simple Syndication (RSS) feeds

OperationalBest Practices

Page 34: Gail Warren Director, Online Operations Microsoft ITS213

Customer ViewSample RSS feed

OperationalBest Practices

Page 35: Gail Warren Director, Online Operations Microsoft ITS213

Problem Management Processes

Present Microsoft Online Services Problem Management processes:

Issue-to-Problem escalation flowMinimize repeat occurrences (incidents & alerts)Build a better service (continuous improvement)

Present Microsoft Online Services Service Intelligence Processes:

What is SI?Sample ReportsHow is the data used to improve service health?

OperationalBest Practices

Page 36: Gail Warren Director, Online Operations Microsoft ITS213

Issue-to-Problem EscalationIssues are logged into a tool called Product Studio

OperationalBest Practices

Page 37: Gail Warren Director, Online Operations Microsoft ITS213

Issue-to-Problem Escalation Flow

Questions asked of each issue:Are there coding changes required?Are there configuration changes required?Are there infrastructure changes required?Are there operational changes required?Are there short-term preventative measures required while a longer-term solution is put in place?Was the issue caught by monitoring? Was the issue responded to correctly?

OperationalBest Practices

Page 38: Gail Warren Director, Online Operations Microsoft ITS213

Service Intelligence - DefinitionBusiness Intelligence vs. Service Intelligence

Let customers focus on their business while we focus on our service and resourcesBI pulls data from the SI platform

“Any metric from any datasource”

Availability, Incidents, Alerts, TTR, TTE

OperationalBest Practices

Page 39: Gail Warren Director, Online Operations Microsoft ITS213

Minimize Repeat OccurrencesLook for trendsTarget preventative actions

OperationalBest Practices

Page 40: Gail Warren Director, Online Operations Microsoft ITS213

Build a Better Service OperationalBest Practices

MOM Alert

Syntx Alert

Customer

Report

Bug in SDE

Operational Process Change

Code Change

Configuration Change

Infrastructure Change

+Bug

+Bug

+Bug

+Bug

Monitor &

Measure

Impact

Page 41: Gail Warren Director, Online Operations Microsoft ITS213

Agenda

BusinessProductivityOnline (BPO)

Carrier-classData Centers

World-classSecurity

World-classArchitecture

Best-of-BreedHardware

OperationalBest Practices

World-classSupport

Page 42: Gail Warren Director, Online Operations Microsoft ITS213

World-Class Support

Dedicated BPO Support organizationDeep service knowledge

Tightly aligned with operations and development organizations

Promotes faster resolution timesEnsures the voice of the customer is heard

24x7 Phone Support andElectronic SupportSupport requests can be entered directly into the Service PortalContinuously updated Knowledge Base articles

World-classSupport

Page 43: Gail Warren Director, Online Operations Microsoft ITS213

question & answer

Page 44: Gail Warren Director, Online Operations Microsoft ITS213

www.microsoft.com/teched

Sessions On-Demand & Community

http://microsoft.com/technet

Resources for IT Professionals

http://microsoft.com/msdn

Resources for Developers

www.microsoft.com/learning

Microsoft Certification & Training Resources

Resources

Page 45: Gail Warren Director, Online Operations Microsoft ITS213

Related ContentBreakout Sessions

• UNC203 - 11/09/2009 09:00-10:15 [Cyril Sultan]Implementing and Administering Microsoft Online Services

• OFS209 - 11/10/2009 17:00-18:15 [Kimmo Forss]SharePoint Online Overview

• SIA08-IS - 11/11/2009 10:45-12:00 [Mike Chan]Security Services in the Cloud

• UNC205 - 11/11/2009 17:30-18:45 [Cyril Sultan]Tips and Tricks for Planning, Deploying, and Troubleshooting the Office Live Meeting Service

• UNC310 - 11/12/2009 13:30-14:45 [David Anderson]Migrating Data, Co-Existence, and Directory Synchronization with Microsoft Online Services

• ITS213 - 11/12/2009 17:00-18:15 [Gail Warren]Critical Infrastructure and Operations for Delivering Secure, Enterprise-Class Software Services

Page 46: Gail Warren Director, Online Operations Microsoft ITS213

Complete an evaluation on CommNet and enter to win an Xbox 360 Elite!

Page 47: Gail Warren Director, Online Operations Microsoft ITS213

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,

IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.