monitor and manage 9000+ servers 30+ azure hosted services 10 global data center facilities & 6...
TRANSCRIPT
How Microsoft Monitors Applications Using APM, Global Service Monitor, and Microsoft Visual Studio Web Testing Charlie SatterfieldSenior Program Manager
MDC-B317
Session Objectives And TakeawaysSession Objective(s): Review capabilities and use of:
Web Availability Monitors using Global Service Monitor (GSM)Visual Studio Web Tests using GSMApplication Performance Monitoring (APM)
Benefits of GSM web testing featuresBenefits of APM for real world web applications
AgendaAbout Monitoring and Management (M&M)How M&M thinks about monitoringSystem Center 2012 SP1 app monitoringApp monitoring in actionChallenges and changes in M&M implementation
About Monitoring and ManagementDatacenter Monitoring &
Management
Monitor and manage 9000+ servers30+ Azure Hosted Services10 global data center facilities & 6 domains110+ internet web sites & 6,900+ databases
Generate 3,000+ OpsMgr alerts and Service Manager incidents per day
Our Customers
Microsoft.comWindows Update / Microsoft UpdateThe Windows StoreMSDNTechnetWindows IntuneSystem Center AdvisorVisual Studio OnlineTofinoGSMAnd more….
ABOUT M&M – TEAM STRUCTURE• Tier 1
• 40 vendors split between Redmond and India 24 / 7• 15-minute SLA to resolve or escalate• 1 Full Time Manager
• Tier 2• 4 vendor service engineers 24 / 7• SLA varies by severity • 1 Full time Tier 2 Manager
• Tier 3 / 4• 4 Full time Service Engineers
• PM / Service Manager• 1 Full time PM/Architect• 1 Full time Service Manager
How M&M thinks about monitoringInside Out
App level monitors based on events and/or counters
Web WS DBMonitor 1
Monitor 2
Monitor 3
Monitor 1
Monitor 2
Monitor 3
Monitor 1
Monitor 2
Monitor 3
Custom MPs for Unique application events
HW, OS, and service component monitoring through retail MPs
How M&M thinks about monitoring
Web Application
IIS
SQL
Windows
Hardware Infra
Operations Manager2012
OpsMgr Agent
Inside OutApp level monitors based on events and/or
counters
Web WS DBMonitor 1
Monitor 2
Monitor 3
Monitor 1
Monitor 2
Monitor 3
Monitor 1
Monitor 2
Monitor 3
Custom MPs for Unique application events
HW, OS, and service component monitoring through retail MPs
How M&M thinks about monitoringOutside In
External probes / Synthetic Trans
HTTP Probes(SCOM)
Uses same tools as SynTran
Synthetic Transactions
(SCOM)
Test core user paths in
UI with Synthetic
Transactions
Web Service with Client UI
Web Service Only
S1S2S3S4
S1S2S3S4
Expose secured web
page that performs API
level tests and returns result
codes. Test for event codes with HTTP
Probes
How M&M thinks about monitoringOutside In
External probes / Synthetic Trans
HTTP Probes(SCOM)
Uses same tools as SynTran
Synthetic Transactions
(SCOM)
Test core user paths in
UI with Synthetic
Transactions
Web Service with Client UI
Web Service Only
S1S2S3S4
S1S2S3S4
Expose secured web
page that performs API
level tests and returns result
codes. Test for event codes with HTTP
Probes
Web Application
IIS
SQL
Windows
Hardware Infra
Operations Manager2012
OpsMgr Agent
3rd Party URL Monitor
Custom Dev URL Monitor
HTTP Probes
HTTP Probes
How M&M thinks about monitoringOutside In
External probes / Synthetic Trans
HTTP Probes(SCOM)
Uses same tools as SynTran
Synthetic Transactions
(SCOM)
Test core user paths in
UI with Synthetic
Transactions
Web Service with Client UI
Web Service Only
S1S2S3S4
S1S2S3S4
Expose secured web
page that performs API
level tests and returns result
codes. Test for event codes with HTTP
Probes
Web Application
IIS
SQL
Windows
Hardware Infra
Operations Manager2012
OpsMgr Agent
3rd Party URL Monitor
Custom Dev URL Monitor
HTTP Probes
HTTP Probes
System Center 2012 SP1 App MonitoringWeb Application Transaction MonitoringWeb Availability MonitoringVisual Studio Web TestsApplication Performance Monitoring
Web Availability Monitoring and GSMUsed by application owners/engineers to:
Measure web application availability and performancePinpoint resource failures for quick resolution
Application owners/engineers leverage several test types:
Test Type Executed by Alerting Availability Reporting
URL GSM A failure of this test indicates a problem with one or more networks, sites, or servers
Provides availability % seen by users of that region
VIP Internal Watcher Nodes(Management Pools)
A failure of this test indicates a problem with one or more servers belonging to this site
Provides availability % specific to this site.
DIP Internal Watcher Nodes(Management Pools)
A failure of this test indicates a problem with a specific server
Provides availability % specific to this server.
Demo: Web Availability Monitors and GSMCharlie Satterfield
Web Availability MonitoringBenefitsAbility to provision an application test with multiple URLs
Web Availability MonitoringBenefitsAbility to provision an application test with multiple URLsEasy to provision tests to internal and external (GSM) watcher nodes
Web Availability MonitoringBenefitsAbility to provision an application test with multiple URLsEasy to provision tests to internal and external (GSM) watcher nodesAutomatic health rollups
Web Availability MonitoringBenefitsAbility to provision an application test with multiple URLsEasy to provision tests to internal and external (GSM) watcher nodesAutomatic health rollupsAvailability and Performance dashboards
Web Availability MonitoringBenefitsAbility to provision an application test with multiple URLsEasy to provision tests to internal and external (GSM) watcher nodesAutomatic health rollupsAvailability and Performance dashboardsAvailability reporting
Web Availability MonitoringBenefitsAbility to provision an application test with multiple URLsEasy to provision tests to internal and external (GSM) watcher nodesAutomatic health rollupsAvailability and Performance dashboardsAvailability reporting It’s Free! Saving our customers as much as $600/month per base URL test as compared to 3rd Party
Web Availability MonitoringChallengesNo way to create or modify multiple web application test settings at onceNo way to view alerts corresponding to outages from OpsMgr reportsNo breakdown of performance data for page components
Visual Studio Web Tests and GSMUsed by Application owners/engineers to determine the health of end to end user scenarios
Search, validate expected resultsLogin, validate content, logoutAnd more…
ConsiderationsThe web test file size is less than 100 KB.Number of steps in the test cannot be more than 100.The test overall must execute faster than 30 seconds.There are no loop statements, plugins, or references to other tests.ThinkTime parameters in the test must be set to 0.Each subscription cannot have more than 3 tests per location, or 45 tests total.
Demo: Visual Studio Web Tests and GSM
Charlie Satterfield
Visual Studio Web Tests and GSMBenefitsAbility to record web application user actions
Visual Studio Web Tests and GSMBenefitsAbility to record web application user actionsTransactional and authentication capabilities
Visual Studio Web Tests and GSMBenefitsAbility to record web application user actionsTransactional and authentication capabilitiesAbility to add specific validation to determine success or failure of test
Visual Studio Web Tests and GSMBenefitsAbility to record web application user actionsTransactional and authentication capabilitiesAbility to add specific validation to determine success or failure of testValidation and performance collectionfor each test step
Visual Studio Web Tests and GSMBenefitsAbility to record web application user actionsTransactional and authentication capabilitiesAbility to add specific validation to determine success or failure of testValidation and performance collection for each test stepIt’s Free!
Visual Studio Web Tests and GSMChallengesNo way to run a Visual Studio web test from an internal OpsMgr watcher node
Microsoft.com and APM
About Microsoft.com What we run
www.Microsoft.comDownload.Microsoft.comProfile.Microsoft.comCareers.Microsoft.comPlus a bunch more….
By the numbers20K to 28K Web requests per second to WWW.Microsoft.com~1.6B Requests per day from 57M unique IP’s550K concurrent connections #9 Corporate web site on the web in terms of reach
About Microsoft.com By the numbers - WWW
24 WWW Front End Application Request Routing servers 64 WWW Backend ServersMultiple other clusters serving sites like /surface, /licensing/servicecenter, etc.SLA of Global 99.90% of platform availability as measured by GSM Objective of Global 99.80% for page delivery as measured by Keynote.
IIS Config WWWWindows Server 2012/IIS83100+ Web Applications26 Application Pools
Using APM in practiceWhat debugging in production used to look like:
Using APM in practiceDebugging Today
APM is now the primary tool for debugging on WWW With 3100+ applications we cannot target all of them all of the timeWe push out an APM MP for targeted apps and collect data Tight integration with our development team through TFS and APM data exchange
Using APM in practiceA real world example – Identify most problematic application(s)AppAdvisor Console – Summary Failure Analysis
Graphical view of event count over time for APM monitored applicationsAllows for quick identification of the most problematic applications over a time period
Using APM in practiceA real world example – Identify most problematic application(s)AppAdvisor Console – Summary Failure Analysis
Graphical view of event count over time for APM monitored applicationsAllows for quick identification of the most problematic applications over a time periodDrill in to get more information on top 5exception events
Using APM in practiceA real world example – Identify most problematic application(s)AppAdvisor Console – Summary Failure Analysis
Graphical view of event count over time for APM monitored applicationsAllows for quick identification of the most problematic applications over a time periodDrill in to get more information on top 5exception eventsDrill in further to get the exception call stack details
Using APM in practiceA real world example – Health of my application over timeAppAdvisor Console – Application Status
Quick review of app health over time
Using APM in practiceA real world example – Health of my application over timeAppAdvisor Console – Application Status
Quick review of app health over timeCompares current, previous, and average events and performance
Using APM in practiceA real world example – Health of my application over time
• AppAdvisor Console – Application Status• Quick review of app health over time• Compares current, previous,
and average events and performance• Highlights top 10 New Problems
Using APM in practiceA real world example – Health of my application over time
• AppAdvisor Console – Application Status• Quick review of app health over time• Compares current, previous,
and average events and performance• Highlights top 10 New Problems• Displays top 10 most frequent problems
with trending
Using APM in practiceA real world example – Health of my application over time
• AppAdvisor Console – Application Status• Quick review of app health over time• Compares current, previous,
and average events and performance• Highlights top 10 New Problems• Displays top 10 most frequent problems
with trending• Drill in to get exception call stack
details
Demo: Escalate APM exceptions
Charlie Satterfield
Using APM in practiceBenefitsAbility to quickly configure monitors using IIS application inventory
Using APM in practiceBenefitsAbility to quickly configure monitors using IIS application inventoryAbility to profile applications for exceptions or performance events without modifying code
Using APM in practiceBenefitsAbility to quickly configure monitors using IIS application inventoryAbility to profile applications for exceptions or performance events without modify codeNear Zero touch non-intrusive debugging and Integration with Intellitrace data
Using APM in practiceBenefitsAbility to quickly configure monitors using IIS application inventoryAbility to profile applications for exceptions or performance events without modify codeNear Zero touch non-intrusive debugging and Integration with Intellitrace data Statistical views and analysis of top failures and worst performance
Using APM in practiceBenefitsAbility to quickly configure monitors using IIS application inventoryAbility to profile applications for exceptions or performance events without modify codeNear Zero touch non-intrusive debugging and Integration with Intellitrace data Statistical views and analysis of Top Failure and worst performanceTight Integration with TFS for seamless bug fix integration between Operations and Development
Using APM in practiceChallengesHelps to have IT Operations resources with code analysis skills to read the APM dataIIS reset required to enable APM profilingSome apps not discovered with defaults – handlers that may not have .aspx files on disk require extra workDoes not provide insight into memory leak investigations
Challenges in M&M ImplementationSome application monitoring features noteasily leveraged in a large multi-tenantmanagement group.
Product constraints400 APM Agents / 700 APM monitored applicationsSingle GSM account per Management Group
Implementation constraintsRestricted Operations Manager Console access for security, performance, and reliability
Operations Manager 2012
Single Mgmt Group9 Business Units~7000 agents
Service Manager 2012
Ale
rts
Resolution to M&M ChallengesImplementation Changes
Provision new dedicated Operations Manager Mgmt Groups per logical business unitEach management group sized by business unit monitoring as many as 3000 agents
EnablesOperations Manager Console AccessGSM account per business unitAPM configuration by application ownerWeb testing via GSM by application owner
Service Manager 2012
Ale
rts
OpsMgr 2012 SP1Mgmt Group
1 Business Unit< 3000 agents
In Review: Session Objectives And TakeawaysSession Objective(s): Review capabilities and use of:
Web Availability Monitors using GSMVisual Studio Web Tests using GSMApplication Performance Monitoring (APM)
Benefits of GSM web testing featuresBenefits of APM for real world web applications
Track resourcesLearn more about Windows Server 2012 R2 Preview, download the datasheet and evaluation bits on http://aka.ms/WS2012R2Learn more about System Center 2012 R2 Preview, download the datasheet and evaluation bits on http://aka.ms/SC2012R2
Related contentBreakout Sessions (session codes and titles)DEV-B312 DevOps: Increasing Application Lifecycle Efficiencies with Microsoft Visual Studio and System Center MDC-H209 Microsoft System Center 2012: Application Performance MonitoringMDC-B208 Microsoft System Center 2012 SP1 – Operations Manager: Overview and What’s New
msdn
Resources for Developers
http://microsoft.com/msdn
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Resources for IT Professionals
http://microsoft.com/technet
Complete an evaluation on CommNet and enter to win!
Evaluate this session
Scan this QR code to evaluate this session and be automatically entered in a drawing to win a prize
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.