1 1. 2 2 jeff hart m2 technology it situational awareness 2

21
1 1

Upload: paul-stocke

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

1 1

2 2

Jeff HartM2 Technology

IT Situational Awareness

2

3

Theoretical model of situation awareness, Dr. Mica Endsley, 1995

Situational Awareness3

Perception of Elements in Current Situation

Comprehension of Current Situation

Projection of Future Status

4

Baseline Discovery

Gathering Perception 4

Track

Usage

Inventory

Software

Inventory

Hardware

Identify

Devices

Discover

Environment

Recognize

Applications

5

Combining Top Down and Bottom Up Metrics

Perceiving Current Status5

N/W SYS

Mission

APPS

Organization

Other

Server1 Process DNSSwitch1

N/W SYS APPS Other

ProbeX

Probe1

Probe2

Probe3

Status

Mission

Organization

6

Beginning to define service models

Perceiving Context 6

7

Automated mapping of what you have and how it relates

Automating Application Mapping

7

Layer 2-7 of the OSI Model

Accounting CSO B2B Ordering Retail store

Accounts receivable_app

Accountspayable_app

Customer support_app

Order_app Shipping_app

Physical Data Center

Business Process Model

8

Building comprehension from perception

Foundation of Comprehension8

Run-time Service Model

HP CMS 3rd party CMDB

Integrated/Federated

Application Performance Management

Infrastructure Performance Management

9

Creating Context 9

Event mapping to CI‘s in the Run-Time service model

– Relationship of events to dynamically updated CIs

BSM Platform• Events and discovery / topology data are brought together• End-to-end visibility of infrastructure and alerts by

showing relationships of events to CIs and business services that are impacted

• Shows CIs in context

Event consolidation through OMi

Run-time Service Model

hosts auto-discovered CIs

Bringing Clarity to Complexity10

TBEC – Topology Based Event Correlation

Efficiency gains per advanced event causal correlation

Use case addressed by TBEC:1. Something goes wrong in your environment2. Monitoring reports multiple problems via events3. Usually just one of the events describes the

CAUSE of the problem4. Others are just SYMPTOMS5. Fix the CAUSE and also the SYMPTOMS go away

Cause

Cause and

SymptomSympto

m

11

Automating Correlation & Service Model Management

11

The “T“ in TBEC - rules based on topology– Adaptive correlation – support for dynamic environments without

addingadministrative burdenCurrent discovered Topology utilized to correlate related events

Related events analyzed to determine SYMPTOMS and CAUSE.

As new CIs and relationships are automatically discovered, the TBEC rules are automatically applied. Experts define the rules ONCE and do NOT have to go back and update when the infrastructure changes

Symptoms

Cause

12

Automatic prioritization of Events

12

Based on Business / Mission Context

Event Priority is calculated based on severity and business / mission impact.

Event that affects business service of criticality 4 gets higher priority than event that affects business service of criticality 2.

CI business impactis calculated based on Business Criticality of all affected business services, applications and business process CIs and eg. SLAs.

Business Criticality

Values: 0..50 = lowest5 = highest

13

Customized console 13

Optimize use of staff resources

– Optimized use of operations staff resources

Operator Perspectives: Operator can configure his own operator console with the information he needs for his daily tasks

Mash-up UI:Gallery allows user to compose new pages using provided components

Role based consoles

14

Projection of Future Status14

HP Service Health Analyzer (SHA)

1. Anticipate problems before the business is impacted and prevent downtime

2. Automatically correlate information from multiple domains

3. Reduce cost of handling events by proactively investigating anomalies

4. Self learning system

Predictive Analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

Proactive Risk Reduction by fusing IA and Ops

Operational Analytics

#2 ProvideContext

#3 Act Appropriately

Proactive Risk Reduction

SECURITYUser ProvisioningIdentity & Access MgmtDatabase EncryptionAnti-Virus, EndpointFirewall, Email Security

#1SEE

EVERYTHING

IT OPERATIONSUser ManagementApp Lifecycle MgmtInformation MgmtOperations MgmtNetwork Mgmt

#1SEE

EVERYTHING

15

16

Operational Analytics 16

A unified approach to solving IT Operations Management (ITOM) problems

Event Triage Log Management

Advanced AnalyticsAdvanced Correlation

Unknown ProblemsKnown Problems

Reacti

ve

Mon

itori

ng

Pro

acti

ve

Mon

itori

ng

Operations Analytics

17

Service Health Analyzer powered by RTSM

17

Run-time Service Model Comprehensive, automated and up-to-date

model for dynamic services

Infrastructure Performance Management

Application Performance Management

• End-User Experience• Transactions• App Diagnostics• Business metrics

• Server• Network• Virtualization• 3rd party

Service Health

Analyzer

18

Projection of Future Status18

Early morning: Metric performing within baseline

11:00am: Metric violates threshold

10:30am - SHA detects an anomaly and sends out an alert\event

SHA detected a problem and sent alert one full hour before the service failed

1

2

3

11:30am: Service is now unavailable…

4

19

Implementing the Ops Bridge19

Value to the OrganizationExpertsCan focus on mission / business initiatives

• Less time spent on day to day operations tasks

• Less time spent on administration• Are able to add incremental value to

operations more rapidly

OperationsAre more effective at day to day operations activities

• Continued control of OpEx• Higher efficiency – lower MTTR• Higher Service levels

Spend less time working on

day to day operations

Spend less time maintaining

operational solutions

Optimize time engaged in

evolving operational solutions

Maximize time spent on strategic activities

Reduce false alarms.

Work on causes and not

symptoms. Enable co-

operative cross-domain working

Focus on what matters to the

business Handle a higher

proportion of incidents without

escalation Fix issues more

rapidlyStreamline

incident management

activities

Tier 1 Operators

Experts

Workload Efficiency

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Questions?

Contact Information:

Jeff Hart, M2 Technology Enterprise Software Specialist

[email protected]

202-595-1917

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you

Contact Information:

Jeff Hart, M2 Technology Enterprise Software Specialist

[email protected]

202-595-1917