data tagging architecture for system monitoring in dynamic environments bharat krishnamurthy,...

18
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research Division in India) 1 IEEE Network Operations and Management Symposium (NOMS), 2008.

Upload: sarah-wilkerson

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Data Tagging Architecture for System Monitoring in Dynamic Environments

Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh(IBM Research Division in India)

IEEE Network Operations and Management Symposium (NOMS), 2008.

2

Introduction• Why need Monitoring system:

▫Monitoring various parameters of the entire IT system in the data center is the key to efficient management of a data center environment.

▫Though each IT component is often packaged with open monitoring interfaces and tools e.g., a database server or a network element will

have published APIs for querying its performance metrics

▫A monitoring system is required to integrate and process the data collected from heterogeneous sources

3

Concept: Monitoring Objective

•Specification that defines how to collect a data stream, process it, and generate a type of event or an aggregated data stream.

•Also called Service Level Objectives or SLOs in this paper.

•“Service Level Agreement” (SLA) is viewed as a possible composition of multiple SLOs

4

Issue

•In dynamic environments, monitoring objectives cannot be frozen at setup.

•Monitoring objective need to be specified on a continuous basis as new applications or hardware are deployed.

•Operations personnel lack sufficient knowledge and skills to use or extend complex data modeling standards▫such as Common Information Modeling

(CIM) [10][10] DMTF – Common Information Model (CIM). http://www.dmtf.org/standards/cim/

5

Example• An operations team member wants to measure the

utilization of a disk.• Assume we have a function to get the current

utilization of a disk.• The Operation personnel should choose

“BaseMetricDefinition, SystemResource, ComputerSystem” class from higher-level CIM classes.▫Tell the monitoring system what data is

• Furthermore, Operation personnel have to integrate all monitoring parameter to a logical one.

6

Goal

•The goal is to balance the benefits (e.g. uniform interpretation) of well-defined taxonomies like CIM with the ease of use that free text descriptions offer.

•Model Driven Monitoring System (MDMS)▫A SLO authoring and monitoring system

7

Model Driven Monitoring System (MDMS)

•MDMS has two parts, ▫The structured part

Uses standard system configuration and monitoring specifications(CIM), to allow standard processing to be automatically performed on the monitoring data.

▫The unstructured part Allows any additional “information” not

represented by the structured part to be modeled statically or at runtime.

8

A simplified data model(1/2)

•Use tag to represent CIM model, and leave out the redundant attribute.

•For example▫A user wants to monitor the disk utilization

on server 1▫ ‘<BaseMetricDefinition.name=utilization>/<SystemResource.name=disk>/

<ComputerSystem.name=server1>’

▫The computerSystem may have another redundant attribute such as Dedicated, ResetCapability

9

A simplified data model(2/2)

•Use string matching algorithm to get relation between tags

•For example,▫A different user, to measure the throughput

of an application in server 1.▫ ‘<BaseMetricDefinition.name=throughput>/<application.name=printbill>/

<ComputerSystem.name=server1>’.

•By using fast string matching algorithms, we can get a relation with the example in previous page

10

Incorporating Unstructured Data Types•Since this system use tag as model, so we

just “concatenate” the unstructured data types

•In the above example, measuring disk utilization, the data collection logic can generate the pairs by collecting per partition disk utilization.▫ ‘<BaseMetricDefinition.name=utilization>/

<SystemResource.name=disk>/<ComputerSystem.name=server1>/<part.name=hda>’

• Assumption: analytics application writers are domain experts who can use existing unstructured text mining algorithms and tools to infer types from the free text descriptions and/or the tag values

11

System Architecture overview

12

Look into agent

13

Data Processing by Server

14

Overhead Experiments(1/3)

•Environments▫The MDMS Server is hosted on an IBM

p9113-550 with 16GB of RAM and 4 CPUs running at 1.6 GHz each.

▫At the time of the measurements close to 1500 SLOs were configured to monitor around 500 servers.

15

Overhead Experiments(2/3)•MDMS server overhead:

16

Overhead Experiments(3/3)• The agent measured was configured to handle 235 objectives. • The utilization measurements do not include the actual running of the

code that performs the data sensing and processing activity

17

Conclusion

•In this paper we described a monitoring system that uses a hybrid data model consisting of structured and unstructured parts to describe the semantics of monitoring data and events.

•The representation of data semantics is in terms of string tags.▫The authoring of specifications becomes

simpler and more intuitive for operations

18

Future work