(towards) a scalable monitoring system for large computing

(Towards) A Scalable Monitoring System for Large Computing Clusters

Moreno Marzolla Email [email protected]: http://www.dsi.unive.it/~marzolla

Dip. Informatica, Università di Venezia

Moreno Marzolla 2

Talk Outline" System Monitoring

� Definition� Tassonomy� Distributed Systems Monitoring

" Case Study: the BaBar Computing Farm� ASC: A prototype of an SNMP−Based

monitoring application" Conclusions

Moreno Marzolla 3

Monitoring" A monitor is a tool used to observe the

activities on a system� Collect performance statistics� Analyze the data� Display the results.

" Why?� Identify frequently used portions of a program;� Measure resource utilization to find performance

bottlenecks;� Characterize the Workload;� Find model parameters, validate models, or develop

inputs for a model.

Moreno Marzolla 4

Monitor Classification" Based on the level at which it is implemented

� Software� Hardware� Firmware� Hybrid

" Based on the triggering mechanism� Event−driven� Sampling based

" Based on the display ability� On−line� Batch

Moreno Marzolla 5

Hw versus Sw Monitors" Input Rate

� Much higher on Hw monitors; � Depends on CPU power for Sw ones.

" Time Resolution� Higher on Hw monitors;

" Overhead� None for Hw monitors. � Can be an issue for Sw monitors.

" Portability� Architecture and/or OS dependent?

Moreno Marzolla 6

Event−Driven vs Sampling Based" Event−Driven monitors collect data when triggered

by some condition: Pros: No polling required. Only "interesting"

notifications are sentO Cons: What happens when the triggering mechanism

crashes? " Sampling Based require periodic queries

: Pros: Can detect crashes of the monitored entity.

Sampling rate can be changed "outside" the monitored system.

O Cons: Can overload the medium used to transport the

requests

Moreno Marzolla 7

Distributed Systems Monitoring" To monitor a distributed system, the

monitor needs itself to be (at least partly) distributed.

Monitor

Host 1

Observer

Host 2

Observer

Host 3

Observer

Host 4

Observer

Moreno Marzolla 8

Layered View of DS monitors

Management

Console

Interpretation

Presentation

Analysis

Collection

ObservationGathers raw data on individual components of the system

Collects data from various observers

Statistical routines to summarize the characteristics of the data

User interface: provides reports, displays, alarms...

The user who interprets the results

Provides an interface to change the parameters and states

The entity deciding to change the parameters using the console

Moreno Marzolla 9

Issues of DS monitoring" Scalability

� Can it monitor 10 hosts? 100 hosts? 1000 hosts? ...

" Latency� Long delays make real−time, on−line monitoring

a challenge" Consistency

� Are you getting a consistent view of the system’s status, despite notification delays?

Moreno Marzolla 10

BaBar Case Study" BaBar is a High Energy Physics

experiment, studying matter−antimatter asimmetry.� http://www.slac.stanford.edu/BFROOT/

" Only a small fraction of events has real information (10−5−10−6)1Large amounts of data needed1Large computing facilities are

required

Moreno Marzolla 11

BaBar Farm @ INFN Padova" ≈150 2xPIII 1.26GHz

Machines, 1GB Ram, RH Linux 7.2

" Tape Library with a capacity of ~70TB not compressed;

" Network switch, UPSes, Environmental conditioning system, ...

Moreno Marzolla 12

Monitoring Requirements" Hardware Status

� Machine Crashes, disk crashes, CPU temperatures, disk/partitions overflows...

" Processes status" Environmental conditions

� Humidity, Temperature, UPS status..." System administrators should be notified as

soon as a problem occurs� Some automatic action should be taken when

possibile (eg, shut down the machines if overheated).

Moreno Marzolla 13

Other requirements" The monitoring system should also be:

� Scalable� Efficient (little resources requirement)

� Flexible and customizable� Easy to configure� Should be able to operate in batch mode (as a

regular UNIX dæmon, no GUI)� Should be able to observe different quantities with

different granularities" eg, CPU utilization sampled every 5 seconds, CPU

temperature sampled every minute, ...

Moreno Marzolla 14

Some existing monitoring tools" There are many of them:

� MRTG http://people.ee.ethz.ch/~oetiker/webtools/mrtg/� Ngop http://www−isd.fnal.gov/ngop/� Netsaint/Nagios http://www.nagios.org/� Ganglia http://ganglia.sourceforge.net/� RemStats

http://silverlock.dgim.crc.ca/remstats/release/index.html� Cricket http://cricket.sourceforge.net/� GxSNMP http://www.gxsnmp.org/� ... (add your own here)

Moreno Marzolla 15

So, what’s wrong?" We examined some publicly available

monitoring tools. Unfortunately, many of them were not suited to us:� Not scalable� Require their own dæmons running on the

monitored hosts" Can’t install a dæmon on a network switch, or on a

tape library

� Hard to configure" In many cases we gave up without trying the program.

� Poorly implemented " "...My God, it’s full of scripting languages..."

Moreno Marzolla 16

Architectural sketch

Observer Observer Observer Observer

Collector

Monitor

Collector

Observer

User Interface User Interface User Interface

Hardware Hardware Hardware Hardware Hardware

Moreno Marzolla 17

The Observer" The Observer must be able to collect

statistics on any networked equipment" We decided to use SNMP (Simple Network

Management Protocol)� It is a well−known protocol� It is implemented by virtually every vendor� It is reasonably simple yet powerful� A very good open source implementation is

available on Unix/Linux platforms http://net−snmp.sourceforge.net/

Moreno Marzolla 18

More on SNMP

Network Protocol

IP

UDP

SNMP

ManagementApplication

GetRequest

GetNextRequest

SetRequest

GetResponse

Trap

Network Protocol

IP

UDP

SNMP

SNMP Managed Objects

GetRequest

GetNextRequest

SetRequest

GetResponse

Trap

Managed Resources

SNMP Messages

LAN/WAN

W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, 3rd edition, p. 81

Moreno Marzolla 19

The Collector/Monitor" Is being developed" Features:

� Asynchronous (nonblocking) parallelized SNMP Polling;

� XML−based configuration file;� The RRDTool package is used to store data

and produce graphs" Old data have lower resolution than recent ones." Round Robin Databases have known maximum size." Graphing capabilities are provided by the library.

� Dynamic generation of HTML pages using XSLT stylesheets.

Moreno Marzolla 20

Architecture

<?xml version="1.0"

standalone="no"?>

<!DOCTYPE monitor

SYSTEM "monitor.dtd">

<monitor>

...

</monitor>

XML Configuration File

Status

ASC HTTPD

Host 1 Host 2 Host n

XSLT Stylesheet 1

XSLT Stylesheet 2

HTML Page 1

HTML Page 2

StylesheetsHTML Pages

Monitored Hosts

Moreno Marzolla 21

Example of XML Configuration File

<?xml version="1.0" standalone="no"?><!DOCTYPE monitor SYSTEM "monitor.dtd">

<monitor numconnections="20" asclogfile="/monitor/asc.log" httpdlogfile="/dev/null" rrddir="/monitor" htmldir="/monitor/html" ascverbosity="3" >

<host name="localhost"> <description>This machine</description> <miblist> <mib id="cpuUser" name=".1.3.6.1.4.1.2021.11.50.0" type="COUNTER"> <archives> <rra cf="AVERAGE" granularity="60" expire="604800"/> </archives> </mib> </miblist> <graphs> <!−− Graph definitions here −−> </graphs> </host>

</monitor>

Moreno Marzolla 22

Example of XML Status Dump<?xml version="1.0"?><hosts>

<host name="localhost" status="NR"><mibs>

<mib id="availSwap" lastUpdated="1018016033">1052248.000000</mib><mib id="totalSwap" lastUpdated="1018016033">1052248.000000</mib><mib id="totalMem" lastUpdated="1018016033">917080.000000</mib><mib id="cachedMem" lastUpdated="1018016033">7128.000000</mib><mib id="bufferMem" lastUpdated="1018016033">35052.000000</mib><mib id="sharedMem" lastUpdated="1018016033">0.000000</mib><mib id="freeMem" lastUpdated="1018016033">833800.000000</mib><mib id="cpuSystem" lastUpdated="1018016033">137587.000000</mib><mib id="cpuUser" lastUpdated="1018016033">13581.000000</mib><mib id="tempCpu2" lastUpdated="1018016033">24500.000000</mib><mib id="tempCpu1" lastUpdated="1018016033">25000.000000</mib><mib id="tempMB" lastUpdated="1018016033">33000.000000</mib>

</mibs><graphs>

<graph id="hourly.png" title="Hourly data"/></graphs><notifications>

<msg ts="1017933071.29405 17:11:11.029405" severity="CRITICAL"> Timeout

</msg></notifications></host>

</hosts>

Moreno Marzolla 23

Sample HTML Output

Moreno Marzolla 24

What happens if it does not scale?" "Hierarchize" ASC?

Host 1

SNMP Agent

Host 2

SNMP Agent

Host 3

SNMP Agent

Host 4

SNMP Agent

HTTPD

ASC

HTTPD

ASC

HTTPD

Proxy

Moreno Marzolla 25

Hierarchical Monitoring" A hierarchical configuration of SNMP

agents has already been shown to scale� Rajesh Subramanyan, José Miguel−Alonso and José A. B.

Fortes, "A Scalable SNMP−Based Distributed Monitoring System for Heterogeneous Network Computing", Proc. SC2000. Dallas, Texas, Nov. 2000.

" What about fault−tolerance?� "Well written applications do not crash" is right,

but machines running them crash anyway...� Well−known leader election techniques could

be applied� Redundant monitoring?

Moreno Marzolla 26

Redundancy for fault−tolerance" Additional collectors could be used to

ensure that at least K out of N of them are sufficient to cover the whole system

Host 1 Host 2 Host 3 Host 4 Host 5 Host 6 Host 7 Host 8

Collector 1 Collector 1

Collector 1

Host 9

Any 2 collectors monitor all the hosts

Moreno Marzolla 27

Conclusions" Monitoring a large computing cluster is a

highly nontrivial task." Many available monitoring tools exist, but

many of them are not adequate for large distributed systems.

" We are trying to build a general−purpose SNMP and XML−based monitoring tool.

" A prototype exists and is working well� ...but see next slide

Moreno Marzolla 28

Future work" Alarms are not currently implemented, but are

at the top position of the to−do list" ASC is running on a small (≈20 machines) test

farm, while the bigger farm is being installed.� ASC is expected to scale; however, serious

scalability experiments will be done as soon as all the machines are up and running.

" Could it be extended for monitoring Grids?

Moreno Marzolla 29

Bibliography" W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, third edition,

Addison−Wesley, 1999" R. Jain, The art of computer systems performance analysis: Techniques

for Experimental Design, Measurement, Simulation, and Modeling, John Wiley and Sons, 1991

" Rajesh Subramanyan, José Miguel−Alonso and José A. B. Fortes, A Scalable SNMP−Based Distributed Monitoring System for Heterogeneous Network Computing, Proc. SC2000. Dallas, Texas, Nov. 2000.

" Grid Performance Working Group, http://www−didc.lbl.gov/GridPerf/" M. Bearden and R. Bianchini Jr., Efficient and Fault−Tolerant Distributed

Host Monitoring using System−Level Diagnosis, Proceedings of the IFIP/IEEE International Conference on Distributed Platforms, Dresden, Germany, pp. 159−172, February 1996.

" Flaviu Cristian, Understanding fault−tolerant distributed systems, Comm. ACM 34(2), 1991

(towards) a scalable monitoring system for large computing

Documents