(towards) a scalable monitoring system for large computing
TRANSCRIPT
(Towards) A Scalable Monitoring System for Large Computing Clusters
Moreno Marzolla Email [email protected]: http://www.dsi.unive.it/~marzolla
Dip. Informatica, Università di Venezia
Moreno Marzolla 2
Talk Outline" System Monitoring
� Definition� Tassonomy� Distributed Systems Monitoring
" Case Study: the BaBar Computing Farm� ASC: A prototype of an SNMP−Based
monitoring application" Conclusions
Moreno Marzolla 3
Monitoring" A monitor is a tool used to observe the
activities on a system� Collect performance statistics� Analyze the data� Display the results.
" Why?� Identify frequently used portions of a program;� Measure resource utilization to find performance
bottlenecks;� Characterize the Workload;� Find model parameters, validate models, or develop
inputs for a model.
Moreno Marzolla 4
Monitor Classification" Based on the level at which it is implemented
� Software� Hardware� Firmware� Hybrid
" Based on the triggering mechanism� Event−driven� Sampling based
" Based on the display ability� On−line� Batch
Moreno Marzolla 5
Hw versus Sw Monitors" Input Rate
� Much higher on Hw monitors; � Depends on CPU power for Sw ones.
" Time Resolution� Higher on Hw monitors;
" Overhead� None for Hw monitors. � Can be an issue for Sw monitors.
" Portability� Architecture and/or OS dependent?
Moreno Marzolla 6
Event−Driven vs Sampling Based" Event−Driven monitors collect data when triggered
by some condition: Pros: No polling required. Only "interesting"
notifications are sentO Cons: What happens when the triggering mechanism
crashes? " Sampling Based require periodic queries
: Pros: Can detect crashes of the monitored entity.
Sampling rate can be changed "outside" the monitored system.
O Cons: Can overload the medium used to transport the
requests
Moreno Marzolla 7
Distributed Systems Monitoring" To monitor a distributed system, the
monitor needs itself to be (at least partly) distributed.
Monitor
Host 1
Observer
Host 2
Observer
Host 3
Observer
Host 4
Observer
Moreno Marzolla 8
Layered View of DS monitors
Management
Console
Interpretation
Presentation
Analysis
Collection
ObservationGathers raw data on individual components of the system
Collects data from various observers
Statistical routines to summarize the characteristics of the data
User interface: provides reports, displays, alarms...
The user who interprets the results
Provides an interface to change the parameters and states
The entity deciding to change the parameters using the console
Moreno Marzolla 9
Issues of DS monitoring" Scalability
� Can it monitor 10 hosts? 100 hosts? 1000 hosts? ...
" Latency� Long delays make real−time, on−line monitoring
a challenge" Consistency
� Are you getting a consistent view of the system’s status, despite notification delays?
Moreno Marzolla 10
BaBar Case Study" BaBar is a High Energy Physics
experiment, studying matter−antimatter asimmetry.� http://www.slac.stanford.edu/BFROOT/
" Only a small fraction of events has real information (10−5−10−6)1Large amounts of data needed1Large computing facilities are
required
Moreno Marzolla 11
BaBar Farm @ INFN Padova" ≈150 2xPIII 1.26GHz
Machines, 1GB Ram, RH Linux 7.2
" Tape Library with a capacity of ~70TB not compressed;
" Network switch, UPSes, Environmental conditioning system, ...
Moreno Marzolla 12
Monitoring Requirements" Hardware Status
� Machine Crashes, disk crashes, CPU temperatures, disk/partitions overflows...
" Processes status" Environmental conditions
� Humidity, Temperature, UPS status..." System administrators should be notified as
soon as a problem occurs� Some automatic action should be taken when
possibile (eg, shut down the machines if overheated).
Moreno Marzolla 13
Other requirements" The monitoring system should also be:
� Scalable� Efficient (little resources requirement)
� Flexible and customizable� Easy to configure� Should be able to operate in batch mode (as a
regular UNIX dæmon, no GUI)� Should be able to observe different quantities with
different granularities" eg, CPU utilization sampled every 5 seconds, CPU
temperature sampled every minute, ...
Moreno Marzolla 14
Some existing monitoring tools" There are many of them:
� MRTG http://people.ee.ethz.ch/~oetiker/webtools/mrtg/� Ngop http://www−isd.fnal.gov/ngop/� Netsaint/Nagios http://www.nagios.org/� Ganglia http://ganglia.sourceforge.net/� RemStats
http://silverlock.dgim.crc.ca/remstats/release/index.html� Cricket http://cricket.sourceforge.net/� GxSNMP http://www.gxsnmp.org/� ... (add your own here)
Moreno Marzolla 15
So, what’s wrong?" We examined some publicly available
monitoring tools. Unfortunately, many of them were not suited to us:� Not scalable� Require their own dæmons running on the
monitored hosts" Can’t install a dæmon on a network switch, or on a
tape library
� Hard to configure" In many cases we gave up without trying the program.
� Poorly implemented " "...My God, it’s full of scripting languages..."
Moreno Marzolla 16
Architectural sketch
Observer Observer Observer Observer
Collector
Monitor
Collector
Observer
User Interface User Interface User Interface
Hardware Hardware Hardware Hardware Hardware
Moreno Marzolla 17
The Observer" The Observer must be able to collect
statistics on any networked equipment" We decided to use SNMP (Simple Network
Management Protocol)� It is a well−known protocol� It is implemented by virtually every vendor� It is reasonably simple yet powerful� A very good open source implementation is
available on Unix/Linux platforms http://net−snmp.sourceforge.net/
Moreno Marzolla 18
More on SNMP
Network Protocol
IP
UDP
SNMP
ManagementApplication
GetRequest
GetNextRequest
SetRequest
GetResponse
Trap
Network Protocol
IP
UDP
SNMP
SNMP Managed Objects
GetRequest
GetNextRequest
SetRequest
GetResponse
Trap
Managed Resources
SNMP Messages
LAN/WAN
W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, 3rd edition, p. 81
Moreno Marzolla 19
The Collector/Monitor" Is being developed" Features:
� Asynchronous (nonblocking) parallelized SNMP Polling;
� XML−based configuration file;� The RRDTool package is used to store data
and produce graphs" Old data have lower resolution than recent ones." Round Robin Databases have known maximum size." Graphing capabilities are provided by the library.
� Dynamic generation of HTML pages using XSLT stylesheets.
Moreno Marzolla 20
Architecture
<?xml version="1.0"
standalone="no"?>
<!DOCTYPE monitor
SYSTEM "monitor.dtd">
<monitor>
...
</monitor>
XML Configuration File
Status
ASC HTTPD
Host 1 Host 2 Host n
XSLT Stylesheet 1
XSLT Stylesheet 2
HTML Page 1
HTML Page 2
StylesheetsHTML Pages
Monitored Hosts
Moreno Marzolla 21
Example of XML Configuration File
<?xml version="1.0" standalone="no"?><!DOCTYPE monitor SYSTEM "monitor.dtd">
<monitor numconnections="20" asclogfile="/monitor/asc.log" httpdlogfile="/dev/null" rrddir="/monitor" htmldir="/monitor/html" ascverbosity="3" >
<host name="localhost"> <description>This machine</description> <miblist> <mib id="cpuUser" name=".1.3.6.1.4.1.2021.11.50.0" type="COUNTER"> <archives> <rra cf="AVERAGE" granularity="60" expire="604800"/> </archives> </mib> </miblist> <graphs> <!−− Graph definitions here −−> </graphs> </host>
</monitor>
Moreno Marzolla 22
Example of XML Status Dump<?xml version="1.0"?><hosts>
<host name="localhost" status="NR"><mibs>
<mib id="availSwap" lastUpdated="1018016033">1052248.000000</mib><mib id="totalSwap" lastUpdated="1018016033">1052248.000000</mib><mib id="totalMem" lastUpdated="1018016033">917080.000000</mib><mib id="cachedMem" lastUpdated="1018016033">7128.000000</mib><mib id="bufferMem" lastUpdated="1018016033">35052.000000</mib><mib id="sharedMem" lastUpdated="1018016033">0.000000</mib><mib id="freeMem" lastUpdated="1018016033">833800.000000</mib><mib id="cpuSystem" lastUpdated="1018016033">137587.000000</mib><mib id="cpuUser" lastUpdated="1018016033">13581.000000</mib><mib id="tempCpu2" lastUpdated="1018016033">24500.000000</mib><mib id="tempCpu1" lastUpdated="1018016033">25000.000000</mib><mib id="tempMB" lastUpdated="1018016033">33000.000000</mib>
</mibs><graphs>
<graph id="hourly.png" title="Hourly data"/></graphs><notifications>
<msg ts="1017933071.29405 17:11:11.029405" severity="CRITICAL"> Timeout
</msg></notifications></host>
</hosts>
Moreno Marzolla 23
Sample HTML Output
Moreno Marzolla 24
What happens if it does not scale?" "Hierarchize" ASC?
Host 1
SNMP Agent
Host 2
SNMP Agent
Host 3
SNMP Agent
Host 4
SNMP Agent
HTTPD
ASC
HTTPD
ASC
HTTPD
Proxy
Moreno Marzolla 25
Hierarchical Monitoring" A hierarchical configuration of SNMP
agents has already been shown to scale� Rajesh Subramanyan, José Miguel−Alonso and José A. B.
Fortes, "A Scalable SNMP−Based Distributed Monitoring System for Heterogeneous Network Computing", Proc. SC2000. Dallas, Texas, Nov. 2000.
" What about fault−tolerance?� "Well written applications do not crash" is right,
but machines running them crash anyway...� Well−known leader election techniques could
be applied� Redundant monitoring?
Moreno Marzolla 26
Redundancy for fault−tolerance" Additional collectors could be used to
ensure that at least K out of N of them are sufficient to cover the whole system
Host 1 Host 2 Host 3 Host 4 Host 5 Host 6 Host 7 Host 8
Collector 1 Collector 1
Collector 1
Host 9
Any 2 collectors monitor all the hosts
Moreno Marzolla 27
Conclusions" Monitoring a large computing cluster is a
highly nontrivial task." Many available monitoring tools exist, but
many of them are not adequate for large distributed systems.
" We are trying to build a general−purpose SNMP and XML−based monitoring tool.
" A prototype exists and is working well� ...but see next slide
Moreno Marzolla 28
Future work" Alarms are not currently implemented, but are
at the top position of the to−do list" ASC is running on a small (≈20 machines) test
farm, while the bigger farm is being installed.� ASC is expected to scale; however, serious
scalability experiments will be done as soon as all the machines are up and running.
" Could it be extended for monitoring Grids?
Moreno Marzolla 29
Bibliography" W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, third edition,
Addison−Wesley, 1999" R. Jain, The art of computer systems performance analysis: Techniques
for Experimental Design, Measurement, Simulation, and Modeling, John Wiley and Sons, 1991
" Rajesh Subramanyan, José Miguel−Alonso and José A. B. Fortes, A Scalable SNMP−Based Distributed Monitoring System for Heterogeneous Network Computing, Proc. SC2000. Dallas, Texas, Nov. 2000.
" Grid Performance Working Group, http://www−didc.lbl.gov/GridPerf/" M. Bearden and R. Bianchini Jr., Efficient and Fault−Tolerant Distributed
Host Monitoring using System−Level Diagnosis, Proceedings of the IFIP/IEEE International Conference on Distributed Platforms, Dresden, Germany, pp. 159−172, February 1996.
" Flaviu Cristian, Understanding fault−tolerant distributed systems, Comm. ACM 34(2), 1991