hpdc report domenico vicinanza cern it-gd-ops cern, july 12 th weekly ops section meeting
TRANSCRIPT
HPDC Report
Domenico VicinanzaCERN IT-GD-OPS
CERN, July 12th
weekly OPS section meeting
HPDC '07 in a nutshell
Held in Monterey (CA-USA), July 25-29 Four parallel workshops:
Grid Monitoring Workflows in Support of Large-Scale Science
(WORKS07) Joint EGEE and OSG Workshop on Data
Handling in Production Grids Challenges of Large Applications in
Distributed Environment (CLADE 2007) Three days conference http://www.isi.edu/hpdc2007/
Grid Monitoring WS
• Monitoring in Grids– Fabric monitoring
• Publishing on the Service Availability Information to the local fabric monitoring
• Nagios (integration with SAM)
– Monitoring from the VO/User perspect.• INCA (San Diego Supercomputing Center)
– http://inca.sdsc.edu/– Aiming to integrate (part of) their testing
infrastructure within SAM framework
cont...
• Interoperability of the monitoring tools and OSG-LCG interoperability– Overview of the work done by the Grid
Service Monitoring Working Group– Service Availability Monitor as one of
the main components of the monitoring framework prototype for WLCG/EGEE infrastructure (SAM Team paper)
cont...
• Other monitoring tools:– RGMA (as general framework for information
exchange on large scale distributed infrastructure)
– GridICE – gLite LB– Centralized logging systems
•Syslog-NG (OSG)•Splunk (Fermilab)
Syslog-NG
• New system logging utility used by OSG
• Can replace regular syslog daemon or can be used in parallel
• More powerful facilities for filtering, formatting, and redirecting log messages
• Open source license• Administered by Php-MySQL tool
Syslog-NG facilities
• Can filter log messages based on log level, system host, facility, ip address or regular expressions
• Can reformat and modify messages using template facilities
• Inputs can be files or sockets• Outputs can be other hosts, files, or
sockets
Splunk
• Commercial software used to archive and query log messages
• Web interface allows log messages to be categorized and correlated
• Messages can be queried and sorted based on categorization and other parameters
• Used at Fermilab as well for internal logging collection
Other topics: Open Grids
• Open Grids– BOINC (Berkeley Open Infrastructure for
Network Computing)• improvements in load-balancing• new check-pointing methods• reliability issues
• RIDGE (kind of BOINC improvement)
– observes the past behavior and estimates a reliability rating for worker nodes
Other topics: future improvements
• Provisioning models (modeling needs)– performance-cost optimization in grids– Genetic Algorithm formulation for
provisioning resources for an application
• Condor extensions (Data-driven workflow planning
• Scalable I/O virtualization– dynamically manage virtualized
components among multiple guest domains
Environment issues
• How HPDC is affecting the environment – warming– efforts to deliver energy– cooling system
• Role of the renewable sources of energies in the future of HPDC
• Solar energy to provide electric power to operate the computers and for cooling.
• Covering roofs with solar cells:– How much a house can compute?
Conclusions
• Well established SAM awareness• Defining a common monitoring exchange
format (Grid Monitoring WG) – started a growing network of monitoring tools
integration/interaction– interest including/feeding SAM results from/to
other tools (fabric mon)
• Importance of logs (and log analysis tool)• Strong need for an improved modeling of
– resources, needs, workflows
Bibliography
• SAM Team paper (PDF), minutes and slides:– http://indico.cern.ch/
conferenceDisplay.py?confId=18405