hpdc report domenico vicinanza cern it-gd-ops cern, july 12 th weekly ops section meeting

13
HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Upload: bruno-scott

Post on 04-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

HPDC Report

Domenico VicinanzaCERN IT-GD-OPS

CERN, July 12th

weekly OPS section meeting

Page 2: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

HPDC '07 in a nutshell

Held in Monterey (CA-USA), July 25-29 Four parallel workshops:

Grid Monitoring Workflows in Support of Large-Scale Science

(WORKS07) Joint EGEE and OSG Workshop on Data

Handling in Production Grids Challenges of Large Applications in

Distributed Environment (CLADE 2007) Three days conference http://www.isi.edu/hpdc2007/

Page 3: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Grid Monitoring WS

• Monitoring in Grids– Fabric monitoring

• Publishing on the Service Availability Information to the local fabric monitoring

• Nagios (integration with SAM)

– Monitoring from the VO/User perspect.• INCA (San Diego Supercomputing Center)

– http://inca.sdsc.edu/– Aiming to integrate (part of) their testing

infrastructure within SAM framework

Page 4: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

cont...

• Interoperability of the monitoring tools and OSG-LCG interoperability– Overview of the work done by the Grid

Service Monitoring Working Group– Service Availability Monitor as one of

the main components of the monitoring framework prototype for WLCG/EGEE infrastructure (SAM Team paper)

Page 5: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

cont...

• Other monitoring tools:– RGMA (as general framework for information

exchange on large scale distributed infrastructure)

– GridICE – gLite LB– Centralized logging systems

•Syslog-NG (OSG)•Splunk (Fermilab)

Page 6: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Syslog-NG

• New system logging utility used by OSG

• Can replace regular syslog daemon or can be used in parallel

• More powerful facilities for filtering, formatting, and redirecting log messages

• Open source license• Administered by Php-MySQL tool

Page 7: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Syslog-NG facilities

• Can filter log messages based on log level, system host, facility, ip address or regular expressions

• Can reformat and modify messages using template facilities

• Inputs can be files or sockets• Outputs can be other hosts, files, or

sockets

Page 8: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Splunk

• Commercial software used to archive and query log messages

• Web interface allows log messages to be categorized and correlated

• Messages can be queried and sorted based on categorization and other parameters

• Used at Fermilab as well for internal logging collection

Page 9: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Other topics: Open Grids

• Open Grids– BOINC (Berkeley Open Infrastructure for

Network Computing)• improvements in load-balancing• new check-pointing methods• reliability issues

• RIDGE (kind of BOINC improvement)

– observes the past behavior and estimates a reliability rating for worker nodes

Page 10: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Other topics: future improvements

• Provisioning models (modeling needs)– performance-cost optimization in grids– Genetic Algorithm formulation for

provisioning resources for an application

• Condor extensions (Data-driven workflow planning

• Scalable I/O virtualization– dynamically manage virtualized

components among multiple guest domains

Page 11: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Environment issues

• How HPDC is affecting the environment – warming– efforts to deliver energy– cooling system

• Role of the renewable sources of energies in the future of HPDC

• Solar energy to provide electric power to operate the computers and for cooling.

• Covering roofs with solar cells:– How much a house can compute?

Page 12: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Conclusions

• Well established SAM awareness• Defining a common monitoring exchange

format (Grid Monitoring WG) – started a growing network of monitoring tools

integration/interaction– interest including/feeding SAM results from/to

other tools (fabric mon)

• Importance of logs (and log analysis tool)• Strong need for an improved modeling of

– resources, needs, workflows

Page 13: HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting

Bibliography

• SAM Team paper (PDF), minutes and slides:– http://indico.cern.ch/

conferenceDisplay.py?confId=18405