intelligent platform management interface (ipmi) monitoring and control ian collier ral tier1 fabric...
TRANSCRIPT
![Page 1: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/1.jpg)
Intelligent Platform Management Interface (IPMI) Monitoring and
Control
Ian CollierRAL Tier1 Fabric TeamJuly 2nd 2009 HEPSYSMAN
With apologies/thanks to Massimiliano Masi at CERN
![Page 2: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/2.jpg)
IPMI at RAL Tier1
• At RAL Tier1 we are just beginning rolling out significant use of IPMI
• In our new building we’re able to implement a separate management network for IPMI, APC PDUs etc
![Page 3: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/3.jpg)
What and Why
• Started in 1998, IPMI is now at revision 2.0
![Page 4: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/4.jpg)
What and Why
• Started in 1998, IPMI is now at revision 2.0• Is a standard accepted by DELL, IBM, SUN, INTEL
and many others including SuperMicro of course
![Page 5: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/5.jpg)
What and Why
• Started in 1998, IPMI is now at revision 2.0• Is a standard accepted by DELL, IBM, SUN, INTEL
and many others including SuperMicro of course• Goal 1: IPMI is a spec for monitoring and
controlling the machine via special hardware, the Baseboard Management Controller, BMC
![Page 6: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/6.jpg)
What and Why
• Started in 1998, IPMI is now at revision 2.0• Is a standard accepted by DELL, IBM, SUN, INTEL
and many others including SuperMicro of course• Goal 1: IPMI is a spec for monitoring and controlling
the machine via special hardware, the Baseboard Management Controller, BMC
• Goal 2: Serial Over Lan (SOL). This is a method to redirect serial connections over an ethernet cable.
• Many cards now also provide KVM over LAN – eliminating need for expensive network KVMs!
![Page 7: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/7.jpg)
What and Why?Major IPMI concepts:• Sensors (Fans speed, CPU Temperature,
voltage)• Events (What the BMC should do when the CPU
temperature reach 100 degrees? SNMP Traps)• SDR (Sensor data repository, where the data
are collected)• SEL (System Event Log, a log of all critical
situation)• Session (Between the client and the BMC)
![Page 8: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/8.jpg)
What and Why?SECURITY
• Can define users• Can define privileges• Can encrypt communication with BMCThe security depends on the version of the
specification
![Page 9: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/9.jpg)
What and Why?SECURITY
• Can define users• Can define privileges• Can encrypt communication with BMCThe security depends on the version of the
specification• Version 2.0: RMCP/RMCP+: based on RAKP
messages (HMAC like protocol)• Serial-Over-Lan is encrypted with RMCP+ only
![Page 10: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/10.jpg)
Manufacturers provide GUIs
![Page 11: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/11.jpg)
Open source tools
• OpenIPMI (ipmitool) • Lmsensors• Freeipmi (no drivers)
![Page 12: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/12.jpg)
ipmitool sensor local output
[root@lcg0954 ~]# ipmitool sensorCPU Temp 1 | 35.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 2 | 34.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 3 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 4 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 Sys Temp | 31.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU1 Vcore | 1.184 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 CPU2 Vcore | 1.192 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 3.3V | 3.264 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 5V | 4.920 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 12V | 11.712 | Volts | ok | 10.464 | 10.560 | 10.656 | 13.344 | 13.440 | 13.536 1.5V | 1.488 | Volts | ok | 1.296 | 1.312 | 1.328 | 1.664 | 1.680 | 1.696 5VSB | 4.896 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 VBAT | 3.280 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 Fan1 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan2 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan3 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan4 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan5 | 10400.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan6 | 8800.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan7 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Fan8 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Power Supply | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU0 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU1 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU Overheat | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip0 | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip1 | 0x0 | discrete | 0x0000| na | na | na | na | na | na
![Page 13: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/13.jpg)
ipmitool sensor remote output
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
sensor get'CPU1 Temp’
Password:
Locating sensor record...
Sensor ID : CPU1
Temp (0x0)Entity ID : 3.0
Sensor Type (Discrete): OEM reserved #c0
![Page 14: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/14.jpg)
ipmitool sensor remote output
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
sensor get'CPU1 Temp’
Password:
Locating sensor record...
Sensor ID : CPU1
Temp (0x0)Entity ID : 3.0
Sensor Type (Discrete): OEM reserved #c0
Note that the lanplus option encrypts communication including passwords
![Page 15: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/15.jpg)
ipmitool remote power control
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
power off
Password:
Chassis Power Control: Down/Off
![Page 16: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/16.jpg)
ipmitool remote power control
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
power off
Password:
Chassis Power Control: Down/Off
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
power status
Password:
Chassis Power is off
![Page 17: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/17.jpg)
ipmitool remote power control
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
power off
Password:
Chassis Power Control: Down/Off
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
power status
Password:
Chassis Power is off
# ipmitool -I lanplus -H 172.16.177.64 -U ADMIN \
power on
Password:
Chassis Power Control: Up/On
![Page 18: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/18.jpg)
ipmitool remote power control
= less visits to the machine room!
![Page 19: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/19.jpg)
ipmitool serial over lan
# ipmitool -I lanplus -H 172.16.176.143 –U \ ADMIN sol activate
Password:
[SOL Session operational. Use ~? for help]
Scientific Linux SL release 4.6 (Beryllium)
Kernel 2.6.9-78.0.22.ELsmp on an i686
gdss328.gridpp.rl.ac.uk login:
![Page 20: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/20.jpg)
ipmitool serial over lan
= even less visits to the machine room!
![Page 21: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/21.jpg)
ipmitool sensor local output
[root@lcg0954 ~]# ipmitool sensorCPU Temp 1 | 35.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 2 | 34.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 3 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 CPU Temp 4 | na | degrees C | na | na | na | na | 76.000 | 78.000 | 80.000 Sys Temp | 31.000 | degrees C | ok | na | na | na | 76.000 | 78.000 | 80.000 CPU1 Vcore | 1.184 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 CPU2 Vcore | 1.192 | Volts | ok | 0.680 | 0.688 | 0.696 | 1.624 | 1.632 | 1.640 3.3V | 3.264 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 5V | 4.920 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 12V | 11.712 | Volts | ok | 10.464 | 10.560 | 10.656 | 13.344 | 13.440 | 13.536 1.5V | 1.488 | Volts | ok | 1.296 | 1.312 | 1.328 | 1.664 | 1.680 | 1.696 5VSB | 4.896 | Volts | ok | 4.416 | 4.440 | 4.464 | 5.520 | 5.544 | 5.568 VBAT | 3.280 | Volts | ok | 2.912 | 2.928 | 2.944 | 3.648 | 3.664 | 3.680 Fan1 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan2 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan3 | 10500.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan4 | 8700.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan5 | 10400.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan6 | 8800.000 | RPM | ok | 200.000 | 300.000 | 400.000 | na | na | na Fan7 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Fan8 | 0.000 | RPM | nr | 200.000 | 300.000 | 400.000 | na | na | na Power Supply | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU0 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU1 Internal E | 0x0 | discrete | 0x0000| na | na | na | na | na | na CPU Overheat | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip0 | 0x0 | discrete | 0x0000| na | na | na | na | na | na Thermal Trip1 | 0x0 | discrete | 0x0000| na | na | na | na | na | na
![Page 22: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/22.jpg)
Gathering IPMI metrics in Ganglia
• Perl script runs ipmitool sensor and pulls out non null values• Metric labels vary with manufacturer and specific BMC• Test deployment at:• http://ganglia.gridpp.rl.ac.uk/ganglia/?
m=load_one&r=hour&s=descending&c=Workers_SL4&h=lcg0954.gridpp.rl.ac.uk
![Page 23: Intelligent Platform Management Interface (IPMI) Monitoring and Control Ian Collier RAL Tier1 Fabric Team July 2 nd 2009 HEPSYSMAN With apologies/thanks](https://reader035.vdocument.in/reader035/viewer/2022070306/5515f3db550346cf6f8b553f/html5/thumbnails/23.jpg)
Future
• Our new hardware has BMCs that support KVM over lan as well – with SuperMicro’s web interface
• The data gathered by Ganglia can be mined for very granular information about the conditions in the machine room – indicating airflow problems etc.
• Useful in diagnosing hardware problems after the event
• Configure snmp traps for alarms