4 IPMI checks
Overview
IPMI is a standardized interface for remote “lights-out” or “out-of-band” management of computer systems. It allows to monitor hardware status directly from the so-called “out-of-band” management cards, independently from the operating system or whether the machine is powered on at all.
Zabbix IPMI monitoring works only for devices having IPMI support (HP iLO, DELL DRAC, IBM RSA, Sun SSP, etc).
Since Zabbix 3.4, a new IPMI manager process has been added to schedule IPMI checks by IPMI pollers. Now a host is always polled by only one IPMI poller at a time, reducing the number of open connections to BMC controllers. With those changes it’s safe to increase the number of IPMI pollers without worrying about BMC controller overloading. The IPMI manager process is automatically started when at least one IPMI poller is started.
See also for IPMI checks.
Configuration
Host configuration
A host must be configured to process IPMI checks. An IPMI interface must be added, with the respective IP and port numbers, and IPMI authentication parameters must be defined.
See the configuration of hosts for more details.
Server configuration
By default, the Zabbix server is not configured to start any IPMI pollers, thus any added IPMI items won’t work. To change this, open the Zabbix server configuration file (zabbix_server.conf) as root and look for the following line:
Save the file and restart zabbix_server afterwards.
Item configuration
When configuring an item on a host level:
- Enter an item that is unique within the host (say, ipmi.fan.rpm)
- For Host interface select the relevant IPMI interface (IP and port). Note that an IPMI interface must exist on the host.
- Specify the IPMI sensor (for example ‘FAN MOD 1A RPM’ on Dell Poweredge) to retrieve the metric from. By default, the sensor ID should be specified. It is also possible to use prefixes before the value:
- - to specify sensor ID;
name:
- to specify sensor full name. This can be useful in situations when sensors can only be distinguished by specifying the full name.
Supported checks
The table below describes in-built items that are supported in IPMI agent checks.
Timeout and session termination
IPMI message timeouts and retry counts are defined in OpenIPMI library. Due to the current design of OpenIPMI, it is not possible to make these values configurable in Zabbix, neither on interface nor item level.
IPMI session inactivity timeout for LAN is 60 +/-3 seconds. Currently it is not possible to implement periodic sending of Activate Session command with OpenIPMI. If there are no IPMI item checks from Zabbix to a particular BMC for more than the session timeout configured in BMC then the next IPMI check after the timeout expires will time out due to individual message timeouts, retries or receive error. After that a new session is opened and a full rescan of the BMC is initiated. If you want to avoid unnecessary rescans of the BMC it is advised to set the IPMI item polling interval below the IPMI session inactivity timeout configured in BMC.
Notes on IPMI discrete sensors
To find sensors on a host start Zabbix server with DebugLevel=4 enabled. Wait a few minutes and find sensor discovery records in Zabbix server logfile:
To decode IPMI sensor types and states, get a copy of IPMI 2.0 specifications at (At the time of writing the newest document was http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/second-gen-interface-spec-v2.pdf)
Another example - a sensor with “reading_type:0x3”. “Table 42-1, Event/Reading Type Code Ranges” says that reading type codes 02h-0Ch mean “Generic Discrete” sensor. Discrete sensors have up to 15 possible states (in other words - up to 15 meaningful bits). For example, for sensor ‘CATERR’ with “type:0x7” the “Table 42-3, Sensor Type Codes” shows that this type means “Processor” and the meaning of individual bits is: 00h (the least significant bit) - IERR, 01h - Thermal Trip etc.
There are few sensors with “reading_type:0x6f” in our example. For these sensors the “Table 42-1, Event/Reading Type Code Ranges” advises to use “Table 42-3, Sensor Type Codes” for decoding meanings of bits. For example, sensor ‘Power Unit Stat’ has type “type:0x9” which means “Power Unit”. Offset 00h means “PowerOff/Power Down”. In other words if the least significant bit is 1, then server is powered off. To test this bit, the bitand with mask ‘1’ can be used. The trigger expression could be like
to warn about a server power off.
Notes on discrete sensor names in OpenIPMI-2.0.16, 2.0.17, 2.0.18 and 2.0.19
Names of discrete sensors in OpenIPMI-2.0.16, 2.0.17 and 2.0.18 often have an additional ““ (or some other digit or letter) appended at the end. For example, while ipmitool
and OpenIPMI-2.0.19 display sensor names as ““ or “CATERR
“, in OpenIPMI-2.0.16, 2.0.17 and 2.0.18 the names are ““ or “CATERR0
“, respectively.
When configuring an IPMI item with Zabbix server using OpenIPMI-2.0.16, 2.0.17 and 2.0.18, use these names ending with “0” in the IPMI sensor field of IPMI agent items. When your Zabbix server is upgraded to a new Linux distribution, which uses OpenIPMI-2.0.19 (or later), items with these IPMI discrete sensors will become “NOT SUPPORTED”. You have to change their IPMI sensor names (remove the ‘0’ in the end) and wait for some time before they turn “Enabled” again.
Notes on threshold and discrete sensor simultaneous availability
Some IPMI agents provide both a threshold sensor and a discrete sensor under the same name. In Zabbix versions prior to 2.2.8 and 2.4.3, the first provided sensor was chosen. Since versions 2.2.8 and 2.4.3, preference is always given to the threshold sensor.
Notes on connection termination
If IPMI checks are not performed (by any reason: all host IPMI items disabled/notsupported, host disabled/deleted, host in maintenance etc.) the IPMI connection will be terminated from Zabbix server or proxy in 3 to 4 hours depending on the time when Zabbix server/proxy was started.