Overview

    IPMI is a standardized interface for remote “lights-out” or “out-of-band” management of computer systems. It allows to monitor hardware status directly from the so-called “out-of-band” management cards, independently from the operating system or whether the machine is powered on at all.

    Zabbix IPMI monitoring works only for devices having IPMI support (HP iLO, DELL DRAC, IBM RSA, Sun SSP, etc).

    Since Zabbix 3.4, a new IPMI manager process has been added to schedule IPMI checks by IPMI pollers. Now a host is always polled by only one IPMI poller at a time, reducing the number of open connections to BMC controllers. With those changes it’s safe to increase the number of IPMI pollers without worrying about BMC controller overloading. The IPMI manager process is automatically started when at least one IPMI poller is started.

    See also known issues for IPMI checks.

    Configuration

    Host configuration

    A host must be configured to process IPMI checks. An IPMI interface must be added, with the respective IP and port numbers, and IPMI authentication parameters must be defined.

    See the for more details.

    Server configuration

    By default, the Zabbix server is not configured to start any IPMI pollers, thus any added IPMI items won’t work. To change this, open the Zabbix server configuration file () as root and look for the following line:

    Uncomment it and set poller count to, say, 3, so that it reads:

    Save the file and restart zabbix_server afterwards.

    Item configuration
    • Enter an item that is unique within the host (say, ipmi.fan.rpm)

    • For Host interface select the relevant IPMI interface (IP and port). Note that an IPMI interface must exist on the host.

    • Specify the IPMI sensor (for example ‘FAN MOD 1A RPM’ on Dell Poweredge) to retrieve the metric from. By default, the sensor ID should be specified. It is also possible to use prefixes before the value:

      • - to specify sensor ID;

      • name: - to specify sensor full name. This can be useful in situations when sensors can only be distinguished by specifying the full name.

    Supported checks

    The table below describes in-built items that are supported in IPMI agent checks.

    Timeout and session termination

    IPMI message timeouts and retry counts are defined in OpenIPMI library. Due to the current design of OpenIPMI, it is not possible to make these values configurable in Zabbix, neither on interface nor item level.

    Notes on IPMI discrete sensors

    To find sensors on a host start Zabbix server with DebugLevel=4 enabled. Wait a few minutes and find sensor discovery records in Zabbix server logfile:

    To decode IPMI sensor types and states, get a copy of IPMI 2.0 specifications at (At the time of writing the newest document was http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/second-gen-interface-spec-v2.pdf)

    The first parameter to start with is “reading_type”. Use “Table 42-1, Event/Reading Type Code Ranges” from the specifications to decode “reading_type” code. Most of the sensors in our example have “reading_type:0x1” which means “threshold” sensor. “Table 42-3, Sensor Type Codes” shows that “type:0x1” means temperature sensor, “type:0x2” - voltage sensor, “type:0x4” - Fan etc. Threshold sensors sometimes are called “analog” sensors as they measure continuous parameters like temperature, voltage, revolutions per minute.

    Another example - a sensor with “reading_type:0x3”. “Table 42-1, Event/Reading Type Code Ranges” says that reading type codes 02h-0Ch mean “Generic Discrete” sensor. Discrete sensors have up to 15 possible states (in other words - up to 15 meaningful bits). For example, for sensor ‘CATERR’ with “type:0x7” the “Table 42-3, Sensor Type Codes” shows that this type means “Processor” and the meaning of individual bits is: 00h (the least significant bit) - IERR, 01h - Thermal Trip etc.

    There are few sensors with “reading_type:0x6f” in our example. For these sensors the “Table 42-1, Event/Reading Type Code Ranges” advises to use “Table 42-3, Sensor Type Codes” for decoding meanings of bits. For example, sensor ‘Power Unit Stat’ has type “type:0x9” which means “Power Unit”. Offset 00h means “PowerOff/Power Down”. In other words if the least significant bit is 1, then server is powered off. To test this bit a function band with mask 1 can be used. The trigger expression could be like

    to warn about a server power off.

    Notes on discrete sensor names in OpenIPMI-2.0.16, 2.0.17, 2.0.18 and 2.0.19

    Names of discrete sensors in OpenIPMI-2.0.16, 2.0.17 and 2.0.18 often have an additional “” (or some other digit or letter) appended at the end. For example, while ipmitool and OpenIPMI-2.0.19 display sensor names as “” or “CATERR”, in OpenIPMI-2.0.16, 2.0.17 and 2.0.18 the names are “” or “CATERR0”, respectively.

    When configuring an IPMI item with Zabbix server using OpenIPMI-2.0.16, 2.0.17 and 2.0.18, use these names ending with “0” in the IPMI sensor field of IPMI agent items. When your Zabbix server is upgraded to a new Linux distribution, which uses OpenIPMI-2.0.19 (or later), items with these IPMI discrete sensors will become “NOT SUPPORTED”. You have to change their IPMI sensor names (remove the ‘0’ in the end) and wait for some time before they turn “Enabled” again.

    Notes on threshold and discrete sensor simultaneous availability

    Some IPMI agents provide both a threshold sensor and a discrete sensor under the same name. In Zabbix versions prior to 2.2.8 and 2.4.3, the first provided sensor was chosen. Since versions 2.2.8 and 2.4.3, preference is always given to the threshold sensor.

    Notes on connection termination