Here we discuss about some guidelines on how
to proceed if you find a server has unexpectedly powered off. The examples
offered may be specific to a particular command or firmware version, are
provided to illustrate a troubleshooting concept, and may not apply to all
servers. Always refer to the support documentation for your particular server
product to determine the correct equivalent command or procedure.
Various conditions can trigger a system shutdown,
including:
§ Temperature
of a component or ambient air is too high.
§ Multiple
cooling fan failures.
§ A
voltage fluctuation beyond the acceptable threshold.
§ Multiple
power supplies have failed or have been removed causing loss of power
redundancy.
§ External
(computer room) AC or DC power fails, or falls outside the range required by
the server power supplies to safely continue to run the system.
§ A
component hot-swap circuit has faulted.
The first thing to note, is that if the chassis has
no power, then the Service Processor (SP) will not function, as it operates
from standby / housekeeping voltage. If this is the case then a physical
examination of the server is required, as outlined below in the section “Verifying
cause of NO chassis power“.
If the SP is accessible, this means external power
is being delivered to at least one of the server power supplies, which in turn
are supplying standby voltage to the chassis.
Gathering possible reasons for the outage using
ipmitool
The ipmitool command can be used to
collect information about the possible reasons for the platform state, such as
voltage & temperature sensors, fault LEDs & indicators, and the
platform System Event Log (SEL).
Example - SEL entries
showing high temperature events that resulted in automatic system power-off:
281 | 11/01/2014 | 22:23:20 | Temperature dbp.t_amb
| Upper Critical going high | Reading 34 > Threshold 33 degrees C
282 | 11/01/2014| 22:42:07 | Temperature dbp.t_amb | Upper Non-recoverable going high | Reading 44 > Threshold 43 degrees C
283 | 11/01/2014| 22:42:46 | System ACPI Power State sys.acpi | S5/G2: soft-off | Asserted
282 | 11/01/2014| 22:42:07 | Temperature dbp.t_amb | Upper Non-recoverable going high | Reading 44 > Threshold 43 degrees C
283 | 11/01/2014| 22:42:46 | System ACPI Power State sys.acpi | S5/G2: soft-off | Asserted
Example - SEL entry
showing Chassis Intrusion switch was triggered when the
chassis cover was removed:
200 | 11/01/2014 | 10:35:36 | Physical Security sys.intsw | General Chassis intrusion | Asserted
200 | 11/01/2014 | 10:35:36 | Physical Security sys.intsw | General Chassis intrusion | Asserted
Example - SEL entries
showing the power button was used to power-off the system:
109 | 11/01/2014 | 19:01:26 | Button | Power Button pressed | Asserted
10a | 11/01/2014 | 19:01:29 | System ACPI Power State ACPI | S5/G2: soft-off | Asserted
109 | 11/01/2014 | 19:01:26 | Button | Power Button pressed | Asserted
10a | 11/01/2014 | 19:01:29 | System ACPI Power State ACPI | S5/G2: soft-off | Asserted
Gathering possible reasons for the outage using
Service Processor web GUI
Integrated Lights Out Manager (ILOM) and Embedded
Lights Out Manager (ELOM) based Service Processors provide an easy-to-use web
interface for managing the platform. Point your web browser to the Service
Processor IP address or resolving DNS hostname, and enter your login
credentials when prompted.
Once logged in, click the System Monitoring tab,
which reveals access to additional tabs. Click to drill down further:
§ Sensor
readings.
§ Event
logs.
§ Fault
and other Indicator LED states.
§ Power
Management & utilization.
Gathering possible reasons for the outage from the
Operating System
If the system can be powered up and OS booted OK
after an unexpected shutdown, check:
§ OS
messages and event logs: Was the shutdown graceful? Is there any indication of
the power button being pressed, temperature or other event recorded?
§ OS
fault manager (such as Solaris FMA) records?
§ Console
log: was anything relevant displayed on the system console at or near the time
of the shutdown?
Verifying cause of NO chassis power
§ Visually
inspect each power supply for the status of the AC Present, Power
OK, and Fault LEDs. If the Fault LED is
illuminated on any of the PSUs then further troubleshooting will be required.
§ If AC
Present is NOT illuminated, ensure the AC power cords are securely
plugged into the server and connected to working AC power outlet(s). Test using
known good power cables and power source. Engage a qualified electrician to
test voltage on the power cords.
No comments:
Post a Comment
Thank You:)