Predictive failure of hardware components
First Claim
1. A computer-readable storage medium including instructions that upon execution cause a computer system to:
- receive, in a management controller, from a hardware component, a measured power usage of the hardware component, wherein the hardware component is an integrated circuit or a printed circuit board including multiple integrated circuits, and wherein the hardware component is within a server computer that includes the management controller;
receive from the hardware component, a measured junction temperature of the hardware component and an air mover speed;
calculate an expected power usage based on the junction temperature, and the air mover speed;
compare the expected power usage to the measured power usage; and
transmit a predictive failure indication if the measured power usage exceeds the expected power usage by a threshold amount so as to provide notification that the hardware component is failing.
1 Assignment
0 Petitions
Accused Products
Abstract
A system is described wherein power degradation can be used in conjunction with predictive failure analysis in order to accurately determine when a hardware component might fail. In one example, printed circuit boards (PCBs) can unexpectedly malfunction due to a variety of reasons including silicon power variation or air mover speed. Other hardware components can include silicon or an integrated circuit. In order to accurately monitor the hardware component, telemetry is used to automatically receive communications regarding measurements of data associated with the hardware component, such as power-related data or temperature data. The different temperature data can include junction temperature or ambient air temperature to determine an expected power usage. The actual power usage is then compared to the expected power usage to determine whether the hardware component can soon fail.
27 Citations
13 Claims
-
1. A computer-readable storage medium including instructions that upon execution cause a computer system to:
-
receive, in a management controller, from a hardware component, a measured power usage of the hardware component, wherein the hardware component is an integrated circuit or a printed circuit board including multiple integrated circuits, and wherein the hardware component is within a server computer that includes the management controller; receive from the hardware component, a measured junction temperature of the hardware component and an air mover speed; calculate an expected power usage based on the junction temperature, and the air mover speed; compare the expected power usage to the measured power usage; and transmit a predictive failure indication if the measured power usage exceeds the expected power usage by a threshold amount so as to provide notification that the hardware component is failing. - View Dependent Claims (2, 3, 4)
-
-
5. A method of predictively determining a hardware failure, comprising:
-
receiving a transmission in a server computer that includes data indicating a measured power usage of a hardware component and a transmission including an air mover speed in the server computer; calculating an expected power usage as a function of junction temperature of the hardware component, an ambient temperature upstream of the hardware component, and an air mover speed associated with an air mover used to cool the hardware component; comparing the measured power usage of the hardware component to the expected power usage; and transmitting a predictive failure notification upon determining that the measured power usage exceeds the expected power by a threshold amount so as to indicate that the hardware component is failing. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A system, comprising:
-
a hardware component having a power measurement sub-component built therein and a junction temperature sub-component built therein, the power measurement sub-component for determining power used by the hardware component, and the junction temperature sub-component for determining a junction temperature; a pump that moves fluid past the hardware component, the pump being an air mover or a liquid pump to cool the hardware component; and a management controller coupled to the hardware component, the management controller for calculating an expected power usage based on the junction temperature received from the hardware component and based on a measurement received from the pump, and for calculating a difference between the power determined by the power measurement component and the expected power usage, wherein the management controller is for transmitting a predictive failure notification when the difference exceeds a threshold so as to indicate that the hardware component is likely to fail and wherein the management controller is within a server computer and the hardware component is silicon within an integrated circuit. - View Dependent Claims (12, 13)
-
Specification