Data center cost optimization using predictive analytics

US 10,152,394 B2
Filed: 09/27/2016
Issued: 12/11/2018
Est. Priority Date: 09/27/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method to manage environmental conditions of a data center comprising:

receiving, at a processor unit of a computer, sensor data from sensors monitoring environmental conditions at a data center, the data center housing operating hardware components that have not yet failed, and receiving reliability data of the hardware components; and

for each hardware component;

deriving, using an analytics model stored in a memory storage unit of the computer, an estimated time to failure of the hardware component, said analytics model being run on a processor unit and trained using machine learning, to correlate a component reliability using learned patterns of component failure, said received reliability data and said sensor data of monitored environmental conditions that the hardware component has been subject to at said data center;

determining, at the processor unit, whether said estimated time to failure of the hardware component exceeds an expected reference life criteria time t_expassociated with that component, andfor each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;

computing, using the processor unit, a respective time for incurring a lowest cost to replace or repair the component;

generating, using the processor unit, a candidate modification to one or more environmental conditions of said data center, wherein said candidate modification to said one or more environment conditions minimizes energy usage of operations at the data center and extends a life of the respective component while operating under said candidate modification to one or more environmental conditions at said data center to its respective lowest cost time to replace;

computing, using the processor unit, an energy cost impact of letting the respective component operate under said candidate modified environment condition at said data center; and

after generating a candidate modified environment condition associated with each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;

selecting, using the processor unit, an environmental condition modification from said generated candidate modified environment conditions, said selected environmental condition modification corresponding to a respective hardware component having a largest computed energy savings impact and running said analytics model on said processor to derive a new estimated time to failure of remaining hardware components having less than largest energy savings impact, said environmental condition modification selection ensuring that the new derived estimated time to failure of each remaining hardware component exceeds its respective said expected reference life criteria time if operating under the selected modification environment condition;

generating, using the processor unit, an output signal for use in modifying said data center environment according to said selected environment condition modification;

modifying said data center environment according to said selected environment condition modification, and scheduling a replacement of the hardware component corresponding to the selected environmental condition modification having the largest computed energy savings impact in the data center based on said computed time for incurring a lowest cost to replace or repair the component.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer program product for optimizing total cost of ownership (TCO) of a piece of IT equipment, e.g., a hard drive or server, using predictive analytics. The data center environment monitors and measures a number of environment variables, including temperature, Relative Humidity, and corrosion. For each piece of hardware, several pieces of data are assigned, including a criticality measure, an operational cost (function of environment), a static replacement cost, and a downtime cost (function of time). For each piece of hardware, if it has not yet failed, the system predicts a time-to-failure using the environment variables. If predicted time-to-failure exceeds an expected reference life criteria, real time TCO analytics is performed to minimize data center energy usage and/or maximize operational cost-efficiency.

Citations

12 Claims

1. A computer-implemented method to manage environmental conditions of a data center comprising:
- receiving, at a processor unit of a computer, sensor data from sensors monitoring environmental conditions at a data center, the data center housing operating hardware components that have not yet failed, and receiving reliability data of the hardware components; and
  
  for each hardware component;
  
  deriving, using an analytics model stored in a memory storage unit of the computer, an estimated time to failure of the hardware component, said analytics model being run on a processor unit and trained using machine learning, to correlate a component reliability using learned patterns of component failure, said received reliability data and said sensor data of monitored environmental conditions that the hardware component has been subject to at said data center;
  
  determining, at the processor unit, whether said estimated time to failure of the hardware component exceeds an expected reference life criteria time t_expassociated with that component, andfor each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;
  
  computing, using the processor unit, a respective time for incurring a lowest cost to replace or repair the component;
  
  generating, using the processor unit, a candidate modification to one or more environmental conditions of said data center, wherein said candidate modification to said one or more environment conditions minimizes energy usage of operations at the data center and extends a life of the respective component while operating under said candidate modification to one or more environmental conditions at said data center to its respective lowest cost time to replace;
  
  computing, using the processor unit, an energy cost impact of letting the respective component operate under said candidate modified environment condition at said data center; and
  
  after generating a candidate modified environment condition associated with each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;
  
  selecting, using the processor unit, an environmental condition modification from said generated candidate modified environment conditions, said selected environmental condition modification corresponding to a respective hardware component having a largest computed energy savings impact and running said analytics model on said processor to derive a new estimated time to failure of remaining hardware components having less than largest energy savings impact, said environmental condition modification selection ensuring that the new derived estimated time to failure of each remaining hardware component exceeds its respective said expected reference life criteria time if operating under the selected modification environment condition;
  
  generating, using the processor unit, an output signal for use in modifying said data center environment according to said selected environment condition modification;
  
  modifying said data center environment according to said selected environment condition modification, and scheduling a replacement of the hardware component corresponding to the selected environmental condition modification having the largest computed energy savings impact in the data center based on said computed time for incurring a lowest cost to replace or repair the component.
- View Dependent Claims (2, 3, 4)
- - 2. The method as claimed in claim 1, wherein for a hardware component that is a critical component required for continuous operations, said method further comprising:
    - determining a time said critical component is expected to fail;
      
      computing a data center environment modification to extend life of the critical component to be greater than a first predetermined time t* representing a time buffer between an expectation of failure and a replacement of the critical component; and
      
      modifying the environment at said data center to extend said life of the critical component.
  - 3. The method as claimed in claim 1, wherein said generating said candidate modification to one or more environmental conditions comprises:
    - computing a new expected life (life) for the component operating under said candidate modified environment condition of said data center environment; and
      
      determining whether said computed life is greater than said expected reference life criteria (t_exp), and if said computed life is not greater than t_exp, changing an environment condition to produce a new changed candidate data center environment and computing a new energy savings of moving to said new changed candidate data center environment, wherein said computing an energy cost savings, said computing a new expected life, said determining whether said computed life is greater than said t_expand said changing an environment condition are repeated until the computed expected life is greater than said t_exp.
  - 4. The method as claimed in claim 3, wherein when said expected life is greater than t_exp:
    - determining whether the new changed candidate data center environment will adjust the expected life to equal said t_mintime value; and
      
      if the new data center environment adjusts the expected life to a value of t_min, generating a recommendation to schedule a replacement part for said component at a time t_min;
      
      otherwise,if it is determined that the new data center environment does not adjust the expected life to a value of t_min;
      
      compute a new time t** representative of another time in which to schedule a replacement for the component, wherein t** time is earlier than said t_mintime value;
      
      computing a replacement cost penalty for replacing the component at said new time t**;
      
      determining whether said energy cost savings of moving from a current data center environment to said new data center environment exceeds said replacement cost penalty; and
      
      if said energy cost savings of moving from a current data center environment to said new data center environment exceeds said replacement cost penalty, then record the current environment and recommend scheduling a replacement part for said component at a time t**;
      
      otherwise, if said energy cost savings by moving from a current data center environment to said new data center environment does not exceed said replacement cost penalty;
      
      computing a new data center environment to adjust the expected life to equal a value of t_min; and
      
      recommending scheduling a replacement part for said hardware component at a time t_min.

5. A computer program product comprising:
- one or more computer readable storage media and program instructions stored on the one or more computer readable storage media, the program instructions, when executed by at least one computer, cause said at least one computer to perform a process for managing environmental conditions of a data center, the program instructions comprising instructions configuring a processor unit of said at least one computer to;
  
  receive sensor data from sensors monitoring environmental conditions at a data center, the data center housing operating hardware components that have not yet failed, and receive reliability data of the hardware components; and
  
  for each hardware component;
  
  derive, using an analytics model stored in the one or more computer readable storage media of the at least one computer, an estimated time to failure of the hardware component, said analytics model being run on a processor unit and trained using machine learning, to correlate a component reliability using learned patterns of component failure, said received reliability data and said sensor data of monitored environmental conditions that the hardware component has been subject to at said data center;
  
  determine whether said estimated time to failure of the hardware component exceeds an expected reference life criteria time t_expassociated with that component, andfor each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;
  
  compute a respective time for incurring a lowest cost to replace or repair the component;
  
  generate a candidate modification to one or more environmental conditions of said data center, wherein said candidate modification to said one or more environmental conditions minimizes energy usage of operations at the data center and extends a life of the respective component while operating under said modified candidate modification to one or more environmental conditions at said data center to its respective lowest cost time to replace;
  
  compute an energy cost impact of letting the respective component operate under said candidate modified environment condition at said data center; and
  
  after generating a candidate modified environment condition associated with each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;
  
  select an environmental condition modification from said generated candidate modified environment conditions, said selected environmental condition modification corresponding to a respective hardware component having a largest computed energy savings impact, and run said analytics model on said processor to derive a new estimated time to failure of remaining hardware components having less than largest energy savings impact, said environmental condition modification selection ensuring that the new derived estimated time to failure of each remaining hardware component exceeds its respective said expected reference life criteria time if operating under the selected modification environment condition;
  
  generate an output signal for use in modifying said data center environment according to said selected environment condition modification;
  
  modify said data center environment according to said selected environment condition modification, andschedule a replacement of the hardware component corresponding to the selected environmental condition modification having the largest computed energy savings impact in the data center based on said computed time for incurring a lowest cost to replace or repair the component.
- View Dependent Claims (6, 7, 8)
- - 6. The computer program product of claim 5, wherein for a hardware component that is a critical component required for continuous operations, said program instructions further comprise instructions configuring a processor unit of said at least one computer to:
    - determine a time said critical component is expected to fail;
      
      compute a data center environment modification to extend life of the critical component to be greater than a first predetermined time t* representing a time buffer between an expectation of failure and a replacement of the critical component; and
      
      modify the environment at said data center to extend said life of the critical component.
  - 7. The computer program product of claim 5, wherein to generate said candidate modification to one or more environmental conditions, said program instructions further comprise instructions configuring the processor unit of said at least one computer to:
    - compute a new expected life (life) for the component operating under said candidate modified environment condition of said data center environment; and
      
      determine whether said computed life is greater than said expected reference life criteria (t_exp), and if said computed life is not greater than t_exp, changing an environment condition to produce a new changed candidate data center environment and computing a new energy savings of moving to said new changed candidate data center environment, wherein said computing a new energy savings, said computing a new expected life, said determining whether said computed life is greater than said t_expand said changing an environment condition are repeated until the computed expected life is greater than said t_exp.
  - 8. The computer program product of claim 7, wherein when said expected life is greater than t_exp, said program instructions further comprise instructions configuring a processor unit of said at least one computer to:
    - determine whether the new changed candidate data center environment will adjust the expected life time to equal said t_mintime value; and
      
      if the new changed candidate data center environment adjusts the expected life to a value of t_min, generate a recommendation to schedule a replacement part for said component at a time t_min;
      
      otherwise,if it is determined that the new data center environment does not adjust the expected life to a value of t_min;
      
      compute a new time t** representative of another time in which to schedule a replacement for the component, wherein t** is earlier than t_min;
      
      compute a replacement cost penalty for replacing the component at said new time t**;
      
      determine whether said energy cost savings of moving from a current data center environment to said new data center environment exceeds said replacement cost penalty; and
      
      if said energy cost savings of moving from a current data center environment to said new data center environment exceeds said replacement cost penalty, then record the current environment and recommend scheduling a replacement part for said component at a time t**;
      
      otherwise, if said energy cost savings by moving from a current data center environment to said new data center environment does not exceed said replacement cost penalty;
      
      compute a new data center environment to adjust the expected life to equal a value of t_min; and
      
      recommend scheduling a replacement part for said hardware component at a time t_min.

9. A computer-implemented system for managing environmental conditions of a data center, the system comprising:
- a memory storage device storing program instructions;
  
  at least one hardware processor coupled to the memory storage device and running said program instructions to configure said at least one hardware processor to;
  
  receive sensor data from sensors monitoring environmental conditions at a data center, the data center housing operating hardware components that have not yet failed, and receive reliability data of the hardware components; and
  
  for each hardware component;
  
  derive, using an analytics model stored in the memory storage device, an estimated time to failure of the hardware component, said analytics model being run on a processor unit and trained using machine learning, to correlate a component reliability using learned patterns of component failure, said received reliability data and said sensor data of monitored environmental conditions at said data center;
  
  determine whether said estimated time to failure of the hardware component exceeds an expected reference life criteria time t_expassociated with the component, andfor each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;
  
  compute a respective time for incurring a lowest cost to replace or repair the component;
  
  generate a candidate modification to one or more environmental conditions of said data center, wherein said candidate modification to said one or more environmental conditions minimizes energy usage of operations at the data center, and extends a life of the respective component while operating under said candidate modification to one or more environmental conditions at said data center to its respective lowest cost time to replace;
  
  compute an energy cost impact to an entity of letting the respective component operate under said candidate modified environment condition at said data center; and
  
  after generating a candidate modified environment condition associated with each hardware component having a derived estimated time to failure that does not exceed the expected reference life criteria time t_expfor the respective component;
  
  select an environmental condition modification from said generated candidate modified environment conditions, said selected environmental condition modification corresponding to a respective hardware component having a largest computed energy savings impact and run said analytics model on said processor to derive a new estimated time to failure of remaining hardware components having less than largest energy savings impact, said environmental condition modification selection ensuring that the new derived estimated time to failure of each remaining hardware component exceeds its respective said expected reference life criteria time if operating under the selected modification environment condition;
  
  generate an output signal for use in modifying said data center environment according to said selected environment condition modification;
  
  modify said data center environment according to said selected environment condition modification, andschedule a replacement of the hardware component corresponding to the selected environmental condition modification having the largest computed energy savings impact in the data center based on said computed time for incurring a lowest cost to replace or repair the component.
- View Dependent Claims (10, 11, 12)
- - 10. The computer-implemented system of claim 9, wherein for a hardware component that is a critical component required for continuous operations, said program instructions further configure said at least one hardware processor to:
    - determine a time said critical component is expected to fail;
      
      compute a data center environment modification to extend life of the critical component to be greater than a first predetermined time t* representing a time buffer between an expectation of failure and a replacement of the critical component; and
      
      modify the environment at said data center to extend said life of the critical component.
  - 11. The computer-implemented system of claim 10, wherein to generate said candidate modification to one or more environmental conditions, said program instructions further configure said at least one hardware processor to:
    - compute a new expected life (life) for the component operating under said candidate modified environment condition of said data center environment; and
      
      determine whether said computed life is greater than said expected reference life criteria (t_exp), and if said computed life is not greater than t_exp, changing an environment condition to produce a new changed candidate data center environment and computing a new energy savings of moving to said new changed candidate data center environment, wherein said computing an energy cost impact, said computing a new expected life, said determining whether said computed life is greater than said t_expand said changing an environment condition are repeated until the computed expected life is greater than said t_exp.
  - 12. The computer-implemented system of claim 11, wherein when said expected life is greater than t_exp, said program instructions further configure said at least one hardware processor to:
    - determine whether the new changed candidate data center environment will adjust the expected life to equal said t_mintime value; and
      
      if the new data center environment adjusts the expected life to a value of t_min,generate a recommendation to schedule a replacement part for said hardware component at a time t_min;
      
      otherwise,if it is determined that the new data center environment does not adjust the expected life to a value of t_min;
      
      compute a new time t** representative of another time in which to schedule a replacement for the component, wherein t** is earlier than t_min;
      
      compute a replacement cost penalty for replacing the component at said new time t**;
      
      determine whether said energy cost savings of moving from a current data center environment to said new data center environment exceeds said replacement cost penalty; and
      
      if said energy cost savings of moving from a current data center environment to said new data center environment exceeds said replacement cost penalty, then record the current data center environment and recommend scheduling a replacement part for said component at a time t**;
      
      otherwise, if said energy cost savings by moving from a current data center environment to said new data center environment does not exceed said replacement cost penalty;
      
      compute a new data center environment to adjust the expected life to equal a value of t_min; and
      
      recommend scheduling a replacement part for said component at a time t_min.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Demetriou, Dustin W., Venkatesan, Vidhya Shankar
Primary Examiner(s)
Schell, Joseph O

Application Number

US15/277,731
Publication Number

US 20180089042A1
Time in Patent Office

805 Days
Field of Search
US Class Current
CPC Class Codes

G05B 23/0283   Predictive maintenance, e.g...

G06F 11/008   Reliability or availability...

G06F 11/20   using active fault-masking,...

G06F 11/3006   where the computing system ...

G06F 11/3058   Monitoring arrangements for...

G06F 11/3409   for performance assessment

G06N 20/00   Machine learning

G06N 5/022   Knowledge engineering; Know...

G06Q 10/0631   Resource planning, allocati...

G06Q 10/0635   Risk analysis of enterprise...

G06Q 10/10   Office automation; Time man...

G06Q 10/20   Administration of product r...

Y02D 10/00   Energy efficient computing,...

Data center cost optimization using predictive analytics

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Data center cost optimization using predictive analytics

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links