DATACENTER LEVEL UTILIZATION PREDICTION WITHOUT OPERATING SYSTEM INVOLVEMENT

US 20200134423A1
Filed: 10/29/2018
Published: 04/30/2020
Est. Priority Date: 10/29/2018
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

generating first one or more predictions of hardware utilization at a first hardware level of a plurality of hardware levels in a system of networked computing devices using a first trained machine learning model;

using training data, training a second machine learning model to predict hardware utilization at a second hardware level of the plurality of hardware levels given hardware utilization features recorded in the training data to produce a second trained machine learning model;

wherein the training data comprises first hardware utilization data for one or more hardware levels of the plurality of hardware levels collected during a first time period and the first one or more predictions of hardware utilization generated using the first trained machine learning model;

generating second one or more predictions of hardware utilization at the first hardware level using the first trained machine learning model;

based, at least in part, on second hardware utilization data for the one or more hardware levels collected during a second time period subsequent to the first time period, and the second one or more predictions of hardware utilization at the first hardware level, generating a prediction of hardware utilization at the second hardware level using the second trained machine learning model;

wherein the method is performed by one or more computing devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments use a hierarchy of machine learning models to predict datacenter behavior at multiple hardware levels of a datacenter without accessing operating system generated hardware utilization information. The accuracy of higher-level models in the hierarchy of models is increased by including, as input to the higher-level models, hardware utilization predictions from lower-level models. The hierarchy of models includes: server utilization models and workload/OS prediction models that produce predictions at a server device-level of a datacenter; and also top-of-rack switch models and backbone switch models that produce predictions at higher levels of the datacenter. These models receive, as input, hardware utilization information from non-OS sources. Based on datacenter-level network utilization predictions from the hierarchy of models, the datacenter automatically configures its hardware to avoid any predicted over-utilization of hardware in the datacenter. Also, the predictions from the hierarchy of models can be used to detect anomalies of datacenter hardware behavior.

21 Citations

22 Claims

1. A method, comprising:
- generating first one or more predictions of hardware utilization at a first hardware level of a plurality of hardware levels in a system of networked computing devices using a first trained machine learning model;
  
  using training data, training a second machine learning model to predict hardware utilization at a second hardware level of the plurality of hardware levels given hardware utilization features recorded in the training data to produce a second trained machine learning model;
  
  wherein the training data comprises first hardware utilization data for one or more hardware levels of the plurality of hardware levels collected during a first time period and the first one or more predictions of hardware utilization generated using the first trained machine learning model;
  
  generating second one or more predictions of hardware utilization at the first hardware level using the first trained machine learning model;
  
  based, at least in part, on second hardware utilization data for the one or more hardware levels collected during a second time period subsequent to the first time period, and the second one or more predictions of hardware utilization at the first hardware level, generating a prediction of hardware utilization at the second hardware level using the second trained machine learning model;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein:
    - the first hardware level comprises a particular computing device of the system of networked computing device;
      
      the second one or more predictions predict utilization of the particular computing device; and
      
      the first trained machine learning model generates the second one or more predictions based, at least in part, on input features generated from sensor data collected from an out-of-band subsystem of the particular computing device.
  - 3. The method of claim 2, wherein:
    - the first and second hardware levels are the same level;
      
      the second hardware level comprises the particular computing device of the system of networked computing device;
      
      the first and second hardware utilization data for the one or more hardware levels comprises one or more of;
      
      network counters, or network flow data; and
      
      the prediction of hardware utilization at the second hardware level predicts one or more of;
      
      an operating system running on the particular computing device, or a type of workload running on the particular computing device.
  - 4. The method of claim 2, wherein:
    - the second hardware level comprises a top-of-rack (ToR) switch that is communicatively connected to the particular computing device;
      
      the first and second hardware utilization data for the one or more hardware levels comprises one or more of;
      
      input features generated from sensor data collected from an out-of-band subsystem of the ToR switch;
      
      network counters;
      
      network flow data;
      
      or network configuration information; and
      
      the prediction of hardware utilization at the second hardware level predicts network utilization of the ToR switch.
  - 5. The method of claim 4, further comprising:
    - generating first one or more predictions of hardware utilization at the second hardware level using the second trained machine learning model;
      
      using second training data, training a third machine learning model to predict hardware utilization at a third hardware level of the plurality of hardware levels given hardware utilization features recorded in the second training data to produce a third trained machine learning model;
      
      wherein the second training data comprises third hardware utilization data for one or more hardware levels of the plurality of hardware levels collected during a third time period and the first one or more predictions of hardware utilization at the second hardware level generated using the second trained machine learning model;
      
      generating second one or more predictions of hardware utilization at the second hardware level using the second trained machine learning model;
      
      based, at least in part, on fourth hardware utilization data for the one or more hardware levels collected during a fourth time period subsequent to the third time period, and the second one or more predictions of hardware utilization at the second hardware level, generating a third-level prediction of hardware utilization at the third hardware level using the third trained machine learning model.
  - 6. The method of claim 5, wherein:
    - the third hardware level comprises a backbone switch that is communicatively connected to the ToR switch;
      
      the third and fourth hardware utilization data for the one or more hardware levels comprises one or more of;
      
      input features generated from sensor data collected from an out-of-band subsystem of the backbone switch;
      
      network counters;
      
      network flow information;
      
      or network configuration information; and
      
      the third-level prediction of hardware utilization at the third hardware level predicts network utilization of the backbone switch.
  - 7. The method of claim 6, further comprising:
    - generating first one or more predictions of hardware utilization at the backbone switch level using the third trained machine learning model;
      
      using third training data, training a fourth machine learning model to predict hardware utilization at a fourth datacenter level given hardware utilization features recorded in the third training data to produce a fourth trained machine learning model;
      
      wherein the third training data comprises the first one or more predictions of hardware utilization at the backbone switch level generated using the third trained machine learning model;
      
      generating second one or more predictions of hardware utilization at the backbone switch level using the third trained machine learning model;
      
      based, at least in part, on the second one or more predictions of hardware utilization at the backbone switch level, generating a datacenter-level prediction of hardware utilization at the datacenter level using the fourth trained machine learning model.
  - 8. The method of claim 1, further comprising:
    - based, at least in part, on a set of predictions, identifying one or more devices as a predicted network hotspot;
      
      wherein the set of predictions includes one or more of;
      
      a set of predictions from the first trained machine learning model, and a set of predictions from the second trained machine learning model;
      
      causing one or more network controllers to reroute one or more network flows away from the predicted network hotspot.
  - 9. The method of claim 1, wherein:
    - a particular prediction, of said second one or more predictions of hardware utilization, predicts utilization of particular hardware at the first hardware level during a particular period of time;
      
      the method further comprises;
      
      detecting actual utilization of the particular hardware during the particular period of time;
      
      determining whether the actual utilization of the particular hardware is within a pre-determined threshold of the particular prediction;
      
      in response to determining that the actual utilization of the particular hardware is not within the pre-determined threshold of the particular prediction, identifying a deviation event for the particular hardware during the particular period of time.
  - 10. The method of claim 9 further comprising:
    - determining whether the deviation event identified for the particular hardware is an anomalous event based, at least in part, on one or more of;
      
      the particular hardware has been restarted within a threshold time period prior to the deviation event;
      
      the particular hardware has changed ownership within a threshold time period prior to the deviation event;
      
      orhistorical deviation event data recorded for the particular hardware.
  - 11. The method of claim 10 wherein:
    - the historical deviation event data recorded for the particular hardware comprises information about one or more historical deviation events for the particular hardware; and
      
      determining whether the deviation event identified for the particular hardware is an anomalous event is based, at least in part, on the historical deviation event data recorded for the particular hardware, and further comprises;
      
      identifying a historical pattern of deviation events in the historical deviation event data,determining that the deviation event fails to conform to the historical pattern of deviation events, andin response to determining that the deviation event fails to conform to the historical pattern of deviation events, determining that the deviation event is an anomalous event.

12. One or more non-transitory computer-readable media storing instructions which, when executed by one or more processors, cause:
- generating first one or more predictions of hardware utilization at a first hardware level of a plurality of hardware levels in a system of networked computing devices using a first trained machine learning model;
  
  using training data, training a second machine learning model to predict hardware utilization at a second hardware level of the plurality of hardware levels given hardware utilization features recorded in the training data to produce a second trained machine learning model;
  
  wherein the training data comprises first hardware utilization data for one or more hardware levels of the plurality of hardware levels collected during a first time period and the first one or more predictions of hardware utilization generated using the first trained machine learning model;
  
  generating second one or more predictions of hardware utilization at the first hardware level using the first trained machine learning model;
  
  based, at least in part, on second hardware utilization data for the one or more hardware levels collected during a second time period subsequent to the first time period, and the second one or more predictions of hardware utilization at the first hardware level, generating a prediction of hardware utilization at the second hardware level using the second trained machine learning model.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The one or more non-transitory computer-readable media of claim 12, wherein:
    - the first hardware level comprises a particular computing device of the system of networked computing device;
      
      the second one or more predictions predict utilization of the particular computing device; and
      
      the first trained machine learning model generates the second one or more predictions based, at least in part, on input features generated from sensor data collected from an out-of-band subsystem of the particular computing device.
  - 14. The one or more non-transitory computer-readable media of claim 13, wherein:
    - the first and second hardware levels are the same level;
      
      the second hardware level comprises the particular computing device of the system of networked computing device;
      
      the first and second hardware utilization data for the one or more hardware levels comprises one or more of;
      
      network counters, or network flow data; and
      
      the prediction of hardware utilization at the second hardware level predicts one or more of;
      
      an operating system running on the particular computing device, or a type of workload running on the particular computing device.
  - 15. The one or more non-transitory computer-readable media of claim 13, wherein:
    - the second hardware level comprises a top-of-rack (ToR) switch that is communicatively connected to the particular computing device;
      
      the first and second hardware utilization data for the one or more hardware levels comprises one or more of;
      
      input features generated from sensor data collected from an out-of-band subsystem of the ToR switch;
      
      network counters;
      
      network flow data;
      
      or network configuration information; and
      
      the prediction of hardware utilization at the second hardware level predicts network utilization of the ToR switch.
  - 16. The one or more non-transitory computer-readable media of claim 15, wherein the instructions further comprise instructions which, when executed by one or more processors, cause:
    - generating first one or more predictions of hardware utilization at the second hardware level using the second trained machine learning model;
      
      using second training data, training a third machine learning model to predict hardware utilization at a third hardware level of the plurality of hardware levels given hardware utilization features recorded in the second training data to produce a third trained machine learning model;
      
      wherein the second training data comprises third hardware utilization data for one or more hardware levels of the plurality of hardware levels collected during a third time period and the first one or more predictions of hardware utilization at the second hardware level generated using the second trained machine learning model;
      
      generating second one or more predictions of hardware utilization at the second hardware level using the second trained machine learning model;
      
      based, at least in part, on fourth hardware utilization data for the one or more hardware levels collected during a fourth time period subsequent to the third time period, and the second one or more predictions of hardware utilization at the second hardware level, generating a third-level prediction of hardware utilization at the third hardware level using the third trained machine learning model.
  - 17. The one or more non-transitory computer-readable media of claim 16, wherein:
    - the third hardware level comprises a backbone switch that is communicatively connected to the ToR switch;
      
      the third and fourth hardware utilization data for the one or more hardware levels comprises one or more of;
      
      input features generated from sensor data collected from an out-of-band subsystem of the backbone switch;
      
      network counters;
      
      network flow information;
      
      or network configuration information; and
      
      the third-level prediction of hardware utilization at the third hardware level predicts network utilization of the backbone switch.
  - 18. The one or more non-transitory computer-readable media of claim 17, wherein the instructions further comprise instructions which, when executed by one or more processors, cause:
    - generating first one or more predictions of hardware utilization at the backbone switch level using the third trained machine learning model;
      
      using third training data, training a fourth machine learning model to predict hardware utilization at a fourth datacenter level given hardware utilization features recorded in the third training data to produce a fourth trained machine learning model;
      
      wherein the third training data comprises the first one or more predictions of hardware utilization at the backbone switch level generated using the third trained machine learning model;
      
      generating second one or more predictions of hardware utilization at the backbone switch level using the third trained machine learning model;
      
      based, at least in part, on the second one or more predictions of hardware utilization at the backbone switch level, generating a datacenter-level prediction of hardware utilization at the datacenter level using the fourth trained machine learning model.
  - 19. The one or more non-transitory computer-readable media of claim 12, wherein the instructions further comprise instructions which, when executed by one or more processors, cause:
    - based, at least in part, on a set of predictions, identifying one or more devices as a predicted network hotspot;
      
      wherein the set of predictions includes one or more of;
      
      a set of predictions from the first trained machine learning model, and a set of predictions from the second trained machine learning model;
      
      causing one or more network controllers to reroute one or more network flows away from the predicted network hotspot.
  - 20. The one or more non-transitory computer-readable media of claim 12, wherein:
    - a particular prediction, of said second one or more predictions of hardware utilization, predicts utilization of particular hardware at the first hardware level during a particular period of time;
      
      the instructions further comprise instructions which, when executed by one or more processors, cause;
      
      detecting actual utilization of the particular hardware during the particular period of time;
      
      determining whether the actual utilization of the particular hardware is within a pre-determined threshold of the particular prediction;
      
      in response to determining that the actual utilization of the particular hardware is not within the pre-determined threshold of the particular prediction, identifying a deviation event for the particular hardware during the particular period of time.
  - 21. The one or more non-transitory computer-readable media of claim 20 wherein the instructions further comprise instructions which, when executed by one or more processors, cause:
    - determining whether the deviation event identified for the particular hardware is an anomalous event based, at least in part, on one or more of;
      
      the particular hardware has been restarted within a threshold time period prior to the deviation event;
      
      the particular hardware has changed ownership within a threshold time period prior to the deviation event;
      
      orhistorical deviation event data recorded for the particular hardware.
  - 22. The one or more non-transitory computer-readable media of claim 21 wherein:
    - the historical deviation event data recorded for the particular hardware comprises information about one or more historical deviation events for the particular hardware; and
      
      determining whether the deviation event identified for the particular hardware is an anomalous event is based, at least in part, on the historical deviation event data recorded for the particular hardware, and further comprises;
      
      identifying a historical pattern of deviation events in the historical deviation event data,determining that the deviation event fails to conform to the historical pattern of deviation events, andin response to determining that the deviation event fails to conform to the historical pattern of deviation events, determining that the deviation event is an anomalous event.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Shinde, Pravin, Schmidt, Felix, Kocberber, Onur

Granted Patent

US 11,443,166 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 11/0709   in a distributed system con...

G06F 11/0751   Error or fault detection no...

G06F 11/0787   Storage of error reports, e...

G06F 11/3409   for performance assessment

G06F 11/3447   Performance evaluation by m...

G06F 11/3452   Performance evaluation by s...

G06F 11/3495   for systems

G06F 21/53   by executing in a restricte...

G06F 21/57   Certifying or maintaining t...

G06F 2221/034   Test or assess a computer o...

G06N 20/20   Ensemble learning

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G06N 3/088   Non-supervised learning, e....

G06N 5/01   Dynamic search techniques; ...

DATACENTER LEVEL UTILIZATION PREDICTION WITHOUT OPERATING SYSTEM INVOLVEMENT

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

21 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

DATACENTER LEVEL UTILIZATION PREDICTION WITHOUT OPERATING SYSTEM INVOLVEMENT

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links