Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform

US 10,361,919 B2
Filed: 11/09/2015
Issued: 07/23/2019
Est. Priority Date: 11/09/2015
Status: Active Grant

First Claim

Patent Images

1. A method for managing a virtual machine server cluster in a multi-cloud platform, comprising:

supporting a group of statistics for selecting metric statistics, the group of statistics including each of average, maximum value, minimum value, last value, standard, sum of historical values, sum of squares of historical values, and count of values;

classifying a first quality metric into a load metric class, wherein members of the load metric class indicate a load on a virtual machine;

selecting a load metric statistic based on the classifying the first quality metric into the load metric class, the load metric statistic being selected from the group of statistics;

accumulating values for one or more load metric partial sums from performance monitoring data relating to the first quality metric, the load metric partial sums being selected to calculate a value of the load metric statistic;

calculating the value of the load metric statistic from the load metric partial sums accumulated from performance monitoring data relating to the first quality metric;

classifying a second quality metric into a utilization metric class, wherein members of the utilization metric class indicate a utilization of hardware by the virtual machine;

selecting a utilization metric statistic based on the classifying the second quality metric into the utilization metric class, the utilization metric statistic being selected from the group of statistics and being different from the load metric statistic;

accumulating values for one or more utilization metric partial sums from performance monitoring data relating to the second quality metric, the utilization metric partial sums being selected to calculate a value of the utilization metric statistic;

calculating the value of the utilization metric statistic from the load metric partial sums accumulated from performance monitoring data relating to the second quality metric;

determining an adaptive threshold range for the first quality metric based on the value of the load metric statistic and based on the classifying the first quality metric into the load metric class;

determining an adaptive threshold range for the second quality metric based on the value of the utilization metric statistic and based on the classifying the second quality metric into the utilization metric class;

determining that a monitoring value for one of the first quality metric and the second quality metric is outside the adaptive threshold range for the one quality metric;

performing a self-healing and dynamic optimization task based on the determining that the monitoring value is outside the adaptive threshold range;

determining that a value of one of the partial sums accumulated from performance monitoring data relating to one of the quality metrics exceeds a limit imposed to prevent arithmetic overflow of a value storage; and

dividing values of each partial sum accumulated from performance monitoring data relating to the one of the quality metrics by two.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Virtual machine server clusters are managed using self-healing and dynamic optimization to achieve closed-loop automation. The technique uses adaptive thresholding to develop actionable quality metrics for benchmarking and anomaly detection. Real-time analytics are used to determine the root cause of KPI violations and to locate impact areas. Self-healing and dynamic optimization rules are able to automatically correct common issues via no-touch automation in which finger-pointing between operations staff is prevalent, resulting in consolidation, flexibility and reduced deployment time.

Citations

18 Claims

1. A method for managing a virtual machine server cluster in a multi-cloud platform, comprising:
- supporting a group of statistics for selecting metric statistics, the group of statistics including each of average, maximum value, minimum value, last value, standard, sum of historical values, sum of squares of historical values, and count of values;
  
  classifying a first quality metric into a load metric class, wherein members of the load metric class indicate a load on a virtual machine;
  
  selecting a load metric statistic based on the classifying the first quality metric into the load metric class, the load metric statistic being selected from the group of statistics;
  
  accumulating values for one or more load metric partial sums from performance monitoring data relating to the first quality metric, the load metric partial sums being selected to calculate a value of the load metric statistic;
  
  calculating the value of the load metric statistic from the load metric partial sums accumulated from performance monitoring data relating to the first quality metric;
  
  classifying a second quality metric into a utilization metric class, wherein members of the utilization metric class indicate a utilization of hardware by the virtual machine;
  
  selecting a utilization metric statistic based on the classifying the second quality metric into the utilization metric class, the utilization metric statistic being selected from the group of statistics and being different from the load metric statistic;
  
  accumulating values for one or more utilization metric partial sums from performance monitoring data relating to the second quality metric, the utilization metric partial sums being selected to calculate a value of the utilization metric statistic;
  
  calculating the value of the utilization metric statistic from the load metric partial sums accumulated from performance monitoring data relating to the second quality metric;
  
  determining an adaptive threshold range for the first quality metric based on the value of the load metric statistic and based on the classifying the first quality metric into the load metric class;
  
  determining an adaptive threshold range for the second quality metric based on the value of the utilization metric statistic and based on the classifying the second quality metric into the utilization metric class;
  
  determining that a monitoring value for one of the first quality metric and the second quality metric is outside the adaptive threshold range for the one quality metric;
  
  performing a self-healing and dynamic optimization task based on the determining that the monitoring value is outside the adaptive threshold range;
  
  determining that a value of one of the partial sums accumulated from performance monitoring data relating to one of the quality metrics exceeds a limit imposed to prevent arithmetic overflow of a value storage; and
  
  dividing values of each partial sum accumulated from performance monitoring data relating to the one of the quality metrics by two.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising classifying a third quality metric into one of a process efficiency metric class and a response time metric class.
  - 3. The method of claim 1, wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the second quality metric, and performing the self-healing and dynamic optimization task comprises adding a resource if the monitoring value is above an upper threshold of the adaptive threshold range for the second quality metric and removing a resource if the monitoring value is below a lower threshold of the adaptive threshold range for the second quality metric.
  - 4. The method of claim 1, wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the second quality metric, and performing the self-healing and dynamic optimization task comprises performing optimizing resource tuning.
  - 5. The method of claim 1, wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the first quality metric, and performing the self-healing and dynamic optimization task comprises adjusting a system load.
  - 6. The method of claim 1, further comprising:
    - classifying a third quality metric into a process efficiency metric class, wherein members of the process efficiency metric class indicate a number of active processes;
      
      selecting a process efficiency metric statistic based on the classifying the third quality metric into the process efficiency metric class, the process efficiency metric statistic being selected from the group of statistics;
      
      accumulating values for one or more process efficiency metric partial sums from performance monitoring data relating to the third quality metric, the process efficiency metric partial sums being selected to calculate a value of the process efficiency metric statistic;
      
      calculating the value of the process efficiency metric statistic from the process efficiency metric partial sums accumulated from performance monitoring data relating to the third quality metric;
      
      wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the third quality metric, and performing the self-healing and dynamic optimization task comprises performing virtual machine life cycle management.
  - 7. The method of claim 1, further comprising:
    - associating a single, normalized timestamp with all the partial sum values that are accumulated in a single execution of a history update function.
  - 8. The method of claim 1, wherein the partial sums include a sum value, a sum-of-the-squares value, and a count value.

9. A computer-readable storage device having stored thereon computer readable instructions for managing a virtual machine server cluster in a multi-cloud platform, wherein execution of the computer readable instructions by a processor causes the processor to perform operations comprising:
- supporting a group of statistics for selecting metric statistics, the group of statistics including each of average, maximum value, minimum value, last value, standard, sum of historical values, sum of squares of historical values, and count of values;
  
  classifying a first quality metric into a load metric class, wherein members of the load metric class indicate a load on a virtual machine;
  
  selecting a load metric statistic based on the classifying the first quality metric into the load metric class, the load metric statistic being selected from the group of statistics;
  
  accumulating values for one or more load metric partial sums from performance monitoring data relating to the first quality metric, the load metric partial sums being selected to calculate a value of the load metric statistic;
  
  calculating the value of the load metric statistic from the load metric partial sums accumulated from performance monitoring data relating to the first quality metric;
  
  classifying a second quality metric into a utilization metric class, wherein members of the utilization metric class indicate a utilization of hardware by the virtual machine;
  
  selecting a utilization metric statistic based on the classifying the second quality metric into the utilization metric class, the utilization metric statistic being selected from the group of statistics and being different from the load metric statistic;
  
  accumulating values for one or more utilization metric partial sums from performance monitoring data relating to the second quality metric, the utilization metric partial sums being selected to calculate a value of the utilization metric statistic;
  
  calculating the value of the utilization metric statistic from the load metric partial sums accumulated from performance monitoring data relating to of the second quality metric;
  
  determining an adaptive threshold range for the first quality metric based on the value of the load metric statistic and based on the classifying the first quality metric into the load metric class;
  
  determining an adaptive threshold range for the second quality metric based on the value of the utilization metric statistic and based on the classifying the second quality metric into the utilization metric class;
  
  determining that a monitoring value for one of the first quality metric and the second quality metric is outside the adaptive threshold range for the one quality metric; and
  
  performing a self-healing and dynamic optimization task based on the so determining that the monitoring value is outside the adaptive threshold range;
  
  determining that a value of one of the partial sums accumulated from performance monitoring data relating to one of the quality metrics exceeds a limit imposed to prevent arithmetic overflow of a value storage; and
  
  dividing values of each partial sum accumulated from performance monitoring data relating to the one of the quality metrics by two.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The computer-readable storage device of claim 9, wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the second quality metric, and performing the self-healing and dynamic optimization task comprises adding a resource if the monitoring value is above an upper threshold of the adaptive threshold range for the second quality metric and removing a resource if the monitoring value is below a lower threshold of the adaptive threshold range for the second quality metric.
  - 11. The computer-readable storage device of claim 9, wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the second quality metric, and performing the self-healing and dynamic optimization task comprises performing optimizing resource tuning.
  - 12. The computer-readable storage device of claim 9, wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the first quality metric, and performing the self-healing and dynamic optimization task comprises adjusting a system load.
  - 13. The computer-readable storage device of claim 9, further comprising:
    - classifying a third quality metric into a process efficiency metric class, wherein members of the process efficiency metric class indicate a number of active processes;
      
      selecting a process efficiency metric statistic based on the classifying the third quality metric into the process efficiency metric class, the process efficiency metric statistic being selected from the group of statistics;
      
      accumulating values for one or more process efficiency metric partial sums from performance monitoring data relating to the third quality metric, the process efficiency metric partial sums being selected to calculate a value of the process efficiency metric statistic;
      
      calculating the value of the process efficiency metric statistic from the process efficiency metric partial sums accumulated from performance monitoring data relating to the third quality metric;
      
      wherein the quality metric for which the monitoring value is determined to be outside the adaptive threshold range is the third quality metric, and performing the self-healing and dynamic optimization task comprises performing virtual machine life cycle management.
  - 14. The computer-readable storage device of claim 9, wherein the operations further comprise:
    - associating a single, normalized timestamp with all the partial sum values that are accumulated in a single execution of a history update function.
  - 15. The computer-readable storage device of claim 9, wherein the partial sums include a sum value, a sum-of-the-squares value, and a count value.

16. A system for managing a virtual machine server cluster in a multi-cloud platform, comprising:
- a processor resource;
  
  a performance measurement interface connecting the processor resource to the virtual machine server cluster; and
  
  a computer-readable storage device having stored thereon computer readable instructions, wherein execution of the computer readable instructions by the processor resource causes the processor resource to perform operations comprising;
  
  supporting a group of statistics for selecting metric statistics, the group of statistics including each of average, maximum value, minimum value, last value, standard, sum of historical values, sum of squares of historical values, and count of values;
  
  receiving, by the performance measurement interface, performance monitoring data of a first quality metric and a second quality metric;
  
  classifying, by the processor, the first quality metric into a load metric class, wherein members of the load metric class indicate a load on a virtual machine;
  
  selecting a load metric statistic based on the classifying the first quality metric into the load metric class, the load metric statistic being selected from the group of statistics;
  
  accumulating values for one or more load metric partial sums from the performance monitoring data of the first quality metric, the load metric partial sums being selected to calculate a value of the load metric statistic;
  
  calculating, by the processor, the value of the load metric statistic from the load metric partial sums accumulated from the performance monitoring data of the first quality metric;
  
  classifying, by the processor, the second quality metric into a utilization metric class, wherein members of the utilization metric class indicate a utilization of hardware by the virtual machine;
  
  selecting a utilization metric statistic based on the classifying the second quality metric into the utilization metric class, the utilization metric statistic being selected from the group of statistics and being different from the load metric statistic;
  
  accumulating values for one or more utilization metric partial sums from performance monitoring data relating to the second quality metric, the utilization metric partial sums being selected to calculate a value of the utilization metric statistic;
  
  calculating, by the processor, the value of the utilization metric statistic from the load metric partial sums accumulated from performance monitoring data relating to the second quality metric;
  
  determining, by the processor, an adaptive threshold range for the first quality metric based on the value of the load metric statistic and based on the classifying the first quality metric into the load metric class;
  
  determining, by the processor, an adaptive threshold range for the second quality metric based on the value of the utilization metric statistic and based on the classifying the second quality metric into the utilization metric class;
  
  determining, by the processor, that a monitoring value for one of the first quality metric and the second quality metric is outside the adaptive threshold range for the one quality metric; and
  
  performing, by the processor, a self-healing and dynamic optimization task based on the determining that the monitoring value for the second quality metric is outside the adaptive threshold range, the self-healing and dynamic optimization task comprising adding a computing resource if the statistical value is above an upper threshold and removing a computing resource if the statistical value is below a lower threshold;
  
  determining that a value of one of the partial sums accumulated from performance monitoring data relating to one of the quality metrics exceeds a limit imposed to prevent arithmetic overflow of a value storage; and
  
  dividing values of each partial sum accumulated from performance monitoring data relating to the one of the quality metrics by two.
- View Dependent Claims (17, 18)
- - 17. The system of claim 16, further comprising:
    - associating a single, normalized timestamp with all the partial sum values that are accumulated in a single execution of a history update function.
  - 18. The system of claim 16, wherein the partial sums comprise a sum value, a sum-of-the-squares value, and a count value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Yang, Chen-Yui, Lu, David H., Baker, Scott, Srdar, Anthony M., Bourge, Gabriel
Primary Examiner(s)
Hussain, Imad

Application Number

US14/936,095
Publication Number

US 20170134237A1
Time in Patent Office

1,352 Days
Field of Search

709224
US Class Current
CPC Class Codes

G06F 2009/45591   Monitoring or debugging sup...

G06F 2009/45595   Network integration; Enabli...

G06F 9/45558   Hypervisor-specific managem...

H04L 41/0816   the condition being an adap...

H04L 41/0823   characterised by the purpos...

H04L 41/0895   Configuration of virtualise...

H04L 41/0896   Bandwidth or capacity manag...

H04L 41/0897   by horizontal or vertical s...

H04L 41/12   Discovery or management of ...

H04L 41/40   using virtualisation of net...

H04L 41/5009   Determining service level p...

H04L 41/5025   by proactively reacting to ...

H04L 43/0876   Network utilisation, e.g. v...

H04L 43/16   Threshold monitoring

Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links