Organizing network performance metrics into historical anomaly dependency data

US 9,558,056 B2
Filed: 05/13/2014
Issued: 01/31/2017
Est. Priority Date: 07/28/2013
Status: Active Grant

First Claim

Patent Images

1. A method of organizing network performance metrics into historical anomaly dependency data, the method including:

assembling performance data for a multiplicity of metrics across a multiplicity of resources on a network and automatically setting criteria based on the performance data over time that qualifies a subset of the performance data as anomalous instance data;

constructing a map of active network communication paths that carry communications among first and second resources subject to anomalous performance and representing the active network communication paths as edges between nodes representing first and second resources, thereby forming connected node pairs;

calculating cascading failure relationships from time-stamped anomalous instance data for the connected node pairs, wherein the cascading failure relationships are based at least in part on whether conditional probabilities of anomalous performance of the second resources given prior anomalous performance of the first resources exceed a predetermined threshold;

wherein calculating the conditional probabilities makes use of a statistical measure of likelihood;

conditional probability=p(anomalous second resource instance|anomalous first resource instance); and

automatically representing the anomalous performance of the second resource as a cascading failure resulting from the anomalous performance of the first resource based on the calculated cascading failure relationships.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The technology disclosed relates to organizing network performance metrics into historical anomaly dependency data. In particular, it relates to calculating cascading failure relationships between correlated anomalies detected in a network. It also relates to illustrating to a network administrator causes of system failure by laying out the graph to show a progression over time of the cascading failures and identify root causes of the cascading failures. It also relates to ranking anomalies and anomaly clusters in the network based on attributes of the resources exhibiting anomalous performances and attributes of the anomalous performances. It further relates to depicting evolution of resource failures across a network by visually coding impacted resources and adjusting the visual coding over time and allowing replay over time to visualize propagation of anomalous performances among the impacted resource.

26 Citations

View as Search Results

20 Claims

1. A method of organizing network performance metrics into historical anomaly dependency data, the method including:
- assembling performance data for a multiplicity of metrics across a multiplicity of resources on a network and automatically setting criteria based on the performance data over time that qualifies a subset of the performance data as anomalous instance data;
  
  constructing a map of active network communication paths that carry communications among first and second resources subject to anomalous performance and representing the active network communication paths as edges between nodes representing first and second resources, thereby forming connected node pairs;
  
  calculating cascading failure relationships from time-stamped anomalous instance data for the connected node pairs, wherein the cascading failure relationships are based at least in part on whether conditional probabilities of anomalous performance of the second resources given prior anomalous performance of the first resources exceed a predetermined threshold;
  
  wherein calculating the conditional probabilities makes use of a statistical measure of likelihood;
  
  conditional probability=p(anomalous second resource instance|anomalous first resource instance); and
  
  automatically representing the anomalous performance of the second resource as a cascading failure resulting from the anomalous performance of the first resource based on the calculated cascading failure relationships.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further including preparing and forwarding for viewing visual representation data, wherein:
    - the visual representation data summarize a chain of cascading failure relationships related to a first group of anomalous instance data, including a count of the first and second resources involved in the chain of cascading failure relationships; and
      
      the visual representation data graphically depict the first and second resources impacted by the chain of cascading failure relationships, arranging the first and second resources along a timeline and showing how anomalous performances spread in time among the impacted first and second resources.
  - 3. The method of claim 2, wherein the visual representation data further include graphic depiction of predicted impacts on additional resources not yet impacted, based at least on:
    - active network communication paths that carry communications among first and second resources and the additional resources not yet impacted; and
      
      the calculated conditional probabilities applied to detected anomalous instance data.
  - 4. The method of claim 1, further including preparing and forwarding for viewing visual representation data, wherein:
    - the visual representation data graphically depict the first and second resources impacted by the chain of cascading failure relationships, arranging the first and second resources along a timeline and showing how anomalous performances spread in time among the impacted first and second resources; and
      
      the visual representation data further include replay controls that allow a user to filter by beginning and ending time the depiction of the chain of cascading failure relationships along the timeline.
  - 5. The method of claim 1, further including grouping connected node pairs and patterns of anomalous instance data at the connected node pairs for calculating the conditional probabilities.
  - 6. The method of claim 1, further including:
    - processing data that equates groups of connected node pairs as having similar operating relationships; and
      
      calculating the conditional probabilities for the groups of connected node pairs.
  - 7. The method of claim 1, wherein the time-stamped anomalous instance data identify at least start times of anomalous performances of the first and second resources that are within a predetermined time period, further including automatically representing anomalous performance of the second resource as a cascading failure resulting from the anomalous performance of the first resource.
  - 8. The method of claim 1, wherein the time-stamped anomalous instance data identify at least end times of anomalous performances of the first and second resources that are within a predetermined time period, further including automatically representing anomalous performance of the second resource as a cascading failure resulting from the anomalous performance of the first resource.
  - 9. The method of claim 1, further including calculating cascading failure relationships based at least in part on historical frequency of anomalous performance of the second resources given prior anomalous performance of the first resources.
  - 10. The method of claim 1, further including presenting groups of connected node pairs and calculated cascading failure relationships for expert human ratification or rejection.
  - 11. The method of claim 1, further including:
    - receiving human feedback on cascading failure relationships calculated for a first network communication path; and
      
      using the received human feedback for the first network communication path to calculate cascading failure relationships for other network communication paths.

12. A method of illustrating to a network administrator causes of system failure, the method including:
- generating for display a cluster of operation anomalies that are interrelated as cascading failures in an anomaly impact graph, including;
  
  depicting anomalous instance data in the cluster as nodes in a plot;
  
  representing active network communication paths that carry communications among first and second resources subject to anomalous performances as edges between the nodes, thereby forming connected node pairs; and
  
  depicting at least part of the plot to show a progression over time of the cascading failures for the connected node pairs and to identify one or more root causes of the cascading failures.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method of claim 12, wherein the cluster of operation anomalies includes nodes that are proximate in time and connected by edges that represent cascading failure result links.
  - 14. The method of claim 12, wherein the anomalous instance data identifies when the anomalous performances began and when the anomalous performance ended.
  - 15. The method of claim 12, further including providing a time-lapsed view of the cascading failures that depicts how anomalous performance spread in time among the first and second resources.
  - 16. The method of claim 12, further including a portal with a dashboard that reports a plurality of clusters of operation anomalies with rating of urgency of a cluster and indication of at least a magnitude of anomaly count for anomalies in the cluster.
  - 17. The method of claim 16, wherein the portal includes a slider control that provides a drill-down access to anomalies in the cluster.

18. A method of illustrating to a network administrator causes of system failure, the method including:
- generating for display an anomaly impact graph interface that depicts a cluster of operation anomalies that are interrelated as cascading failures, including;
  
  nodes in a diagram that represent anomalous instance data for different resources in the cluster;
  
  edges between the nodes that represent active network communication path data for communications among first and second resources, wherein the edges and nodes form connected node pairs; and
  
  arrangement of the diagram that shows progression over time of cascading failure result links between anomalous performances of the first and second resources occurring within a predetermined time period.
- View Dependent Claims (19, 20)
- - 19. The method of claim 18, further including providing a time-lapsed view of the cascading failure result links that depicts how anomalous performances spread in time among the first and second resources.
  - 20. The method of claim 18, further including a portal with a dashboard that reports a plurality of clusters of operation anomalies with rating of urgency of a cluster and indication of at least a magnitude of anomaly count for anomalies in the cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lightbend, Inc.
Original Assignee
OpsClarity, Inc. (Lightbend, Inc.)
Inventors
Sasturkar, Amit, Ngai, Alan
Primary Examiner(s)
Manoskey, Joseph D

Application Number

US14/276,826
Publication Number

US 20150033084A1
Time in Patent Office

994 Days
Field of Search

714/42, 714/43, 714/25, 714/26, 714/33, 714/37, 714/46, 714/47.1, 714/47.2, 714/47.3, 714/48, 714/57
US Class Current

1/1
CPC Class Codes

G06F 11/0709   in a distributed system con...

G06F 11/079   Root cause analysis, i.e. e...

G06F 11/3006   where the computing system ...

G06F 11/3409   for performance assessment

G06F 16/24578   using ranking

G06F 16/26   Visual data mining; Browsin...

G06F 16/285   Clustering or classification

H04L 41/0631   using root cause analysis; ...

H04L 41/064   involving time analysis

H04L 41/065   involving logical or physic...

H04L 41/145   involving simulating, desig...

H04L 41/22   comprising specially adapte...

H04L 43/04   Processing captured monitor...

Organizing network performance metrics into historical anomaly dependency data

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Organizing network performance metrics into historical anomaly dependency data

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links