Automatic Root Cause Diagnosis in Networks
First Claim
1. A computer-implemented method comprising:
- obtaining a set of data records, wherein the data records include respective pluralities of tuples characterizing operation of communication sessions in a network, wherein the tuples contain signatures representing features and values, wherein the features and values identify hardware or software components related to the network that were involved in the communication sessions;
generating binary labels for the data records, wherein the binary labels respectively indicate whether the communication sessions associated with the data records were successful or failed;
determining degrees to which signatures in the pluralities of tuples are associated with communication problems in the network, wherein, for a particular signature, a degree is based on linear combinations of;
(i) a proportion of the data records not including the particular signature, and (ii) a proportion of the data records labelled as failed that do not include the particular signature;
identifying, from the degrees, a subset of the signatures most associated with the communication problems;
grouping specific pairs from the subset of the signatures into equivalence classes based on co-occurrence of signatures of the specific pairs within the data records;
generating a dependency graph between the equivalence classes in which the equivalence classes are represented as nodes in the dependency graph and edges are placed between a parent equivalence class and a child equivalence class where the data records in the child equivalence class are approximately a subset of the data records in the parent equivalence class;
based on the signatures and the binary labels, determining relative failure ratios of each of the child equivalence classes with respect to their parent equivalence classes;
removing a parent or child equivalence classes from the dependency graph where all of the relative failure ratios thereof are less than a pre-determined threshold; and
from the equivalence classes remaining in the dependency graph, selecting a subset of the hardware or software components related to the network that are candidates for involvement with the communication problems.
3 Assignments
0 Petitions
Accused Products
Abstract
An embodiment may involve: (i) obtaining a set of data records that include respective pluralities of tuples characterizing operation of communication sessions in a network and that identify hardware or software components related to the network that were involved in the communication sessions, (ii) determining degrees to which signatures in the pluralities of tuples are associated with communication problems in the network; (iii) identifying, from the degrees, a subset of the signatures most associated with the communication problems; (iv) grouping specific pairs from the subset of the signatures into equivalence classes based on co-occurrence of signatures of the specific pairs within the data records; (v) generating and pruning a dependency graph between the equivalence classes; (vi) from the equivalence classes remaining in the dependency graph, selecting a subset of the hardware or software components related to the network that are candidates for involvement with the communication problems.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining a set of data records, wherein the data records include respective pluralities of tuples characterizing operation of communication sessions in a network, wherein the tuples contain signatures representing features and values, wherein the features and values identify hardware or software components related to the network that were involved in the communication sessions; generating binary labels for the data records, wherein the binary labels respectively indicate whether the communication sessions associated with the data records were successful or failed; determining degrees to which signatures in the pluralities of tuples are associated with communication problems in the network, wherein, for a particular signature, a degree is based on linear combinations of;
(i) a proportion of the data records not including the particular signature, and (ii) a proportion of the data records labelled as failed that do not include the particular signature;identifying, from the degrees, a subset of the signatures most associated with the communication problems; grouping specific pairs from the subset of the signatures into equivalence classes based on co-occurrence of signatures of the specific pairs within the data records; generating a dependency graph between the equivalence classes in which the equivalence classes are represented as nodes in the dependency graph and edges are placed between a parent equivalence class and a child equivalence class where the data records in the child equivalence class are approximately a subset of the data records in the parent equivalence class; based on the signatures and the binary labels, determining relative failure ratios of each of the child equivalence classes with respect to their parent equivalence classes; removing a parent or child equivalence classes from the dependency graph where all of the relative failure ratios thereof are less than a pre-determined threshold; and from the equivalence classes remaining in the dependency graph, selecting a subset of the hardware or software components related to the network that are candidates for involvement with the communication problems. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method comprising:
-
obtaining a set of data records, wherein the data records include respective pluralities of tuples characterizing operation of communication sessions in a network, wherein the tuples contain signatures representing features and values, wherein the features and values identify hardware or software components related to the network that were involved in the communication sessions; determining a 2-signature tuple present in at least one of the data records, wherein the 2-signature tuple is composed of a first signature and a second signature; calculating, for the 2-signature tuple, a first gain representing an overall relative inefficiency of the communication sessions involving the 2-signature tuple compared to relative inefficiencies of the communication sessions involving the first signature or the second signature; determining that the first gain exceeds a first pre-determined threshold; based on determining that the first gain exceeds the first pre-determined threshold, (i) filtering the communication sessions involving the 2-signature tuple to create a subset of the communication sessions involving 1-signatures for which a size of the subset exceeds a second pre-determined threshold, and (ii) calculating a second gain representing the overall relative inefficiency of the communication sessions involving the 2-signature tuple compared to relative inefficiencies of the communication sessions involving the 1-signatures for which the size of the subset exceeds the second pre-determined threshold; determining that the second gain exceeds the first pre-determined threshold; and based on determining that the second gain exceeds the first pre-determined threshold, identifying the features and values that are represented by the first signature and the second signature as units of the hardware or software components that are incompatible. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An article of manufacture including a non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising:
-
obtaining a set of data records, wherein the data records include respective pluralities of tuples characterizing operation of communication sessions in a network, wherein the tuples contain signatures representing features and values, wherein the features and values identify hardware or software components related to the network that were involved in the communication sessions; determining a 2-signature tuple present in at least one of the data records, wherein the 2-signature tuple is composed of a first signature and a second signature; calculating, for the 2-signature tuple, a first gain representing an overall relative inefficiency of the communication sessions involving the 2-signature tuple compared to relative inefficiencies of the communication sessions involving the first signature or the second signature; determining that the first gain exceeds a first pre-determined threshold; based on determining that the first gain exceeds the first pre-determined threshold, (i) filtering the communication sessions involving the 2-signature tuple to create a subset of the communication sessions involving 1-signatures for which a size of the subset exceeds a second pre-determined threshold, and (ii) calculating a second gain representing the overall relative inefficiency of the communication sessions involving the 2-signature tuple compared to relative inefficiencies of the communication sessions involving the 1-signatures for which the size of the subset exceeds the second pre-determined threshold; determining that the second gain exceeds the first pre-determined threshold; and based on determining that the second gain exceeds the first pre-determined threshold, identifying the features and values that are represented by the first signature and the second signature as units of the hardware or software components that are incompatible.
-
Specification