MDL-based clustering for application dependency mapping
First Claim
1. A method comprising:
- capturing network flow data using sensors executing on servers of a data center network and sensors executing on networking devices connected to the servers;
determining a graph including nodes representing the servers and observed edges and unobserved edges representing the network flow data in which the observed edges between any pair of nodes of the graph indicate there are one or more observed flows between a pair of servers represented by that pair of nodes and in which the unobserved edges between any pair of nodes of the graph indicates there are no observed flows between a pair of servers represented by that pair of nodes;
determining different clusterings of the nodes of the graph in which each clustering includes clusters of one or more nodes of the graph;
determining a minimum description length (MDL) score for each clustering in which the MDL score aggregates a description length of each cluster of the clustering and in which the description length of each cluster is computed based on a minimum value between a number of the observed edges of the graph from the cluster to each other cluster of the clustering and a number of the unobserved edges of the graph from the cluster to each other cluster of the clustering;
identifying a first clustering having a minimum value for the MDL score among the different clusterings; and
generating an application dependency map including representations of applications executing in the data center network and representations of application dependencies in which the representations of applications each correspond to one of the clusters of the first clustering and the representations of application dependencies each correspond to one of the observed edges of the first clustering.
1 Assignment
0 Petitions
Accused Products
Abstract
Application dependency mapping (ADM) can be automated in a network. The network can determine an optimum number of clusters for the network using the minimum description length principle (MDL). The network can capture network and associated data using a sensor network that provides multiple perspectives and generate a graph therefrom. The nodes of the graph can include sources, destinations, and destination ports identified in the captured data, and the edges of the graph can include observed flows from the sources to the destinations at the destination ports. Each clustering can be evaluated according to an MDL score. The optimum number of clusters for the network may correspond to the number of clusters of the clustering associated with the minimum MDL score.
573 Citations
19 Claims
-
1. A method comprising:
-
capturing network flow data using sensors executing on servers of a data center network and sensors executing on networking devices connected to the servers; determining a graph including nodes representing the servers and observed edges and unobserved edges representing the network flow data in which the observed edges between any pair of nodes of the graph indicate there are one or more observed flows between a pair of servers represented by that pair of nodes and in which the unobserved edges between any pair of nodes of the graph indicates there are no observed flows between a pair of servers represented by that pair of nodes; determining different clusterings of the nodes of the graph in which each clustering includes clusters of one or more nodes of the graph; determining a minimum description length (MDL) score for each clustering in which the MDL score aggregates a description length of each cluster of the clustering and in which the description length of each cluster is computed based on a minimum value between a number of the observed edges of the graph from the cluster to each other cluster of the clustering and a number of the unobserved edges of the graph from the cluster to each other cluster of the clustering; identifying a first clustering having a minimum value for the MDL score among the different clusterings; and generating an application dependency map including representations of applications executing in the data center network and representations of application dependencies in which the representations of applications each correspond to one of the clusters of the first clustering and the representations of application dependencies each correspond to one of the observed edges of the first clustering. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
one or more processors; and memory including instructions that, upon being executed by the one or more processors, cause the system to; capture network flow data using sensors executing on servers of a data center network and sensors executing on networking devices connected to the servers; determine a graph including nodes representing the servers and observed edges and unobserved edges representing the network flow data in which the observed edges between any pair of nodes of the graph indicate there are one or more observed flows between a pair of servers represented by that pair of nodes and in which the unobserved edges between any pair of nodes of the graph indicates there are no observed flows between a pair of servers represented by that pair of nodes; determine different clusterings of the nodes of the graph in which each clustering includes clusters of one or more nodes of the graph; and determine a minimum description length (MDL) score for each clustering of the different clusterings in which the MDL score aggregates a description length of each cluster of the clustering and in which the description length of each cluster is computed based on a minimum value between a number of the observed edges of the graph from the cluster to each other cluster of the clustering and a number of the unobserved edges of the graph from the cluster to each other cluster of the clustering; identify a first clustering having a minimum value for the MDL score among the different clusterings; and generate an application dependency map including representations of applications executing in the data center network and representations of application dependencies in which the representations of applications each correspond to one of the clusters of the first clustering and the representations of application dependencies each correspond to one of the observed edges of the first clustering. - View Dependent Claims (13, 14, 15)
-
-
16. A non-transitory computer-readable medium having computer readable instructions that, upon being executed by one or more processors, cause the one or more processors to:
-
capture network flow data using sensors executing on a servers of a data center network and sensors executing on networking devices connected to the servers; determine a graph including nodes representing the servers and observed edges and unobserved edges representing the network flow data in which the observed edges between any pair of nodes of the graph indicate there are one or more observed flows between a pair of servers represented by that pair of nodes and in which the unobserved edges between any pair of nodes of the graph indicates there are no observed flows between a pair of servers represented by that pair of nodes; determine different clusterings of the nodes of the graph in which each clustering includes clusters of one or more nodes of the graph; determine a minimum description length (MDL) score for each clustering of the different clusterings in which the MDL score sums a description length of each cluster of the clustering and in which the description length of each cluster is computed based on a minimum value between a number of the observed edges of the graph from the cluster to each other cluster of the clustering and a number of the unobserved edges of the graph from the cluster to each other cluster of the clustering; identify a first clustering having a minimum value for the MDL score among the different clusterings; and generate an application dependency map including representations of applications executing in the data center network and representations of application dependencies in which the representations of applications each correspond to one of the clusters of the first clustering and the representations of application dependencies each correspond to one of the observed edges of the first clustering. - View Dependent Claims (17, 18, 19)
-
Specification