System and method for graph based monitoring and management of distributed systems
First Claim
1. A method comprising:
- generating, by one or more processors, for each instance of a plurality of instances of a moving window, first metrics indicative of at least one of central processing unit (CPU) utilization, memory utilization, or disk utilization by each of a plurality of servers of a distributed streaming system and second metrics indicative of at least one of throughput or latency of each of the plurality of servers of the distributed streaming system;
generating, by the one or more processors, a topology graph including a plurality of vertices representing the plurality of servers and a plurality of edges representing data flow among the plurality of servers;
generating, by the one or more processors, at least one first metrics graph corresponding to the first metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph, wherein each vertex of the first metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the first metrics graph is indicative of the first metrics of a first server represented by a first vertex of the pair of vertices of the first metrics graph being within a predetermined threshold of the first metrics of a second server represented by a second vertex of the pair of vertices of the first metrics graph;
generating, by the one or more processors, at least one second metrics graph corresponding to the second metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph wherein each vertex of the second metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the second metrics graph is indicative of the second metrics of a first server represented by a first vertex of the pair of vertices of the second metrics graph being within a predetermined threshold of the second metrics of a second server represented by a second vertex of the pair of vertices of the second metrics graph;
identifying, by the one or more processors, one or more differences between at least one of the first metrics graph at a first instance of the plurality of instances of the moving window and the first metrics graph at a second instance of the plurality of instances of the moving window or the second metrics graph at the first instance and the second metrics graph at the second instance; and
displaying, by the one or more processors, the one or more differences as indicative of a malfunction of the distributed streaming system.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, methods, and computer-readable media are disclosed for graph based monitoring and management of network components of a distributed streaming system. In one aspect, a method includes generating, by a processor, a first metrics and a second metrics based on data collected on a system; generating, by the processor, a topology graph representing data flow within the system; generating, by the processor, at least one first metrics graph corresponding to the first metrics based in part on the topology graph; generating, by the processor, at least one second metrics graph corresponding to the second metrics based in part on the topology graph; identifying, by the processor, a malfunction within the system based on a change in at least one of the first metrics graph and the second metrics graph; and sending, by the processor, a feedback on the malfunction to an operational management component of the system.
223 Citations
18 Claims
-
1. A method comprising:
-
generating, by one or more processors, for each instance of a plurality of instances of a moving window, first metrics indicative of at least one of central processing unit (CPU) utilization, memory utilization, or disk utilization by each of a plurality of servers of a distributed streaming system and second metrics indicative of at least one of throughput or latency of each of the plurality of servers of the distributed streaming system; generating, by the one or more processors, a topology graph including a plurality of vertices representing the plurality of servers and a plurality of edges representing data flow among the plurality of servers; generating, by the one or more processors, at least one first metrics graph corresponding to the first metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph, wherein each vertex of the first metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the first metrics graph is indicative of the first metrics of a first server represented by a first vertex of the pair of vertices of the first metrics graph being within a predetermined threshold of the first metrics of a second server represented by a second vertex of the pair of vertices of the first metrics graph; generating, by the one or more processors, at least one second metrics graph corresponding to the second metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph wherein each vertex of the second metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the second metrics graph is indicative of the second metrics of a first server represented by a first vertex of the pair of vertices of the second metrics graph being within a predetermined threshold of the second metrics of a second server represented by a second vertex of the pair of vertices of the second metrics graph; identifying, by the one or more processors, one or more differences between at least one of the first metrics graph at a first instance of the plurality of instances of the moving window and the first metrics graph at a second instance of the plurality of instances of the moving window or the second metrics graph at the first instance and the second metrics graph at the second instance; and displaying, by the one or more processors, the one or more differences as indicative of a malfunction of the distributed streaming system. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a memory having computer-readable instructions stored therein; and one or more processors configured to execute the computer-readable instructions to perform operations including; generating, for each instance of a plurality of instances of a moving window, system metrics indicative of at least one of central processing unit (CPU) utilization, memory utilization, or disk utilization by each of a plurality of servers of a distributed streaming system and performance metrics indicative of at least one of throughput or latency of each of the plurality of servers of the distributed streaming system; generating a topology graph including a plurality of vertices representing the plurality of servers and a plurality of edges representing data flow among the plurality of servers; generating at least one system metrics graph corresponding to the system metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph, wherein each vertex of the system metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the system metrics graph is indicative of the system metrics of a first server represented by a first vertex of the pair of vertices of the system metrics graph being within a predetermined threshold of the system metrics of a second server represented by a second vertex of the pair of vertices of the system metrics graph; generating at least one performance metrics graph corresponding to the performance metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph, wherein each vertex of the performance metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the performance metrics graph is indicative of the performance metrics of a first server represented by a first vertex of the pair of vertices of the performance metrics graph being within a predetermined threshold of the performance metrics of a second server represented by a second vertex of the pair of vertices of the performance metrics graph; identifying one or more differences between at least one of the system metrics graph at a first instance of the plurality of instances of the moving window and the system metrics graph at a second instance of the plurality of instances of the moving window or the performance metrics graph at the first instance and the performance metrics graph at the second instance; and displaying the one or more differences as indicative of a malfunction of the distributed streaming system. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium having computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
generating, for each instance of a plurality of instances of a moving window, system metrics indicative of at least one of central processing unit (CPU) utilization, memory utilization, or disk utilization by each of a plurality of servers of a distributed streaming system and performance metrics indicative of at least one of throughput or latency of each of the plurality of servers of the distributed streaming system; generating a topology graph including a plurality of vertices representing the plurality of servers and a plurality of edges representing data flow among the plurality of servers; generating at least one system metrics graph corresponding to the system metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph, wherein each vertex of the system metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the system metrics graph is indicative of the system metrics of a first server represented by a first vertex of the pair of vertices of the system metrics graph being within a predetermined threshold of the system metrics of a second server represented by a second vertex of the pair of vertices of the system metrics graph; generating at least one performance metrics graph corresponding to the performance metrics for each server of the plurality of servers and for each instance of the plurality of instances of the moving window based in part on the topology graph, wherein each vertex of the performance metrics graph represents one of the servers of the plurality of servers and each edge between each pair of vertices of the performance metrics graph is indicative of the performance metrics of a first server represented by a first vertex of the pair of vertices of the performance metrics graph being within a predetermined threshold of the performance metrics of a second server represented by a second vertex of the pair of vertices of the performance metrics graph; identifying one or more differences between at least one of the system metrics graph at a first instance of the plurality of instances of the moving window and the system metrics graph at a second instance of the plurality of instances of the moving window or the performance metrics graph at the first instance and the performance metrics graph at the second instance; and displaying the one or more differences as indicative of a malfunction of the distributed streaming system. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification