System, method, and computer program for highly available and scalable application monitoring
First Claim
1. A method, comprising:
- assigning, by a monitoring system having a plurality of collectors servers, each reporting agent of a plurality of reporting agents to a corresponding collector server of the plurality of collector servers, wherein each reporting agent of the plurality of reporting agents is embedded in a corresponding one of a plurality of applications of an application system being monitored for overall system health and performance, wherein assigning each reporting agent of the plurality of reporting agents to the corresponding collector server of the plurality of collector servers includes;
grouping the reporting agents in the plurality of reporting agents into a plurality of responsibility groups,assigning each collector server of the plurality of collector servers with a sequence number corresponding to one of the responsibility groups in the plurality of responsibility groups,wherein each collector server creates an in memory list of reporting agents included the one of the responsibility groups corresponding to the sequence number assigned to the collector server and wherein the in memory list is updated each time one of the reporting agents in the list disconnects from the collector server and each time a new reporting agent connects to the collector server, andwherein the monitoring system dynamically splits a workload among the plurality of collector servers by reassigning each reporting agent of the plurality of reporting agents to a new corresponding collector server of the plurality of collector servers when a new collector server is added to the plurality of collector servers and when an existing collector server of the plurality of collector servers is removed from the plurality of collector servers;
receiving, by the plurality of collector servers of the monitoring system, a plurality of metrics from the plurality of reporting agents, the plurality of metrics including health state and performance data for the plurality of applications;
aggregating, by the monitoring system, the plurality of metrics in a shared memory accessible to the plurality of collector servers;
applying, by the monitoring system, one or more rules to the plurality of metrics;
based on the applying of the one or more rules to the plurality of metrics, determining, by the monitoring system, to dispatch one or more alerts and the plurality of metrics to one or more downstream systems of the monitoring system; and
dispatching, by the monitoring system, the one or more alerts and the plurality of metrics to the one or more downstream systems utilizing one or more points of access to the one or more downstream systems;
consuming, by the one or more downstream systems of the monitoring system, the plurality of metrics for taking automatic action including scaling of the plurality of applications.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer program product are provided for highly available and scalable application monitoring. In operation, a monitoring system receives a plurality of metrics from a plurality of reporting agents associated with a system being monitored. The system being monitored includes a plurality of heterogeneous components each being associated with at least one of the plurality of reporting agents, and the monitoring system is configured to monitor the system being monitored for overall system health utilizing the plurality of metrics. Further, the monitoring system determines to dispatch one or more alerts, metrics and aggregated metrics, to one or more downstream systems based on one or more of the plurality of metrics. Additionally, the monitoring system dispatches the one or more alerts to the one or more downstream systems utilizing one or more points of access to a plurality of downstream systems including the one or more downstream systems. The monitoring system enables the plurality of reporting agents to each automatically connect to one of a plurality of collector servers for communicating the plurality of metrics. In addition, the monitoring system enables additional reporting agents to be automatically added to the monitoring system and enables the plurality of reporting agents each to automatically reconnect to another one of the plurality of collector servers upon failure of the one of the plurality of collector servers.
23 Citations
14 Claims
-
1. A method, comprising:
-
assigning, by a monitoring system having a plurality of collectors servers, each reporting agent of a plurality of reporting agents to a corresponding collector server of the plurality of collector servers, wherein each reporting agent of the plurality of reporting agents is embedded in a corresponding one of a plurality of applications of an application system being monitored for overall system health and performance, wherein assigning each reporting agent of the plurality of reporting agents to the corresponding collector server of the plurality of collector servers includes; grouping the reporting agents in the plurality of reporting agents into a plurality of responsibility groups, assigning each collector server of the plurality of collector servers with a sequence number corresponding to one of the responsibility groups in the plurality of responsibility groups, wherein each collector server creates an in memory list of reporting agents included the one of the responsibility groups corresponding to the sequence number assigned to the collector server and wherein the in memory list is updated each time one of the reporting agents in the list disconnects from the collector server and each time a new reporting agent connects to the collector server, and wherein the monitoring system dynamically splits a workload among the plurality of collector servers by reassigning each reporting agent of the plurality of reporting agents to a new corresponding collector server of the plurality of collector servers when a new collector server is added to the plurality of collector servers and when an existing collector server of the plurality of collector servers is removed from the plurality of collector servers; receiving, by the plurality of collector servers of the monitoring system, a plurality of metrics from the plurality of reporting agents, the plurality of metrics including health state and performance data for the plurality of applications; aggregating, by the monitoring system, the plurality of metrics in a shared memory accessible to the plurality of collector servers; applying, by the monitoring system, one or more rules to the plurality of metrics; based on the applying of the one or more rules to the plurality of metrics, determining, by the monitoring system, to dispatch one or more alerts and the plurality of metrics to one or more downstream systems of the monitoring system; and dispatching, by the monitoring system, the one or more alerts and the plurality of metrics to the one or more downstream systems utilizing one or more points of access to the one or more downstream systems; consuming, by the one or more downstream systems of the monitoring system, the plurality of metrics for taking automatic action including scaling of the plurality of applications. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product embodied on a non-transitory computer readable medium, comprising computer code for:
-
assigning, by a monitoring system having a plurality of collectors servers, each reporting agent of a plurality of reporting agents to a corresponding collector server of the plurality of collector servers, wherein each reporting agent of the plurality of reporting agents is embedded in a corresponding one of a plurality of applications of an application system being monitored for overall system health and performance, wherein assigning each reporting agent of the plurality of reporting agents to the corresponding collector server of the plurality of collector servers includes; grouping the reporting agents in the plurality of reporting agents into a plurality of responsibility groups, assigning each collector server of the plurality of collector servers with a sequence number corresponding to one of the responsibility groups in the plurality of responsibility groups, wherein each collector server creates an in memory list of reporting agents included the one of the responsibility groups corresponding to the sequence number assigned to the collector server and wherein the in memory list is updated each time one of the reporting agents in the list disconnects from the collector server and each time a new reporting agent connects to the collector server, and wherein the monitoring system dynamically splits a workload among the plurality of collector servers by reassigning each reporting agent of the plurality of reporting agents to a new corresponding collector server of the plurality of collector servers when a new collector server is added to the plurality of collector servers and when an existing collector server of the plurality of collector servers is removed from the plurality of collector servers; receiving, by the plurality of collector servers of the monitoring system, a plurality of metrics from the plurality of reporting agents, the plurality of metrics including health state and performance data for the plurality of applications; aggregating, by the monitoring system, the plurality of metrics in a shared memory accessible to the plurality of collector servers; applying, by the monitoring system, one or more rules to the plurality of metrics; based on the applying of the one or more rules to the plurality of metrics, determining, by the monitoring system, to dispatch one or more alerts and the plurality of metrics to one or more downstream systems of the monitoring system; and dispatching, by the monitoring system, the one or more alerts and the plurality of metrics to the one or more downstream systems utilizing one or more points of access to the one or more downstream systems; consuming, by the one or more downstream systems of the monitoring system, the plurality of metrics for taking automatic action including scaling of the plurality of applications. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A monitoring system comprising one or more processors operable for:
-
assigning, by the monitoring system having a plurality of collectors servers, each reporting agent of a plurality of reporting agents to a corresponding collector server of the plurality of collector servers, wherein each reporting agent of the plurality of reporting agents is embedded in a corresponding one of a plurality of applications of an application system being monitored for overall system health and performance, wherein assigning each reporting agent of the plurality of reporting agents to the corresponding collector server of the plurality of collector servers includes; grouping the reporting agents in the plurality of reporting agents into a plurality of responsibility groups, assigning each collector server of the plurality of collector servers with a sequence number corresponding to one of the responsibility groups in the plurality of responsibility groups, wherein each collector server creates an in memory list of reporting agents included the one of the responsibility groups corresponding to the sequence number assigned to the collector server and wherein the in memory list is updated each time one of the reporting agents in the list disconnects from the collector server and each time a new reporting agent connects to the collector server, and wherein the monitoring system dynamically splits a workload among the plurality of collector servers by reassigning each reporting agent of the plurality of reporting agents to a new corresponding collector server of the plurality of collector servers when a new collector server is added to the plurality of collector servers and when an existing collector server of the plurality of collector servers is removed from the plurality of collector servers; receiving, by the plurality of collector servers of the monitoring system, a plurality of metrics from the plurality of reporting agents, the plurality of metrics including health state and performance data for the plurality of applications; aggregating, by the monitoring system, the plurality of metrics in a shared memory accessible to the plurality of collector servers; applying, by the monitoring system, one or more rules to the plurality of metrics; based on the applying of the one or more rules to the plurality of metrics, determining, by the monitoring system, to dispatch one or more alerts and the plurality of metrics to one or more downstream systems of the monitoring system; and dispatching, by the monitoring system, the one or more alerts and the plurality of metrics to the one or more downstream systems utilizing one or more points of access to the one or more downstream systems; consuming, by the one or more downstream systems of the monitoring system, the plurality of metrics for taking automatic action including scaling of the plurality of applications.
-
Specification