Intelligent services for application dependency discovery, reporting, and management tool
First Claim
1. A computer-implemented method comprising:
- configuring a monitoring application to monitor a first application and a plurality of dependencies of the first application using a plurality of monitoring interfaces;
detecting, by the monitoring application and based on the plurality of monitoring interfaces, that the first application has an unhealthy operating status;
collecting, by one or more data collecting agents and based on detecting that the first application has the unhealthy operating status, system state information corresponding to the first application and each of the plurality of dependencies;
storing the collected system state information in a database as a first incident record corresponding to a first incident event and comprising incident attribute information for the first application and each of the plurality of dependencies;
training a machine learning model based on a plurality of incident records including the first incident record, wherein training the machine learning model comprises;
clustering incident events corresponding to each of the plurality of incident records for the first application, wherein clustering the incident events is based on attributes of the system state information corresponding to each of the plurality of dependencies;
determining one or more patterns of performance based on the clustered incident events, wherein a first pattern of performance of the one or more patterns of performance indicates a potential correlation between a first attribute of the system state information corresponding to a first dependency and the first application having the unhealthy operating status; and
updating the machine learning model based on the determined patterns of performance;
detecting, by the monitoring application and based on the plurality of monitoring interfaces, a current operating status of the first application and the plurality of dependencies; and
generating, using the machine learning model and based on the first pattern of performance and the current operating status, a recommendation regarding operation of the first application or the first dependency.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for monitoring operating statuses of an application and its dependencies are provided. A monitoring application may collect and report the operating status of the monitored application and each dependency. Through use of existing monitoring interfaces, the monitoring application can collect operating status without requiring modification of the underlying monitored application or dependencies. The monitoring application may determine a problem service that is a root cause of an unhealthy state of the monitored application. Dependency analyzer and discovery crawler techniques may automatically configure and update the monitoring application. Machine learning techniques may be used to determine patterns of performance based on system state information associated with performance events and provide health reports relative to a baseline status of the monitored application. Also provided are techniques for testing a response of the monitored application through modifications to API calls. Such tests may be used to train the machine learning model.
41 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
configuring a monitoring application to monitor a first application and a plurality of dependencies of the first application using a plurality of monitoring interfaces; detecting, by the monitoring application and based on the plurality of monitoring interfaces, that the first application has an unhealthy operating status; collecting, by one or more data collecting agents and based on detecting that the first application has the unhealthy operating status, system state information corresponding to the first application and each of the plurality of dependencies; storing the collected system state information in a database as a first incident record corresponding to a first incident event and comprising incident attribute information for the first application and each of the plurality of dependencies; training a machine learning model based on a plurality of incident records including the first incident record, wherein training the machine learning model comprises; clustering incident events corresponding to each of the plurality of incident records for the first application, wherein clustering the incident events is based on attributes of the system state information corresponding to each of the plurality of dependencies; determining one or more patterns of performance based on the clustered incident events, wherein a first pattern of performance of the one or more patterns of performance indicates a potential correlation between a first attribute of the system state information corresponding to a first dependency and the first application having the unhealthy operating status; and updating the machine learning model based on the determined patterns of performance; detecting, by the monitoring application and based on the plurality of monitoring interfaces, a current operating status of the first application and the plurality of dependencies; and generating, using the machine learning model and based on the first pattern of performance and the current operating status, a recommendation regarding operation of the first application or the first dependency. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a first application having a plurality of dependencies, wherein a first dependency of the plurality of dependencies comprises an Application Programming Interface (API) utilized by the first application; a monitoring interface application providing a plurality of monitoring interfaces, wherein a first monitoring interface of the plurality of monitoring interfaces is configured to retrieve operating status information for the first application and a second monitoring interface of the plurality of monitoring interfaces is configured to retrieve operating status information for the first dependency; a database configured to store a plurality of incident records associated with the first application; and a monitoring device implementing a monitoring application and comprising one or more processors and memory storing instructions that, when executed by the one or more processors, cause the monitoring device to; configure the monitoring application to monitor the first application and the plurality of dependencies of the first application using the plurality of monitoring interfaces; detect, based on the plurality of monitoring interfaces, that the first application has an unhealthy operating status; collect, by one or more data collecting agents and based on detecting that the first application has the unhealthy operating status, system state information corresponding to the first application and each of the plurality of dependencies; store the collected system state information in the database as a first incident record corresponding to a first incident event and comprising incident attribute information for the first application and each of the plurality of dependencies; train a machine learning model based on a plurality of incident records including the first incident record, wherein the instructions cause the monitoring device to train the machine learning model by causing the monitoring device to; cluster incident events corresponding to each of the plurality of incident records for the first application based on attributes of the system state information corresponding to each of the plurality of dependencies; determine one or more patterns of performance based on the clustered incident events, wherein a first pattern of performance of the one or more patterns of performance indicates a potential correlation between a first attribute of the system state information corresponding to a first dependency and the first application having the unhealthy operating status; and update the machine learning model based on the determined patterns of performance; detect, based on the plurality of monitoring interfaces, a current operating status of the first application and the plurality of dependencies; and generate, using the machine learning model and based on the first pattern of performance and the current operating status, a recommendation regarding operation of the first application or the first dependency. - View Dependent Claims (18, 19)
-
-
20. One or more non-transitory computer readable media storing instructions that, when executed by one or more processors, cause a monitoring device to perform steps comprising:
-
configuring a monitoring application to monitor a first application and a plurality of dependencies of the first application using a plurality of monitoring interfaces, wherein the plurality of monitoring interfaces comprises; a first monitoring interface configured to determine incident attribute information associated with the first application; and a second monitoring interface configured to determine incident attributed information associated with a first dependency of the plurality of dependencies; detecting, by the monitoring application and based on the plurality of monitoring interfaces, that the first application has an unhealthy operating status; collecting, by one or more data collecting agents and based on detecting that the first application has the unhealthy operating status, system state information corresponding to the first application and each of the plurality of dependencies; storing the collected system state information in a database as a first incident record corresponding to a first incident event and comprising incident attribute information for the first application and each of the plurality of dependencies; updating the first incident record to indicate a corrective action taken in response to the first application having the unhealthy state; training a machine learning model based on a plurality of incident records including the first incident record, wherein training the machine learning model comprises; clustering incident events corresponding to each of the plurality of incident records for the first application, wherein clustering the incident events is based on attributes of the system state information corresponding to each of the plurality of dependencies; determining one or more patterns of performance based on the clustered incident events, wherein a first pattern of performance of the one or more patterns of performance indicates a potential correlation between a first attribute of the system state information corresponding to the first dependency and the first application having the unhealthy operating status; and updating the machine learning model based on the determined patterns of performance; detecting, by the monitoring application and based on the plurality of monitoring interfaces, a current operating status of the first application and the plurality of dependencies; and determining, using the machine learning model and based on the first pattern of performance and the current operating status, a suggested action based on the corrective action taken in response to the first incident event.
-
Specification