Managing a distributed computing system
First Claim
1. A system manager for testing, administering and monitoring, and/or diagnosing problems with a distributed system having a plurality of computing machines, each machine including an event monitoring agent for monitoring for one or more pre-defined events occurring on the machine and/or for collecting usage statistics with respect to the machine, the system manager comprising a data collection module (DCM), a data management module (DMM), a data storing module (DSM), and a user interface module (UIM),the DCM for receiving a request from the DMM describing performance data to be collected from each of one or more of the machines of the system by way of the agent thereof, and based on the request for collecting such performance data from each agent and sending same to the DMM,the DMM for receiving the sent performance data from the DCM and storing the received performance data at the DSM along with metadata corresponding to the request for current and future usage, including monitoring, analyzing and evaluation,the UIM for receiving requests from a user and forwarding each received request to the DMM, and for formatting data from the DSM and representing the formatted data to the user,the DSM for storing performance data and corresponding metadata, the performance data comprising data as received from each agent in the system including raw values of a specific performance counter, the metadata comprising a description relating to circumstances of collecting the performance data, including data description information, data environment information, and a rule requiring performance of an action in response to a detection of a condition from performance data,the DMM for enforcing each rule by reviewing the performance data as received to determine if the condition thereof is met, and if so taking the corresponding action.
4 Assignments
0 Petitions
Accused Products
Abstract
A system manager tests, administers and monitors, and/or diagnoses problems with a distributed system having a plurality of computing machines. Each machine includes an event monitoring agent and the system manager comprises a data collection module (DCM), a data management module (DMM), a data storing module (DSM), and a user interface module (UIM). The DCM receives a request from the DMM describing performance data to be collected from each agent, and based on the request the DCM then collects such performance data from each agent and sends same to the DMM. The DMM stores the performance data at the DSM along with metadata corresponding to the request for current and future usage, including monitoring, analyzing and evaluation. The UIM receives requests from a user and forwards same to the DMM, and formats data from the DSM and represents the formatted data to the user.
-
Citations
21 Claims
-
1. A system manager for testing, administering and monitoring, and/or diagnosing problems with a distributed system having a plurality of computing machines, each machine including an event monitoring agent for monitoring for one or more pre-defined events occurring on the machine and/or for collecting usage statistics with respect to the machine, the system manager comprising a data collection module (DCM), a data management module (DMM), a data storing module (DSM), and a user interface module (UIM),
the DCM for receiving a request from the DMM describing performance data to be collected from each of one or more of the machines of the system by way of the agent thereof, and based on the request for collecting such performance data from each agent and sending same to the DMM, the DMM for receiving the sent performance data from the DCM and storing the received performance data at the DSM along with metadata corresponding to the request for current and future usage, including monitoring, analyzing and evaluation, the UIM for receiving requests from a user and forwarding each received request to the DMM, and for formatting data from the DSM and representing the formatted data to the user, the DSM for storing performance data and corresponding metadata, the performance data comprising data as received from each agent in the system including raw values of a specific performance counter, the metadata comprising a description relating to circumstances of collecting the performance data, including data description information, data environment information, and a rule requiring performance of an action in response to a detection of a condition from performance data, the DMM for enforcing each rule by reviewing the performance data as received to determine if the condition thereof is met, and if so taking the corresponding action.
-
13. A method for testing, administering and monitoring, and/or diagnosing problems with a distributed system having a plurality of computing machines, each machine including an event monitoring agent for monitoring for one or more pre-defined events occurring on the machine and/or for collecting usage statistics with respect to the machine, the method employing a system manager including a data collection module (DCM), a data management module (DMM), and a data storing module (DSM), the method comprising:
-
receiving, by the DCM, a request from the DMM describing performance data to be collected from each of one or more of the machines of the system by way of the agent thereof; collecting, by the DCM and based on the request, such performance data from each agent; sending, by the DCM, the collected performance data to the DMM; receiving, by the DMM, the sent performance data from the DCM; and storing, by the DMM, the received performance data at the DSM along with metadata corresponding to the request for current and future usage, including monitoring, analyzing and evaluation, the DSM storing performance data and corresponding metadata, the performance data comprising data as received from each agent in the system including raw values of a specific performance counter, the metadata comprising a description relating to circumstances of collecting the performance data, including data description information, data environment information, and a rule requiring performance of an action in response to a detection of a condition from performance data, the DMM enforcing each rule by reviewing the performance data as received to determine if the condition thereof is met, and if so taking the corresponding action. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
Specification