Real-time database exception monitoring tool using instance eviction data
First Claim
1. A method for monitoring a particular node of a first cluster database system that comprises a database, a database server that executes on a plurality of nodes that includes the particular node, wherein each node of the plurality of nodes executes one or more database instances, wherein the database of the first cluster database system is shared by multiple database instances of the database server, the method comprising:
- establishing, on the particular node of the plurality of nodes of the first cluster database system, a monitoring process;
the monitoring process collecting a plurality of values, wherein each value of the plurality of values indicates a measure of utilization or responsiveness of a first resource of the particular node at a different instant;
determining that a database instance that was executing on a node of a second cluster database system was evicted from the second cluster database system;
in response to determining that the database instance was evicted from the second cluster database system, storing eviction data that indicates a measure of utilization or responsiveness of a second resource of the node of the second cluster database system;
wherein the eviction data reflects the utilization or responsiveness that the second resource was experiencing when the database instance was evicted from the second cluster database system;
determining, based on (a) one or more values of the plurality of values and (b) the eviction data, a probability that the utilization or responsiveness of the first resource indicated by the one or more values will lead to performance problems for the particular node; and
based on the probability, performing one or more specified actions;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for monitoring resources of a computer system are provided. A monitoring process collects and reports utilization data for one or more resources of a computer system, such as CPU, memory, disk I/O, and network I/O. Instead of reporting just an average of the collected data over a period of time (e.g., 10 seconds), the monitoring process at least reports individually collected resource utilization values. If one or more of the utilization values exceed specified thresholds for the respective resources, then an alert may be generated. In one approach, the monitoring process is made a real-time priority process in the computer system to ensure that the memory used by the monitoring process is not swapped out of memory. Also, being a real-time priority process ensures that the monitoring process obtains a CPU in order collect resource utilization data even when the computer system is in a starvation mode.
-
Citations
29 Claims
-
1. A method for monitoring a particular node of a first cluster database system that comprises a database, a database server that executes on a plurality of nodes that includes the particular node, wherein each node of the plurality of nodes executes one or more database instances, wherein the database of the first cluster database system is shared by multiple database instances of the database server, the method comprising:
-
establishing, on the particular node of the plurality of nodes of the first cluster database system, a monitoring process; the monitoring process collecting a plurality of values, wherein each value of the plurality of values indicates a measure of utilization or responsiveness of a first resource of the particular node at a different instant; determining that a database instance that was executing on a node of a second cluster database system was evicted from the second cluster database system; in response to determining that the database instance was evicted from the second cluster database system, storing eviction data that indicates a measure of utilization or responsiveness of a second resource of the node of the second cluster database system; wherein the eviction data reflects the utilization or responsiveness that the second resource was experiencing when the database instance was evicted from the second cluster database system; determining, based on (a) one or more values of the plurality of values and (b) the eviction data, a probability that the utilization or responsiveness of the first resource indicated by the one or more values will lead to performance problems for the particular node; and based on the probability, performing one or more specified actions; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory machine-readable storage media storing instructions for monitoring a particular node of a first cluster database system that comprises a database, a database server that executes on a plurality of nodes that includes the particular node, wherein each node of the plurality of nodes executes one or more database instances, wherein the database of the first cluster database system is shared by multiple database instances of the database server, wherein the instructions, when executed by one or more processors, cause:
-
establishing, on the particular node of the plurality of nodes of the first cluster database system, a monitoring process; the monitoring process collecting a plurality of values, wherein each value of the plurality of values indicates a measure of utilization or responsiveness of a first resource of the particular node at a different instant; determining that a database instance that was executing on a node of a second cluster database system was evicted from the second cluster database system; in response to determining that the database instance was evicted from the second cluster database system, storing eviction data that indicates a measure of utilization or responsiveness of a second resource of the node of the second cluster database system; wherein the eviction data reflects the utilization or responsiveness that the second resource was experiencing when the database instance was evicted from the second cluster database system; determining, based on (a) one or more values of the plurality of values and (b) the eviction data, a probability that the utilization or responsiveness of the first resource indicated by the one or more values will lead to performance problems for the particular node; and based on the probability, performing one or more specified actions. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification