Cluster neighborhood event advisory
First Claim
1. A computer-implemented method comprising steps of:
- a first server instance, in a cluster of server instances, detecting that a problem event, from a specified set of problem events, is occurring relative to an operation that the first server instance is attempting to perform;
in response to detecting that the problem event is occurring, the first server instance broadcasting, to one or more other server instances in the cluster, first information that indicates characteristics of the problem event;
the first server instance receiving, from a second server instance in the cluster, second information that indicates characteristics of the problem event; and
based at least in part on the second information received from the second server instance, the first server instance selecting an action from a set of actions; and
the first server instance performing the action;
wherein the set of actions comprises (a) a first action that includes terminating execution of the first server instance and (b) a second action that includes waiting for a specified amount of time, but excludes terminating execution of the first server instance;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Database server instances in a database server cluster broadcast, to other instances in the cluster, information concerning certain problem events. Because each server instance is aware of problems that other server instances are experiencing, each server instance is enabled to make more intelligent decisions regarding the actions that it should perform in response to the problems that the server instance is experiencing. Instead of terminating itself, a server instance might opt to wait for a longer amount of time for an operation to complete. The server instance may do so due to the server instance having received information that indicates that other server instances are experiencing similar problems. Whenever the information received from other server instances makes it appear that a problem is unlikely to be solved in the cluster as a whole by terminating a server instance, that server instance may continue to wait instead of terminating itself.
-
Citations
20 Claims
-
1. A computer-implemented method comprising steps of:
-
a first server instance, in a cluster of server instances, detecting that a problem event, from a specified set of problem events, is occurring relative to an operation that the first server instance is attempting to perform; in response to detecting that the problem event is occurring, the first server instance broadcasting, to one or more other server instances in the cluster, first information that indicates characteristics of the problem event; the first server instance receiving, from a second server instance in the cluster, second information that indicates characteristics of the problem event; and based at least in part on the second information received from the second server instance, the first server instance selecting an action from a set of actions; and the first server instance performing the action; wherein the set of actions comprises (a) a first action that includes terminating execution of the first server instance and (b) a second action that includes waiting for a specified amount of time, but excludes terminating execution of the first server instance; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method comprising steps of:
-
receiving, over a network from a first server, a message that indicates a first problem resolution action that the first server plans to perform; in response to receiving the message from the first server, storing a record that indicates the first problem resolution action; detecting a problem incident; in response to detecting the problem incident, and based on information contained in the record, waiting for the first server to perform the first problem resolution action indicated in the record; after waiting for the first server to perform the first problem resolution action, determining whether the problem incident has been solved by the first server'"'"'s performance of the first problem resolution action; and performing a second problem resolution action only in response to determining that the first server'"'"'s performance of the first problem resolution action did not solve the problem incident; wherein the steps are performed by one or more computing devices. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
-
11. A volatile or non-volatile computer-readable storage medium storing instructions which, when executed by one or more processors, cause said one or more processors to perform steps comprising:
-
a first server instance, in a cluster of server instances, detecting that a problem event, from a specified set of problem events, is occurring relative to an operation that the first server instance is attempting to perform; in response to detecting that the problem event is occurring, the first server instance broadcasting, to one or more other server instances in the cluster, first information that indicates characteristics of the problem event; the first server instance receiving, from a second server instance in the cluster, second information that indicates characteristics of the problem event; and based at least in part on the second information received from the second server instance, the first server instance selecting an action from a set of actions; and the first server instance performing the action; wherein the set of actions comprises (a) a first action that includes terminating execution of the first server instance and (b) a second action that includes waiting for a specified amount of time, but excludes terminating execution of the first server instance. - View Dependent Claims (12, 13)
-
-
14. A volatile or non-volatile computer-readable storage medium storing instructions which, when executed by one or more processors, cause said one or more processors to perform steps comprising:
-
receiving, over a network from a first server, a message that indicates a first problem resolution action that the first server plans to perform; in response to receiving the message from the first server, storing a record that indicates the first problem resolution action; detecting a problem incident; in response to detecting the problem incident, and based on information contained in the record, waiting for the first server to perform the first problem resolution action indicated in the record; after waiting for the first server to perform the first problem resolution action, determining whether the problem incident has been solved by the first server'"'"'s performance of the first problem resolution action; and performing a second problem resolution action only in response to determining that the first server'"'"'s performance of the first problem resolution action did not solve the problem incident. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification