Cluster neighborhood event advisory

US 8,117,488 B2
Filed: 10/23/2009
Issued: 02/14/2012
Est. Priority Date: 10/23/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising steps of:

a first server instance, in a cluster of server instances, detecting that a problem event, from a specified set of problem events, is occurring relative to an operation that the first server instance is attempting to perform;

in response to detecting that the problem event is occurring, the first server instance broadcasting, to one or more other server instances in the cluster, first information that indicates characteristics of the problem event;

the first server instance receiving, from a second server instance in the cluster, second information that indicates characteristics of the problem event; and

based at least in part on the second information received from the second server instance, the first server instance selecting an action from a set of actions; and

the first server instance performing the action;

wherein the set of actions comprises (a) a first action that includes terminating execution of the first server instance and (b) a second action that includes waiting for a specified amount of time, but excludes terminating execution of the first server instance;

wherein the method is performed by one or more computing devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Database server instances in a database server cluster broadcast, to other instances in the cluster, information concerning certain problem events. Because each server instance is aware of problems that other server instances are experiencing, each server instance is enabled to make more intelligent decisions regarding the actions that it should perform in response to the problems that the server instance is experiencing. Instead of terminating itself, a server instance might opt to wait for a longer amount of time for an operation to complete. The server instance may do so due to the server instance having received information that indicates that other server instances are experiencing similar problems. Whenever the information received from other server instances makes it appear that a problem is unlikely to be solved in the cluster as a whole by terminating a server instance, that server instance may continue to wait instead of terminating itself.

Citations

20 Claims

1. A computer-implemented method comprising steps of:
- a first server instance, in a cluster of server instances, detecting that a problem event, from a specified set of problem events, is occurring relative to an operation that the first server instance is attempting to perform;
  
  in response to detecting that the problem event is occurring, the first server instance broadcasting, to one or more other server instances in the cluster, first information that indicates characteristics of the problem event;
  
  the first server instance receiving, from a second server instance in the cluster, second information that indicates characteristics of the problem event; and
  
  based at least in part on the second information received from the second server instance, the first server instance selecting an action from a set of actions; and
  
  the first server instance performing the action;
  
  wherein the set of actions comprises (a) a first action that includes terminating execution of the first server instance and (b) a second action that includes waiting for a specified amount of time, but excludes terminating execution of the first server instance;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein the first action comprises the first server instance waiting for a specified amount of time, and further comprising:
    - the first server instance, after waiting for the specified amount of time, determining whether the problem event has been solved; and
      
      in response to the first server instance determining that the problem event has not been solved after waiting for the specified amount of time, the first server instance terminating itself.
  - 3. The method of claim 1, wherein the first action comprises the first server instance waiting for a specified amount of time, and further comprising:
    - the first server instance, after waiting for the specified amount of time, determining whether the problem event has been solved; and
      
      in response to the first server instance determining that the problem event has been solved after waiting for the specified amount of time, the first server instance continuing execution without performing the first action in response to the problem event.

4. A computer-implemented method comprising steps of:
- receiving, over a network from a first server, a message that indicates a first problem resolution action that the first server plans to perform;
  
  in response to receiving the message from the first server, storing a record that indicates the first problem resolution action;
  
  detecting a problem incident;
  
  in response to detecting the problem incident, and based on information contained in the record, waiting for the first server to perform the first problem resolution action indicated in the record;
  
  after waiting for the first server to perform the first problem resolution action, determining whether the problem incident has been solved by the first server'"'"'s performance of the first problem resolution action; and
  
  performing a second problem resolution action only in response to determining that the first server'"'"'s performance of the first problem resolution action did not solve the problem incident;
  
  wherein the steps are performed by one or more computing devices.
- View Dependent Claims (5, 6, 7, 8, 9, 10)
- - 5. The method of claim 4, wherein performing the second problem resolution action comprises terminating execution of a second server that received the message from the first server over the network.
  - 6. The method of claim 4, wherein the first problem resolution action involves the first server terminating its own execution.
  - 7. The method of claim 4, wherein waiting for the first server to perform the first problem resolution action indicated in the record comprises waiting for an amount of time indicated within the record.
  - 8. The method of claim 4, wherein the step of waiting for the first server to perform the first problem resolution action is performed in response to determining that the record indicates characteristics that match characteristics of the detected problem incident.
  - 9. The method of claim 4, wherein the step of waiting for the first server to perform the first problem resolution action is additionally performed in response to determining that the record indicates a time-of-day range into which a time-of-day at which the detected problem incident occurred falls.
  - 10. The method of claim 4, wherein the step of waiting for the first server to perform the first problem resolution action is additionally performed in response to determining that the record indicates at least one of:
    - (a) a data structure that a recipient of the message is waiting to access, (b) a device that the recipient of the message is waiting to access, or (c) an error message of an error experienced by the recipient of the message.

11. A volatile or non-volatile computer-readable storage medium storing instructions which, when executed by one or more processors, cause said one or more processors to perform steps comprising:
- a first server instance, in a cluster of server instances, detecting that a problem event, from a specified set of problem events, is occurring relative to an operation that the first server instance is attempting to perform;
  
  in response to detecting that the problem event is occurring, the first server instance broadcasting, to one or more other server instances in the cluster, first information that indicates characteristics of the problem event;
  
  the first server instance receiving, from a second server instance in the cluster, second information that indicates characteristics of the problem event; and
  
  based at least in part on the second information received from the second server instance, the first server instance selecting an action from a set of actions; and
  
  the first server instance performing the action;
  
  wherein the set of actions comprises (a) a first action that includes terminating execution of the first server instance and (b) a second action that includes waiting for a specified amount of time, but excludes terminating execution of the first server instance.
- View Dependent Claims (12, 13)
- - 12. The computer-readable storage medium of claim 11, wherein the first action comprises the first server instance waiting for a specified amount of time, and further comprising:
    - the first server instance, after waiting for the specified amount of time, determining whether the problem event has been solved; and
      
      in response to the first server instance determining that the problem event has not been solved after waiting for the specified amount of time, the first server instance terminating itself.
  - 13. The computer-readable storage medium of claim 11, wherein the first action comprises the first server instance waiting for a specified amount of time, and further comprising:
    - the first server instance, after waiting for the specified amount of time, determining whether the problem event has been solved; and
      
      in response to the first server instance determining that the problem event has been solved after waiting for the specified amount of time, the first server instance continuing execution without performing the first action in response to the problem event.

14. A volatile or non-volatile computer-readable storage medium storing instructions which, when executed by one or more processors, cause said one or more processors to perform steps comprising:
- receiving, over a network from a first server, a message that indicates a first problem resolution action that the first server plans to perform;
  
  in response to receiving the message from the first server, storing a record that indicates the first problem resolution action;
  
  detecting a problem incident;
  
  in response to detecting the problem incident, and based on information contained in the record, waiting for the first server to perform the first problem resolution action indicated in the record;
  
  after waiting for the first server to perform the first problem resolution action, determining whether the problem incident has been solved by the first server'"'"'s performance of the first problem resolution action; and
  
  performing a second problem resolution action only in response to determining that the first server'"'"'s performance of the first problem resolution action did not solve the problem incident.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computer-readable storage medium of claim 14, wherein performing the second problem resolution action comprises terminating execution of a second server that received the message from the first server over the network.
  - 16. The computer-readable storage medium of claim 14, wherein the first problem resolution action involves the first server terminating its own execution.
  - 17. The computer-readable storage medium of claim 14, wherein waiting for the first server to perform the first problem resolution action indicated in the record comprises waiting for an amount of time indicated within the record.
  - 18. The computer-readable storage medium of claim 14, wherein the step of waiting for the first server to perform the first problem resolution action is performed in response to determining that the record indicates characteristics that match characteristics of the detected problem incident.
  - 19. The computer-readable storage medium of claim 14, wherein the step of waiting for the first server to perform the first problem resolution action is additionally performed in response to determining that the record indicates a time-of-day range into which a time-of-day at which the detected problem incident occurred falls.
  - 20. The computer-readable storage medium of claim 14, wherein the step of waiting for the first server to perform the first problem resolution action is additionally performed in response to determining that the record indicates at least one of:
    - (a) a data structure that a recipient of the message is waiting to access, (b) a device that the recipient of the message is waiting to access, or (c) an error message of an error experienced by the recipient of the message.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Chan, Wilson, Pruscino, Angelo, Wang, Tak Fung
Primary Examiner(s)
Schell, Joseph

Application Number

US12/605,248
Publication Number

US 20110099412A1
Time in Patent Office

844 Days
Field of Search

714/4.11, 714/4.2, 714/4.21, 714/4.3, 714/4.4
US Class Current

714/4.3
CPC Class Codes

G06F 11/0709   in a distributed system con...

G06F 11/0766   Error or fault reporting or...

G06F 9/542   Event management; Broadcast...

Cluster neighborhood event advisory

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Cluster neighborhood event advisory

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links