Fault isolation using code paths
First Claim
1. A method comprising:
- receiving a first request to perform a first operation that may be performed by a first code path within a server;
detecting occurrence of a fault during performance of the first operation by the first code path;
identifying a first entity involved in the first operation that experienced the fault;
wherein the first entity is a first type of entity, wherein the first entity is one of;
a SQL plan step, a SQL plan, a SQL statement, or a storage location targeted by an I/O request;
in response to the first entity being involved in the first operation that experienced the fault, determining that quarantine criteria for the first code path has been satisfied relative to the first entity;
in response to determining that quarantine criteria for the first code path has been satisfied relative to the first entity, storing data that indicates that the first entity is quarantined relative to the first code path while continuing to allow the first code path to be executed for operations involving other entities of the first type of entity;
after storing data that indicates that the first entity is quarantined relative to the first code path, receiving a second request to perform a second operation;
in response to the second request, determining whether the first entity is involved in the second operation;
in response to determining that the first entity is involved in the second operation and that the first entity is quarantined relative to the first code path, responding to the second request without using the first code path;
wherein the method is performed by one or more computing devices.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for isolating faults in a software program by providing at least two code paths that are capable of performing the same operation. When a fault occurs while the one of the code paths is being used to perform an operation, data that indicates the circumstances under which the fault occurred is stored. For example, a fault-recording mechanism may store data that indicates the entities that were involved in the failed operation. Because they were involved in an operation that experienced a fault, one or more of those entities may be “quarantined”. When subsequent requests arrive to perform the operation, a check may be performed to determine whether the requested operation involves any of the quarantined entities. If the requested operation involves a quarantined entity, a different code path is used to perform the operation, rather than the code path from which the entity is quarantined.
-
Citations
29 Claims
-
1. A method comprising:
-
receiving a first request to perform a first operation that may be performed by a first code path within a server; detecting occurrence of a fault during performance of the first operation by the first code path; identifying a first entity involved in the first operation that experienced the fault; wherein the first entity is a first type of entity, wherein the first entity is one of;
a SQL plan step, a SQL plan, a SQL statement, or a storage location targeted by an I/O request;in response to the first entity being involved in the first operation that experienced the fault, determining that quarantine criteria for the first code path has been satisfied relative to the first entity; in response to determining that quarantine criteria for the first code path has been satisfied relative to the first entity, storing data that indicates that the first entity is quarantined relative to the first code path while continuing to allow the first code path to be executed for operations involving other entities of the first type of entity; after storing data that indicates that the first entity is quarantined relative to the first code path, receiving a second request to perform a second operation; in response to the second request, determining whether the first entity is involved in the second operation; in response to determining that the first entity is involved in the second operation and that the first entity is quarantined relative to the first code path, responding to the second request without using the first code path; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
receiving a first request to perform a first operation that may be performed by a first code path within a server; detecting occurrence of a fault during performance of the first operation by the first code path; identifying a first entity involved in the first operation that experienced the fault; in response to the first entity being involved in the first operation that experienced the fault, determining that quarantine criteria for the first code path has been satisfied relative to the first entity; in response to determining that quarantine criteria for the first code path has been satisfied relative to the first entity, storing data that indicates that the first entity is quarantined relative to the first code path; after storing data that indicates that the first entity is quarantined relative to the first code path, receiving a second request to perform a second operation; in response to the second request, determining whether the first entity is involved in the second operation; in response to determining that the first entity is involved in the second operation and that the first entity is quarantined relative to the first code path, responding to the second request without using the first code path; wherein the first entity has a hierarchical relationship relative to a plurality of other entities; wherein the quarantine criteria is satisfied relative to the first entity in response to a particular number of said other entities being quarantined relative to the first code path; wherein the method is performed by one or more computing devices. - View Dependent Claims (17)
-
-
18. A non-transitory computer-readable storage that stores instructions which, when executed by one or more processors, causes the one or more processors to perform steps comprising:
-
receiving a first request to perform a first operation that may be performed by any one of a plurality of code paths within a server; selecting a first code path, of the plurality of code paths, to perform the first operation; detecting occurrence of a fault during performance of the first operation by the first code path; identifying a first entity involved in the first operation that experienced the fault; wherein the first entity is a first type of entity, wherein the first entity is one of;
a SQL plan step, a SQL plan, a SQL statement, or a storage location targeted by an I/O request;in response to the first entity being involved in the first operation that experienced the fault, determining that quarantine criteria for the first code path has been satisfied relative to the first entity; in response to determining that quarantine criteria for the first code path has been satisfied relative to the first entity, storing data that indicates that the first entity is quarantined relative to the first code path while continuing to allow the first code path to be executed for operation involving other entity of the first type of entity; after storing data that indicates that the first entity is quarantined relative to the first code path, receiving a second request to perform a second operation that can be performed by any one of the plurality of code paths; in response to the second request, determining whether the first entity is involved in the second operation; in response to determining that the first entity is involved in the second operation and that the first entity is quarantined relative to the first code path, selecting a second code path of the plurality of code paths to perform the second operation; wherein the second code path is different from the first code path. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A non-transitory computer-readable storage that stores instructions which, when executed by one or more processors, causes the one or more processors to perform steps comprising:
-
receiving a first request to perform a first operation that may be performed by any one of a plurality of code paths within a server; selecting a first code path, of the plurality of code paths, to perform the first operation; detecting occurrence of a fault during performance of the first operation by the first code path; identifying a first entity involved in the first operation that experienced the fault; in response to the first entity being involved in the first operation that experienced the fault, determining that quarantine criteria for the first code path has been satisfied relative to the first entity; in response to determining that quarantine criteria for the first code path has been satisfied relative to the first entity, storing data that indicates that the first entity is quarantined relative to the first code path; after storing data that indicates that the first entity is quarantined relative to the first code path, receiving a second request to perform a second operation that can be performed by any one of the plurality of code paths; in response to the second request, determining whether the first entity is involved in the second operation; in response to determining that the first entity is involved in the second operation and that the first entity is quarantined relative to the first code path, selecting a second code path of the plurality of code paths to perform the second operation; wherein the second code path is different from the first code path; wherein the first entity has a hierarchical relationship relative to a plurality of other entities; and wherein the quarantine criteria is satisfied relative to the first entity in response to a particular number of said other entities being quarantined relative to the first code path. - View Dependent Claims (28)
-
-
29. A storage server comprising:
-
storage for storing data that belongs to databases managed by one or more database servers; a first code path for handling I/O requests from the one or more database servers; a second code path for handling I/O requests from the one or more database servers; a code path selector configured to determine whether entities involved in I/O requests are quarantined from the first code path and, when an entity involved in a particular I/O request is quarantined from the first code path, causing the particular I/O request to be handled by the second code path; wherein the entity is a first type of entity, wherein the entity is one of a SQL plan step, a SQL plan, a SQL statement, or a storage location targeted by the particular I/O request; a fault handler configured to store information about which entities are involved in faults when faults occur while the first code path is being used to service I/O requests; and a quarantine handler configured to determine, based on which entities are involved in faults, whether quarantine criteria is satisfied for those entities, and to store data that indicates that the entities are quarantined in response to determining that the quarantine criteria is satisfied while continuing to allow the first code path to be executed for operations involving other non-quarantined entities of the first type of entity.
-
Specification