Obviation of Recovery of Data Store Consistency for Application I/O Errors
First Claim
1. A method for handling an error of an input/output (I/O) operation, the method comprising:
- a computer intercepting the error, wherein the I/O operation is via a first path to a shared storage system of data, wherein the intercepting comprises the computer preventing execution of code of the application in response to the error;
the computer completing the I/O operation via a second path to the shared storage;
the computer creating a checkpoint image of a set of processes that the application comprises, wherein the checkpoint image enables resumption of execution of the application, wherein the resumption of execution comprises starting execution at a point in the application subsequent to completion of the I/O operation; and
the computer transferring the checkpoint image to a second computer to enable the second computer to resume execution of the application via the checkpoint image.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments comprise a plurality of computing devices that dynamically intercept process application I/O errors. Various embodiments comprise two or more computing devices, such as two or more servers, each having access to a shared data storage system. An application may be executing on the first computing device and performing an I/O operation when an I/O error occurs. The first computing device may intercept the I/O error, rather than passing it back to the application, and prevent the error from affecting the application. The first computing device may complete the I/O operation, and any other pending I/O operations not written to disk, via an alternate path, perform a checkpoint operation to capture the state of the set of processes associated with the application, and transfer the checkpoint image to the second computing device. The second computing device may resume operation of the application from the checkpoint image.
71 Citations
24 Claims
-
1. A method for handling an error of an input/output (I/O) operation, the method comprising:
-
a computer intercepting the error, wherein the I/O operation is via a first path to a shared storage system of data, wherein the intercepting comprises the computer preventing execution of code of the application in response to the error; the computer completing the I/O operation via a second path to the shared storage; the computer creating a checkpoint image of a set of processes that the application comprises, wherein the checkpoint image enables resumption of execution of the application, wherein the resumption of execution comprises starting execution at a point in the application subsequent to completion of the I/O operation; and the computer transferring the checkpoint image to a second computer to enable the second computer to resume execution of the application via the checkpoint image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus, comprising:
-
input/output (I/O) hardware coupled to a storage device, wherein the I/O hardware enables performance of I/O operations for an application; an application module to execute the application and generate a state of a set of processes for the application, wherein the application module is configured to perform an I/O operation for the application, wherein the I/O operation is via a first path to data of the storage device; an error module to intercept an error of the I/O operation and prevent the error from causing the application module to execute code in response to receiving the error, wherein the application module is configured to complete the I/O operation via a second path to data of the storage device in response to the error module intercepting the error; and a checkpoint module to create a checkpoint image of the state in response to completion of the I/O operation via the second path, wherein the checkpoint module is configured to transfer the checkpoint image to a second apparatus, wherein the checkpoint module is configured to create the checkpoint image in a manner which enables the second apparatus to resume execution of the application by starting execution at a point in the application subsequent to the completion of the I/O operation. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A server for processing an error of an input/output (I/O) operation, the server comprising:
-
one or more processors, one or more computer-readable memories and one or more computer-readable, tangible storage devices; program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to enable an application of the server to perform the I/O operation with a storage subsystem via a first storage connection, wherein the application of the server comprises a set of processes; program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to detect the error of the set and prevent the error from causing one or more processes of the set from executing code in response to the error; program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to complete the I/O operation via a second storage connection and create a checkpoint image of the set upon the completion of the I/O operation, wherein the checkpoint image enables resumption of execution of the application by starting execution at a point in the application subsequent to the completion of the I/O operation; and program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to transfer the checkpoint image to a second server to enable the second server to resume operation of the application, wherein the transference is via at least one of a shared memory connection and a network interface. - View Dependent Claims (18, 19, 20)
-
-
21. A computer program product for handling an error of an input/output (I/O) operation, computer program product comprising:
-
one or more computer-readable, tangible storage devices; program instructions, stored on at least one of the one or more storage devices, to intercept the error of the I/O operation of an application, wherein the I/O operation is via a first path from a first computing device to a shared storage system of data; program instructions, stored on at least one of the one or more storage devices, to complete the I/O operation via a second path to the shared storage system of data; program instructions, stored on at least one of the one or more storage devices, to create a checkpoint image of the application, wherein the checkpoint image comprises state of the application, wherein the program instructions to intercept the error comprise program instructions to prevent the first computing device from executing code of the application in response to the error, wherein the checkpoint image enables resuming execution of the application, wherein the resuming the execution comprises obviating initialization of the application; and program instructions, stored on at least one of the one or more storage devices, to enable the generation of the checkpoint image in a second computing device. - View Dependent Claims (22, 23, 24)
-
Specification