EFFICIENT FAILURE DETECTION FOR LONG RUNNING DATA TRANSFER JOBS
First Claim
1. A computer implemented method of handling errors during a data transfer, comprising:
- (a) for a first task that is configured to transfer a plurality of data records from a source to a destination storage system and when a specific record of such first task fails to be transferred to the destination storage system, causing the first task to retry transferring of the specific record to the destination storage system so that such retry is only performed a predefined number of times; and
(b) when the first task has been caused to retry transferring of a specific record of the first task more than the predefined number of times, storing the specific record in an error log for a later transfer attempt.
9 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are methods and apparatus for error handling within jobs that utilize a plurality of tasks for data transfer of individual data records to a storage destination. For each task, one or more failed records may be logged to a file for later insertion. If a high percentage of a task'"'"'s output (e.g., writes to another data storage system) is determined to be failing, the task short-circuits itself. Each task is also configured to perform checkpoint logging as the task completes work. If the entire job later short-circuits and is to be restarted, the restarted job only repeats a minimal amount of previously completed work for the tasks which have not already completed their data insertions. Together, these techniques can ensure that in the face of periodic failures, the job completes long-running job in a minimal time with minimal effects.
-
Citations
21 Claims
-
1. A computer implemented method of handling errors during a data transfer, comprising:
-
(a) for a first task that is configured to transfer a plurality of data records from a source to a destination storage system and when a specific record of such first task fails to be transferred to the destination storage system, causing the first task to retry transferring of the specific record to the destination storage system so that such retry is only performed a predefined number of times; and (b) when the first task has been caused to retry transferring of a specific record of the first task more than the predefined number of times, storing the specific record in an error log for a later transfer attempt. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus comprising at least a processor and a memory, wherein the processor and/or memory are configured to perform the following operations:
-
(a) for a first task that is configured to transfer a plurality of data records from a source to a destination storage system and when a specific record of such first task fails to be transferred to the destination storage system, causing the first task to retry transferring of the specific record to the destination storage system so that such retry is only performed a predefined number of times; and (b) when the first task has been caused to retry transferring of a specific record of the first task more than the predefined number of times, storing the specific record in an error log for a later transfer attempt. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for partitioning a native table in a database, comprising at least one computer-readable medium having computer instructions stored therein which are operable to cause a computer device to perform the following operations:
-
(a) for a first task that is configured to transfer a plurality of data records from a source to a destination storage system and when a specific record of such first task fails to be transferred to the destination storage system, causing the first task to retry transferring of the specific record to the destination storage system so that such retry is only performed a predefined number of times; and (b) when the first task has been caused to retry transferring of a specific record of the first task more than the predefined number of times, storing the specific record in an error log for a later transfer attempt. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification