Replaying processing of a restarted application
First Claim
Patent Images
1. A computer-implemented method, comprising:
- establishing a predetermined checkpoint and storing a log of duplicate read data in association with the predetermined checkpoint during a running of an application that is processing at least one data set, the duplicate read data including an image of all data retrieved from the at least one data set in response to a plurality of data reads made by the application before the predetermined checkpoint;
identifying a first failure of the application;
restarting the application and performing a first replay of the application in response to the first failure;
identifying a second failure of the application during the first replay; and
restarting the application and performing a second replay of the application, utilizing the predetermined checkpoint and the log of duplicate read data, where the second failure of the application results in stale read data within the log that is skipped over, utilizing a filter.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method according to one embodiment includes establishing a predetermined checkpoint and storing duplicate read data in association with the predetermined checkpoint during a running of an application that is processing at least one data set, identifying a failure of the application, restarting the application in response to the failure, and enabling a replay of the processing of the at least one data set by the restarted application, utilizing the predetermined checkpoint and the duplicate read data.
42 Citations
24 Claims
-
1. A computer-implemented method, comprising:
-
establishing a predetermined checkpoint and storing a log of duplicate read data in association with the predetermined checkpoint during a running of an application that is processing at least one data set, the duplicate read data including an image of all data retrieved from the at least one data set in response to a plurality of data reads made by the application before the predetermined checkpoint; identifying a first failure of the application; restarting the application and performing a first replay of the application in response to the first failure; identifying a second failure of the application during the first replay; and restarting the application and performing a second replay of the application, utilizing the predetermined checkpoint and the log of duplicate read data, where the second failure of the application results in stale read data within the log that is skipped over, utilizing a filter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product for enabling a replay of a processing of at least one data set by a restarted application, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to perform a method comprising:
-
establishing, utilizing the processor, a predetermined checkpoint and storing, utilizing the processor, a log of duplicate read data in association with the predetermined checkpoint during a running of an application that is processing at least one data set, the duplicate read data including an image of all data retrieved from the at least one data set in response to a plurality of data reads made by the application before the predetermined checkpoint; identifying, utilizing the processor, a first failure of the application; restarting, utilizing the processor, the application and performing a first replay of the application in response to the first failure; identifying, utilizing the processor, a second failure of the application during the first replay; and restarting the application, utilizing the processor, and performing a second replay of the application, utilizing the predetermined checkpoint and the log of duplicate read data, where the second failure of the application results in stale read data within the log that is skipped over, utilizing a filter. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system, comprising:
-
a processor; and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to; establish a predetermined checkpoint and storing a log of duplicate read data in association with the predetermined checkpoint during a running of an application that is processing at least one data set, the duplicate read data including an image of all data retrieved from the at least one data set in response to a plurality of data reads made by the application before the predetermined checkpoint; identify a first failure of the application; restart the application and perform a first replay of the application in response to the first failure; identify a second failure of the application during the first replay; and restart the application and perform a second replay of the application, utilizing the predetermined checkpoint and the log of duplicate read data, where the second failure of the application results in stale read data within the log that is skipped over, utilizing a filter.
-
-
24. A computer-implemented method, comprising:
-
identifying a first failure of an application, where the application is processing at least one data set at a first partition of a system; restarting the application at a second partition of the system in response to the first failure; identifying a first plurality of data reads from the restarted application at the second partition of the system, where the first plurality of data reads occur before a predetermined checkpoint; in response to the first plurality of data reads, retrieving logged data from the first partition of the system and returned the logged data to the restarted application at the second partition of the system; performing a first replay of the application at the second partition of the system in response to the first failure; identifying a second failure of the application during the first replay; and restarting the application and performing a second replay of the application, utilizing the predetermined checkpoint and the logged data, where the second failure of the application results in stale read data within the logged data that is skipped over, utilizing a filter.
-
Specification