Transparent checkpointing and process migration in a distributed system

US 20070277056A1
Filed: 11/17/2004
Published: 11/29/2007
Est. Priority Date: 11/17/2003
Status: Active Grant

First Claim

Patent Images

1-46. -46. (canceled)

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A distributed system for creating a checkpoint for a plurality of processes running on the distributed system. The distributed system includes a plurality of compute nodes with an operating system executing on each compute node. A checkpoint library resides at the user level on each of the compute nodes, and the checkpoint library is transparent to the operating system residing on the same compute node and to the other compute nodes. Each checkpoint library uses a windowed messaging logging protocol for checkpointing of the distributed system. Processes participating in a distributed computation on the distributed system may be migrated from one compute node to another compute node in the distributed system by re-mapping of hardware addresses using the checkpoint library.

Citations

109 Claims

1-46. -46. (canceled)

47. A compute node capable of operating as part of a distributed system, comprising:
- memory; and
  
  a processor configured to access the memory to perform a process in a distributed computation running on the distributed system, the processor being further configured to record a first set of memory locations modified by the processor during a first checkpoint interval, and create a checkpoint from the contents of the first set of memory locations, while recording a second set of memory locations modified by the processor during a second checkpoint interval.
- View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
- - 48. The compute node of claim 47 wherein the processor is further configured to write protect the first set of memory locations before modifying the second set of memory locations.
  - 49. The compute node of claim 48 wherein the processor is further configured to suspend the process between the first and second checkpoint intervals, the processor being further configured to write protect the first set of memory locations while the process is suspended.
  - 50. The compute node of claim 49 wherein the processor is further configured to enter a barrier following the completion of write protecting the first set of memory locations and exit the barrier before resuming the process during the second checkpoint interval.
  - 51. The compute node of claim 48 wherein the processor is further configured to create the checkpoint by storing the contents of the first set of memory locations, the processor being further configured to remove the write protection for a memory location from the first set when the processor needs to modify the memory location during the second checkpoint interval after the contents of the memory location has been stored.
  - 52. The compute node of claim 51 wherein the processor is further configured to create the checkpoint by storing the contents of the first set of memory locations in a certain order, the processor further being configured to store the contents of a memory location from the first set earlier than it would otherwise be stored when the processor needs to modify the memory location during the second checkpoint interval.
  - 53. The compute node of claim 52 wherein the processor is further configured to remove the write protection for the memory location after the contents of the memory location has been stored.
  - 54. The compute node of claim 47 further comprising a checkpoint file, wherein the processor is further configured to create the checkpoint by storing the contents of the first set of memory locations to the checkpoint file.
  - 55. The compute node of claim 54 wherein the processor is further configured to remove the record of a memory location from the first set after the contents from the memory location is stored in the checkpoint file.
  - 56. The compute node of claim 47 wherein the processor is further configured to configured to create the checkpoint by storing the contents of the first set of memory locations to non-volatile storage.
  - 57. The compute node of claim 47 wherein the processor is further configured to store in the memory a copy of each message output from the compute node during the process until an acknowledgement is received, and output each message copied in the memory that does not receive an acknowledgement, and wherein the processor is further configured to receive messages during the process, and output an acknowledgement for each message received, the processor being further configured to recognize and discard duplicate messages received by the compute node, and for each duplicate message, output an acknowledgement.

58. A compute node capable of operating as part of a distributed system, comprising:
- memory; and
  
  a processor configured to perform a process in a distributed computation running on the distributed system, the processor being further configured to store in the memory a copy of each message output from the compute node until an acknowledgement is received, the processor being further configured to create a checkpoint, and if a subsequent failure occurs, roll the compute node back to the checkpoint and output each message copied in the memory that does not receive an acknowledgement after the compute node is rolled back to the checkpoint.
- View Dependent Claims (59)
- - 59. The compute node of claim 58 wherein the processor is further configured to receive messages, and output an acknowledgement for each message received, the processor being further configured to recognize and discard duplicate messages received by the compute node, and for each duplicate message, output an acknowledgement.

60. A compute node capable of operating as part of a distributed system, comprising:
- a processor configured to perform a process in a distributed computation running on the distributed system, the processor being further configured to receive messages, and output an acknowledgement for each message received, the processor being further configured to create a checkpoint, and if a subsequent failure occurs, roll the compute node back to the checkpoint, recognize and discard duplicate messages received by the compute node after the compute node is rolled back to the checkpoint, and for each duplicate message, output an acknowledgement.
- View Dependent Claims (61)
- - 61. The compute node of claim 60 further comprising memory, wherein the processor is further configured to store in the memory a copy of each message output from the compute node until an acknowledgement is received, the processor being further configured to output each message copied in the memory that does not receive an acknowledgement.

62. A compute node capable of operating as part of a distributed system, comprising:
- a processor configured to perform a process in a distributed computation running on the distributed system, the processor being further configured to create a checkpoint for the process, and in response to a preemptive scheduling request, store the checkpoint to non-volatile memory and halt the process.
- View Dependent Claims (63, 64, 65)
- - 63. The compute node of claim 62 wherein the processor is further configured to use the stored checkpoint to resume the process.
  - 64. The compute node of claim 62 wherein the processor is further configured to perform a process in a second distributed computation running on the distributed system after the previous process is halted.
  - 65. The compute node of claim 62 wherein the processor is further configured to perform a second process in the distributed computation previously being performed by another compute node in the distributed system, the processor being further configured to resume the second process from the checkpoint last taken by said another compute node for the second process.

66. Computer readable media embodying a program of instructions executable by a processor to perform a method of creating a checkpoint for a process in a distributed computation running on a distributed system, the method comprising:
- recording a first set of memory locations modified by the process during a first checkpoint interval; and
  
  creating a checkpoint from the contents of the first set of memory locations, while recording a second set of memory locations modified by the process during a second checkpoint interval.
- View Dependent Claims (67, 68, 69, 70, 71, 72, 73, 74, 75, 76)
- - 67. The computer readable media of claim 66 wherein the method further comprises write protecting the first set of memory locations before the process modifies the second set of memory locations.
  - 68. The computer readable media of claim 67 wherein the method further comprises suspending the process between the first and second checkpoint intervals, and wherein the first set of memory locations are write protected while the process is suspended.
  - 69. The computer readable media of claim 67 wherein the method further comprises entering a barrier following the completion of write protecting the first set of memory locations and exiting the barrier before resuming the process during the second checkpoint interval.
  - 70. The computer readable media of claim 67 wherein the checkpoint is created by storing the contents of the first set of memory locations, the method further comprising removing the write protection for a memory location from the first set when the process needs to modify the memory location during the second checkpoint interval after the contents of the memory location has been stored.
  - 71. The computer readable media of claim 70 wherein the checkpoint is created by storing the contents of the first set of memory locations in a certain order, the method further comprising storing the contents of a memory location from the first set earlier than it would otherwise be stored when the process needs to modify the memory location during the second checkpoint interval.
  - 72. The computer readable media of claim 71 wherein the method further comprises removing the record of the memory location and the write protection for the memory location, after the contents of the memory location has been stored.
  - 73. The computer readable media of claim 66 wherein the checkpoint is created by storing the contents of the first set of memory locations to a checkpoint file.
  - 74. The computer readable media of claim 73 wherein the method further comprises removing the record of a memory location from the first set after the contents from the memory location is stored in the checkpoint file.
  - 75. The computer readable media of claim 66 wherein the checkpoint is created by storing the contents of the first set of memory locations to non-volatile storage.
  - 76. The computer readable media of claim 66 wherein the process is performed by a compute node in the distributed system, the method further comprising storing in the memory a copy of each message output from the compute node during the process until an acknowledgement is received, and outputting each message copied in the memory that does not receive an acknowledgement, and wherein the method further comprises receiving messages during the process, outputting an acknowledgement for each message received, recognizing and discarding duplicate messages received by the compute node, and for each duplicate message, outputting an acknowledgement.

77. Computer readable media embodying a program of instructions executable by a processor to perform a method of creating a checkpoint for a process in a distributed computation running on a distributed system, the process being performed by a compute node, the method comprising:
- storing in a memory a copy of each message output from the compute node until an acknowledgement is received;
  
  creating a checkpoint for the process;
  
  rolling compute node back to the checkpoint in response to a failure; and
  
  outputting each message copied in the memory that does not receive an acknowledgement after the compute node is rolled back to the checkpoint.
- View Dependent Claims (78)
- - 78. The computer readable media of claim 77 wherein the method further comprises outputting an acknowledgement for each message received by the compute node, and recognizing and discarding duplicate messages received by the compute node, and for each duplicate message, outputting an acknowledgement.

79. Computer readable media embodying a program of instructions executable by a processor to perform a method of creating a checkpoint for a process in a distributed computation running on a distributed system, the process being performed by a compute node, the method comprising:
- outputting an acknowledgement for each message received by the compute node;
  
  creating a checkpoint for the process;
  
  rolling the compute node back to the checkpoint in response to a failure; and
  
  recognizing and discarding duplicate messages received by the compute node after the compute node is rolled back to the checkpoint, and for each duplicate message, outputting an acknowledgement.
- View Dependent Claims (80)
- - 80. The computer readable media of claim 79 wherein the method further comprises storing in the memory a copy of each message output from the compute node until an acknowledgement is received, and outputting each message copied in the memory that does not receive an acknowledgement.

81. Computer readable media embodying a program of instructions comprising a checkpoint library executable by a processor having access to an operating system and a distributed application to perform a process in a distributed computation running on a distributed system, the checkpoint library comprising:
- instructions to create a checkpoint for the process, the creation of the checkpoint being transparent to the operating system and the distributed application.
- View Dependent Claims (82, 83)
- - 82. The computer readable media of claim 81 wherein the instructions to create the checkpoint comprise instructions to record a first set of memory locations modified by the process during a first checkpoint interval, and instructions to create the checkpoint from the contents of the first set of memory locations, while recording a second set of memory locations modified by the process during a second checkpoint interval.
  - 83. The computer readable media of claim 81 wherein the process is performed in a compute node in the distributed system, and wherein the instructions to create the checkpoint comprise instructions to store in memory a copy of each message output from the compute node during the process until an acknowledgement is received, and output each message copied in the memory that does not receive an acknowledgement, and instructions to receive messages during the process, output an acknowledgement for each message received, recognize and discard duplicate messages received by the compute node, and for each duplicate message, output an acknowledgement.

84. Computer readable media embodying a program of instructions executable by a processor to perform a method of creating a checkpoint for a process in a distributed computation running on a distributed system, the method comprising:
- creating a checkpoint for the process; and
  
  storing the checkpoint to non-volatile memory and halting the process in response to a preemptive scheduling request.
- View Dependent Claims (85, 86, 87)
- - 85. The computer readable media of claim 84 wherein the method further comprises using the stored checkpoint to resume the process.
  - 86. The computer readable media of claim 84 wherein the method further comprises performing a process in a second distributed computation running on the distributed system after the previous process is halted.
  - 87. The computer readable media of claim 84 wherein the process is performed in a compute node, the method further comprising performing a second process in the distributed computation previously being performed by a second compute node in the distributed system, the second process being resumed from the checkpoint last taken by the second compute node.

88. A method of creating a checkpoint for a process in a distributed computation running on a distributed system, the method comprising:
- recording a first set of memory locations modified by the process during a first checkpoint interval; and
  
  creating a checkpoint from the contents of the first set of memory locations, while recording a second set of memory locations modified by the process during a second checkpoint interval.
- View Dependent Claims (89, 90, 91, 92, 93, 94, 95, 96, 97, 98)
- - 89. The method of claim 88 further comprising write protecting the first set of memory locations before the process modifies the second set of memory locations.
  - 90. The method of claim 89 further comprising suspending the process between the first and second checkpoint intervals, and wherein the first set of memory locations are write protected while the process is suspended.
  - 91. The method of claim 90 further comprising entering a barrier following the completion of write protecting the first set of memory locations and exiting the barrier before resuming the process during the second checkpoint interval.
  - 92. The method of claim 89 wherein the checkpoint is created by storing the contents of the first set of memory locations, the method further comprising removing the write protection for a memory location from the first set when the process needs to modify the memory location during the second checkpoint interval after the contents of the memory location has been stored.
  - 93. The method of claim 92 wherein the checkpoint is created by storing the contents of the first set of memory locations in a certain order, the method further comprising storing the contents of a memory location from the first set earlier than it would otherwise be stored when the process needs to modify the memory location during the second checkpoint interval.
  - 94. The method of claim 93 further comprising removing the record of the memory location and the write protection for the memory location, after the contents of the memory location has been stored.
  - 95. The method of claim 88 wherein the checkpoint is created by storing the contents of the first set of memory locations to a checkpoint file.
  - 96. The method of claim 95 further comprising removing the record of a memory location from the first set after the contents from the memory location is stored in the checkpoint file.
  - 97. The method of claim 88 wherein the checkpoint is created by storing the contents of the first set of memory locations to non-volatile storage.
  - 98. The method of claim 88 wherein the process is performed by a compute node in the distributed system, the method further comprising storing in the memory a copy of each message output from the compute node during the process until an acknowledgement is received, and outputting each message copied in the memory that does not receive an acknowledgement, and wherein the method further comprises receiving messages during the process, outputting an acknowledgement for each message received, recognizing and discarding duplicate messages received by the compute node, and for each duplicate message, outputting an acknowledgement.

99. A method of creating a checkpoint for a process in a distributed computation running on a distributed system, the process being performed by a compute node, the method comprising:
- storing in a memory a copy of each message output from the compute node until an acknowledgement is received;
  
  creating a checkpoint for the process;
  
  rolling compute node back to the checkpoint in response to a failure; and
  
  outputting each message copied in the memory that does not receive an acknowledgement after the compute node is rolled back to the checkpoint.
- View Dependent Claims (100)
- - 100. The method of claim 99 further comprising outputting an acknowledgement for each message received by the compute node, and recognizing and discarding duplicate messages received by the compute node, and for each duplicate message, outputting an acknowledgement.

101. A method of creating a checkpoint for a process in a distributed computation running on a distributed system, the process being performed by a compute node, the method comprising:
- outputting an acknowledgement for each message received by the compute node;
  
  creating a checkpoint for the process;
  
  rolling the compute node back to the checkpoint in response to a failure; and
  
  recognizing and discarding duplicate messages received by the compute node after the compute node is rolled back to the checkpoint, and for each duplicate message, outputting an acknowledgement.
- View Dependent Claims (102)
- - 102. The method of claim 101 further comprising storing in the memory a copy of each message output from the compute node until an acknowledgement is received, and outputting each message copied in the memory that does not receive an acknowledgement.

103. A method of creating a checkpoint for a process in a distributed computation running on a distributed system, the method comprising:
- creating a checkpoint for the process; and
  
  storing the checkpoint to non-volatile memory and halting the process in response to a preemptive scheduling request.
- View Dependent Claims (104, 105, 106)
- - 104. The method of claim 103 further comprising using the stored checkpoint to resume the process.
  - 105. The method of claim 103 further comprising performing a process in a second distributed computation running on the distributed system after the previous process is halted.
  - 106. The method of claim 103 wherein the process is performed in a compute node, the method further comprising performing a second process in the distributed computation previously being performed by a second compute node in the distributed system, the second process being resumed from the checkpoint last taken by the second compute node.

107. A method of migrating a process in a distributed computation running on a distributed system from a first compute node to a second compute node in the distributed system, each of the first and second compute nodes having an operating system, the method comprising:
- creating a checkpoint for the process in the first compute node; and
  
  migrating the process to the second compute node by providing the second compute node with the checkpoint without migrating the operating system from the first compute node to the second compute node.
- View Dependent Claims (108, 109)
- - 108. The method of claim 107 wherein the checkpoint is created by recording a first set of memory locations modified by the process in the first compute node during a first checkpoint interval, and creating the checkpoint from the contents of the first set of memory locations, while recording a second set of memory locations modified by the process in the first compute node during a second checkpoint interval.
  - 109. The method of claim 107 wherein the checkpoint is created by storing in memory a copy of each message output from the first compute node during the process in the first compute node until an acknowledgement is received, and outputting each message copied in the memory that does not receive an acknowledgement, and receiving messages during the process in the first compute node, outputting an acknowledgement for each message received, recognizing and discarding duplicate messages received by the first compute node, and for each duplicate message, outputting an acknowledgement.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
Virginia Tech Intellectual Properties, Inc. (Virginia Polytechnic Institute and State University)
Inventors
Varadarajan, Srinidhi, Ruscio, Joseph

Granted Patent

US 7,536,591 B2
Time in Patent Office

Days
Field of Search
US Class Current

714/15
CPC Class Codes

G06F 11/1438   Restarting or rejuvenating

G06F 11/1451   by selection of backup cont...

G06F 11/1458   Management of the backup or...

G06F 11/1464   for networked environments

G06F 11/1469   Backup restoration techniques

G06F 11/1471   involving logging of persis...

G06F 11/1482   by means of middleware or O...

G06F 11/203   using migration

G06F 11/2046   where the redundant compone...

G06F 16/215   Improving data quality; Dat...

G06F 2201/82   Solving problems relating t...

G06F 2201/84   Using snapshots, i.e. a log...

G06F 3/0619   in relation to data integri...

G06F 3/065   Replication mechanisms

G06F 3/0683   Plurality of storage devices

Transparent checkpointing and process migration in a distributed system

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

109 Claims

Specification

Solutions

Use Cases

Quick Links

Transparent checkpointing and process migration in a distributed system

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

109 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links