LOW OVERHEAD FAULT TOLERANCE THROUGH HYBRID CHECKPOINTING AND REPLAY
First Claim
1. A method for providing fault-tolerance protection for a primary virtual machine using a snapshot image of the primary virtual machine that represents a state of the primary virtual machine at a time t0, the method comprising:
- tracking virtual machine instructions executed by the primary virtual machine since time t0;
tracking a set of changes made to the state of the primary virtual machine since time t0; and
merging the set of changes into the snapshot image, thereby creating a new snapshot image for a backup virtual machine.
2 Assignments
0 Petitions
Accused Products
Abstract
A virtualized computer system provides fault tolerant operation of a primary virtual machine. In one embodiment, this system includes a backup computer system that stores a snapshot of the primary virtual machine and a log file containing non-deterministic events occuring in the instruction stream of the primary virtual machine. The primary virtual machine periodically updates the snapshot and the log file. Upon a failure of the primary virtual machine, the backup computer can instantiate a failover backup virtual machine by consuming the stored snapshot and log file.
304 Citations
20 Claims
-
1. A method for providing fault-tolerance protection for a primary virtual machine using a snapshot image of the primary virtual machine that represents a state of the primary virtual machine at a time t0, the method comprising:
-
tracking virtual machine instructions executed by the primary virtual machine since time t0; tracking a set of changes made to the state of the primary virtual machine since time t0; and merging the set of changes into the snapshot image, thereby creating a new snapshot image for a backup virtual machine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A virtualized computer platform adapted to provide failure-free operation of a primary virtual machine, the virtualized computer platform comprising:
-
a primary computer system comprising a processor programmed to execute a virtual machine software layer configured to (a) instantiate the primary virtual machine, (b) identify non-deterministic events occurring in the primary virtual machine'"'"'s instruction stream, and (c) periodically determine checkpoint changes made to the primary virtual machine'"'"'s state; and a backup computer system coupled to the primary computer system, the backup computer system comprising a data store and a processor coupled to the data store, the processor programmed to (a) receive notification of a failure of the primary virtual machine, (b) retrieve from the data store a snapshot image of the primary virtual machine, (c) retrieve from the data store a log file containing information relating to non-deterministic events in the primary virtual machine'"'"'s instruction stream, and (d) provide the snapshot image and the log file to an instantiated backup virtual machine for restoring a state of the primary virtual machine by using the snapshot image and replaying a set of virtual machine instructions using the log file. - View Dependent Claims (13, 14)
-
-
15. A computer readable storage medium having stored therein a computer program for providing a fault-tolerant virtual machine operating environment, wherein the computer program is configured to be executed on a primary computer system to carry out the steps of:
-
instantiating a primary virtual machine; establishing a connection to a backup computer system; transmitting a snapshot image of the primary virtual machine to the backup computer system; logging information relating to virtual machine instructions executed by the primary virtual machine; periodically determining a set of changes made to the state of the primary virtual machine; and transmitting the set of changes to the backup computer system. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification