Virtual machine fault tolerance
First Claim
1. In a computer system running at least a primary virtual machine (VM) on virtualization software on a primary virtualized computer system (VCS) and running a secondary VM on virtualization software on a secondary VCS, a computer implemented method for the secondary VM to provide quasi-lockstep fault tolerance for the primary VM comprises:
- as the primary VM is executing a workload, virtualization software in the primary VCS is;
(a) causing predetermined events to be recorded in an event log, (b) keeping output associated with the predetermined events pending, and (c) sending the log entries to the virtualization software in the secondary VCS;
as the secondary VM is replaying the workload, virtualization software in the secondary VCS is;
(a) sending acknowledgements indicating that log entries have been received;
(b) when the virtualization software encounters one of the predetermined events, searching the log entries to determine whether a log entry corresponding to the same event was received from the primary VCS, and if so, comparing data associated with the predetermined event produced by the secondary VM with that of the primary VM;
if there is a match, the virtualization software in the secondary VCS transmitting an acknowledgement to the virtualization software in the primary VCS;
one of the virtualization software in the primary or secondary VCS dropping the event and the other dispatching the output; and
if there is no match, performing a checkpoint resynchronization;
wherein at least one of the predetermined events relates to virtual disk I/O, and virtual disks are shared; and
wherein the secondary VM output is not sent to the virtual disks after a match is found.
2 Assignments
0 Petitions
Accused Products
Abstract
In a computer system running a primary virtual machine (VM) on virtualization software on a primary virtualized computer system (VCS) and running a secondary VM on virtualization software on a secondary VCS, a method for the secondary VM to provide quasi-lockstep fault tolerance for the primary VM includes: as the primary VM is executing a workload, virtualization software in the primary VCS is: (a) causing predetermined events to be recorded in an event log, (b) keeping output associated with the predetermined events pending, and (c) sending the log entries to the virtualization software in the secondary VCS; as the secondary VM is replaying the workload, virtualization software in the secondary VCS is: (a) sending acknowledgements indicating that log entries have been received; (b) when the virtualization software encounters one of the predetermined events, searching the log entries to determine whether a log entry corresponding to the same event was received from the primary VCS, and if so, comparing data associated with the predetermined event produced by the secondary VM with that of the primary VM; if there is a match, the virtualization software in the secondary VCS transmitting an acknowledgement to the virtualization software in the primary VCS; one of the virtualization software in the primary or secondary VCS dropping the event and the other dispatching the output; and if there is no match, performing a checkpoint resynchronization.
-
Citations
17 Claims
-
1. In a computer system running at least a primary virtual machine (VM) on virtualization software on a primary virtualized computer system (VCS) and running a secondary VM on virtualization software on a secondary VCS, a computer implemented method for the secondary VM to provide quasi-lockstep fault tolerance for the primary VM comprises:
-
as the primary VM is executing a workload, virtualization software in the primary VCS is;
(a) causing predetermined events to be recorded in an event log, (b) keeping output associated with the predetermined events pending, and (c) sending the log entries to the virtualization software in the secondary VCS;as the secondary VM is replaying the workload, virtualization software in the secondary VCS is;
(a) sending acknowledgements indicating that log entries have been received;
(b) when the virtualization software encounters one of the predetermined events, searching the log entries to determine whether a log entry corresponding to the same event was received from the primary VCS, and if so, comparing data associated with the predetermined event produced by the secondary VM with that of the primary VM;if there is a match, the virtualization software in the secondary VCS transmitting an acknowledgement to the virtualization software in the primary VCS; one of the virtualization software in the primary or secondary VCS dropping the event and the other dispatching the output; and if there is no match, performing a checkpoint resynchronization; wherein at least one of the predetermined events relates to virtual disk I/O, and virtual disks are shared; and wherein the secondary VM output is not sent to the virtual disks after a match is found. - View Dependent Claims (2, 3)
-
-
4. In a computer system running at least a first virtual machine (VM) and a second VM on virtualization software, a computer implemented method for the second VM to provide quasi-lockstep fault tolerance for the first VM comprises:
-
recording predetermined operations of the first VM to log entries and communicating the log entries to the second VM until the first VM reaches a first externally visible output; replaying the log entries at the second VM until the second VM reaches a second externally visible output; and responsive to the second externally visible output diverging from the first externally visible output, the first VM initiating a checkpoint resynchronization process and the second VM completing the checkpoint resynchronization process for the second VM to restore checkpointed states of the first VM and be synchronized with the first VM, wherein the checkpoint resynchronization process comprises; (a) pausing progress of the first VM, or preventing the first VM'"'"'s checkpointed state from being overwritten before the second VM restores the checkpointed state of the first VM, wherein pausing includes stopping I/O completions from being posted to the first VM and stopping guest operating system instructions from being executed at the first VM; (b) serializing an emulation state of the first VM, storing the emulation state of the first VM in a serialized file, and sending the serialized file to the second VM; (c) restoring execution at the second VM based on the serialized state of the first VM; and (d) reissuing pending I/O operations at the second VM. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. In a computer system running at least a primary virtual machine (VM) on virtualization software on a primary virtualized computer system (VCS) and running a secondary VM on virtualization software on a secondary VCS, a computer implemented method for the secondary VM to provide quasi-lockstep fault tolerance for the primary VM comprises:
-
as the primary VM is executing a workload, virtualization software in the primary VCS is;
(a) causing predetermined events to be recorded in an event log, (b) keeping output associated with the predetermined events pending, and (c) sending the log entries to the virtualization software in the secondary VCS;as the secondary VM is replaying the workload, virtualization software in the secondary VCS is;
(a) sending acknowledgements indicating that log entries have been received;
(b) when the virtualization software encounters one of the predetermined events, searching the log entries to determine whether a log entry corresponding to the same event was received from the primary VCS, and if so, comparing data associated with the predetermined event produced by the secondary VM with that of the primary VM;if there is a match, the virtualization software in the secondary VCS transmitting an acknowledgement to the virtualization software in the primary VCS; one of the virtualization software in the primary or secondary VCS dropping the event and the other dispatching the output; and if there is no match, performing a checkpoint resynchronization; wherein the predetermined events relate to virtual disk I/O, and the virtual disks are not shared; and wherein the secondary VM output is sent to the second VM'"'"'s virtual disk after a match is found.
-
Specification