Recovery from data fetch errors in hypervisor code
First Claim
1. A logically partitioned data processing system;
- comprising;
a plurality of processors, each of which assigned to one of a plurality of partitions; and
a plurality of private data areas, each of which is assigned to one of the plurality of processors and each of the plurality of private data areas includes a primary copy of data and an alternate copy of data;
wherein each of the plurality of processors is configured, during data fetch operations, to attempt to retrieve data from the primary copy of the data and, responsive to a failure to retrieve the data from the primary copy, attempting to retrieve the data from the alternate copy of the data; and
each of the plurality of processors is configured, responsive to a failure of a write operation to the primary copy of data, to copy data from the alternate copy of the data into the primary copy of the data and re-attempt the write operation.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system, and apparatus for isolating fatal data fetch errors to a single partition within a logically partitioned data processing system. In one embodiment, the logically partitioned data processing system includes a plurality of operating systems and a plurality of processors is provided. Each of the operating systems is assigned to a separate one of a plurality of logical partitions. Each of the processors is assigned to one of the plurality of logical partitions. The logically partitioned data processing system also includes a hypervisor for creating and maintaining separation of the plurality of logical partitions. The hypervisor contains services and functions accessed by each of the logical partitions and, to prevent fatal data fetch errors in one partition from effecting other partitions within the logically partitioned data processing system, the hypervisor includes a plurality of data structure areas. Fatal data fetch errors occurring in one of the plurality of data structure areas results in rebooting data processing system components associated with only a single effected logical partition of the plurality of logical partitions within the logically partitioned data processing system.
-
Citations
10 Claims
-
1. A logically partitioned data processing system;
- comprising;
a plurality of processors, each of which assigned to one of a plurality of partitions; and
a plurality of private data areas, each of which is assigned to one of the plurality of processors and each of the plurality of private data areas includes a primary copy of data and an alternate copy of data;
whereineach of the plurality of processors is configured, during data fetch operations, to attempt to retrieve data from the primary copy of the data and, responsive to a failure to retrieve the data from the primary copy, attempting to retrieve the data from the alternate copy of the data; and
each of the plurality of processors is configured, responsive to a failure of a write operation to the primary copy of data, to copy data from the alternate copy of the data into the primary copy of the data and re-attempt the write operation. - View Dependent Claims (2, 3, 4, 5, 6)
a support processor, wherein the support processor is configured to monitor the log file and, responsive to an indication that a processor has recorded a fatal data fetch error in the log file, notifying other processors, allocated to a same partition as the processor reporting the fatal data fetch error, of the error, whereby the other processors allocated to the same partition initiate the same partition'"'"'s reboot policy.
- comprising;
-
5. The logically partitioned data processing system as recited in claim 2, further comprising:
-
a support processor, wherein the support processor performs surveillance on each of the plurality of partitions and, responsive to a determination that a processor within one of the plurality of partitions does not respond after a time out period, performing a system reset on the other processors within the partition to which the non-responding processor is allocated.
-
-
6. The logically partitioned data processing system as recited in claim 2, further comprising:
-
a support processor, wherein the support processor, responsive to receipt of a user initiated reset command, performs a reset on partition processors allocated to a same partition as that to which the processor receiving the fatal data fetch error is allocated.
-
-
7. A method of preventing a data fetch error occurring within one partition from affecting the operation of other partitions within a logically partitioned data processing system, the method comprising:
-
receiving, at a processor, a data fetch error;
creating an error log file indicating the receipt of the data fetch error;
initiating a reboot policy for the one partition to which the processor receiving the data fetch error is allocated; and
wherein the initiating a reboot policy step comprises;
other processors, allocated to one partition to which the processor receiving the data fetch error is allocated, polling the error log file; and
responsive to determining that a data fetch error has been recorded, other processors executing the reboot policy.
-
-
8. A computer program product for preventing a data fetch error occurring within one partition from affecting the operation of other partitions within a logically partitioned data processing system, the product comprising:
-
instruction means for receiving, by a processor, a data fetch error;
instruction means for creating an error log file indicating the receipt of the data fetch error;
instruction means for initiating a reboot policy for the one partition to which the processor receiving the data fetch error is allocated; and
wherein the instruction means for initiating a reboot policy further comprises;
instruction means for polling the error log file by other processors allocated to one partition to which the processor receiving the data fetch error is allocated; and
instruction means responsive to determining that a data fetch error has been recorded, for executing the reboot policy by other processors.
-
-
9. A method of preventing a data fetch error occurring within one partition from affecting the operation of other partitions within a logically partitioned data processing system, the method comprising:
-
receiving, at a processor, a data fetch error;
creating an error log file indicating the receipt of the data fetch error;
initiating a reboot policy for the one partition to which the processor receiving the data fetch error is allocated; and
wherein the initiating the reboot policy comprises receipt of a user command to initiate the reboot policy.
-
-
10. A method of preventing a data fetch error occurring within one partition from affecting the operation of other partitions within a logically partitioned data processing system, the method comprising:
-
receiving, at a processor, a data fetch error;
creating an error log file indicating the receipt of the data fetch error;
initiating a reboot policy for the one partition to which the processor receiving the data fetch error is allocated; and
wherein the initiating the reboot policy comprises;
the processor signaling a system reset to other processors allocated to a same partition; and
initiation of the reboot policy by the other processors.
-
Specification