Method, apparatus and program product to concurrently detect, repair, verify and isolate memory failures
First Claim
1. A method for repairing memory failure in a computer system, comprising:
- receiving a command that a failed memory unit has been replaced and to test a new memory unit in a memory subsystem having one or more memory units concurrently being used by a running processor;
determining a test pattern;
determining time duration for testing the new memory unit;
writing the test pattern to the new memory unit;
reading the written test pattern from the new memory unit;
comparing the test pattern read with the test pattern that was written, if the read test pattern and the written test pattern do not match, notifying that the new memory unit is bad and if the read test pattern and the written test pattern match, determining if the time duration for testing has expired;
if the time duration has not expired, repeating the steps of writing, reading, and comparing; and
if the time duration has expired, configuring the new memory as being available for use.
2 Assignments
0 Petitions
Accused Products
Abstract
Method and system for repairing memory failure in a computer system in one aspect determines one or more test patterns and time duration for testing the new memory unit that replaced a failed memory unit. The test pattern is written to the new memory unit and read from the new memory unit. The read pattern is compared to the test pattern that was used to write. If the read test pattern and the written test pattern doe not match, a further repair action is taken. If they match, writing and reading of the test pattern repeats until the time duration for testing expires. The new memory unit may be configured as available for use when the write and read test completes successfully for the testing time duration.
-
Citations
13 Claims
-
1. A method for repairing memory failure in a computer system, comprising:
-
receiving a command that a failed memory unit has been replaced and to test a new memory unit in a memory subsystem having one or more memory units concurrently being used by a running processor; determining a test pattern; determining time duration for testing the new memory unit; writing the test pattern to the new memory unit; reading the written test pattern from the new memory unit; comparing the test pattern read with the test pattern that was written, if the read test pattern and the written test pattern do not match, notifying that the new memory unit is bad and if the read test pattern and the written test pattern match, determining if the time duration for testing has expired; if the time duration has not expired, repeating the steps of writing, reading, and comparing; and if the time duration has expired, configuring the new memory as being available for use. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for recovering memory failure in a computer system, comprising:
a processor operable to receive a command that a failed memory unit has been replaced and to test the new memory unit in a memory subsystem of a computer system, the memory subsystem having one or more memory units, the processor further operable to determine a test pattern and determine time duration for testing the new memory unit, the processor further operable to write the test pattern to the new memory unit, read the written test pattern from the new memory unit, and compare the test pattern read with the test pattern that was written, if the read test pattern and the written test pattern do not match, the processor operable to notify that the new memory unit is bad and if the read test pattern and the written test pattern match, the processor operable to determine if the time duration for testing has expired, and if the time duration has not expired, the processor operable to repeat the steps of writing, reading, and comparing, and if the time duration has expired, the processor operable to configure the new memory as being available for use. - View Dependent Claims (7, 8)
-
9. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a method for repairing memory failure in a computer system, comprising:
-
receiving a command that a failed memory unit has been replaced and to test a new memory unit in a memory subsystem having one or more memory units concurrently being used by a running processor; determining a test pattern; determining time duration for testing the new memory unit; writing the test pattern to the new memory unit; reading the written test pattern from the new memory unit; comparing the test pattern read with the test pattern that was written, if the read test pattern and the written test pattern do not match, notifying that the new memory unit is bad and if the read test pattern and the written test pattern match, determining if the time duration for testing has expired; if the time duration has not expired, repeating the steps of writing, reading, and comparing; and if the time duration has expired, configuring the new memory as being available for use. - View Dependent Claims (10, 11, 12, 13)
-
Specification