Computer system operation with corrected read data function
First Claim
1. A method of operating a computer system comprising the steps of:
- a) sending requests by a CPU to a memory to read selected data items stored in said memory with ECC codes;
b) retrieving said selected data items from said memory and checking each said data item using said ECC code, and, if a correctable error is detected, correcting said data item and signalling to said CPU that a corrected read data event has occurred;
c) storing by said CPU an identification print for a corrected read data event;
d) comparing an identification print, before storing, with previously-stored ones of said stored identification prints to see if a match exists.e) scrubbing a location in said memory for which a corrected read data event occurred; and
f) moving a block of said memory to another location in said memory when a corrected read data event occurs having an identification print matching that of a previous identification print.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer system having a memory with an ECC function employs an improved method for handling corrected read data events, so transient errors caused by alpha particle hits in DRAMs may be distinguished from hard errors. When a corrected read data event occurs, a footprint defining its location is compared with previously-stored footprints to determine if this location has failed before. Also, a location showing a corrected read data event is "scrubbed" (data is read, corrected and rewritten) so transient error locations are removed. If another corrected read data event occurs for this same location, after scrubbing, then the location is assumed to have a hard fault, and so the page containing this location is replaced.
-
Citations
22 Claims
-
1. A method of operating a computer system comprising the steps of:
-
a) sending requests by a CPU to a memory to read selected data items stored in said memory with ECC codes; b) retrieving said selected data items from said memory and checking each said data item using said ECC code, and, if a correctable error is detected, correcting said data item and signalling to said CPU that a corrected read data event has occurred; c) storing by said CPU an identification print for a corrected read data event; d) comparing an identification print, before storing, with previously-stored ones of said stored identification prints to see if a match exists. e) scrubbing a location in said memory for which a corrected read data event occurred; and f) moving a block of said memory to another location in said memory when a corrected read data event occurs having an identification print matching that of a previous identification print. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system comprising:
-
a) a CPU and a memory, the CPU having means for sending data items to the memory for storing identified by physical addresses and for sending requests to said memory to read selected ones of said data items; b) the memory having ECC means for storing said data items along with an ECC code for each item and retrieving said data items and checking said data items against said ECC code to detect and correct any correctable errors occurring; c) said memory having means for sending to said CPU an identification print of said physical address in memory of each said data item for which said correctable error is detected and corrected; d) said CPU having means for storing said identification prints and comparing each one of said identification prints representing a correctable error, before storing, with previously-stored ones of said stored identification prints to see if a match exists; e) means for scrubbing a location in said memory containing each said physical address for which a correctable error was detected; and f) means for moving each page of said memory to another page location in said memory when a correctable error occurs having an identification print matching that of a previous correctable error. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method of operating a computer system comprising the steps of:
-
a) sending data items to memory by a CPU identified by physical addresses; b) storing said data items in memory along with an ECC code for each item; c) sending requests by said CPU to said memory to read selected ones of said data items; d) retrieving said selected ones of said data item from said memory and checking said data items using said ECC code, and correcting each of said data items if a correctable error is detected; e) storing by said CPU an identification print of said physical address in memory of each said data item for which said correctable error is detected and corrected; f) comparing each one of said identification prints representing a correctable error, before storing, with previously-stored ones of said stored identification prints to see if a match exists; e) scrubbing a location in said memory containing each said physical address for which a correctable error was detected; and f) moving a block of said memory containing the physical address of a correctable error to another location in said memory when a correctable error occurs having an identification print matching that of a previous correctable error. - View Dependent Claims (20, 21, 22)
-
Specification