Systems and methods for memory failure prevention, management, and mitigation
First Claim
Patent Images
1. A computer-implemented method of monitoring and retiring memory pages in random access memory (RAM), the computer implemented method comprising:
- monitoring, by a computer system, correctable error statistics for each of a plurality of memory pages, wherein the correctable error statistics comprise one or more page retirement criteria, wherein the one or more page retirement criteria comprise a correctable error count, correctable error rate, or a time since a most recent correctable error;
detecting, by the computer system, a high-risk page, wherein detecting the high-risk page comprises determining whether the page retirement criteria of the high-risk page has exceeded a retirement criteria threshold;
placing, by the computer system, page information associated with the high-risk page on a retired page list, wherein the retired page list has a size corresponding to a number of spare pages stored in a reserved space of RAM;
storing, by the computer system, identical data to data stored in the high-risk page in a spare page; and
identifying, by the computer system in a mapping of the plurality of memory pages, the high-risk page such that one or more references to the high-risk page in the mapping are rerouted to the spare page,wherein the computer system comprises a processor and the RAM.
0 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments described herein are directed to memory page or bad block monitoring and retirement algorithms, systems and methods for random access memory (RAM). Reliability issues or errors can be detected for multiple memory pages using one or more retirement criterion. In some embodiments, when reliability errors are detected, it may be desired to remove such pages from operation before they create a more serious problem, such as a computer crash. Thus, bad block retirement and replacement mechanisms are described herein.
-
Citations
20 Claims
-
1. A computer-implemented method of monitoring and retiring memory pages in random access memory (RAM), the computer implemented method comprising:
-
monitoring, by a computer system, correctable error statistics for each of a plurality of memory pages, wherein the correctable error statistics comprise one or more page retirement criteria, wherein the one or more page retirement criteria comprise a correctable error count, correctable error rate, or a time since a most recent correctable error; detecting, by the computer system, a high-risk page, wherein detecting the high-risk page comprises determining whether the page retirement criteria of the high-risk page has exceeded a retirement criteria threshold; placing, by the computer system, page information associated with the high-risk page on a retired page list, wherein the retired page list has a size corresponding to a number of spare pages stored in a reserved space of RAM; storing, by the computer system, identical data to data stored in the high-risk page in a spare page; and identifying, by the computer system in a mapping of the plurality of memory pages, the high-risk page such that one or more references to the high-risk page in the mapping are rerouted to the spare page, wherein the computer system comprises a processor and the RAM. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computing system comprising:
-
one or more computer readable storage devices configured to store a plurality of computer executable instructions; and one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute the plurality of computer executable instructions in order to cause the system to; monitor correctable error statistics for each of a plurality of memory pages, wherein the correctable error statistics comprise one or more page retirement criteria, wherein the one or more page retirement criteria comprise a correctable error count, correctable error rate, or a time since a most recent correctable error; detect a high-risk page, wherein detecting the high-risk page comprises determining whether the page retirement criteria of the high-risk page has exceeded a retirement criteria threshold; place page information associated with the high-risk page on a retired page list, wherein the retired page list has a size corresponding to a number of spare pages stored in a reserved space of a RAM; store identical data to data stored in the high-risk page in a spare page; and identify, in a mapping of the plurality of memory pages, the high-risk page such that one or more references to the high-risk page in the mapping are rerouted to the spare page.
-
Specification