TOLERATING MEMORY STACK FAILURES IN MULTI-STACK SYSTEMS
First Claim
1. A memory system, comprising:
- a random-access memory including a plurality of memory stacks, each including a plurality of stacked random-access memory integrated circuit dies;
a memory controller coupled to said random-access memory and operable to;
receive a block of data for writing to the memory stacks;
divide the block of data into a plurality of sub-blocks;
create a reliability sub-block based on the plurality of sub-blocks;
cause the plurality of sub-blocks and the reliability sub-block each to be written to a different one of the memory stacks;
cause the plurality of sub-blocks to be read from the plurality of memory stacks and detect an error therein indicating a failure within one of the memory stacks; and
in response to detecting the error, recover correct data based on the reliability sub-block.
1 Assignment
0 Petitions
Accused Products
Abstract
Memory management circuitry and processes operate to improve reliability of a group of memory stacks, providing that if a memory stack or a portion thereof fails during the product'"'"'s lifetime, the system may still recover with no errors or data loss. A front-end controller receives a block of data requested to be written to memory, divides the block into sub-blocks, and creates a new redundant reliability sub-block. The sub-blocks are then written to different memory stacks. When reading data from the memory stacks, the front-end controller detects errors indicating a failure within one of the memory stacks, and recovers corrected data using the reliability sub-block. The front-end controller may monitor errors for signs of a stack failure and disable the failed stack.
-
Citations
20 Claims
-
1. A memory system, comprising:
-
a random-access memory including a plurality of memory stacks, each including a plurality of stacked random-access memory integrated circuit dies; a memory controller coupled to said random-access memory and operable to; receive a block of data for writing to the memory stacks; divide the block of data into a plurality of sub-blocks; create a reliability sub-block based on the plurality of sub-blocks; cause the plurality of sub-blocks and the reliability sub-block each to be written to a different one of the memory stacks; cause the plurality of sub-blocks to be read from the plurality of memory stacks and detect an error therein indicating a failure within one of the memory stacks; and in response to detecting the error, recover correct data based on the reliability sub-block. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of managing memory access, comprising:
-
receiving a block of data for writing to a random-access memory; dividing the block of data into a plurality of sub-blocks; creating a reliability sub-block based on the plurality of sub-blocks; causing the plurality of sub-blocks and the reliability sub-block each to be written to different ones of a plurality of memory stacks, each memory stack comprising a plurality of stacked random-access memory integrated circuits; causing the plurality of sub-blocks to be read from the plurality of memory stacks and detecting an error therein indicating a failure within one of the memory stacks; and in response to detecting the error, recovering correct data based on the reliability sub-block. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A memory controller circuit for interfacing with a plurality of random-access memory stacks, comprising:
-
a plurality of memory channel controllers coupled to the random-access memory stacks; and a front-end controller coupled to the plurality of memory channel controllers and operable to; receive a block of data for writing to the random-access memory stacks; divide the block of data into a plurality of sub-blocks; create a reliability sub-block based on the plurality of sub-blocks; direct selected ones of the memory channel controllers to cause the plurality of sub-blocks and the reliability sub-block each to be written to a different one of the random-access memory stacks; direct selected ones of the memory channel controllers to cause the plurality of sub-blocks to be read from the random-access memory stacks; detect an error in the plurality of sub-blocks indicating a failure within one of the memory stacks; and in response to detecting the error, recover correct data based on the reliability sub-block. - View Dependent Claims (17, 18, 19, 20)
-
Specification