System and method for enhancing the reliability of a computer system by combining a cache sync-flush engine with a replicated memory module
First Claim
1. In a computer system comprising a node coupled to a shared memory via an interconnect network, the shared memory having a plurality of replicated memory modules, the node having first and second processors and caches, the caches connected to a system control unit via a common bus, a method of ensuring that only one processor has access to a particular memory address in a shared memory at a given time, the method comprising:
- issuing a lock command on the common bus;
requesting the lock command using the first processor;
retrieving data from the particular memory address located in a plurality of replicated memory modules;
determining a data value corresponding to the data that is held by a majority of the plurality of replicated memory modules;
transmitting the data value to the cache of the first processor; and
sending an unlock command from the first processor to the system control unit.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer system and method for enhancing the reliability of a computer system by combining a cache sync-flush engine with a replicated memory module includes placing a “lock” command on the common bus. The lock protects or controls accesses to a number of memory locations in the memory modules designated by the programmer. At any point in time, one processor can obtain the lock, and hence has access to the number of memory locations protected by the lock. Other processors may attempt to acquire or make a request for the same lock, however, the other processor will fail until the processor that has the lock has released (i.e., “unlocked”) the lock. The other processors will keep trying to get the lock. The processor that obtains the lock instructs the system control unit to begin logging or monitoring all subsequent memory addresses that appear on the common bus. After the processor gets the lock, it can start reading from and writing to the number of memory locations that are implemented as a number of replicated memory modules. A data value is then determined based on the data held by a majority of the replicated memory modules. The data value is transmitted to the cache of the processor. After the data is processed, an “unlock” command is transmitted from the processor to a system control unit that issues a write back request on the common bus that flushes the data value from the cache to the number of replicated memory modules.
23 Citations
17 Claims
-
1. In a computer system comprising a node coupled to a shared memory via an interconnect network, the shared memory having a plurality of replicated memory modules, the node having first and second processors and caches, the caches connected to a system control unit via a common bus, a method of ensuring that only one processor has access to a particular memory address in a shared memory at a given time, the method comprising:
-
issuing a lock command on the common bus;
requesting the lock command using the first processor;
retrieving data from the particular memory address located in a plurality of replicated memory modules;
determining a data value corresponding to the data that is held by a majority of the plurality of replicated memory modules;
transmitting the data value to the cache of the first processor; and
sending an unlock command from the first processor to the system control unit.
-
-
2. In a computer system comprising a memory module and a plurality of connected processor and cache configured to issue lock and unlock requests, the method of reading and writing back memory module data, comprising:
-
receiving a lock request for a given region of the memory module from a given connected processor and cache;
locking the given region to control access to data stored at addresses from within the given region;
reading and transmitting data from one or more addresses within the given region to the given connected processor and cache;
storing the one or more addresses from within the given region in a buffer;
receiving an unlock request for the given region;
unlocking the given region; and
writing back data from the given connected processor and cache to each address that is stored in the buffer and within the given region. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11)
receiving a lock request for a second given region of the memory module from a second given connected processor and cache in a second node;
locking the second given region to control access to data stored at addresses from within the second given region;
reading and transmitting data from one or more addresses within the second given region to the second given connected processor and cache;
storing the one or more addresses from within the second given region in a buffer;
receiving an unlock request for the second given region;
unlocking the second given region; and
writing back data from the second given connected processor and cache to each address that is stored in the buffer and within the second given region;
wherein steps relating to the reading and writing back of data in the first given region occur in parallel with steps relating to the reading and writing back of data in the second given region.
-
-
4. The method of claim 3, the computer system further comprising one or more additional memory modules to form a plurality of memory modules, each memory module being configured with a copy of the data in the other memory modules, wherein:
-
the step of reading and transmitting data from the first given region includes reading data from each of the plurality of memory modules, comparing the read data to determine data values held by a majority of the plurality of memory modules, and transmitting the data values held by a majority of the plurality of memory modules;
the step of writing back data from the first given connected processor and cache comprises writing back the data to each memory module of the plurality of memory modules;
the step of reading and transmitting data from the second given region includes reading data from each of the plurality of memory modules, comparing the read data to determine data values held by a majority of the plurality of memory modules, and transmitting the data values held by a majority of the plurality of memory modules; and
the step of writing back data from the second given connected processor and cache comprises writing back the data to each memory module of the plurality of memory modules.
-
-
5. The method of claim 3, wherein:
-
the set of one or more of the plurality of connected processor and cache for the first node includes at least two connected processor and cache;
the set of one or more of the plurality of connected processor and cache for the second node includes at least two. connected processor and cache;
the step of receiving a lock request for a given region of the memory module includes monitoring a first common bus connected to each connected processor and cache of the first node; and
the step of receiving a lock request for a second given region of the memory module includes monitoring a second common bus connected to each connected processor and cache of the second node.
-
-
6. The method of claim 5, and further comprising:
-
maintaining a readable port accessible through the first common bus and configured to provide information on the availability for receiving a lock request from any connected processor and cache of the first node; and
maintaining a readable port accessible through the second common bus and configured to provide information on the availability for receiving a lock request from any connected processor and cache of the second node.
-
-
7. The method of claim 2, the computer system further comprising one or more additional memory modules to form a plurality of memory modules, each memory module being configured with a copy of the data in the other memory modules, wherein:
-
the step of reading and transmitting data includes reading data from each of the plurality of memory modules, comparing the read data to determine data values held by a majority of the plurality of memory modules, and transmitting the data values held by a majority of the plurality of memory modules; and
the step of writing back data comprises writing back the data from the given connected processor and cache to each memory module of the plurality of memory modules.
-
-
8. The method of claim 2, wherein the step of storing the one or more addresses comprises writing the one or more addresses to a region in the memory module.
-
9. The method of claim 2, wherein at least some data written in the step of writing back data is changed from when it was transmitted in the step of reading and transmitting data.
-
10. The method of claim 2, wherein at least some data written in the step of writing back data is unchanged from when it was transmitted in the step of reading and transmitting data.
-
11. The method of claim 2, wherein the step of locking the given region includes setting a lock bit associated with the given region to a lock setting, and wherein the step of unlocking the given region includes setting the lock bit to an unlocked setting.
-
12. A computer system, comprising:
-
a memory module;
a memory controller;
a plurality of connected processors and caches, each connected processor and cache being configured to issue lock and unlock requests for regions of the memory module; and
a system control unit connected to a given connected processor and cache of the plurality of connected processors and caches;
wherein the memory controller is configured to lock and unlock access to individual regions of the memory module in response to lock and unlock requests from connected processors and caches;
wherein the system control unit is configured to store memory-module addresses of data copied from a locked memory region to the given connected processor and cache; and
wherein the system control unit is configured to write back cache data to the locked memory region in response to an unlock request for the locked memory region issued by the given connected processor and cache. - View Dependent Claims (13, 14, 15, 16, 17)
the computer system defines a plurality of nodes;
each node of the plurality of nodes including a system control unit of the plurality of system control units and one or more of the plurality of connected processors and caches;
wherein each system control unit is configured to store memory-module addresses of data copied from locked memory regions to connected processors and caches within the same node as the system control unit; and
wherein each system control unit is configured to write back cache data to the locked memory region in response to an unlock request for the locked memory region issued by connected processors and caches within the same node as the system control unit.
-
-
16. The computer system of claim 15, and further comprising one or more additional memory modules to form a plurality of memory modules, each memory module of the plurality of memory modules being configured to store a copy of the data in the other memory modules, wherein the memory controller includes a voter configured to compare copies of data received from the plurality of memory modules and select data having the greatest occurrence from among the copies of data.
-
17. The computer system of claim 15, wherein for at least one node of the plurality of nodes:
-
the node further includes two or more of the plurality of connected processors and caches;
the node further includes a bus connecting the system control unit to the two or more connected processors and caches;
the system control unit receives lock and unlock requests from the two or more connected processors and caches via the common bus; and
the system control unit includes a readable port accessible through the common bus and configured to provide information on the availability of the system control unit for receiving a lock request from the two or more connected processors and caches.
-
Specification