High availability error self-recovering shared cache for multiprocessor systems
First Claim
1. A high availability shared cache in a multiprocessor system having at least two processing units, for storing information as congruence classes to be utilized by the processing units, wherein the multiprocessor system comprises:
- cache means associated with each of said processing units for storing information to be utilized by the processing units;
main memory means for storing information to be utilized by the processing units, which are managed by arbitration means;
bus means for transferring information between the processing units, the shared cache, and the main memory means;
shared cache directory means for managing the shared cache, comprising entries of all the information stored in the shared cache, a Valid bit for the status of the information stored in the shared cache, and a Parity bit for indicating errors in the shared cache; and
error status register means for recording error events, which are provided by the shared cache;
wherein a request for information is transmitted to the shared cache and the main memory means, in parallel, and in case of the error status register means recording an error in any entry of a congruence class in the shared cache directory means, self-recovery of the shared cache is accomplished by invalidating all the entries in the shared cache directory means of the accessed congruence class by resetting the Valid bits to "0" and by setting the Parity bit to a correct value; and
the request for information to the main memory means is not cancelled.
1 Assignment
0 Petitions
Accused Products
Abstract
A high availability shared cache memory in a tightly coupled multiprocessor system provides an error self-recovery mechanism for errors in the associated cache directory or the shared cache itself. After an error in a congruence class of the cache is indicated by an error status register, self-recovery is accomplished by invalidating all the entries in the shared cache directory means of the accessed congruence class by resetting Valid bits to "0" and by setting the Parity bit to a correct value, wherein the request for data to the main memory is not cancelled.
Multiple bit failures in the cached data are recovered by setting the Valid bit in the matching column to "0". The processor reissues the request for data, which is loaded into the processor'"'"'s private cache and the shared cache as well. Further requests to this data by other processors are served by the shared cache.
97 Citations
12 Claims
-
1. A high availability shared cache in a multiprocessor system having at least two processing units, for storing information as congruence classes to be utilized by the processing units, wherein the multiprocessor system comprises:
-
cache means associated with each of said processing units for storing information to be utilized by the processing units; main memory means for storing information to be utilized by the processing units, which are managed by arbitration means; bus means for transferring information between the processing units, the shared cache, and the main memory means; shared cache directory means for managing the shared cache, comprising entries of all the information stored in the shared cache, a Valid bit for the status of the information stored in the shared cache, and a Parity bit for indicating errors in the shared cache; and error status register means for recording error events, which are provided by the shared cache; wherein a request for information is transmitted to the shared cache and the main memory means, in parallel, and in case of the error status register means recording an error in any entry of a congruence class in the shared cache directory means, self-recovery of the shared cache is accomplished by invalidating all the entries in the shared cache directory means of the accessed congruence class by resetting the Valid bits to "0" and by setting the Parity bit to a correct value; and
the request for information to the main memory means is not cancelled. - View Dependent Claims (2, 3, 4, 5, 6, 11)
-
-
7. A high availability shared cache in a multiprocessor system having at least two processing units, for storing information, to be utilized by the processing units, as congruence classes, wherein the multiprocessor system comprises:
-
cache means associated with each of said processing units for storing information to be utilized by the processing units; main memory means for storing information to be utilized by the processing units, which are managed by arbitration means; bus means for transferring information between the processing units, the shared cache, and the main memory means; shared cache directory means for managing the shared cache, comprising entries of all the information stored in the shared cache, a Valid bit for the status of the information stored in the shared cache, and a Parity bit for indicating errors in the shared cache; and error status register means for recording error events, which are provided by the shared cache; wherein a request for information is transmitted to the shared cache and the main memory means, in parallel, and in case of multiple bit failures in the shared cache as indicated by an error correction facility, self-recovery of the shared cache is accomplished by invalidating matching entries in the shared cache directory means of the corresponding congruence class by resetting the Valid bits to "0" and by informing the requesting processing unit that the information can not be provided, whereby the processing unit issues the request for information again and the request is served by the main memory means. - View Dependent Claims (8, 9, 10, 12)
-
Specification