Multiprocessor system having controller for controlling the number of processors for which cache coherency must be guaranteed
First Claim
1. A multiprocessor system comprising:
- a plurality of clusters of processors interconnected via a processor global bus, in which each cluster includes at least two processors having a cache memory and a translation lookaside buffer, a local shared memory, and a memory interface unit which is interconnected to said at least two processors and said local shared memory for controlling an access from said processors to said local shared memory;
a global shared memory;
a system control unit connected between said processor global bus and said global shared memory for controlling an access from a processor in any of said plurality of clusters to said global shared memory; and
means responsive to area attribute information which is held in said translation lookaside buffer in each of said processors for identifying, for an access from any of said plurality of processors, whether cache coherency is to be guaranteed among cache memories in a local cluster or is to be expanded to include all cache memories in all clusters throughout the system.
1 Assignment
0 Petitions
Accused Products
Abstract
To provide a large scale multiprocessor system capable of executing an area limited cache coherency control implementing a high speed operation while substantially reducing the amount of processor-to-processor communications there is provided a translation lookaside buffer which retains cache coherency attribute information defining a limitable cache coherent area to maintain data consistency among caches, and a processor memory interface unit includes a cache coherency control which identifies whether cache coherency is required only within a particular cluster of processors or is required for every one of the cache memories in every one of the clusters throughout the system, on the basis of the contents of the cache coherency attribute information. Further, in another version of large scale multiprocessor system, each cluster may be provided with an export directory which registers an identifier of data whose copy is cached in cache memories in other clusters. Thereby, latency in cache coherency procedures can be reduced greatly, since a cache coherent area can be limited in dependence on various characteristics of data. Further, it is also possible to greatly reduce inter-cluster communication quantities, since it is no longer necessary to broadcast to all processors in the system upon every occasion of a memory read/write.
208 Citations
16 Claims
-
1. A multiprocessor system comprising:
-
a plurality of clusters of processors interconnected via a processor global bus, in which each cluster includes at least two processors having a cache memory and a translation lookaside buffer, a local shared memory, and a memory interface unit which is interconnected to said at least two processors and said local shared memory for controlling an access from said processors to said local shared memory;
a global shared memory;
a system control unit connected between said processor global bus and said global shared memory for controlling an access from a processor in any of said plurality of clusters to said global shared memory; and
means responsive to area attribute information which is held in said translation lookaside buffer in each of said processors for identifying, for an access from any of said plurality of processors, whether cache coherency is to be guaranteed among cache memories in a local cluster or is to be expanded to include all cache memories in all clusters throughout the system. - View Dependent Claims (4)
-
-
2. A multiprocessor system comprising:
-
a plurality of clusters of processors interconnected via a processor global bus, in which each cluster includes at least two processors having a cache memory and a translation lookaside buffer, a local shared memory, and a memory interface unit which is interconnected to said at least two processors and said local shared memory for controlling an access from said processors to said local shared memory;
a global shared memory;
a system control unit connected between said processor global bus and said global shared memory for controlling an access from a processor in any of said plurality of clusters to said global shared memory; and
means responsive to area attribute information which is held in said translation lookaside buffer in each of said processors for identifying, for an access from any of said plurality of processors, whether cache coherency is to be guaranteed among every one of the cache memories throughout the system, or whether cache coherency is to be maintained among cache memories in a limited area of the system.
-
-
3. A multiprocessor system comprising:
-
a plurality of clusters of processors interconnected via a processor global bus, in which each cluster includes at least two processors having a cache memory and a translation lookaside buffer, a local shared memory, and a memory interface unit which is interconnected to said at least two processors and said local shared memory for controlling an access from said processors to said local shared memory;
a global shared memory; and
a system control unit connected between said processor global bus and said global shared memory for controlling an access from a processor in any of said plurality of clusters to said global shared memory;
wherein area attribute information is held in said translation lookaside buffer to identify, for an access from said one of said plurality of processors, whether cache coherency is to be guaranteed among every one of the cache memories throughout the system, or whether cache coherency is to be maintained among cache memories in a limited area of the system; and
whereinsaid memory interface unit comprises cache coherency area determination means for determining a limited area of cache memories in the system which are to be guaranteed cache coherency on the basis of the area attribute information held in said translation lookaside buffer. - View Dependent Claims (5, 6, 9)
-
-
7. A multiprocessor system comprising:
-
a plurality of clusters of processors interconnected via a processor global bus, in which each cluster includes at least two processors having a cache memory and a translation lookaside buffer, a local shared memory, and a memory interface unit which is interconnected to said at least two processors and said local shared memory for controlling an access from said processors to said local shared memory;
a global shared memory; and
a system control unit connected between said processor global bus and said global shared memory for controlling an access from a processor in any of said plurality of clusters to said global shared memory;
wherein area attribute information is held in said translation lookaside buffer to identify, for an access from said one of said plurality of processors, whether cache coherency is to be guaranteed among every one of the cache memories throughout the system, or whether cache coherency is to be maintained among cache memories in a limited area of the system; and
whereinsaid memory interface unit comprises cache coherency area determination means for determining a limited area of cache memories in the system which are to be guaranteed cache coherency on the basis of the area attribute information held in said translation lookaside buffer, and broadcast means for broadcasting information for use in cache coherency to processors within a specified area specified by said cache coherency area determination means.
-
-
8. A multiprocessor system comprising:
-
a plurality of clusters of processors interconnected via a processor global bus, in which each cluster includes at least two processors having a cache memory and a translation lookaside buffer, a local shared memory, and a memory interface unit which is interconnected to said at least two processors and said local shared memory for controlling an access from said processors to said local shared memory;
a global shared memory; and
a system control unit connected between said processor global bus and said global shared memory for controlling an access from a processor in any of said plurality of clusters to said global shared memory;
wherein area attribute information is held in said translation lookaside buffer to identify, for an access from said one of said plurality of processors, whether cache coherency is to be maintained among every one of the cache memories throughout the system, or whether cache coherency is to be maintained among cache memories in a limited area of the system; and
whereinsaid memory interface unit comprises cache coherency area determination means for determining a limited area of cache memories in the system which are to be guaranteed cache coherency on the basis of the area attribute information held in said translation lookaside buffer, cache coherency monitor means for monitoring a cache coherency transaction for processors until its completion within a limited area specified by said cache coherency area determination means, and data supplier select means for selecting, upon completion of the cache coherency transaction, whether to execute a cache-to-cache data transfer within its own cluster, to read data from said local shared memory or to read data from said global shared memory.
-
-
10. A multiprocessor system comprising:
-
a plurality of processors each having a cache memory and a translation lookaside buffer;
a main memory for storing instructions and data processed by said plurality of processors;
a memory interface unit coupled to said plurality of processors and said main memory for controlling an access from said plurality of processors to said main memory;
means responsive to area attribute information, retained in said translation lookaside buffer, for identifying, for access from one of said plurality of processors, whether cache coherency should be maintained among every one of the cache memories throughout the system, or only among cache memories in a limited area of the system; and
said memory interface unit comprises cache coherency area determination means for determining a limited area of cache memories in the system which are to be guaranteed cache coherency on the basis of the area attribute information held in said translation lookaside buffer.
-
-
11. A multiprocessor system comprising:
-
a plurality of processors having a cache memories and translation lookaside buffers;
a main memory for storing instructions and data for processing by said plurality of processors; and
a memory interface unit coupled to said plurality of processors and said main memory for controlling an access from said plurality of processors to said main memory;
wherein area attribute information is held in said translation lookaside buffer for identifying, for an access from one of said plurality of processors, whether cache coherency should be maintained among every one of the cache memories throughout the system, or only among cache memories in a limited area therein; and
whereinsaid memory interface unit comprises cache coherent area determination means for determining a cache coherent area in which cache coherency is necessitated in accordance with area attribute information held in said translation lookaside buffer. - View Dependent Claims (12, 13)
cache coherency monitor means for monitoring whether or not a cache coherency operation is completed among cache memories in processors within a designated area of clusters designated by said cache coherency area determination means, and data supplier select means for selecting, after completion of cache coherency, whether to carry out a cache-to-cache data transfer or to execute a data read from said main memory.
-
-
14. An area limitable processor system comprising:
-
a plurality of processors each of which includes;
an instruction cache memory which retains a part of instructions stored in a main memory, a data cache memory which retains a part of data stored in said main memory, an instruction fetch unit which reads out an instruction to be executed from said instruction cache memory or from said main memory, an instruction execution unit which interprets the instruction fetched by said instruction fetch unit, then reads out a data from said data cache memory or said main memory so as to execute thusly interpreted instruction, and a translation lookaside buffer which translates a virtual address issued from said instruction fetch unit or said instruction execution unit into a real address, wherein said translation lookaside buffer includes a memory space for holding area attribute information which defines a limited area in which cache coherency is to be maintained among a plurality of cache memories of said plurality of processors.
-
-
15. A distributed-memory type multiprocessor system having a cache memory coherency protocol function, comprising:
-
a plurality of clusters coupled to each other via a cluster communication control unit therefor, each cluster being defined by a group, including;
a plurality of processors including built-in cache memories, and a local memory connected to said plurality of processors, said cluster communication control unit connected to said plurality of processors and said local memory, wherein said cluster communication control unit comprises a small capacity export directory which only holds an address of data in said local memory in a local cluster a copy of which is exported in a cache memory in a remote cluster; and
cache coherency area determination means for determining whether cache coherency is guaranteed among every cache memory throughout the system or only among cache memories within the local cluster dependent upon the contents of said export directory.
-
-
16. A distributed-memory type multiprocessor system with a cache memory coherency protocol function, comprising:
-
a plurality of clusters coupled each to other via a cluster communication control unit and a cluster bus, in which each cluster is defined by a group, including;
a plurality of processors including built-in cache memories, a local bus for connecting said plurality of processors, a local memory coupled to said local bus, and said cluster communication control unit connected to said local bus, wherein each of said plurality of processors comprises a local bus cache coherency protocol function for monitoring said local bus and performing a cache coherency protocol as required, wherein said cluster communication control unit comprises;
a local bus cache coherency protocol function for monitoring said local bus whereby a necessitated cache coherency protocol is executed,
a cluster bus cache coherency protocol function for monitoring said cluster bus whereby a necessitated cache coherency protocol is executed among clusters, and
an export directory with a small capacity of memory for holding only an address of data in said local memory in its own cluster a copy of which is exported in a cache memory in a remote cluster; and
means for determining whether it is necessary to tie up said local bus cache coherency protocol function with said cluster bus cache coherency protocol function or its cache coherency protocol may be accomplished separately by said local bus cache coherency protocol function alone without resorting to such a tie-up operation dependent upon the contents of said export directory.
-
Specification