Memory controller for controlling memory accesses across networks in distributed shared memory processing systems
First Claim
1. A cache coherency system for a shared memory parallel processing system including a plurality of processing nodes, comprising:
- a single multi-stage communication network for interconnecting said processing nodes, said network including a dual priority switch at each node for selectively operating in normal low priority mode and camp-on high priority mode;
each said processing node including a unique section of shared memory which is not a cache;
each said processing node including one or more caches for storing a plurality of cache lines;
a cache coherency directory which is distributed to each of said nodes for tracking which of one or more of said nodes have copies of each cache line; and
an adapter for storing changed data immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed, such that said shared memory always contains the most recent data according to a two hop process including in hop
1) a requesting node requests most recent data of a home node, and in hop
2) said home node immediately returns said most recent data from its shared memory to said requesting node.
0 Assignments
0 Petitions
Accused Products
Abstract
A shared memory parallel processing system interconnected by a multi-stage network combines new system configuration techniques with special-purpose hardware to provide remote memory accesses across the network, while controlling cache coherency efficiently across the network. The system configuration techniques include a systematic method for partitioning and controlling the memory in relation to local verses remote accesses and changeable verses unchangeable data. Most of the special-purpose hardware is implemented in the memory controller and network adapter, which implements three send FIFOs and three receive FIFOs at each node to segregate and handle efficiently invalidate functions, remote stores, and remote accesses requiring cache coherency. The segregation of these three functions into different send and receive FIFOs greatly facilitates the cache coherency function over the network. In addition, the network itself is tailored to provide the best efficiency for remote accesses.
-
Citations
20 Claims
-
1. A cache coherency system for a shared memory parallel processing system including a plurality of processing nodes, comprising:
-
a single multi-stage communication network for interconnecting said processing nodes, said network including a dual priority switch at each node for selectively operating in normal low priority mode and camp-on high priority mode;
each said processing node including a unique section of shared memory which is not a cache;
each said processing node including one or more caches for storing a plurality of cache lines;
a cache coherency directory which is distributed to each of said nodes for tracking which of one or more of said nodes have copies of each cache line; and
an adapter for storing changed data immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed, such that said shared memory always contains the most recent data according to a two hop process including in hop
1) a requesting node requests most recent data of a home node, and in hop
2) said home node immediately returns said most recent data from its shared memory to said requesting node.- View Dependent Claims (2, 3, 4, 5, 6)
a shared memory including a first memory portion for storing unchangeable data and a second memory portion for storing changeable data; and
said cache coherency directory listing which nodes of said plurality of processing nodes have accessed copies of said cache lines in said second memory portion.
-
-
3. The cache coherency system of claim 2, each of said plurality of processing nodes being operable for reading, storing, and invalidating said shared memory at any other of said processing nodes.
-
4. The cache coherency system of claim 3, further comprising at a first node of said plurality of processing nodes a memory controller selectively operable first responsive to a request for access to a memory word by first accessing the cache at said first node and, if said requested memory word is not available in said cache, selectively operable second for accessing said memory word selectively from said shared memory regardless of which of said nodes includes the section of shared memory being accessed, and storing said cache line including said memory word to said cache at said first node.
-
5. The cache coherency system of claim 4, said memory controller further being selectively operable for deleting a cache line from said cache at said first node when said cache is full to provide space for a new cache line to be stored to said cache, and for sending the address of the deleted cache line to an invalidation directory to indicate said node no longer has a copy of said cache line.
-
6. The cache coherency system of claim 4, said memory controller further being selectively operable for sending cache update messages to update corresponding cache lines at all remote nodes having copies of a changed cache line and for receiving cache lines of data from remote nodes for updating the cache at said first node.
-
7. A cache coherency system for a shared memory parallel processing system including a plurality of processing nodes, comprising:
-
a multi-stage communication network for interconnecting said processing nodes;
each said processing node including a unique section of shared memory which is not a cache;
each said processing node including one or more caches for storing a plurality of cache lines;
a cache coherency directory which is distributed to each of said nodes for tracking which of said nodes have copies of each cache line; and
a network adapter for controlling cache coherency autonomously without intervention from any said processing node storing changed data immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed according to a two hop process including in a first hop a requesting node requests most recent data of a home node and in a second hop said home node immediately returns said most recent data from its shared memory to said requesting node, such that said shared memory always contains the most recent data. - View Dependent Claims (8, 9, 10, 11, 12)
a shared memory including a first memory portion for storing unchangeable data and a second memory portion for storing changeable data; and
said cache coherency directory listing which nodes of said plurality of processing nodes have accessed copies of said cache lines in said second memory portion.
-
-
9. The cache coherency system of claim 8, each of said plurality of processing nodes being operable for reading, storing, and invalidating said shared memory at any other of said processing nodes.
-
10. The cache coherency system of claim 9, further comprising at a first node of said plurality of processing nodes a memory controller selectively operable first responsive to a request for access to a memory word by first accessing the cache at said first node and, if said requested memory word is not available in said cache, selectively operable second for accessing said memory word selectively from said shared memory regardless of which of said nodes includes the section of shared memory being accessed, and storing said cache line including said memory word to said cache at said first node.
-
11. The cache coherency system of claim 10, said memory controller further being selectively operable for deleting a cache line from said cache at said first node when said cache is full to provide space for a new cache line to be stored to said cache, and for sending the address of the deleted cache line to an invalidation directory to indicate said node no longer has a copy of said cache line.
-
12. The cache coherency system of claim 10, said memory controller further being selectively operable for sending cache update messages to update corresponding cache lines at all remote nodes having copies of a changed cache line and for receiving cache lines of data from remote nodes for updating the cache at said first node.
-
13. A method for operating a shared memory parallel processing system as a cache coherency system including a plurality of processing nodes, each said processing node including a unique section of shared memory which is not a cache, comprising the steps of:
-
interconnecting said processing nodes through a single multi-stage communication network, said network including a dual priority switch at each node for selectively operating in normal low priority mode and camp-on high priority mode;
storing at each said processing node a plurality of cache lines in one or more caches;
distributing to each of said processing nodes a cache coherency directory;
tracking in said cache coherency directory which of said one or more of said processing nodes have copies of each cache line; and
changing said shared memory according to a two hop process including in hop
1) a requesting node requests most recent data of a home node, and in hop
2) said home node immediately returns said most recent data from its shared memory to said requesting node, wherein changed data is stored immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed, wherein said shared memory always contains the most recent data.
-
-
14. A method for operating a shared memory parallel processing system as a cache coherency system including a plurality of processing nodes, each said processing node including a unique section of shared memory which is not a cache, comprising the steps of:
-
interconnecting said processing nodes through a multi-stage communication network;
storing at each said processing node a plurality of cache lines in one or more caches;
distributing to each of said processing nodes a cache coherency directory;
tracking in said cache coherency directory which of said processing nodes have copies of each cache line; and
changing said shared memory according to a two hop processing including in a first hop a requesting node requests most recent data of a home node and in a second hop said home node immediately returns said most recent data from its shared memory to said requesting node, wherein changed data is stored immediately to said unique section of shared memory without intervention from any said processing node regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed, wherein said shared memory always contains the most recent data.
-
-
15. A program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform method steps for operating a shared memory parallel processing system including a plurality of processing nodes, each said processing node including a unique section of shared memory which is not a cache, said method steps comprising:
-
interconnecting said processing nodes through a single multi-stage communication network, said network including a dual priority switch at each node for selectively operating in normal low priority mode and camp-on high priority mode;
storing at each said processing node a plurality of cache lines in one or more caches;
tracking in a cache coherency directory which is distributed to each of said processing nodes which of one or more of said processing nodes have copies of each cache line; and
changing said unique section of shared memory according to a two hop process including in hop
1) a requesting node requests most recent data of a home node, and in hop
2) said home node immediately returns said most recent data from its shared memory to said requesting node, wherein changed data is stored immediately to shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed, wherein said shared memory always contains the most recent data.
-
-
16. A program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform method steps for operating a shared memory parallel processing system including a plurality of processing nodes, each said processing node including a unique section of shared memory which is not a cache, said method steps comprising:
-
interconnecting said processing nodes through a multi-stage communication network;
storing at each said processing node a plurality of cache lines in one or more caches;
tracking in a cache coherency directory which is distributed to each of said processing nodes which of said processing nodes have copies of each cache line; and
changing said unique section of shared memory according to a two hop process including a requesting node requests most recent data of a home node and said home node immediately returns said most recent data from its shared memory to said requesting node, wherein changed data is stored immediately to shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed, wherein said shared memory always contains the most recent data.
-
-
17. An article of manufacture comprising:
-
a computer useable medium having computer readable program code means embodied therein for operating a shared memory parallel processing system including a plurality of processing nodes, each said processing node including a unique section of shared memory which is not a cache, the computer readable program means in said article of manufacture comprising;
computer readable program code means for causing a computer to effect interconnecting said processing nodes through a multi-stage communication network, said network including a dual priority switch at each node for selectively operating in normal low priority mode and camp-on high priority mode;
computer readable program code means for causing a computer to effect storing at each said processing node a plurality of cache lines in one or more caches;
computer readable program code means for causing a computer to effect tracking in a cache coherency directory which is distributed to each of said processing nodes which of said processing nodes have copies of each cache line; and
computer readable program code means for storing changed data immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed according to a two hop process including in hop
1) a requesting node requests most recent data of a home node, and in hop
2) said home node immediately returns said most recent data from its shared memory to said requesting node, such that said shared memory always contains the most recent data.
-
-
18. An article of manufacture comprising:
-
a computer useable medium having computer readable program code means embodied therein for operating a shared memory parallel processing system including a plurality of processing nodes, each said processing node including a unique section of shared memory which is not a cache, the computer readable program means in said article of manufacture comprising;
computer readable program code means for causing a computer to effect interconnecting said processing nodes through a multi-stage communication network;
computer readable program code means for causing a computer to effect storing at each said processing node a plurality of cache lines in one or more caches;
computer readable program code means for causing a computer to effect tracking in a cache coherency directory which is distributed to each of said processing nodes which of said processing nodes have copies of each cache line; and
computer readable program code means for executing a two stage process including a requesting node requests most recent data of a home node and said home node immediately returns said most recent data from its shared memory to said requesting node thus storing changed data immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed.
-
-
19. A computer program product or computer program element for operating a shared memory parallel processing system including a plurality of processing nodes, each said node including a unique section of shared memory which is not a cache, according to the steps of:
-
interconnecting said processing nodes through a single multi-stage communication network, said network including a dual priority switch at each node for selectively operating in normal low priority mode and camp-on high priority mode;
storing at each said processing node a plurality of cache lines in one or more caches;
distributing to each of said processing nodes a cache coherency directory;
tracking in said cache coherency directory which of said processing nodes have copies of each cache line; and
storing changed data immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed according to a two hop process including in hop
1) a requesting node requests most recent data of a home node, and in hop
2) said home node immediately returns said most recent data from its shared memory to said requesting node such that said shared memory always contains the most recent data.
-
-
20. A computer program product or computer program element for operating a shared memory parallel processing system including a plurality of processing nodes, each said node including a unique section of shared memory which is not a cache, according to the steps of:
-
interconnecting said processing nodes through a multi-stage communication network including a dual priority switch at each node for selectively operating in normal low priority mode and camp-on high priority mode;
storing at each said processing node a plurality of cache lines in one or more caches;
distributing to each of said processing nodes a cache coherency directory;
tracking in said cache coherency directory which of said processing nodes have copies of each cache line; and
storing changed data immediately to said unique section of shared memory regardless of which of said nodes is changing the data and which of said nodes includes the section of shared memory to be changed according to a two hop process including in a first hop a requesting node requests most recent data of a home node and a second hop said home node immediately returns said most recent data from its shared memory to said requesting node, such that said shared memory always contains the most recent data.
-
Specification