Cache coherent network adapter for scalable shared memory processing systems
First Claim
1. A cache coherency controller for a distributed, scalable shared memory systems, said system including a scalable plurality of nodes, comprising:
- shared memory distributed to the node memory at each node for storing a plurality of storage words at addressable memory locations in each of a plurality of cache lines;
said node memory subdivided into a first section for changeable data and a second section for unchangeable data;
a status bit associated with each of said storage words for defining whether the corresponding memory location contains changeable or constant data;
a distributed invalidation directory at each node associated with said first section for listing and tracking which nodes have copies of each cache line in said first section, said invalidation directory being expandable when necessary by using an overflow directory so as not to limit the number of nodes that can access each cache line;
a memory controller at each node for determining whether an address in shared memory to which access is being sought by a first thread is located in local memory or remote memory; and
if the access is remote, for signaling the node processor that a remote read is required for said first thread, enabling said node processor selectively to respond by switching program threads;
generating a read request message for the cache line containing a requested storage word to the remote node having the memory address being accessed;
receiving the requested cache line from said remote node;
storing the requested cache line to local cache; and
signaling the node processor that the requested data is available; and
if data is stored to a cache line which resides in said changeable portion of memory, for invalidating copies of said cache line stored at remote nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
A shared memory parallel processing system interconnected by a multi-stage network combines new system configuration techniques with special-purpose hardware to provide remote memory accesses across the network, while controlling cache coherency efficiently across the network. The system configuration techniques include a systematic method for partitioning and controlling the memory in relation to local verses remote accesses and changeable verses unchangeable data. Most of the special-purpose hardware is implemented in the memory controller and network adapter, which implements three send FIFOs and three receive FIFOs at each node to segregate and handle efficiently invalidate functions, remote stores, and remote accesses requiring cache coherency. The segregation of these three functions into different send and receive FIFOs greatly facilitates the cache coherency function over the network. In addition, the network itself is tailored to provide the best efficiency for remote accesses.
132 Citations
7 Claims
-
1. A cache coherency controller for a distributed, scalable shared memory systems, said system including a scalable plurality of nodes, comprising:
-
shared memory distributed to the node memory at each node for storing a plurality of storage words at addressable memory locations in each of a plurality of cache lines; said node memory subdivided into a first section for changeable data and a second section for unchangeable data; a status bit associated with each of said storage words for defining whether the corresponding memory location contains changeable or constant data; a distributed invalidation directory at each node associated with said first section for listing and tracking which nodes have copies of each cache line in said first section, said invalidation directory being expandable when necessary by using an overflow directory so as not to limit the number of nodes that can access each cache line; a memory controller at each node for determining whether an address in shared memory to which access is being sought by a first thread is located in local memory or remote memory; and if the access is remote, for signaling the node processor that a remote read is required for said first thread, enabling said node processor selectively to respond by switching program threads;
generating a read request message for the cache line containing a requested storage word to the remote node having the memory address being accessed;
receiving the requested cache line from said remote node;
storing the requested cache line to local cache; and
signaling the node processor that the requested data is available; andif data is stored to a cache line which resides in said changeable portion of memory, for invalidating copies of said cache line stored at remote nodes.
-
-
2. A cache coherency controller for a processing node of a shared memory Parallel processing system, said node including a node memory and a cache, said cache coherency controller comprising:
-
an invalidation directory for storing a list of node identifier segments for nodes which have copied a cache line from said cache since the last time said cache line was changed; an overflow directory for expanding said invalidation directory; tracking means for adding to said invalidation directory or said overflow directory the node identifier segment of nodes copying each cache line; invalidation means responsive to a change to a cache line for invalidating the copies of said changed cache line at all local and remote nodes listed in said invalidation directory or overflow directory for said changed cache line; said tracking means being responsive (A) to said dedicated invalidation word not containing any previously invalid node identifier segments and having a valid extension address field pointing to an address in said overflow directory for (1) accessing a second invalidation word from said overflow directory which is solely dedicated to the same cache line of local node memory, (2) adding the node identifier segment of the node requesting the access as a valid node identifier segment to a node identifier segment that was previously invalid in said dedicated invalidation word, and (3) returning the modified and dedicated invalidation word to said overflow directory to the address defined by said extension address; and (B) to said dedicated invalidation word not containing any previously invalid node identifier segments and having an invalid extension address field for (1) procuring a new extension address field, (2) storing said new extension address as a valid extension field to the first invalidation word, (3) stores said first invalidation word to the invalidation directory, (4) creating a second invalidation word which is initially all zeroes and which is solely dedicated to the same cache line of local memory, (5) adding the node ID number of the node requesting the access as a valid node identifier segment to a node identifier segment that was previously invalid in said dedicated invalidation word, and (6) returning the modified and dedicated invalidation word to said overflow directory to the address defined by said extension address. - View Dependent Claims (3)
-
-
4. A shared memory processing node, comprising:
-
a network adapter for interfacing a communications network; a node memory including a section of a shared memory; a local processor; at least one local cache accessible only by said local processor; said local processor comprising means for writing data to said private cache while selectively writing said data to said section of shared memory or loading said data to said network adapter for updating a cache and section of shared memory at another processing node; said network adapter further comprising; an invalidation directory for storing a list of node identifier segments for nodes which have copied a cache line from said cache since the last time said cache line was changed; a remote data storing means for storing a quantity of data equal to a cache line over said network to a remote memory at any of a plurality of remote nodes; and cache coherency means responsive to said invalidation directory for maintaining coherency of said cache and caches at all of said plurality of remote nodes; a local memory controller; said processor being operable responsive to a request from a requesting node for accessing data at a memory address for transmitting the cache line address and a cache line of data to said local memory controller and said local cache with a command (1) to change the addressed cache line in said local cache and said local memory if said memory address addresses a location in that portion of shared memory at this local node;
or(2) to change the addressed cache line in said local caches, the remote memory, and the remote caches of remote nodes if said memory address addresses a location in that portion of shared memory at a remote node; invalidation command means for accessing said invalidation directory to invalidate all copies of said cache line stored to remote nodes upon detecting said cache line address is for storing data to said local memory; a send FIFO; a receive FIFO; and store message processing means for generating store messages and controlling the operation of said send FIFO and said receive FIFO for selectively sending and receiving said store messages;
said store messages operable for changing a cache line of data in a remote node over said network and including a cache line of data words and a message header including destination node indicia equal to a sector segment of a cache line address and source node indicia equal the node ID number of the local node; and
memory address indicia equal to said memory address;
said store message processing means controlling the operation of said send FIFO for storing and forwarding said store message to said network adapter for transmission to said network and thence to the remote node selected by said destination node indicia, and thereafter deleting said store message;said store message processing means for converting the message header of a store message received to said receive FIFO to the local memory address of a cache line of data to be changed, and delivering said memory address and message data to the caches and local memory of the local node for the purpose of updating the addressed cache line.
-
-
5. A shared memory processing node, comprising:
-
a network adapter for interfacing a communications network; a node memory including a section of a shared memory; at least one local cache; a local processor for writing data to said private cache while selectively writing said data to said section of shared memory or loading said data to said network adapter for updating a cache and section of shared memory at another processing node; said network adapter further comprising; a send FIFO; receive FIFO; an invalidation directory for storing a list of node identifier segments for nodes which have copied a cache line from said cache since the last time said cache line was changed; store message processing means for generating store messages and controlling the operation of said send FIFO and said receive FIFO for selectively sending and receiving said store messages;
said store message processing means being responsive to a store message for a selected cache line from a given remote node and to said invalidation directory for storing said cache line to said local cache and node memory and for providing a cache line invalidation message to said network adapter for communication to all remote nodes other than said given remote node;a time stamp register for providing a time value to said cache line invalidation message for communication to remote nodes; and said store message processing means being responsive to a cache line invalidation message including a time value received from a remote node selectively for invalidating an addressed cache line in said local cache. - View Dependent Claims (6, 7)
-
Specification