System for handling coherence protocol races in a scalable shared memory system based on chip multiprocessing
First Claim
1. A multiprocessor computer system comprising a plurality of nodes, each node from said plurality of nodes comprising:
- an interface to a local memory subsystem, the local memory subsystem storing a multiplicity of memory lines of information and a directory;
a first memory cache for caching memory lines of information, said memory lines of information including memory lines of information stored in the local memory subsystem and memory lines of information stored in a remote memory subsystem that is local to another node;
a protocol engine configured to maintain cache coherence across the plurality of nodes;
a cache controller configured to maintain cache coherence within the node;
the protocol engine configured transmit an external request concerning a memory line of information to the cache controller for processing and a response, the external request originating from another node;
the cache controller configured to transmit an internal request concerning the memory line of information to the protocol engine for processing and a response, the internal request originating from the first memory cache;
the protocol engine configured to process the transmitted internal request, if a memory transaction corresponding to the transmitted internal request and a memory transaction corresponding to the transmitted external request overlap, by sending an instruction request to the cache controller for a set of one or more instructions concerning the transmitted internal request; and
stalling action on the transmitted internal request until after the set of one or more instructions is received.
5 Assignments
0 Petitions
Accused Products
Abstract
In a chip multiprocessor system, the coherence protocol is split into two cooperating protocols implemented by different hardware modules. One protocol is responsible for cache coherence management within the chip, and is implemented by a second-level cache controller. The other protocol is responsible for cache coherence management across chip multiprocessor nodes, and is implemented by separate cache coherence protocol engines. The cache controller and the protocol engine within each node communicate and synchronize memory transactions involving multiple nodes to maintain cache coherence within and across the nodes. The present invention addresses race conditions that arise during this communication and synchronization.
-
Citations
41 Claims
-
1. A multiprocessor computer system comprising a plurality of nodes, each node from said plurality of nodes comprising:
-
an interface to a local memory subsystem, the local memory subsystem storing a multiplicity of memory lines of information and a directory;
a first memory cache for caching memory lines of information, said memory lines of information including memory lines of information stored in the local memory subsystem and memory lines of information stored in a remote memory subsystem that is local to another node;
a protocol engine configured to maintain cache coherence across the plurality of nodes;
a cache controller configured to maintain cache coherence within the node;
the protocol engine configured transmit an external request concerning a memory line of information to the cache controller for processing and a response, the external request originating from another node;
the cache controller configured to transmit an internal request concerning the memory line of information to the protocol engine for processing and a response, the internal request originating from the first memory cache;
the protocol engine configured to process the transmitted internal request, if a memory transaction corresponding to the transmitted internal request and a memory transaction corresponding to the transmitted external request overlap, by sending an instruction request to the cache controller for a set of one or more instructions concerning the transmitted internal request; and
stalling action on the transmitted internal request until after the set of one or more instructions is received. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
the protocol engine includes a first memory transaction array, said first memory transaction array comprising one or more entries corresponding to zero or more internal requests and zero or more external requests. -
3. The system of claim 2, wherein
each of said one or more entries include an identifier of a memory line of information. -
4. The system of claim 3, wherein
the identifier is the physical address of the memory line of information. -
5. The system of claim 2, wherein
the protocol engine is configured to add an entry associated with the transmitted external request to the first memory transaction array. -
6. The system of claim 5, wherein
the protocol engine is configured to add an entry associated with the transmitted internal request to the first memory transaction array upon receiving the transmitted internal request. -
7. The system of claim 6, wherein
the protocol engine is configured to scan the first memory transaction array for the entry associated with the transmitted external request upon receiving the transmitted internal request; - and
confirm a match between an identifier of the memory line of information included in the entry associated with the external request and the identifier of the memory line of information included in the entry associated with the transmitted internal request.
- and
-
8. The system of claim 7, wherein
the protocol engine is configured to modify the entry associated with the transmitted internal request to reflect the match, said match establishing that the memory transaction corresponding to the transmitted internal request and the memory transaction corresponding to the transmitted external request overlap. -
9. The system of claim 8, wherein
the protocol engine is configured to modify the entry associated with the transmitted internal request to require said protocol engine to execute the set of one or more instructions before taking further action on said transmitted internal request, whereby action on said transmitted internal request is stalled. -
10. The system of claim 1, wherein
the protocol engine is configured to scan the transmitted internal request to determine whether said transmitted internal request indicates that the memory transaction corresponding to the transmitted internal request and the memory transaction corresponding to the transmitted external request overlap. -
11. The system of claim 1, wherein
each node from the plurality of nodes further comprises an output buffer, said output buffer configured to receive the transmitted internal request and a response to the transmitted external request from the cache controller and forward said transmitted internal request and said response to the protocol engine. -
12. The system of claim 11, wherein
the output buffer is configured to determine whether the memory transaction corresponding to the transmitted external request and the memory transaction corresponding to the transmitted internal request overlap. -
13. The system of claim 11, wherein
the output buffer is configured to modify the transmitted internal request to indicate that the memory transaction corresponding to the transmitted external request and the memory transaction corresponding to the transmitted internal request overlap if the output buffer receives the transmitted external request from the cache controller before the response but forwards said response to the protocol engine before the transmitted internal request. -
14. The system of claim 11, wherein
the output buffer includes a high priority lane, a low priority lane, and a comparator; -
the high priority lane configured to store a plurality of high priority messages received from the cache controller, said plurality of high priority messages including a response to the transmitted external request;
the low priority lane configured to store a plurality of low priority message received from the cache controller, said plurality of low priority messages including the transmitted internal request; and
the comparator is configured to determine whether the response when selected for transmittal from the high priority lane matches the transmitted internal request, said comparator further configured to modify said transmitted internal request to indicate that the memory transaction corresponding to the transmitted internal request and the memory transaction corresponding to the transmitted external request overlap.
-
-
15. The system of claim 14, wherein
low priority lane comprises a series of low priority staging buffers, said plurality of low priority messages being individually stored in the series of low priority staging buffers and selected for transmittal in the order received from the cache controller. -
16. The system of claim 14, wherein
high priority lane comprises a series of high priority staging buffers, said plurality of high priority messages being individually stored in the series of high priority staging buffers and selected for transmittal in the order received from the cache controller. -
17. The system of claim 14, wherein
the comparator is configured to compare an identifier of a memory line of information included in a high priority message from the plurality of high priority messages to an identifier of a memory line of information included in a low priority message included in the plurality of low priority messages. -
18. The system of claim 17, wherein
the identifier of the memory line of information included in the high priority message and the identifier of the memory line of information included in the low priority message included in the plurality of low priority messages are each a physical memory address of the respective memory lines of information. -
19. The system of claim 14, wherein
said output buffer is configured to select an available high priority message from the high priority lane over an available low priority message from the low priority lane for forwarding to the protocol engine. -
20. The system of claim 14, wherein
the output buffer is configured to modify the transmitted internal request to indicate that the memory transaction corresponding to said transmitted internal request and the memory transaction corresponding to the transmitted external request overlap by setting one or more bits of said transmitted internal request. -
21. The system of claim 14, wherein
the output buffer is an integrated element of the cache controller. -
22. The system of claim 21, wherein
the transmitted internal request originates from the first memory cache. -
23. The system of claim 11, wherein
each node from the plurality of nodes further comprises an input buffer, said input buffer configured to receive the transmitted internal request and the response to the transmitted external request from the output buffer, said protocol engine configured to access said input buffer to process said transmitted internal request and to process said response to the transmitted external request. -
24. The system of claim 23, wherein
the protocol engine is configured to mark the transmitted internal request as stale to indicate that the memory transaction corresponding to the transmitted internal request and the memory transaction corresponding to the response to the transmitted external request overlap if the protocol engine extracts the response to the transmitted external request from the input buffer before the transmitted internal request when the transmitted internal request is received by the input buffer before the response to the transmitted external request. -
25. The system of claim 23, wherein
the input buffer comprises a set of high priority buffers and a set of low priority buffers; -
the set of high priority buffers configured to store a plurality of high priority messages received from the output buffer, said plurality of high priority messages including the response to the transmitted external request; and
the set of low priority buffers configured to store a plurality of low priority messages received from the output buffer, said plurality of low priority messages including the transmitted internal request.
-
-
26. The system of claim 25, wherein
the protocol engine compares an identifier of a memory line of information included in a high priority message stored in said input buffer to an identifier of a memory line of information included in a low priority message stored in said input buffer upon extracting said high priority message from said input buffer to determine if a memory transaction corresponding to the high priority message and a memory transaction corresponding to the low priority message overlap. -
27. The system of claim 26, wherein
the identifier is comprised of a plurality of bits; - and
the match is limited to a subset of the plurality of bits.
- and
-
28. The system of claim 25, wherein
the protocol engine is configured to select a high priority message from the set of high priority buffers over a low priority message from the set of low priority buffers when extracting a message from said input buffer. -
29. The system of claim 25, wherein
the protocol engine is configured to modify a low priority buffer from the set of low priority input buffers that stores the transmitted internal request to indicate that the memory transaction corresponding to the response to the transmitted external request and the memory transaction corresponding to the transmitted internal request overlap if the protocol engine extracts the response to the transmitted external request from the input buffer before the transmitted internal request when the transmitted internal request is received by the input buffer before the response to the transmitted external request. -
30. The system of claim 23, wherein
the input buffer is an integrated element of the protocol engine. -
31. The system of claim 1, wherein
the memory transaction corresponding to the transmitted internal request and the memory transaction corresponding to the transmitted external request overlap if the transmitted internal request is received before a response to the transmitted external request. -
32. The system of claim 1, wherein
the cache controller is configured to update the directory upon processing the transmitted external request, said directory subsequently reflecting a state of the memory line of information consistent with the transmitted external request. -
33. The system of claim 1, wherein
the cache controller is configured to respond to the instruction request by determining a consistency of the transmitted internal request with a state of the memory line of information, said state of said memory line of information stored in the directory, said consistency guiding the selection of the set of one or more instructions. -
34. The system of claim 33, wherein
the transmitted internal request is not consistent with the state of the memory line of information if the transmitted internal request is for a shared or exclusive copy of the memory line of information and said state of said memory line of information indicates that the memory line of information is not exclusively owned or shared by another node, said set of one or more instructions directing the protocol engine to abort the transmitted internal request. -
35. The system of claim 33, wherein
the transmitted internal request is not consistent with the state of the memory line of information if the transmitted internal request is for exclusive ownership of the memory line of information and said state of said memory line of information indicates that the memory line of information is exclusively owned by another node, said set of one or more instructions directing the protocol engine to abort the transmitted internal request. -
36. The system of claim 33, wherein
the internal request type is not consistent with the state of the memory line of information if the state of the memory line of information indicates that a different set of one or more nodes may be sharing or exclusively owning the memory line of information then when the cache controller transmitted the transmitted internal request, said set of one or more instructions including up-to-date sharing information extracted from said state of said memory line of information and directing the protocol engine to execute that transmitted internal request with reference to said up-to-date sharing information. -
37. The system of claim 1, wherein
the cache controller is configured to not update the directory with respect to the transmitted internal request until after receiving from the protocol engine a response to the transmitted internal request. -
38. The system of claim 1, wherein
the protocol engine is configured to defer action on an additional external request concerning the memory line of information until after the set of one or more instructions is executed. -
39. The system of claim 38, wherein
the protocol engine is configured to add an entry to a first memory transaction array concerning the additional external request, said first memory transaction array comprising one or more entries corresponding to zero or more internal requests and zero or more external requests. -
40. The system of claim 38, wherein
the protocol engine is configured to scan the first memory transaction array for the entry associated with the transmitted internal request; - and
confirm a match between an identifier of the memory line of information included in the entry associated with the transmitted internal request and the identifier of the memory line of information included in the entry associated with the additional external request.
- and
-
41. The system of claim 40, wherein
deferring action on the additional external request includes modifying the entry associated with the additional external request to indicate the set of one or more instructions must be executed before continuing progress on the additional external request, whereby the additional external request is stalled.
-
Specification