FAULT-TOLERANT CACHE COHERENCE OVER A LOSSY NETWORK
First Claim
1. A method, comprising:
- sending, by a first hardware unit of a first plurality of hardware units on a requester node of a cluster of nodes, a first request for data to a home node of the cluster of nodes, wherein the first request comprises a first sequence number;
wherein each node of the cluster of nodes comprises a plurality of hardware units wherein each hardware unit of the plurality of hardware units is coupled to a particular memory and a particular cache and each particular hardware unit of the plurality of hardware units is configured as a cache controller of the particular memory and the particular cache;
determining, by the first hardware unit, that no data message has been received by the first hardware unit from the home node in a specified time out period for the first request for data;
based on the determining that no data message has been received, sending, by the first hardware unit on the requester node, a second request for data to the home node, wherein the second request comprises a second sequence number;
receiving, by the first hardware unit on the requester node, a first data message containing the requested data from the home node; and
sending, by the first hardware unit on the requester node, an acknowledgement message to the home node.
1 Assignment
0 Petitions
Accused Products
Abstract
A cache coherence system manages both internode and intranode cache coherence in a cluster of nodes. Each node in the cluster of nodes is either a collection of processors running an intranode coherence protocol between themselves, or a single processor. A node comprises a plurality of coherence ordering units (COUs) that are hardware circuits configured to manage intranode coherence of caches within the node and/or internode coherence with caches on other nodes in the cluster. Each node contains one or more directories which tracks the state of cache line entries managed by the particular node. Each node may also contain one or more scoreboards for managing the status of ongoing transactions. The internode cache coherence protocol implemented in the COUs may be used to detect and resolve communications errors, such as dropped message packets between nodes, late message delivery at a node, or node failure. Additionally, a transport layer manages communication between the nodes in the cluster, and can additionally be used to detect and resolve communications errors.
0 Citations
18 Claims
-
1. A method, comprising:
-
sending, by a first hardware unit of a first plurality of hardware units on a requester node of a cluster of nodes, a first request for data to a home node of the cluster of nodes, wherein the first request comprises a first sequence number; wherein each node of the cluster of nodes comprises a plurality of hardware units wherein each hardware unit of the plurality of hardware units is coupled to a particular memory and a particular cache and each particular hardware unit of the plurality of hardware units is configured as a cache controller of the particular memory and the particular cache; determining, by the first hardware unit, that no data message has been received by the first hardware unit from the home node in a specified time out period for the first request for data; based on the determining that no data message has been received, sending, by the first hardware unit on the requester node, a second request for data to the home node, wherein the second request comprises a second sequence number; receiving, by the first hardware unit on the requester node, a first data message containing the requested data from the home node; and sending, by the first hardware unit on the requester node, an acknowledgement message to the home node. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A first node for use in a distributed computing system, the first node comprising:
-
one or more hardware units, wherein each hardware unit of the one or more hardware units comprises one or more processors, registers, content-addressable memories, and/or other computer-implemented hardware circuity; wherein each hardware unit of the one or more hardware units is coupled to a particular memory and a particular cache and each particular hardware unit of the one or more hardware units is configured as a cache controller of the particular memory and the particular cache; wherein a first hardware unit, of the one or more hardware units, is configured to; send a first request for data to a second node, wherein the first request comprises a first sequence number; determine that no data message has been received by the first hardware unit from the second node in a specified time out period for the first request for data; based on the determination that no data message has been received, send, by the first hardware unit, send a second request for data to the second node, wherein the second request comprises a second sequence number; receive a first data message containing the requested data from the second node; and send an acknowledgement message to the second node. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. One or more non-transitory computer-readable storage media storing instructions, which when executed by one or more processors, cause:
-
sending, by a first hardware unit of a first plurality of hardware units on a requester node of a cluster of nodes, a first request for data to a home node of the cluster of nodes, wherein the first request comprises a first sequence number; wherein each node of the cluster of nodes comprises a plurality of hardware units wherein each hardware unit of the plurality of hardware units is coupled to a particular memory and a particular cache and each particular hardware unit of the plurality of hardware units is configured as a cache controller of the particular memory and the particular cache; determining, by the first hardware unit, that no data message has been received by the first hardware unit from the home node in a specified time out period for the first request for data; based on the determining that no data message has been received, sending, by the first hardware unit on the requester node, a second request for data to the home node, wherein the second request comprises a second sequence number; receiving, by the first hardware unit on the requester node, a first data message containing the requested data from the home node; and sending, by the first hardware unit on the requester node, an acknowledgement message to the home node. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification