Directory-based, shared-memory, scaleable multiprocessor computer system having deadlock-free transaction flow sans flow control protocol

US 6,141,692 A
Filed: 07/01/1996
Issued: 10/31/2000
Est. Priority Date: 07/01/1996
Status: Expired due to Term

First Claim

Patent Images

1. A multi-processor computer system comprising:

a global interconnect;

a plurality of n nodes, each node having;

a local interconnect;

at least one processor, said processor being coupled to the local interconnect;

a cache associated with each processor;

a main memory coupled to the local interconnect, said main memory being equally accessible to all processors within its respective node;

a global interface which couples the global interconnect to the local interconnect of its respective node, said global interface including a transaction filter, a tag memory, home agent, a slave agent, and a request agent, said transaction filter routes cache coherency transactions from said local interconnect through a local physical address-to-global address translator to said request agent, said transaction filter routes input/output transactions from said local interconnect through an I/O input queue to said request agent, and said tag memory stores a permission status entry for each of said routed cache coherency transactions and said routed input/output transactions; and

at least one input buffer associated with each home agent and each slave agent and forming a portion of said global interface, each input buffer associated with said each home agent and said each slave agent of each global interface of each of the plurality of n nodes sized to contain a number of storage locations corresponding to at least a maximum number of outstanding transaction requests receivable at each node of the plurality of nodes, the maximum number of outstanding transaction requests being the outstanding transaction requests together issuable by all of said plurality of n nodes.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided which eliminate the need for an active traffic flow control protocol to manage request transaction flow between the nodes of a directory-based, scaleable, shared-memory, multi-processor computer system. This is accomplished by determining the maximum number of requests that any node can receive at any given time, providing an input buffer at each node which can store at least the maximum number of requests that any node can receive at any given time and transferring stored requests from the buffer as the node completes requests in process and is able to process additional incoming requests. As each node may have only a certain finite number of pending requests, this is the maximum number of requests that can be received by a node acting in slave capacity from any another node acting in requester capacity. In addition, each node may also issue requests that must be processed within that node. Therefore, the input buffer must be sized to accommodate not only external requests, but internal ones as well. Thus, the buffer must be able to store at least the maximum number of transaction requests that may be pending at any node, multiplied by the number of nodes present in the system.

Citations

17 Claims

1. A multi-processor computer system comprising:
- a global interconnect;
  
  a plurality of n nodes, each node having;
  
  a local interconnect;
  
  at least one processor, said processor being coupled to the local interconnect;
  
  a cache associated with each processor;
  
  a main memory coupled to the local interconnect, said main memory being equally accessible to all processors within its respective node;
  
  a global interface which couples the global interconnect to the local interconnect of its respective node, said global interface including a transaction filter, a tag memory, home agent, a slave agent, and a request agent, said transaction filter routes cache coherency transactions from said local interconnect through a local physical address-to-global address translator to said request agent, said transaction filter routes input/output transactions from said local interconnect through an I/O input queue to said request agent, and said tag memory stores a permission status entry for each of said routed cache coherency transactions and said routed input/output transactions; and
  
  at least one input buffer associated with each home agent and each slave agent and forming a portion of said global interface, each input buffer associated with said each home agent and said each slave agent of each global interface of each of the plurality of n nodes sized to contain a number of storage locations corresponding to at least a maximum number of outstanding transaction requests receivable at each node of the plurality of nodes, the maximum number of outstanding transaction requests being the outstanding transaction requests together issuable by all of said plurality of n nodes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The multi-processor computer system of claim 1, wherein each cache comprises a plurality of storage locations, each location sized to store data from an addressable portion of the main memory associated with any node.
  - 3. The multi-processor computer system of claim 2, wherein a portion of the main memory associated with each node is set aside as a directory for cache lines stored within that node, said directory also providing status information for each cache line.
  - 4. The multi-processor computer system of claim 3, wherein said status information identifies one of four data states:
    - shared, owned, modified or invalid.
  - 5. The multi-processor computer system of claim 1, wherein each exportable address location within main memory is associated with a data tag which identifies one of four data states:
    - shared, owned, modified, or invalid.
  - 6. The multi-processor computer system of claim 1, wherein each global interface further comprises a main memory address map for the entire system.
  - 7. The multi-processor computer system of claim 1, wherein each global interface further comprises interface circuitry having a directory cache into which is loaded a sub-set of the node'"'"'s directory.
  - 8. The multi-processor computer system of claim 1, wherein each request agent has a state machine array associated therewith for monitoring the status of each request transaction that it issues.
  - 9. The multi-processor computer system of claim 1, wherein each home agent has a state machine array associated therewith for monitoring the status of all requests for which it has undertaken processing.
  - 10. The multi-processor computer system of claim 1, wherein each home agent has a first input buffer for storing cache-coherency transaction requests until they can be processed, a second input buffer for storing I/O requests until they can be processed, and a third input buffer for storing request-to-own requests until they can be processed.

11. In a multi-processor computer system having multiple nodes, each node having a block of main memory and multiple microprocessors, each node having a global interface which incorporates a home agent, a slave agent and a request agent, a method for providing the orderly flow of memory request and request compliance traffic between nodes without resorting to complex flow control protocol, said method comprising the steps of:
- identifying a number y, which represents the maximum number of incomplete transaction requests that any single node may have outstanding, the number y limited to a certain, determinable finite number;
  
  multiplying the number y by the number n, which represents the number of nodes within the computer system;
  
  providing temporary storage at a buffer of the global interface for at least a number ny of requests at the home agent of each node so that pending requests received by that home agent may be stored until it is able to process them;
  
  processing the requests stored at the temporary storage, provided during said step of providing, at the microprocessor;
  
  maintaining a status indicator at each node for each received request once processing of that request begins;
  
  indicating whether processing of the request is complete or still pending;
  
  transferring stored requests as the requests stored during said step of providing are processed;
  
  receiving cache coherency transactions and input/output transactions from the multiple microprocessors;
  
  routing said cache coherency transactions through a local physical address-to-global address translator to the request agent;
  
  routing said input/output transactions through an I/O input queue to the request agent; and
  
  storing a permission status entry for each of said routed cache coherency transactions and said routed input/output transactions.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11, wherein temporary storage at each node includes storage for requests internal to that node.
  - 13. The method of claim 11, which further comprises the step of providing temporary storage for at least a number ny of requests at the slave agent of each node so that pending requests received by that slave agent may be stored until it is able to process them.
  - 14. The method of claim 11, which further comprises the step of providing temporary storage for at least a number y of requests at the request agent of each node so that pending requests received from processors within that node may be stored until the request agent is able to process them and transmit them to that node'"'"'s home agent.
  - 15. The method of claim 11, wherein separate temporary storage is provided for incoming cache-coherency requests, I/O requests, and request-to-own requests.
  - 16. The method of claim 11, which further comprises the step of providing temporary storage for requests received by the slave agent of each node so that pending requests received by that slave agent may be stored until it is able to process them, said temporary storage being sized such that it can never overflow.
  - 17. The method of claim 11, which further comprises the step of providing temporary storage for requests received by the request agent from processors within that node so that such requests may be stored until the request agent is able to process them and transmit them to that node'"'"'s home agent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Loewenstein, Paul, Hagersten, Erik
Primary Examiner(s)
Powell, Mark R.
Assistant Examiner(s)
ROSSI, JEFFREY A

Application Number

US08/674,358
Time in Patent Office

1,583 Days
Field of Search

395/200.43, 395/200.64, 395/200.6, 395/200.42, 395/671, 395/672-678, 711/130, 711/148, 711/121, 709/212, 709/213, 709/230, 709/234, 709/100-106
US Class Current

709/234
CPC Class Codes

G06F 12/0813   with a network or matrix co...

G06F 12/0828   with concurrent directory a...

G06F 2212/272   Cache only memory architect...

Directory-based, shared-memory, scaleable multiprocessor computer system having deadlock-free transaction flow sans flow control protocol

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Directory-based, shared-memory, scaleable multiprocessor computer system having deadlock-free transaction flow sans flow control protocol

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links