Conversation of distributed memory bandwidth in multiprocessor system with cache coherency by transmitting cancel subsequent to victim write

US 6,393,529 B1
Filed: 08/10/1999
Issued: 05/21/2002
Est. Priority Date: 12/21/1998
Status: Expired due to Term

First Claim

Patent Images

1. A multiprocessing computer system comprising:

a plurality of processing nodes interconnected through an interconnect structure, wherein said plurality of processing nodes includes;

a first processing node with a cache memory, wherein said first processing node is configured to identify a dirty cache line in said cache memory that is to be written into a designated memory location and to generate a first memory write operation to transfer said dirty cache line to said designated memory location; and

a second processing node configured to receive said dirty cache line and to responsively initiate a second memory write operation to write said dirty cache line received from said first processing node into said designated memory location, wherein said second processing node is further configured to transmit a target done message to said first processing node upon receiving said dirty cache line, wherein said first processing node is configured to transmit a memory cancel response to said second processing node when said first processing node receives an invalidating probe prior to receiving said target done message, and wherein said memory cancel response causes said second processing node to abort further execution of said second memory write operation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A messaging scheme that conserves system memory bandwidth and maintains cache coherency during a victim block write operation in a multiprocessing computer system is described. A source node having a dirty victim cache block—a modified cache block that is being written back to a corresponding system memory—sends a victim block command along with the dirty cache block data to the target processing node having associated therewith the corresponding system memory. The target node responds with a target done message sent to the source node and also initiates a memory write cycle to transfer the received cache block to the corresponding memory location. If the source node encounters an invalidating probe between the time it sent the victim block command and the time it received the target done response, the source node sends a memory cancel response to the target node. The memory cancel response helps maintain cache coherency within the system by causing the target node to abort further processing of the memory write cycle involving the victim block because the victim block may no longer contain the valid data. The memory cancel response may also conserve the system memory bandwidth by attempting to avoid relatively lengthy memory write cycles when the victim block may represent stale data. If the source node receives the target done response and if the victim block is still valid, the source node sends, instead, a source done message to the target node to indicate completion of the victim block transfer operation and to allow the target node to commit the victim block to the corresponding memory location.

Citations

21 Claims

1. A multiprocessing computer system comprising:
- a plurality of processing nodes interconnected through an interconnect structure, wherein said plurality of processing nodes includes;
  
  a first processing node with a cache memory, wherein said first processing node is configured to identify a dirty cache line in said cache memory that is to be written into a designated memory location and to generate a first memory write operation to transfer said dirty cache line to said designated memory location; and
  
  a second processing node configured to receive said dirty cache line and to responsively initiate a second memory write operation to write said dirty cache line received from said first processing node into said designated memory location, wherein said second processing node is further configured to transmit a target done message to said first processing node upon receiving said dirty cache line, wherein said first processing node is configured to transmit a memory cancel response to said second processing node when said first processing node receives an invalidating probe prior to receiving said target done message, and wherein said memory cancel response causes said second processing node to abort further execution of said second memory write operation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The multiprocessing computer system of claim 1, wherein said interconnect structure includes a first plurality of dual-unidirectional links.
  - 3. The multiprocessing computer system as in claim 2, wherein each dual-unidirectional link in said first plurality of dual-unidirectional links interconnects a respective pair of processing nodes from said plurality of processing nodes.
  - 4. The multiprocessing computer system according to claim 3, further comprising a plurality of I/O devices, wherein said interconnect structure further includes a second plurality of dual-unidirectional links, and wherein each of said plurality of I/O devices is coupled to a respective processing node through a corresponding one of said second plurality of dual-unidirectional links.
  - 5. The multiprocessing computer system of claim 4, wherein each dual-unidirectional link in said first and said second plurality of dual-unidirectional links performs packetized information transfer and includes a pair of unidirectional buses comprising:
6. The multiprocessing computer system of claim 5, wherein each of said plurality of processing nodes includes:
- a plurality of circuit elements comprising;
  
  a processor core, a cache memory, a memory controller, a bus bridge, a graphics logic, a bus controller, and a peripheral device controller; and
  
  a plurality of interface ports, wherein each of said plurality of circuit elements is coupled to at least one of said plurality of interface ports.
7. The multiprocessing computer system according to claim 6, wherein at least one of said plurality of interface ports in said each of said plurality of processing nodes is connected to a corresponding dual-unidirectional link selected from the group consisting of said first and said second plurality of dual-unidirectional links.
8. The multiprocessing computer system of claim 1, further comprising:
- a plurality of system memories; and
  
  a plurality of memory buses, wherein each of said plurality of system memories is coupled to a corresponding one of said plurality of processing nodes through a respective one of said plurality of memory buses.
9. The multiprocessing computer system as in claim 8, wherein each of said plurality of memory buses is bidirectional.
10. The multiprocessing computer system according to claim 8, wherein a first memory from said plurality of system memories is coupled to said second processing node, and wherein said first memory includes said designated memory location.
11. The multiprocessing computer system according to claim 1, wherein said second processing node is configured to transmit said target done message concurrently with initiation of said second memory write operation.
12. The multiprocessing computer system of claim 1, wherein said target done message functions to inform said first processing node of reception of said dirty cache line by said second processing node.
13. The multiprocessing computer system as recited in claim 1, wherein said second processing node is configured to send said invalidating probe.
14. The multiprocessing computer system of claim 13, wherein said second processing node transmits said invalidating probe in response to a data transfer request from a third processing node in said plurality of processing nodes, and wherein said data transfer request is addressed to said designated memory location.
15. The multiprocessing computer system according to claim 14, wherein said data transfer request from said third processing node indicates an intent of said third processing node to modify data contained in said designated memory location.
16. The multiprocessing computer system as in claim 1, wherein said first processing node is configured to transmit a source done message to said second processing node when said first processing node receives said target done message from said second processing node prior to receiving said invalidating probe.
17. The multiprocessing computer system according to claim 16, wherein said source done message signifies completion of execution of said first memory write operation according to a predetermined data transfer protocol and allows said second processing node to respond to a subsequent data transfer request addressed to said designated memory location.

18. In a multiprocessing computer system comprising a plurality of processing nodes interconnected through an interconnect structure, wherein said plurality of processing nodes includes a first processing node, a second processing node, and a third processing node, a method for selectively writing a dirty cache line stored within said first processing node into a corresponding memory location in a memory associated with said second processing node, said method comprising:
- said first processing node transmitting a write command along with said dirty cache line to said second processing node;
  
  said second processing node transmitting a target done message to said first processing node upon receiving said dirty cache line;
  
  said second processing node initiating a memory write operation in response to said write command to write said dirty cache line into said corresponding memory location;
  
  said first processing node receiving an invalidating probe prior to receiving said target done message;
  
  said first processing node transmitting a memory cancel response to said second processing node upon receiving said invalidating probe;
  
  and said memory cancel response causing said second processing node to abort further processing of said memory write operation.
- View Dependent Claims (19, 20, 21)
- - 19. The method of claim 18, wherein said first processing node receiving said invalidating probe includes:
20. The method according to claim 19, wherein said data transfer request from said third processing node indicates an intent of said third processing node to modify data contained in said corresponding memory location.
21. The method as in claim 18, further comprising:
- said first processing node transmitting a source done message to said secondprocessing node upon receiving said target done message prior to said invalidating probe, thereby allowing said memory write operation to be completed by said second processing node.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Advanced Micro Devices, Inc.
Original Assignee
Advanced Micro Devices, Inc.
Inventors
Keller, James B.
Primary Examiner(s)
Kim, Kenneth S.

Application Number

US09/370,970
Time in Patent Office

1,015 Days
Field of Search

709/216, 711/141, 711/143, 711/147, 712/28
US Class Current

711/141
CPC Class Codes

G06F 12/0808   with cache invalidating mea...

G06F 12/0813   with a network or matrix co...

G06F 12/0815   Cache consistency protocols

Conversation of distributed memory bandwidth in multiprocessor system with cache coherency by transmitting cancel subsequent to victim write

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Conversation of distributed memory bandwidth in multiprocessor system with cache coherency by transmitting cancel subsequent to victim write

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links