Relaxed memory consistency model

US 8,307,194 B1
Filed: 08/18/2003
Issued: 11/06/2012
Est. Priority Date: 08/18/2003
Status: Active Grant

First Claim

Patent Images

1. A computer processing method comprising:

providing a computer system having a shared memory and a multistream processor (MSP), wherein the MSP includes a plurality of single stream processors (SSPs) including a first SSP, each one of the plurality of SSPs having a scalar section and one or more vector sections, wherein each of the plurality of SSPs is operatively coupled to the memory;

defining program order between operations on the first SSP;

defining operation dependence order of vector memory references to the memory with respect to each other and with respect to scalar memory references to the memory;

maintaining a minimal guarantee on the ordering using an active list located in the scalar section, wherein maintaining includes;

placing each instruction in order in the active list, wherein placing includes initializing each instruction to a speculative status;

determining if the speculative status instruction is branch speculative or trap speculative;

if the speculative status instruction is neither branch speculative nor trap speculative, checking to see if all scalar operands for the speculative status instruction are present;

if all scalar operands for the speculative status instruction are present, moving the speculative status instruction to a “

scalar committed”

status and issuing a scalar commitment notice from the active list to the one or more vector sections;

checking to see if all vector operands for the scalar committed status instruction are present;

if all vector operands for the scalar committed status instruction are present, moving the scalar committed status instruction to a “

committed”

status;

checking to see if all instructions previous to the committed status instruction are completed; and

if all instructions previous to the committed status instruction are completed, moving the committed status instruction to a “

graduated”

status; and

maintaining memory consistency between multiple vector memory references and between vector and scalar memory references by guaranteeing no vector store reference can be sent to memory prior to a scalar or vector load that occurs earlier in the program order.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus to provide specifiable ordering between and among vector and scalar operations within a single streaming processor (SSP) via a local synchronization (Lsync) instruction that operates within a relaxed memory consistency model. Various aspects of that relaxed memory consistency model are described. Further, a combined memory synchronization and barrier synchronization (Msync) for a multistreaming processor (MSP) system is described. Also, a global synchronization (Gsync) instruction provides synchronization even outside a single MSP system is described. Advantageously, the pipeline or queue of pending memory requests does not need to be drained before the synchronization operation, nor is it required to refrain from determining addresses for and inserting subsequent memory accesses into the pipeline.

Citations

16 Claims

1. A computer processing method comprising:
- providing a computer system having a shared memory and a multistream processor (MSP), wherein the MSP includes a plurality of single stream processors (SSPs) including a first SSP, each one of the plurality of SSPs having a scalar section and one or more vector sections, wherein each of the plurality of SSPs is operatively coupled to the memory;
  
  defining program order between operations on the first SSP;
  
  defining operation dependence order of vector memory references to the memory with respect to each other and with respect to scalar memory references to the memory;
  
  maintaining a minimal guarantee on the ordering using an active list located in the scalar section, wherein maintaining includes;
  
  placing each instruction in order in the active list, wherein placing includes initializing each instruction to a speculative status;
  
  determining if the speculative status instruction is branch speculative or trap speculative;
  
  if the speculative status instruction is neither branch speculative nor trap speculative, checking to see if all scalar operands for the speculative status instruction are present;
  
  if all scalar operands for the speculative status instruction are present, moving the speculative status instruction to a “
  
  scalar committed”
  
  status and issuing a scalar commitment notice from the active list to the one or more vector sections;
  
  checking to see if all vector operands for the scalar committed status instruction are present;
  
  if all vector operands for the scalar committed status instruction are present, moving the scalar committed status instruction to a “
  
  committed”
  
  status;
  
  checking to see if all instructions previous to the committed status instruction are completed; and
  
  if all instructions previous to the committed status instruction are completed, moving the committed status instruction to a “
  
  graduated”
  
  status; and
  
  maintaining memory consistency between multiple vector memory references and between vector and scalar memory references by guaranteeing no vector store reference can be sent to memory prior to a scalar or vector load that occurs earlier in the program order.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein maintaining memory consistency includes synchronizing memory references by executing a predefined Lsync operation within a local SSP and a predefined Msync operation among SSPs.
  - 3. The method of claim 2, wherein the computer system includes two or more MSPs and a Gsync operation for synchronizing across the two or more MSPs, wherein maintaining memory consistency further includes executing the Gsync operation among all participating SSPs of the two or more MSPs.
  - 4. The method of claim 1, wherein maintaining memory consistency includes monitoring a reference sent bit in the active list.

5. An apparatus comprising:
- a shared memory;
  
  one or more multistream processors (MSPs), wherein each MSP includes a plurality of single stream processors (SSPs) including a first SSP, each one of the plurality of SSPs having a scalar section and one or more vector sections, wherein each of the SSPs is operatively coupled to the memory;
  
  means for defining program order between operations on the first SSP;
  
  means for defining operation dependence order of vector memory references to the memory with respect to each other and with respect to scalar memory references to the memory;
  
  means for maintaining a minimal guarantee on the ordering on the first SSP using an active list located in the scalar section, wherein means for maintaining includes;
  
  means for placing each instruction in order in the active list, wherein means for placing includes means for initializing each instruction to a speculative status;
  
  means for determining if the speculative status instruction is branch speculative or trap speculative;
  
  means for, if the speculative status instruction is neither branch speculative nor trap speculative, checking to see if all scalar operands for the speculative status instruction are present;
  
  means for, if all scalar operands for the speculative status instruction are present, moving the speculative status instruction to a “
  
  scalar committed”
  
  status and issuing a scalar commitment notice from the active list to the one or more vector sections;
  
  means for checking to see if all vector operands for the scalar committed status instruction are present;
  
  means for, if all vector operands for the scalar committed status instruction are present, moving the scalar committed status instruction to a “
  
  committed”
  
  status;
  
  means for checking to see if all instructions previous to the committed status instruction are completed; and
  
  means for, if all instructions previous to the committed status instruction are completed, moving the committed status instruction to a “
  
  graduated”
  
  status; and
  
  means for maintaining memory consistency between the plurality of single stream processors (SSPs) by guaranteeing no vector store reference can be sent to memory prior to a scalar or vector load that occurs earlier in the program order.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus of claim 5, means for maintaining memory consistency includes means for synchronizing memory references by executing a predefined Lsync operation within a local SSP and a predefined Msync operation among SSPs.
  - 7. The apparatus of claim 6, wherein the apparatus includes two or more MSPs and a Gsync operation for synchronizing across the two or more MSPs, wherein means for maintaining memory consistency further includes means for executing the Gsync operation among all participating SSPs of the two or more MSPs.
  - 8. The apparatus of claim 5, wherein means for maintaining memory consistency includes means for monitoring a reference sent bit in the active list.

9. A computer processing method comprising:
- providing a memory having a plurality of addressable locations;
  
  providing a multistream processor (MSP), wherein the MSP includes a plurality of single stream processors (SSPs) including a first SSP, each one of the plurality of SSPs having a scalar section and one or more vector sections, wherein each of the plurality of SSPs is operatively coupled to the memory;
  
  defining program order between operations on the first SSP;
  
  defining operation dependence order of vector memory references to the memory with respect to each other and with respect to scalar memory references to the memory;
  
  serializing writes to any given one of the plurality of addressable locations of memory in the order using an active list in the scalar section, wherein serializing includes;
  
  placing each instruction in order in the active list, wherein placing includes initializing each instruction to a speculative status;
  
  determining if the speculative status instruction is branch speculative or trap speculative;
  
  if the speculative status instruction is neither branch speculative nor trap speculative, checking to see if all scalar operands for the speculative status instruction are present;
  
  if all scalar operands for the speculative status instruction are present, moving the speculative status instruction to a “
  
  scalar committed”
  
  status and issuing a scalar commitment notice from the active list to the one or more vector sections;
  
  checking to see if all vector operands for the scalar committed status instruction are present;
  
  if all vector operands for the scalar committed status instruction are present, moving the scalar committed status instruction to a “
  
  committed”
  
  status;
  
  checking to see if all instructions previous to the committed status instruction are completed; and
  
  if all instructions previous to the committed status instruction are completed, moving the committed status instruction to a “
  
  graduated”
  
  status;
  
  making a write globally visible when no one of the plurality of SSPs can read the value produced by an earlier write in a sequential order of writes to that location;
  
  preventing an SSP from reading a value written by another MSP before that value becomes globally visible; and
  
  performing memory consistency between the plurality of single stream processors (SSPs) by guaranteeing no vector store reference can be sent to memory prior to a scalar or vector load that occurs earlier in the program order.
- View Dependent Claims (10, 11, 12)
- - 10. The method of claim 9, wherein performing memory consistency includes synchronizing memory references by executing a predefined Lsync operation within a local SSP and a predefined Msync operation among SSPs.
  - 11. The method of claim 10, wherein the computer system includes two or more MSPs and a Gsync operation for synchronizing across the two or more MSPs, wherein performing memory consistency further includes executing the Gsync operation among all participating SSPs of the two or more MSPs.
  - 12. The method of claim 9, wherein performing includes monitoring a reference sent bit in the active list.

13. An apparatus comprising:
- a memory having a plurality of addressable locations;
  
  one or more multistream processors (MSPs), wherein each MSP includes a plurality of single stream processors (SSPs) including a first SSP, each one of the plurality of SSPs having a scalar section and one or more vector sections, wherein each of the SSPs is operatively coupled to the memory;
  
  means for defining program order between operations on the first SSP;
  
  means for defining operation dependence order of vector memory references to the memory with respect to each other and with respect to scalar memory references to the memory;
  
  means for serializing writes to any given one of the plurality of addressable locations of memory in the order using an active list in the scalar section, wherein means for serializing includes;
  
  means for placing each instruction in order in the active list, wherein means for placing includes means for initializing each instruction to a speculative status;
  
  means for determining if the speculative status instruction is branch speculative or trap speculative;
  
  means for, if the speculative status instruction is neither branch speculative nor trap speculative, checking to see if all scalar operands for the speculative status instruction are present;
  
  means for, if all scalar operands for the speculative status instruction are present, moving the speculative status instruction to a “
  
  scalar committed”
  
  status and issuing a scalar commitment notice from the active list to the one or more vector sections;
  
  means for checking to see if all vector operands for the scalar committed status instruction are present;
  
  means for, if all vector operands for the scalar committed status instruction are present, moving the scalar committed status instruction to a “
  
  committed”
  
  status;
  
  means for checking to see if all instructions previous to the committed status instruction are completed; and
  
  means for, if all instructions previous to the committed status instruction are completed, moving the committed status instruction to a “
  
  graduated”
  
  status;
  
  means for making a write globally visible when no one of the plurality of SSPs can read the value produced by an earlier write in a sequential order of writes to that location;
  
  means for preventing an SSP from reading a value written by another MSP before that value becomes globally visible; and
  
  means for performing memory consistency between the plurality of single stream processors (SSPs) by guaranteeing no vector store reference can be sent to memory prior to a scalar or vector load that occurs earlier in the program order.
- View Dependent Claims (14, 15, 16)
- - 14. The apparatus of claim 13, wherein means for performing memory consistency includes means for executing a predefined Lsync operation within a local SSP and a predefined Msync operation among SSPs.
  - 15. The apparatus of claim 14, wherein the computer system includes two or more MSPs and a Gsync operation for synchronizing across the two or more MSPs, wherein means for performing memory consistency further includes means for executing the Gsync operation among all the participating SSPs of the two or more MSPs.
  - 16. The apparatus of claim 13, wherein means for performing includes means for monitoring a reference sent bit in the active list.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CRAY Incorporated (Hewlett-Packard Enterprise Company)
Original Assignee
CRAY Incorporated (Hewlett-Packard Enterprise Company)
Inventors
Scott, Steven L., Faanes, Gregory J., Stephenson, Brick, Moore, William T. Jr., Kohn, James R.
Primary Examiner(s)
FENNEMA, ROBERT E

Application Number

US10/643,754
Time in Patent Office

3,368 Days
Field of Search

712/34
US Class Current

712/34
CPC Class Codes

G06F 12/0817   using directory methods

G06F 12/0855   Overlapped cache accessing,...

G06F 12/0862   with prefetch

G06F 12/1027   using associative or pseudo...

G06F 9/30018   Bit or string instructions

G06F 9/30036   Instructions to perform ope...

G06F 9/30038   using a mask

G06F 9/3004   to perform operations on me...

G06F 9/30087   Synchronisation or serialis...

G06F 9/30094   Condition code generation, ...

G06F 9/34   Addressing or accessing the...

G06F 9/3455   using stride

G06F 9/3824   Operand accessing

G06F 9/3834   Maintaining memory consistency

G06F 9/3836   Instruction issuing, e.g. d...

G06F 9/3861   Recovery, e.g. branch miss-...

G06F 9/52   Program synchronisation; Mu...

G06F 9/522   Barrier synchronisation

Relaxed memory consistency model

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Relaxed memory consistency model

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links