STREAM PROCESSOR WITH HIGH BANDWIDTH AND LOW POWER VECTOR REGISTER FILE
First Claim
1. A system comprising:
- a memory; and
a processor coupled to the memory, wherein the processor comprises;
a vector register file;
a source operand buffer;
a vector arithmetic logic unit (VALU); and
a vector destination cache for storing results of instructions executed by the VALU;
wherein the processor is configured to;
evict a first cache line from the vector destination cache; and
write the first cache line to the source operand buffer responsive to determining that the first cache line comprises one or more source operands targeted by a pending instruction.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, apparatuses, and methods for implementing a high bandwidth, low power vector register file for use by a parallel processor are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of processing pipeline. The parallel processing unit includes a vector arithmetic logic unit and a high bandwidth, low power, vector register file. The vector register file includes multi-bank high density random-access memories (RAMs) to satisfy register bandwidth requirements. The parallel processing unit also includes an instruction request queue and an instruction operand buffer to provide enough local bandwidth for VALU instructions and vector I/O instructions. Also, the parallel processing unit is configured to leverage the RAM'"'"'s output flops as a last level cache to reduce duplicate operand requests between multiple instructions. The parallel processing unit includes a vector destination cache to provide additional R/W bandwidth for the vector register file.
-
Citations
20 Claims
-
1. A system comprising:
-
a memory; and a processor coupled to the memory, wherein the processor comprises; a vector register file; a source operand buffer; a vector arithmetic logic unit (VALU); and a vector destination cache for storing results of instructions executed by the VALU; wherein the processor is configured to; evict a first cache line from the vector destination cache; and write the first cache line to the source operand buffer responsive to determining that the first cache line comprises one or more source operands targeted by a pending instruction. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
evicting a first cache line from the vector destination cache; and writing the first cache line to the source operand buffer responsive to determining that the first cache line comprises one or more source operands targeted by a pending instruction. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus comprising:
-
a vector register file; a source operand buffer; a vector arithmetic logic unit (VALU); and a vector destination cache for storing results of instructions executed by the VALU; wherein the apparatus is configured to; evict a first cache line from the vector destination cache; and write the first cache line to the source operand buffer responsive to determining that the first cache line comprises one or more source operands targeted by a pending instruction. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification